Word2vec Guessing Game

Welcome to the guessing game assignment. Think of a target word and give "clue" words below that describe the target. The word2vec model will give the 10 words closest to the semantic mean of the given clues. See if you can make the computer guess your chosen target word.

Downloading Finnish word2vec vectors:

Things the model can do for you

If you separate the words by a space, comma, or + sign, the model will take the vector sum of the semantic representations of the given words. Finally, the model will normalize the vector to be of length 1, essentially turning the sum vector into a mean vector. Examples:

susi lemmikki
susi, lemmikki
susi + lemmikki
susi + lemmikki + häntää + haukkua

By separating words with a - sign, the model will take the vector difference of the semantic representations of the given words. Taken by itself, this difference vector is not that informative. However, when combined with summation you can make the model start at the representation of a word and then make the model move in a certain direction. For example: “king - man” will compute the difference vector (i.e. the direction) from man to king and “king - man + woman” will add this difference vector to the semantic representation of “woman”. In essence, this will compute “a man is to a king as a woman is to a …?” Examples:

kuningas - mies + nainen
kuningatar - kuningas + mies
Taiwan - China + Venäjä
jalka - polvi + kyynärpää

Where does the data come from?

The data was collected by the Turku NLP group. The main publication describing the dataset is here:

[1]J. Luotolahti, J. Kanerva, V. Laippala, S. Pyysalo, and F. Ginter. Towards Universal Web Parsebanks. Proceedings of the International Conference on Dependency Linguistics (Depling’15). 2015

More information about the word2vec algorithm

[2] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.

[3] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

[4] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.