
4.1 Vector Space Model (VSM)
Vector Space Model (VSM)
citation : Pro Deep Learning with Tensorflow
In NLP information-retrieval systems, a document is generally represented as simply a vector of the count of the words it contains. For retrieving documents similar to a specific document either the cosine of the angle or the dot product between the document and other documents is computed. The cosine of the angle between two vectors gives a similarity measure based on the similarity between their vector compositions. To illustrate this fact, let us look at two vectors x, y ∈
[\begin{align}
& \phi(x,y) = \phi \left(\sum_{i=1}^n x_ie_i, \sum_{j=1}^n y_je_j \right)
= \sum_{i=1}^n \sum_{j=1}^n x_i y_j \phi(e_i, e_j) =
& (x_1, \ldots, x_n) \left( ϕ(e1,e1)⋯ϕ(e1,en)⋮⋱⋮ϕ(en,e1)⋯ϕ(en,en) \right)
\left( y1⋮yn \right)
\end{align}]