text mining - Linking related topics IR -
how link terms(keywords entities) have relation among them through text documents . example of google when search person shows recommendations of other people related person .
in picture figured out spouse , presidential candidate , , equal designation
i using frequency count technique . more 2 terms occur in same document more chance of them have relation. links unrelated terms pagemarks , verbs , page refences in text document .
how should improve , there other easy reliable technique ?
you should few techniques
1.) stop word filtering: common in text mining 2 filter words typically not important 2 frequent. the
, a
, is
, on. there predefined dictionaries.
2.) tf/idf: tf/idf re-weights words on how separate documents.
3.) named entity recognition: task @ hand might sufficient focus on names. named entity recognition can extract names documents
4.) linear dirichlet allocation: lda finds concept in documents. concept set of words appear together.
Comments
Post a Comment