ReAKKT: First steps in text mining with R
First steps in text mining with R
Everyone is preparing for Christmas Eve's Dinner. No one is calling, little email. Looks like a perfect time to start researching text mining in R :)
The problem I'm trying to solve:
I've started with tm package.
- extract keywords from multiple texts
- try to summarize texts > sentence extraction
- group and relate products based on their descriptions > classification / clustering
- add relevant information to text based on similar / related text
- http://text-analysis.googlecode.com/files/Text_Mining_Infrastructure_in_R.pdf
- http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf - introduction to tm package
- http://cran.r-project.org/doc/Rnews/Rnews_2008-2.pdf - introduction to text mining in R
- http://epub.wu.ac.at/1923/1/document.pdf - text mining in R and its applications
Then I've jumped to TextRank algorithm for keywords & sentence extraction. Seems, TextRank is not present in tm, but there is Java source code available so should be possible to call it from R.
Will need to compare TextRank to KEA. The later is implemented for R in RKEA.
Looks promising.