http://www.tuicool.com/articles/r2QJVr
http://so.searchtech.pro/articles/2013/06/16/1371392427213.html
What I believe to be the best combination is: map-reduce implementation like apache hadoop or gridgain or JPPF (for processing large datasets) + jdmp for data mining + NoSQL db for query and retrieval (neo4j or bigtable etc). It is still not clear on what is the exact use case ;-)
Also look this link for more detaiLs: Do you know batch log processing tools for hadoop (zohmg alternatives)?
http://stackoverflow.com/questions/154982/what-is-the-best-log-analysis-tool-that-you-used
http://cuddletech.com/?p=795
http://blog.csdn.net/chinalinuxzend/article/category/265273