zoukankan      html  css  js  c++  java
  • [Javascript] Identify the most important words in a document using tf-idf in Natural

    Tf-idf, or term frequency-inverse document frequency, is a statistic that indicates how important a word is to the entire document. This lesson will explain term frequency and inverse document frequency, and show how we can use tf-idf to identify the most relevant words in a body of text.

    Find specific words tf-idf for given documents:

    var natural = require('natural');
    var TfIdf = natural.TfIdf;
    var tfidf = new TfIdf();
    
    tfidf.addDocument('this document is about node.');
    tfidf.addDocument('this document is about ruby.');
    tfidf.addDocument('this document is about ruby and node.');
    
    tfidf.tfidfs('node ruby', function(i, measure) {
        console.log('document #' + i + ' is ' + measure);
    });
    
    /*
    document #0 is 1
    document #1 is 1
    document #2 is 2
    */

    List most important words:

    tfidf.listTerms(0 /*document index*/).forEach(function(item) {
        console.log(item.term + ': ' + item.tfidf);
    });
  • 相关阅读:
    头插法建立单链表
    顺序表
    栈的顺序存储实现
    折半查找
    myeclipe 快捷键盘
    ztree redio单选按钮
    webuploader上传进度条 上传删除
    svn乱码解决办法
    异构SOA系统架构之Asp.net实现(兼容dubbo)
    RPC框架
  • 原文地址:https://www.cnblogs.com/Answer1215/p/7624434.html
Copyright © 2011-2022 走看看