zoukankan      html  css  js  c++  java
  • 【第一课】kaggle初识

    Evernote Export

    Crowdflower搜索结果相关性

    文件和数据描述
    train.csv训练数据集包括

    • id:产品ID查询:使用的搜索词
    • product_description:完整的产品说明以及HTML格式标记
    • median_relevance:3个评分者的中位数相关性得分。该值是1到4之间的整数。
    • relevant_variance:评分者给出的相关性分数的变化。

    测试集 test.csv

    • id:产品ID查询:使用的搜索词
    • product_description:完整的产品说明以及HTML格式标记
      sampleSubmission.csv
    • 格式正确的示例提交文件允许使用外部数据,例如词典,词库,语言语料库。但是,它们不得与此特定数据集直接相关。必须将您的外部数据来源发布到论坛,以确保社区中所有参与者的公平性。
    packagemodelmodel_selectfeatureweighting
    XGBoost gblinear MSE High/Low Yes
    XGBoost gblinear COCR High/Low Yes
    XGBoost gblinear Softmax High/Low Yes
    XGBoost gblinear Softkappa High/Low Yes
    XGBoost gbtree MSE Low Yes
    XGBoost gbtree COCR Low Yes
    XGBoost gbtree Softmax Low Yes
    XGBoost gbtree Softkappa Low Yes
    Sklearn GradientBoostingRegressor   Low Yes
    Sklearn ExtraTreeRegressor   Low Yes
    Sklearn RandomForestRegressor   Low Yes
    Sklearn SVR   Low Yes
    Sklearn Ridge   High/Low No
    Sklearn Lasso   High/Low No
    Sklearn LogisticRegression   High/Low No
    Keras NN Regression   Low No
    RGF Regression   Low No

    集成学习

    **集成学习:**是目前机器学习的一大热门方向,所谓集成学习简单理解就是指采用多个分类器对数据进行预测,从而提高整体分类器的泛化能力
    三种常见框架:bagging、boosting、stacking
    bagging:决定用某一种类型的分类器的时候,通过抽样的方法抽样出不同的子训练集(自助抽样)
    boosting:选择基模型数据集,由基模型(弱模型)等根据权重的方式集成为强模型
    stacking:堆叠集成学习方式,底层基模型不断训练给上层的模型进行预测

    集成模型的选择

    Bias 方差与偏差
    岭回归是有偏的,但是方差结果显示更好
    bagging公式

    E(F)=γimE(fi)=σ2ρ+mσ2(1ρ)

    boosting的偏差与方差

    E(F)=γimE(fi)=m2γ2σ2

    支持向量机回归(SVR)

    数据预处理的步骤

    1.剔除HTML标签

    • 通过bs4库提取HTML中的文本信息
      2.单词替换
    • 拼写错误修正
    • 同义词替换
    • 其他单词替换
      3.词干化

    特征提取

    1.词频数目统计

    • 词出现次数
      2.距离特征统计
    • 分词后之间的距离,查询关键词和产品描述之间的距离,分组距离、统计量等
      3.术语频率和逆文档频率统计
    • tf-idf 自然语言处理的方面应用的词向量
      4.id统计
    • 查询id热编码操作
    • query的独热编码 独热编码
      独热编码:即 One-Hot编码,又称为一位有效编码,其方法是使用N位状态寄存器来对N个状态进行编码,每个状态都由他独立的寄存器位,并且在任意时候,其中只有一位有效

    自然状态码:000,001,010,011,100,101
    独热编码:000001,000010,000100,001000,010000,100000
    距离特征:Jaccard coeffcient JaccardCoef(A,B)=ABAB
    Dice distance DiceDist(A,B)=A+B2AB
    基本距离特征:

    • D(ngram(qi,n),ngram(ti,n))
    • D(ngram(qi,n),ngram(di,n))
    • D(ngram(ti,n),ngram(ti,n))

    距离特征

    • 统计距离特征

    • 1.根据查询或者其他中位数等进行分组

    • Gr=iri=r

    • Gq,r=iqi=q,ri=r
      其中qϵqirϵ1,2,3,4

    • 2.对于每一个样本计算一堆距离

    • Si,r,n=D(ngram(ti,n),ngram(tj,n)jϵGr,j̸=i)

    • SQi,r,n=D(ngram(ti,n),ngram(tj,n)jϵGq,r,j̸=i)
      其中 rϵ1,2,3,4D(,)ϵJaccardCoef(,),DiceDist(,)

    • 3.对于Si,r,nSQi,r,n来说需要计算的值有

    • 最小值

    • 中位数(2分位)

    • 最大值

    • 平均值

    • 标准差

    • 其他评估标准

    TF-IDF特征

    • 基本TF-IDF特征
      • TF-IDF
    %23%23%23%20Crowdflower%E6%90%9C%E7%B4%A2%E7%BB%93%E6%9E%9C%E7%9B%B8%E5%85%B3%E6%80%A7%0A%0A%3E%0A**%E6%96%87%E4%BB%B6%E5%92%8C%E6%95%B0%E6%8D%AE%E6%8F%8F%E8%BF%B0**%0A**train.csv%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE%E9%9B%86%E5%8C%85%E6%8B%AC**%EF%BC%9A%0A-%20id%EF%BC%9A%E4%BA%A7%E5%93%81ID%E6%9F%A5%E8%AF%A2%EF%BC%9A%E4%BD%BF%E7%94%A8%E7%9A%84%E6%90%9C%E7%B4%A2%E8%AF%8D%0A-%20product_description%EF%BC%9A%E5%AE%8C%E6%95%B4%E7%9A%84%E4%BA%A7%E5%93%81%E8%AF%B4%E6%98%8E%E4%BB%A5%E5%8F%8AHTML%E6%A0%BC%E5%BC%8F%E6%A0%87%E8%AE%B0%0A-%20median_relevance%EF%BC%9A3%E4%B8%AA%E8%AF%84%E5%88%86%E8%80%85%E7%9A%84%E4%B8%AD%E4%BD%8D%E6%95%B0%E7%9B%B8%E5%85%B3%E6%80%A7%E5%BE%97%E5%88%86%E3%80%82%E8%AF%A5%E5%80%BC%E6%98%AF1%E5%88%B04%E4%B9%8B%E9%97%B4%E7%9A%84%E6%95%B4%E6%95%B0%E3%80%82%0A-%20%C2%A0relevant_variance%EF%BC%9A%E8%AF%84%E5%88%86%E8%80%85%E7%BB%99%E5%87%BA%E7%9A%84%E7%9B%B8%E5%85%B3%E6%80%A7%E5%88%86%E6%95%B0%E7%9A%84%E5%8F%98%E5%8C%96%E3%80%82%0A%0A**%E6%B5%8B%E8%AF%95%E9%9B%86%C2%A0test.csv**%0A%20-%20id%EF%BC%9A%E4%BA%A7%E5%93%81ID%E6%9F%A5%E8%AF%A2%EF%BC%9A%E4%BD%BF%E7%94%A8%E7%9A%84%E6%90%9C%E7%B4%A2%E8%AF%8D%0A%20-%20product_description%EF%BC%9A%E5%AE%8C%E6%95%B4%E7%9A%84%E4%BA%A7%E5%93%81%E8%AF%B4%E6%98%8E%E4%BB%A5%E5%8F%8AHTML%E6%A0%BC%E5%BC%8F%E6%A0%87%E8%AE%B0%0A**sampleSubmission.csv**%0A%20-%20%E6%A0%BC%E5%BC%8F%E6%AD%A3%E7%A1%AE%E7%9A%84%E7%A4%BA%E4%BE%8B%E6%8F%90%E4%BA%A4%E6%96%87%E4%BB%B6%E5%85%81%E8%AE%B8%E4%BD%BF%E7%94%A8%E5%A4%96%E9%83%A8%E6%95%B0%E6%8D%AE%EF%BC%8C%E4%BE%8B%E5%A6%82%E8%AF%8D%E5%85%B8%EF%BC%8C%E8%AF%8D%E5%BA%93%EF%BC%8C%E8%AF%AD%E8%A8%80%E8%AF%AD%E6%96%99%E5%BA%93%E3%80%82%E4%BD%86%E6%98%AF%EF%BC%8C%E5%AE%83%E4%BB%AC%E4%B8%8D%E5%BE%97%E4%B8%8E%E6%AD%A4%E7%89%B9%E5%AE%9A%E6%95%B0%E6%8D%AE%E9%9B%86%E7%9B%B4%E6%8E%A5%E7%9B%B8%E5%85%B3%E3%80%82%E5%BF%85%E9%A1%BB%E5%B0%86%E6%82%A8%E7%9A%84%E5%A4%96%E9%83%A8%E6%95%B0%E6%8D%AE%E6%9D%A5%E6%BA%90%E5%8F%91%E5%B8%83%E5%88%B0%E8%AE%BA%E5%9D%9B%EF%BC%8C%E4%BB%A5%E7%A1%AE%E4%BF%9D%E7%A4%BE%E5%8C%BA%E4%B8%AD%E6%89%80%E6%9C%89%E5%8F%82%E4%B8%8E%E8%80%85%E7%9A%84%E5%85%AC%E5%B9%B3%E6%80%A7%E3%80%82%0A%20!%5B5112a781d48ea385babd833bfcdde1cd.png%5D(en-resource%3A%2F%2Fdatabase%2F1342%3A1)%0A%20%0A%0A%7C%20**package**%20%7C%20**model**%20%7C%20**model_select**%20%7C%20**feature**%20%7C**weighting**%20%7C%0A%7C%20%3A---%3A%20%7C%20%3A---%3A%20%7C%20%3A---%3A%20%7C%20%3A---%3A%20%7C%20%3A---%3A%20%7C%0A%7C%20*XGBoost*%20%7C%20gblinear%20%7C%20MSE%20%7C%20High%2FLow%20%7C%20Yes%20%7C%0A%7C%20*XGBoost*%20%7C%20gblinear%20%7C%20COCR%20%7C%20%20High%2FLow%20%7C%20Yes%20%20%7C%0A%7C*XGBoost*%20%7C%20gblinear%7C%20Softmax%20%7C%20High%2FLow%20%7C%20Yes%20%20%7C%0A%7C*XGBoost*%20%7C%20gblinear%7CSoftkappa%20%20%7C%20High%2FLow%20%7C%20Yes%20%20%7C%0A%7C%20*XGBoost*%20%7C%20gbtree%20%7C%20MSE%20%7C%20Low%20%7C%20Yes%20%7C%0A%7C*XGBoost*%20%7C%20gbtree%7C%20%20COCR%20%7C%20Low%7CYes%20%20%7C%0A%7C%20*XGBoost*%20%7C%20gbtree%20%20%7C%20Softmax%20%7C%20Low%20%7CYes%20%20%7C%0A%7C%20*XGBoost*%20%7C%20gbtree%20%7C%20Softkappa%20%7C%20Low%20%7C%20Yes%20%7C%0A%7C%20*Sklearn*%20%7C%20GradientBoostingRegressor%7C%20%7C%20Low%20%7C%20Yes%20%20%20%7C%0A%7C%20*Sklearn*%20%7C%20ExtraTreeRegressor%20%7C%20%20%7C%20%20Low%20%7C%20Yes%20%20%20%7C%0A%7C%20*Sklearn*%20%7C%20RandomForestRegressor%20%7C%20%20%7C%20%20Low%20%7C%20Yes%20%20%20%7C%0A%7C%20*Sklearn*%20%7C%20SVR%20%7C%20%20%7C%20%20Low%20%7C%20Yes%20%20%20%7C%0A%7C%20*Sklearn*%20%7C%20Ridge%20%7C%20%20%7C%20%20%20High%2FLow%20%7C%20No%20%20%7C%0A%7C%20*Sklearn*%20%7C%20Lasso%20%7C%20%20%7C%20%20High%2FLow%20%7C%20No%20%20%7C%0A%7C%20*Sklearn*%20%7CLogisticRegression%20%20%7C%20%20%20%7C%20High%2FLow%20%7C%20No%20%20%7C%0A%7C%20*Keras*%20%7C%20NN%20Regression%20%7C%20%20%7C%20Low%20%7C%20No%20%7C%0A%7C%20*RGF*%20%7C%20Regression%20%7C%20%20%7C%20Low%20%7CNo%7C%0A%0A%23%23%23%23%20%E9%9B%86%E6%88%90%E5%AD%A6%E4%B9%A0%0A**%E9%9B%86%E6%88%90%E5%AD%A6%E4%B9%A0%EF%BC%9A**%E6%98%AF%E7%9B%AE%E5%89%8D%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%9A%84%E4%B8%80%E5%A4%A7%E7%83%AD%E9%97%A8%E6%96%B9%E5%90%91%EF%BC%8C%E6%89%80%E8%B0%93%E9%9B%86%E6%88%90%E5%AD%A6%E4%B9%A0%E7%AE%80%E5%8D%95%E7%90%86%E8%A7%A3%E5%B0%B1%E6%98%AF%E6%8C%87%E9%87%87%E7%94%A8%E5%A4%9A%E4%B8%AA%E5%88%86%E7%B1%BB%E5%99%A8%E5%AF%B9%E6%95%B0%E6%8D%AE%E8%BF%9B%E8%A1%8C%E9%A2%84%E6%B5%8B%EF%BC%8C%E4%BB%8E%E8%80%8C%E6%8F%90%E9%AB%98%E6%95%B4%E4%BD%93%E5%88%86%E7%B1%BB%E5%99%A8%E7%9A%84%E6%B3%9B%E5%8C%96%E8%83%BD%E5%8A%9B%0A%E4%B8%89%E7%A7%8D%E5%B8%B8%E8%A7%81%E6%A1%86%E6%9E%B6%EF%BC%9Abagging%E3%80%81boosting%E3%80%81stacking%0Abagging%EF%BC%9A%E5%86%B3%E5%AE%9A%E7%94%A8%E6%9F%90%E4%B8%80%E7%A7%8D%E7%B1%BB%E5%9E%8B%E7%9A%84%E5%88%86%E7%B1%BB%E5%99%A8%E7%9A%84%E6%97%B6%E5%80%99%EF%BC%8C%E9%80%9A%E8%BF%87%E6%8A%BD%E6%A0%B7%E7%9A%84%E6%96%B9%E6%B3%95%E6%8A%BD%E6%A0%B7%E5%87%BA%E4%B8%8D%E5%90%8C%E7%9A%84%E5%AD%90%E8%AE%AD%E7%BB%83%E9%9B%86(%E8%87%AA%E5%8A%A9%E6%8A%BD%E6%A0%B7)%0Aboosting%EF%BC%9A%E9%80%89%E6%8B%A9%E5%9F%BA%E6%A8%A1%E5%9E%8B%E6%95%B0%E6%8D%AE%E9%9B%86%EF%BC%8C%E7%94%B1%E5%9F%BA%E6%A8%A1%E5%9E%8B(%E5%BC%B1%E6%A8%A1%E5%9E%8B)%E7%AD%89%E6%A0%B9%E6%8D%AE%E6%9D%83%E9%87%8D%E7%9A%84%E6%96%B9%E5%BC%8F%E9%9B%86%E6%88%90%E4%B8%BA%E5%BC%BA%E6%A8%A1%E5%9E%8B%0Astacking%EF%BC%9A%E5%A0%86%E5%8F%A0%E9%9B%86%E6%88%90%E5%AD%A6%E4%B9%A0%E6%96%B9%E5%BC%8F%EF%BC%8C%E5%BA%95%E5%B1%82%E5%9F%BA%E6%A8%A1%E5%9E%8B%E4%B8%8D%E6%96%AD%E8%AE%AD%E7%BB%83%E7%BB%99%E4%B8%8A%E5%B1%82%E7%9A%84%E6%A8%A1%E5%9E%8B%E8%BF%9B%E8%A1%8C%E9%A2%84%E6%B5%8B%0A%23%23%23%23%20%E9%9B%86%E6%88%90%E6%A8%A1%E5%9E%8B%E7%9A%84%E9%80%89%E6%8B%A9%0ABias%20%E6%96%B9%E5%B7%AE%E4%B8%8E%E5%81%8F%E5%B7%AE%0A%E5%B2%AD%E5%9B%9E%E5%BD%92%E6%98%AF%E6%9C%89%E5%81%8F%E7%9A%84%EF%BC%8C%E4%BD%86%E6%98%AF%E6%96%B9%E5%B7%AE%E7%BB%93%E6%9E%9C%E6%98%BE%E7%A4%BA%E6%9B%B4%E5%A5%BD%0Abagging%E5%85%AC%E5%BC%8F%0A%24%24E(F)%20%3D%20%5Cgamma%20%5Ccdot%20%5Csum%5Em_i%20E(f_i)%20%3D%20%5Csigma%5E2*%5Crho%2B%5Cfrac%7B%5Csigma%5E2*(1-%5Crho)%7D%7Bm%7D%24%24%0Aboosting%E7%9A%84%E5%81%8F%E5%B7%AE%E4%B8%8E%E6%96%B9%E5%B7%AE%0A%24%24E(F)%20%3D%20%5Cgamma%20%5Ccdot%20%5Csum%5Em_i%20E(f_i)%3Dm%5E2%20*%20%5Cgamma%5E2%20*%20%5Csigma%5E2%24%24%0A%23%23%23%23%20%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA%E5%9B%9E%E5%BD%92(SVR)%0A%0A%0A%23%23%23%23%20%E6%95%B0%E6%8D%AE%E9%A2%84%E5%A4%84%E7%90%86%E7%9A%84%E6%AD%A5%E9%AA%A4%0A**1.%E5%89%94%E9%99%A4HTML%E6%A0%87%E7%AD%BE**%0A-%20%E9%80%9A%E8%BF%87bs4%E5%BA%93%E6%8F%90%E5%8F%96HTML%E4%B8%AD%E7%9A%84%E6%96%87%E6%9C%AC%E4%BF%A1%E6%81%AF%0A**2.%E5%8D%95%E8%AF%8D%E6%9B%BF%E6%8D%A2**%0A-%20%E6%8B%BC%E5%86%99%E9%94%99%E8%AF%AF%E4%BF%AE%E6%AD%A3%0A-%20%E5%90%8C%E4%B9%89%E8%AF%8D%E6%9B%BF%E6%8D%A2%0A-%20%E5%85%B6%E4%BB%96%E5%8D%95%E8%AF%8D%E6%9B%BF%E6%8D%A2%0A**3.%E8%AF%8D%E5%B9%B2%E5%8C%96**%0A%0A%23%23%23%23%20%E7%89%B9%E5%BE%81%E6%8F%90%E5%8F%96%0A**1.%E8%AF%8D%E9%A2%91%E6%95%B0%E7%9B%AE%E7%BB%9F%E8%AE%A1**%0A-%20%E8%AF%8D%E5%87%BA%E7%8E%B0%E6%AC%A1%E6%95%B0%0A**2.%E8%B7%9D%E7%A6%BB%E7%89%B9%E5%BE%81%E7%BB%9F%E8%AE%A1**%0A-%20%E5%88%86%E8%AF%8D%E5%90%8E%E4%B9%8B%E9%97%B4%E7%9A%84%E8%B7%9D%E7%A6%BB%EF%BC%8C%E6%9F%A5%E8%AF%A2%E5%85%B3%E9%94%AE%E8%AF%8D%E5%92%8C%E4%BA%A7%E5%93%81%E6%8F%8F%E8%BF%B0%E4%B9%8B%E9%97%B4%E7%9A%84%E8%B7%9D%E7%A6%BB%EF%BC%8C%E5%88%86%E7%BB%84%E8%B7%9D%E7%A6%BB%E3%80%81%E7%BB%9F%E8%AE%A1%E9%87%8F%E7%AD%89%0A**3.%E6%9C%AF%E8%AF%AD%E9%A2%91%E7%8E%87%E5%92%8C%E9%80%86%E6%96%87%E6%A1%A3%E9%A2%91%E7%8E%87%E7%BB%9F%E8%AE%A1**%0A-%20tf-idf%20%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86%E7%9A%84%E6%96%B9%E9%9D%A2%E5%BA%94%E7%94%A8%E7%9A%84%E8%AF%8D%E5%90%91%E9%87%8F%0A**4.id%E7%BB%9F%E8%AE%A1**%0A-%20%E6%9F%A5%E8%AF%A2id%E7%83%AD%E7%BC%96%E7%A0%81%E6%93%8D%E4%BD%9C%0A-%20query%E7%9A%84%E7%8B%AC%E7%83%AD%E7%BC%96%E7%A0%81%20**%5B%E7%8B%AC%E7%83%AD%E7%BC%96%E7%A0%81%5D(https%3A%2F%2Fzhuanlan.zhihu.com%2Fp%2F35287916)**%0A%E7%8B%AC%E7%83%AD%E7%BC%96%E7%A0%81%EF%BC%9A%E5%8D%B3%20One-Hot%E7%BC%96%E7%A0%81%EF%BC%8C%E5%8F%88%E7%A7%B0%E4%B8%BA%E4%B8%80%E4%BD%8D%E6%9C%89%E6%95%88%E7%BC%96%E7%A0%81%EF%BC%8C%E5%85%B6%E6%96%B9%E6%B3%95%E6%98%AF%E4%BD%BF%E7%94%A8N%E4%BD%8D%E7%8A%B6%E6%80%81%E5%AF%84%E5%AD%98%E5%99%A8%E6%9D%A5%E5%AF%B9N%E4%B8%AA%E7%8A%B6%E6%80%81%E8%BF%9B%E8%A1%8C%E7%BC%96%E7%A0%81%EF%BC%8C%E6%AF%8F%E4%B8%AA%E7%8A%B6%E6%80%81%E9%83%BD%E7%94%B1%E4%BB%96%E7%8B%AC%E7%AB%8B%E7%9A%84%E5%AF%84%E5%AD%98%E5%99%A8%E4%BD%8D%EF%BC%8C%E5%B9%B6%E4%B8%94%E5%9C%A8%E4%BB%BB%E6%84%8F%E6%97%B6%E5%80%99%EF%BC%8C%E5%85%B6%E4%B8%AD%E5%8F%AA%E6%9C%89%E4%B8%80%E4%BD%8D%E6%9C%89%E6%95%88%0A%3E%E8%87%AA%E7%84%B6%E7%8A%B6%E6%80%81%E7%A0%81%EF%BC%9A000%2C001%2C010%2C011%2C100%2C101%0A%3E%E7%8B%AC%E7%83%AD%E7%BC%96%E7%A0%81%EF%BC%9A000001%2C000010%2C000100%2C001000%2C010000%2C100000%0A%3E%E8%B7%9D%E7%A6%BB%E7%89%B9%E5%BE%81%3AJaccard%20coeffcient%20%24JaccardCoef(A%2CB)%20%3D%20%5Cfrac%7B%7CA%20%5Cbigcap%20B%7C%7D%7B%7CA%20%5Cbigcup%20B%7C%7D%24%0A%3EDice%20distance%20%24DiceDist(A%2CB)%20%3D%20%5Cfrac%7B2%7CA%20%5Cbigcap%20B%7C%7D%7B%7CA%7C%2B%7CB%7C%7D%24%0A%3E%E5%9F%BA%E6%9C%AC%E8%B7%9D%E7%A6%BB%E7%89%B9%E5%BE%81%EF%BC%9A%0A%3E*%20%24D(ngram(q_i%2Cn)%2Cngram(t_i%2Cn))%24%0A%3E*%20%24D(ngram(q_i%2Cn)%2Cngram(d_i%2Cn))%24%0A%3E*%20%24D(ngram(t_i%2Cn)%2Cngram(t_i%2Cn))%24%0A%0A%23%23%23%23%20%E8%B7%9D%E7%A6%BB%E7%89%B9%E5%BE%81%0A-%20%E7%BB%9F%E8%AE%A1%E8%B7%9D%E7%A6%BB%E7%89%B9%E5%BE%81%0A-%201.%E6%A0%B9%E6%8D%AE%E6%9F%A5%E8%AF%A2%E6%88%96%E8%80%85%E5%85%B6%E4%BB%96%E4%B8%AD%E4%BD%8D%E6%95%B0%E7%AD%89%E8%BF%9B%E8%A1%8C%E5%88%86%E7%BB%84%0A-%20%24G_r%20%3D%20%7Bi%7Cr_i%3Dr%7D%24%0A-%20%24G_q%2Cr%20%3D%20%7Bi%7Cq_i%3Dq%2Cr_i%3Dr%7D%24%0A%E5%85%B6%E4%B8%AD%24q%20%5Cepsilon%20q_i%20r%20%5Cepsilon%20%7B1%2C2%2C3%2C4%7D%24%0A%0A-%202.%E5%AF%B9%E4%BA%8E%E6%AF%8F%E4%B8%80%E4%B8%AA%E6%A0%B7%E6%9C%AC%E8%AE%A1%E7%AE%97%E4%B8%80%E5%A0%86%E8%B7%9D%E7%A6%BB%0A-%20%24S_i%2Cr%2Cn%20%3D%20D(ngram(t_i%2Cn)%2Cngram(t_j%2Cn)%7Cj%20%5Cepsilon%20G_r%2Cj%20%5Cnot%3D%20i)%24%0A%20-%20%24SQ_i%2Cr%2Cn%20%3D%20D(ngram(t_i%2Cn)%2Cngram(t_j%2Cn)%7Cj%20%5Cepsilon%20G_q%2Cr%2Cj%20%5Cnot%3D%20i)%24%0A%20%E5%85%B6%E4%B8%AD%20%24r%20%5Cepsilon%201%2C2%2C3%2C4%20D(-%2C-)%20%5Cepsilon%20JaccardCoef(-%2C-)%2CDiceDist(-%2C-)%24%0A%20-%203.%E5%AF%B9%E4%BA%8E%24S_i%2Cr%2Cn%24%E5%92%8C%24SQ_i%2Cr%2Cn%24%E6%9D%A5%E8%AF%B4%E9%9C%80%E8%A6%81%E8%AE%A1%E7%AE%97%E7%9A%84%E5%80%BC%E6%9C%89%0A%20-%20%E6%9C%80%E5%B0%8F%E5%80%BC%0A%20-%20%E4%B8%AD%E4%BD%8D%E6%95%B0(2%E5%88%86%E4%BD%8D)%0A%20-%20%E6%9C%80%E5%A4%A7%E5%80%BC%0A%20-%20%E5%B9%B3%E5%9D%87%E5%80%BC%0A%20-%20%E6%A0%87%E5%87%86%E5%B7%AE%0A%20-%20%E5%85%B6%E4%BB%96%E8%AF%84%E4%BC%B0%E6%A0%87%E5%87%86%0A%20%0A%20%23%23%23%23%20TF-IDF%E7%89%B9%E5%BE%81%0A%20-%20%E5%9F%BA%E6%9C%ACTF-IDF%E7%89%B9%E5%BE%81%0A%20%20%20%20-%20TF-IDF%20Features%0A%20%20%20%20-%20Basic%20Cosine%20Similarity%0A%20%20%20%20-%20Statistical%20Cosine%20Similarity%0A%20%20%20%20-%20SVD%20Reduced%20Features%0A%20%20%20%20-%20Basic%20Cosine%20Similarity%20Based%20on%20SVD%20Reduced%20Features%0A%20%20%20%20-%20Statistical%20Cosine%20Similarity%20Based%20on%20SVD%20Reduced%20Features%0A%20%20%20%20%20
    Win a contest, win a challenge
  • 相关阅读:
    sgu175pascal
    poj3659pascal
    poj3411
    传球游戏pascal
    ubuntu 9.04 wubi安装的问题 2009511 13:58:00 (21ic)
    任何道理都是有局限的 20081020 11:07:00 (21ic)
    浓浓的亲情 2008106 8:53:00 (21ic)
    卖买房所得
    黑色的一周 20081018 3:23:00 (21ic)
    度的把握 2008929 20:59:00
  • 原文地址:https://www.cnblogs.com/pandaboy1123/p/10365235.html
Copyright © 2011-2022 走看看