zoukankan      html  css  js  c++  java
  • TextBlob Quick Start

    安装

    pip install textblob
    
    import nltk
    pip install nltk
    nltk.download('punkt')   # 安装一些语料库, 国内安装有问题
    nltk.download('averaged_perceptron_tagger')

    基本操作

    情感分析

    该sentiment属性返回形式的namedtuple 。极性在[-1.0,1.0]范围内浮动。主观性是在[0.0,1.0]范围内的浮动,其中0.0是非常客观的,而1.0是非常主观的。Sentiment(polarity, subjectivity)

    text = "I am happy today. I feel sad today."
    
    from textblob import TextBlob
    blob = TextBlob(text)
    blob.sentences   # 能够分句
    
    print(blob.sentences[0].sentiment)
    print(blob.sentences[1].sentiment)
    print(blob.sentiment)
    # Sentiment(polarity=0.15000000000000002, subjectivity=1.0)

    拼写纠正

    >>> b = TextBlob("I havv goood speling!")
    >>> print(b.correct())
    I have good spelling!

    返回的貌似是字符数组,最好用str转成字符串.

    原理见 How to Write a Spelling Corrector

    例子:amazon评论情感分析

    1. 获取原始评论数据

    import pandas as pd
    
    df=pd.read_csv('hair_dryer.tsv', sep='	', usecols=['review_body','review_date'])
    
    df.head()

    2. 由于评论中有大量的拼写错误,加个单词纠正

    def correct(text):
      b = TextBlob(text)
      return str(b.correct())
    
    
    df['corrected_review_body'] = df['review_body'].map(correct)
    df.head()

    3. 得到情感极性和主观性

    def get_polarity(text):
      testimonial = TextBlob(text)
      return testimonial.sentiment.polarity
    
    def get_subjectivity(text):
      testimonial = TextBlob(text)
      return testimonial.sentiment.subjectivity
    
    
    df['polarity'] = df['review_body'].map(get_polarity)
    df['subjectivity'] = df['review_body'].map(get_subjectivity)
    
    print(df)

    4. 统计词频

    统计词频主要是依靠 collections.Counter(word_list).most_common() 方法来自动统计并排序,如下:

    row = df.shape[0]
    words = ""
    for i in range(0, row):
      words = words + " " + p[i]
    
    print(words)   #汇总每个评论
    
    import collections
    collections.Counter(words.split()).most_common(50)  #显示top50

    具体得到某个单词的频数:

    coll = collections.Counter(words.split())
    coll["good"]

    5. 保存结果到CSV

    df.to_csv("情感分析.csv")

    参考链接:

    1. https://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies

    2. https://blog.csdn.net/heykid/article/details/62424513#%E7%BB%9F%E8%AE%A1%E8%AF%8D%E9%A2%91 

    3. https://blog.csdn.net/sinat_29957455/article/details/79059436

  • 相关阅读:
    序列
    笔算开方法
    笔算开方法
    【AFO】闷声发大财
    P1092 虫食算[搜索]
    数据结构总结
    P1486 [NOI2004]郁闷的出纳员[权值线段树]
    P1850 换教室[dp+期望]
    P4281 [AHOI2008]紧急集合 / 聚会[LCA]
    P5021 赛道修建[贪心+二分]
  • 原文地址:https://www.cnblogs.com/lfri/p/12436340.html
Copyright © 2011-2022 走看看