zoukankan      html  css  js  c++  java
  • Python 新浪微博元素 (Word, Screen Name)词汇多样性

    CODE:

    #!/usr/bin/python 
    # -*- coding: utf-8 -*-
    
    '''
    Created on 2014-7-10
    @author: guaguastd
    @name: weiboLexicalDiversity.py
    '''
    
    if __name__ == '__main__':
        
        # get weibo_api to access sina api
        from sinaWeiboLogin import sinaWeiboLogin
        sinaWeiboApi = sinaWeiboLogin()
        
        # import sinaWeibo
        from sinaWeibo import extractWeiboEntities
        
        # import sinaWeoboStatuses
        from sinaWeiboStatuses import publicTimeline
        
        # import sinaWeiboFrequency
        from sinaWeiboLexicalDiversity import weibo_lexical_diversity, weibo_average_words
        
        # get the new 5 weibo
        weiboNum = 5
        statuses = publicTimeline(sinaWeiboApi, weiboNum)
        status_texts,screen_names,words = extractWeiboEntities(statuses)  
    
        for token in (words, screen_names):  
            print '
    Lexical diversity of %s: ' % token  
            print weibo_lexical_diversity(token)  
      
        for status in (status_texts,):  
            print '
    Average words of %s: ' % status  
            print weibo_average_words(status)  
                

    RESULT:

    Lexical diversity of [u'[mocu8f6cu53d1]2014u65b0u6b3eu590fu88c5u5370u82b1u77edu8896u8fdeu8863u88d9u9ad8u7aefu5927u7801u4e2du5e74u5973u88c5u4feeu8eabu663eu7626u857eu4e1du8fdeu8863u88d9', u'http://t.cn/RvCLdgN', u'[u795eu9a6c]u963fu4f9du83b2u8fdeu8863u88d9', u'ccddu5973u88c52014u590fu88c5u65b0u6b3e', u'u97e9u7248', u'u5c0fu9999u98ceu857eu4e1du516cu4e3bu88d9', u'u6b63u54c1', u'http://t.cn/RvCyo4X', u'u590fu65e5u5ea6u5047u6e05u51c9u88c5~~>>>>>>u559cu6b22u70b9u8fd9u91ccuff1ahttp://t.cn/RvEqd5R', u'u6211u6b63u5728u6b66u4fa0u5361u724cu624bu6e38u201cu5927u638cu95e8u201du4e2du51b2u51fbu8840u6218u699cu5355uff0cu613fu5404u4f4du5927u4fa0u62d4u5200u76f8u52a9uff01u6ce8u518cu5927u638cu95e8uff0cu586bu5199u6211u7684u9080u8bf7u7801u30102zr7u3011uff0cu5171u540cu83b7u53d6u4e30u539au5956u52b1u3002http://t.cn/8FUZSTe', u'@u5927u638cu95e8u6e38u620f', u'u8f7bu8f68u65e9u4e0au7684u7a7au8c03u5f00u5f97u7565u5927']: 
    1.0
    
    Lexical diversity of [u'kathyisangel', u'wangbinrona', u'u5168u7403u6d41u884cu670du9970u6f6eu7f8eu98ceu5c1au63a7', u'u624bu673au7528u62372454403221', u'u6b63u76f4u4f60u4e00u8138u7684u52c7u6562u541b']: 
    1.0
    
    Average words of [u'[mocu8f6cu53d1]2014u65b0u6b3eu590fu88c5u5370u82b1u77edu8896u8fdeu8863u88d9u9ad8u7aefu5927u7801u4e2du5e74u5973u88c5u4feeu8eabu663eu7626u857eu4e1du8fdeu8863u88d9  http://t.cn/RvCLdgN', u'[u795eu9a6c]u963fu4f9du83b2u8fdeu8863u88d9 ccddu5973u88c52014u590fu88c5u65b0u6b3e u97e9u7248 u5c0fu9999u98ceu857eu4e1du516cu4e3bu88d9 u6b63u54c1  http://t.cn/RvCyo4X', u'u590fu65e5u5ea6u5047u6e05u51c9u88c5~~>>>>>>u559cu6b22u70b9u8fd9u91ccuff1ahttp://t.cn/RvEqd5R', u'u6211u6b63u5728u6b66u4fa0u5361u724cu624bu6e38u201cu5927u638cu95e8u201du4e2du51b2u51fbu8840u6218u699cu5355uff0cu613fu5404u4f4du5927u4fa0u62d4u5200u76f8u52a9uff01u6ce8u518cu5927u638cu95e8uff0cu586bu5199u6211u7684u9080u8bf7u7801u30102zr7u3011uff0cu5171u540cu83b7u53d6u4e30u539au5956u52b1u3002http://t.cn/8FUZSTe @u5927u638cu95e8u6e38u620f ', u'u8f7bu8f68u65e9u4e0au7684u7a7au8c03u5f00u5f97u7565u5927']: 
    2.4


  • 相关阅读:
    Dockerfile + Nginx.conf文件记录(用于前端项目部署)
    Dockerfile文件记录(用于后端项目部署)
    结合docker发布后端项目(基于gradle包管理)的shell脚本
    结合docker发布前端项目(基于npm包管理)的shell脚本
    Docker+Nginx使用流程(笔记)
    AntDesign getFieldDecorator 获取自定义组件的值
    c++ primer 第五版第七章
    c++ primer 第五版第六章
    c++ primer 第五版第五章
    c++ Primer 第五版习题答案第四章
  • 原文地址:https://www.cnblogs.com/lcchuguo/p/4566188.html
Copyright © 2011-2022 走看看