zoukankan      html  css  js  c++  java
  • 论文爬取(三)

    英语论文关键字统计

    # -*- coding: utf-8 -*-
    import sys

    sys.path.append('../')

    import jieba
    import jieba.analyse
    import MysqlUtil
    from optparse import OptionParser

    # file_name = "test.txt"
    #
    # content = open(file_name, 'rb').read()
    # content = "Few-shot learning is an important area of research. Conceptually, humans are readily able to understand new concepts given just a few examples, while in more pragmatic terms, limited-example training situations are common practice. Recent effective approaches to few-shot learning employ a metric-learning framework to learn a feature similarity comparison between a query (test) example, and the few support (training) examples. However, these approaches treat each support class independently from one another, never looking at the entire task as a whole. Because of this, they are constrained to use a single set of features for all possible test-time tasks, which hinders the ability to distinguish the most relevant dimensions for the task at hand. In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. This component traverses across the entire support set at once, identifying task-relevant features based on both intra-class commonality and inter-class uniqueness in the feature space. Incorporating our module improves performance considerably (5%-10% relative) over baseline systems on both miniImageNet and tieredImageNet benchmarks, with overall performance competitive with the most recent state-of-the-art systems."
    # 10表示输出的前10个
    # tags = jieba.analyse.extract_tags(content, topK=10, withWeight=True)
    #
    # print(tags)
    # print(",".join(tags))


    def getKey(str):
    counts = {}
    for i in str:
    content = jieba.lcut(i[0])
    for word in content:
    if len(word) == 1 or word in nolist: # 单个词不计算在内
    continue
    else:
    counts[word]=counts.get(word, 0)+1 # 遍历所有词语,每出现一次其对应值加1

    items = list(counts.items()) # 将键值对转化为列表
    items.sort(key=lambda x:x[1], reverse=True) # 根据词语出现的次数进行从大到小的排序

    for i in range(20):
    word, count = items[i]
    MysqlUtil.insert_key(word, count)
    print('{0:<5}{1:<5}'.format(word, count))

    return items


    if __name__ == '__main__':
    nolist ={'are','is','am','and','of','but','so','which','where','when','how','what','that','who','whose','in','at','with','of','for','the','a','an','to','on','we','We','this','by','from','our','as','in','The','can','he','He','The','be','In'}
    res = MysqlUtil.select_ab()
    # print(res[0])
    getKey(res)
  • 相关阅读:
    动态规划法(八)最大子数组问题(maximum subarray problem)
    动态规划法(九)想要更多例子?
    动态规划法(五)钢条切割问题(rod cutting problem)
    MySql排序函数
    Mysql 分组函数查询
    MySql单行函数
    MySql常见的函数
    MySql常见的条件查询
    MySql的一些基础查询
    MySql资料总全
  • 原文地址:https://www.cnblogs.com/mumulailai/p/14912261.html
Copyright © 2011-2022 走看看