zoukankan      html  css  js  c++  java
  • 编程Tips集锦

    以下是自己编程的一些小贴士,记录,总结提高自己。

    1.python中集合类型的查找,尽量用dict or set类型。

    dict和set类型,在python内部的实现都是使用hash映射,查找的时间复杂度是O(1),比任何的查找算法都高效。

    当在程序中使用到>1K次的查询,就应该开始考虑使用dict或set类型来进行数据的组织。

     1 #coding:utf-8
     2 from urllib.request import urlopen
     3 from bs4 import BeautifulSoup
     4 import re
     5 import string
     6 import operator
     7 import datetime
     8 
     9 commonWords = ["the", "be", "and", "of", "a", "in", "to", "have", "it", "i", "that", "for", "you", "he", "with", "on", "do", "say", "this", "they", "is", "an", "at", "but","we", "his", "from", "that", "not", "by", "she", "or", "as", "what", "go", "their","can", "who", "get", "if", "would", "her", "all", "my", "make", "about", "know", "will","as", "up", "one", "time", "has", "been", "there", "year", "so", "think", "when", "which", "them", "some", "me", "people", "take", "out", "into", "just", "see", "him", "your", "come", "could", "now", "than", "like", "other", "how", "then", "its", "our", "two", "more", "these", "want", "way", "look", "first", "also", "new", "because", "day", "more", "use", "no", "man", "find", "here", "thing", "give", "many", "well"]
    10 #若不注释,则为set类型,跑一遍程序,对比一下,则知优劣!
    11 #commonWords = set(commonWords)
    12 
    13 def isCommon(word):
    14     global commonWords
    15     if word in commonWords:
    16         return True
    17     return False
    18 
    19 
    20 def cleanText(input):
    21     input = re.sub('
    +', " ", input).lower()
    22     input = re.sub('[[0-9]*]', "", input)
    23     input = re.sub(' +', " ", input)
    24     input = re.sub("u.s.", "us", input)
    25     input = bytes(input, "UTF-8")
    26     input = input.decode("ascii", "ignore")
    27     return input
    28 
    29 def cleanInput(input):
    30     input = cleanText(input)
    31     cleanInput = []
    32     input = input.split(' ')
    33     for item in input:
    34         item = item.strip(string.punctuation)
    35         if len(item) > 1 or (item.lower() == 'a' or item.lower() == 'i'):
    36             cleanInput.append(item)
    37 
    38     cleanContent = []
    39     for word in cleanInput:
    40         if not isCommon(word):
    41             cleanContent.append(word)
    42     return cleanContent
    43 
    44 def getNgrams(input, n):
    45     input = cleanInput(input)
    46     output = {}
    47     for i in range(len(input)-n+1):
    48         ngramTemp = " ".join(input[i:i+n])
    49         if ngramTemp not in output:
    50             output[ngramTemp] = 0
    51         output[ngramTemp] += 1
    52     return output
    53 
    54 def getFirstSentenceContaining(ngram, content):
    55     #print(ngram)
    56     sentences = content.split(".")
    57     for sentence in sentences:
    58         if ngram in sentence:
    59             return sentence
    60     return ""
    61 
    62 content = str(urlopen("http://pythonscraping.com/files/inaugurationSpeech.txt").read(), 'utf-8')
    63 
    64 print('Use the set as the format of common words.')
    65 print('Begin:',datetime.datetime.now())
    66 for i in range(50):
    67     ngrams = getNgrams(content, 2)
    68     sortedNGrams = sorted(ngrams.items(), key = operator.itemgetter(1), reverse = True)
    69 print('End:',datetime.datetime.now())
    70 print(sortedNGrams)
    View Code

    2.python往数据库插入数据

    在插入数据之前,记得先进行一次查询,查看数据是否已经在数据库中。

    一可以使程序更健壮,二也可顺便避免二次查询。

    3.数据库在建表的时候,最后有索引

    最近需要往数据库中插入上百万级的数据,十万级以后之后,数据库变得极慢,磁盘读写也是爆满!

    后来,发现查询次数太多,重新建表,顺便加入索引。特别是unique index,我猜背后的实现机制是hash映射。

    加入索引之后的数据库,大大减轻了磁盘的负担,查询速度几乎恒定,不过数据库的增大还是降低了读写的速度(实属情理之中)。

    3.python字符串中转义字符的处理

    python中 所占位为4位,不是通常的8位。

    ************************************
    给我一个支点,我可以改变整个世界!
  • 相关阅读:
    11.8-ros-navigation解析
    8.14-rqt_common_pluggins 详解
    8.1-roscomm详解
    8.15-ros-bag使用
    7.26-rosbridge-suit 解读
    7.26-roscpp_overview详解
    7.26-ROS其他有价值模块
    java dbutils查询数据库时无法给部分字段赋值原因
    java_获取多个文件夹下所有.java源码的总行数
    正则表达式-1-初识正则表达式
  • 原文地址:https://www.cnblogs.com/flyinghorse/p/5735276.html
Copyright © 2011-2022 走看看