zoukankan      html  css  js  c++  java
  • python编程技巧

    对c++或者java熟悉的同学,写python代码时通常会用c++,java方式.有些情况下,用python的方法实现一些功能会更方便.

    遍历目录下的所有文件

    # coding:utf-8
    import os
    filepath = r"D:	est"
    files = [f for f in os.listdir(filepath) if os.path.isfile(os.path.join(filepath, f))]
    print(files)
    

    列出文件夹filepath下的所有文件名.一行代码解决.同理可以列出所有文件夹.注意,没有列出子目录的内容.

    数字与一个元素的数组相乘.

    # coding:utf-8
    print(3 * [2])  # [2, 2, 2]
    

    结果是一个数组.这个技巧在分类,聚类算法中,初始化类编号最常用.

    np.random.RandomState

    经常初用来构建一个乱序的numpy类型的数组.

    # coding:utf-8
    import numpy as np
    random_state = np.random.RandomState(0)
    indices = np.arange(100)
    random_state.shuffle(indices)
    print(indices)
    

    np.random.RandomState的参数一样时,构造的数组一定一样.不同的参数构建的数组一定不一样.

    sys.maxunicode

    python内部编码只可能是UCS-2,UCS-4中的某一种.sys.maxunicode为65535时表示该版本内部编码是unicode是UCS-2,sys.maxunicode为1114111时, 表示该版本内部编码是UCS-4.

    print(sys.maxunicode)
    

    zip用法

    labels = [(1, 2), (3, 4), (5, 6)]
    labels, categories = zip(*labels)
    print(labels)
    print(categories)
    

    可以把一个元素组成的数组转化成2个数组.也可以把2个ndarray合成一个tuple

    import numpy as np
    a = np.array(
        [[1, 2],
        [3, 4],
        [5, 6],
        [2, 3],
        [6, 9]]
    )
    b = np.array([[1], [2], [3], [4], [5]])
    for c in zip(a, b):
        print(c)
    

    去掉数组中的一些数组

    用numpy.in1d()可以构建一个bool类型的数组.通过该数组,可以把数组中的一些元素去掉.这在分类算法中去掉一些数据集时非常有用.

    # coding:utf-8
    import numpy as np
    a = [1, 2, 3, 4, 5]
    b = [1, 4]
    mask = np.in1d(a, b)
    names = np.array(["aaa", "bbb", "ccc", "ddd", "fff"])
    names = names[mask]
    print(names)  # ['aaa' 'ddd']
    

    根据url地址下载文件

    # coding:utf-8
    from urllib.request import urlopen
    URL = "http://download.labs.sogou.com/dl/sogoulabdown/categories_2012.txt"
    opener = urlopen(URL)
    with open("test.txt", 'wb') as f:
        f.write(opener.read())
    

    持久化存储

    在处理大文件时经常会遇到这个问题.求一大批文档的tfidf时会产生一个很大但很稀疏的矩阵,而numpy的各种运算的参数又是numpy数组.不能把稀疏矩阵直接转化成numpy数组(内在装不上),解决方法是在预处理的时候把稀疏矩阵存成很多小文件,比如50行存成一个小文件,在训练的时候每次读取一个小文件.这就现实了小内存处理大文件.

    import pickle
    import numpy as np
    fpath = r"D:seri.dat"
    a = {}
    a['aaa'] = 1
    a['bbb'] = 2
    b = np.array([1, 2, 3])
    with open(fpath, 'wb') as f:
        pickle.dump(b, f)  # 把dict存成一个文件
    with open(fpath, 'rb') as f:
        obj2 = pickle.load(f)
        print(obj2)  # 把dict读到内存中
    

    用sklearn库创建tfidf矩阵

    创建矩阵

    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    corpus = np.array(["aaa bbb ccc", "aaa bbb ddd"])
    cv1 = CountVectorizer()
    cv1output = cv1.fit_transform(corpus)
    print(cv1.get_feature_names())
    tfidfTrans1 = TfidfTransformer()
    print(tfidfTrans1.fit_transform(cv1output))
    

    tfidfTrans1就是最终的tfidf矩阵.这时候有一个测试集("aaa vvv ccc", "ccc ccc rrr"),注意vvv,rrr都不在训练集中,要忽略.所以要以训练集的单词为基准,建立测试矩阵.

    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.feature_extraction.text import TfidfTransformer
    corpus = np.array(["aaa bbb ccc", "aaa bbb ddd"])
    cv1 = CountVectorizer()
    cv1output = cv1.fit_transform(corpus)
    print(cv1.get_feature_names())
    tfidfTrans1 = TfidfTransformer()
    print(tfidfTrans1.fit_transform(cv1output))
    
    corpus1 = np.array(["aaa vvv ccc", "ccc ccc rrr"])
    cv2 = CountVectorizer(vocabulary=cv1.vocabulary_)
    cv2output = cv2.fit_transform(corpus1)
    tfidfTrans2 = TfidfTransformer()
    print(tfidfTrans2.fit_transform(cv2output))
    

    数组转化成onehot类型

    # coding:utf-8
    import numpy as np
    def dense_to_one_hot(input_data, class_num):
        data_num = input_data.shape[0]
        # numpy.arange(num_labels)产生一个[0,1,2,3,4,5,6,7,8,9,0,1,3]的数组,* num_classes是把所有数乘以10
        index_offset = np.arange(data_num) * class_num  # [0,10,20,30,40,50,60,70,80,90,100,110,120]
        labels_one_hot = np.zeros((data_num, class_num))  # (13*10)的数组
        # index_offset        [0,10,20,30,40,50,60,70,80,90,100,110, 120]
        # input_data.ravel()  [0,1, 2, 3, 4, 5, 6, 7, 8, 9, 0,    1,  3]
        # sum                 [0,11,22,33,44,55,66,77,88,99,100, 111, 123]
        labels_one_hot.flat[index_offset + input_data.ravel()] = 1
        return labels_one_hot
    input_data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 3])
    class_num = 10
    print(dense_to_one_hot(input_data, class_num))
    

    namedtuple

    tuple可以表示不变集合,例如,一个点的二维坐标就可以表示成:

    p = (1, 2)
    

    但是,看到(1, 2),很难看出这个tuple是用来表示一个坐标的.定义一个class又小题大做了,这时,namedtuple就派上了用场:

    from collections import namedtuple
    Point = namedtuple('Point', ['x', 'y'])
    p = Point(1, 2)
    print(p.x)
    print(p.y)
    

    参考资料:廖雪峰的官方网站

    *args 和**kwargs的用法

    *args表示传递不定长的参数.

    def fun_var_args(farg, *args):
        print("arg:", farg)
        for value in args:
            print("another arg:", value)
    fun_var_args(1, "two", 3)  # *args可以当作可容纳多个变量组成的list
    

    **kwargs也表示传递不定长的参数.和*args的区别是**kwargs传的是key, value的结构.

    def fun_var_kwargs(farg, **kwargs):
        print("arg:", farg)
        for key in kwargs:
            print("another keyword arg: %s: %s" % (key, kwargs[key]))
    fun_var_kwargs(farg=1, myarg2="two", myarg3=3)  # myarg2和myarg3被视为key, 感觉**kwargs可以当作容纳多个key和value的dictionary
    

    命令行传参数

    原生的方法

    # coding:utf-8
    import sys
    print(sys.argv[1])
    print(sys.argv[2])
    

    用tensorflow的方法

    # coding:utf-8
    import tensorflow as tf
    
    flags = tf.app.flags
    flags.DEFINE_string("zipfilepath", "a", "zip file path")
    flags.DEFINE_string("unzipfolder", "b", "unzip folder")
    FLAGS = flags.FLAGS
    print(FLAGS.zipfilepath)
    print(FLAGS.unzipfolder)
    

    解压zip文件

    import zipfile
    zip_ref = zipfile.ZipFile(r"D:	est.zip")
    zip_ref.extractall(r"D:unfolder")
    zip_ref.close()
    

    python打包指定类型的压缩文件

    python setup.py sdist --formats=gztar
    

    对数组中的元素做某一操作

    Z[:] = [0 if x > 0.5 else 1 for x in Z]

    dict中的元素key与value互换

    a = dict()
    a[1] = 2
    a[2] = 3
    a[3] = 4
    d = {v: k for k, v in a.items()}
    print(d)
    

    把数组套数组的结构转化成一维数组

    import itertools
    a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    print(list(itertools.chain.from_iterable(a)))
    

    从二维数组中选出指定行

    import numpy as np
    matrix = np.random.random([1024, 64])  # 64-dimensional embeddings
    ids = np.array([0, 5, 17, 33])
    print(matrix[ids].shape)  # prints a matrix of shape [4, 64]
    

    可以对比tensorflow的tf.nn.embedding_lookup功能

    显示python的版本

    import platform
    if platform.python_version().startswith("3"):
        print("a")
    

    numpy数组四舍五入后转成整型

    import numpy as np
    a = np.random.rand(10)
    print(a)
    b = np.around(a)
    print(b.astype(int))
    

    dict按key排列,按value排列

    data = dict()
    data[1] = 2
    data[13] = 1
    data[5] = 9
    count_pairs = sorted(data.items())
    print(count_pairs)
    count_pairs = sorted(data.items(), key=lambda x: (x[1], x[0]))
    print(count_pairs)
    

    向函数中传不定长参数

    import numpy as np
    def rand_arr(a, b, *args):
        np.random.seed(0)
        return np.random.rand(*args) * (b - a) + a
    
    a = rand_arr(0, 1, 2, 3)
    print(a)
    

    双星

    双星表示平方

    import numpy as np
    a = np.array([1, 2, 3])
    print(a ** a)
    

    输出结果为[ 1 4 27]

    产生0,1个数相等的0,1序列

    import numpy as np
    a = np.random.choice(2, 50000, p=[0.5, 0.5])
    print(len(a))
    print(a[0: 10])
    

    ndarray数组右移若干位

    import numpy as np
    x = np.arange(10)
    print(x)
    print(np.roll(x, 3))
    

    多维transpose

    import numpy as np
    arr1 = np.arange(12).reshape(2, 2, 3)
    print("---------------------------------转换前---------------------------------")
    print(arr1)
    print("---------------------------------转换后---------------------------------")
    print(arr1.transpose((1, 0, 2)))
    
    arr1 = np.arange(12).reshape(2, 2, 3)
    print("---------------------------------转换前---------------------------------")
    print(arr1)
    print("---------------------------------转换后---------------------------------")
    print(arr1.transpose((0, 2, 1)))
    

    用"数组的数组"来理解多维数组.arr1[2][2][3]是一个有2个元素的数组,每个元素又是长度为2的数组,而长度为2的数组的每个元素又是一个长度为3的数组.arr1.transpose((1, 0, 2))的意思是第3维不变.可以这样认为,arr1原始的结构如下:

    [egin{equation*} left[ egin{array}{cc} A & B \ C & D \ end{array} ight] end{equation*} ]

    其中A=[0, 1, 2],B=[3, 4, 5],C=[6, 7, 8],D=[9, 10, 11],现在要转置第1维和第2维,所以转后为

    [egin{equation*} left[ egin{array}{cc} A & C \ B & D \ end{array} ight] end{equation*} ]

    A,B,C,D的内容不变,这就解释了arr1.transpose((1, 0, 2)的值.用类似的思路可以解释arr1.transpose((0, 2, 1)的值.对于arr1.transpose((0, 2, 1)可以认为是第1维不变,第2,3维转置.第1维的每个元素都是一个2*3的矩阵,转置后变成3*2,这就解释了arr1.transpose((0, 2, 1)的输出.

    defaultdict

    通常情况下从dict中按key取一个值,如果key不存在会报错.可以用defaultdict定义dict,key不存在时不会报错.

    from collections import defaultdict
    a = defaultdict(int)
    a["3"] = 1
    print(a["3"])
    print(a["45"])
    

    format用法

    print('{0:.2f} finished. Epoch {1}'.format(1.1234, 2.3354))
    g = "{0:.2f}, {1}".format(1.1234, "aa")
    

    @property作用

    可以像使用属性那样用函数.

    # coding:utf-8
    class Person(object):
        def __init__(self, first_name, last_name):
            """Constructor"""
            self.first_name = first_name
            self.last_name = last_name
        @property
        def full_name(self):
            return "%s %s" % (self.first_name, self.last_name)
    person = Person("zhang", "san")
    print(person.full_name)  # 如果去掉 @property就显示不出来full name
    

    __call__函数

    class Person(object):
        def __init__(self, name, gender):
            self.name = name
            self.gender = gender
        def __call__(self, friend):
            print('My name is %s...' % self.name)
            print('My friend is %s...' % friend)
    p = Person('Bob', 'male')
    p('Tim')  # 对象可以当作方法使用,调用的是__call__函数
    

    获取命令行输出

    # coding:utf-8
    import os
    command = 'ps a'
    with os.popen(command) as p:
        info = p.read()
        print(info)
    

    将文档转化成定长字符串

    # coding:utf-8
    from __future__ import division
    from __future__ import absolute_import
    from __future__ import print_function
    import hashlib
    s1 = "中华人民共和国"
    s2 = "美国"
    print(hashlib.md5(s1.encode("utf-8")).hexdigest())
    print(hashlib.md5(s1.encode("utf-8")).hexdigest())
    print(hashlib.md5(s2.encode("utf-8")).hexdigest())
    

    向mongodb中插入数据

    # coding:utf-8
    from __future__ import division
    from __future__ import absolute_import
    from __future__ import print_function
    from pymongo import MongoClient
    client = MongoClient('localhost', 27017)
    db = client.weichat
    db.docs.insert_one(
        {"class_type": "canvas",
         "content": "春江潮水连海平",
         })
    

    统计子串数

    a = "aaa bbb ccc ddd eee aaa bbb aaa aaa"
    print(a.count("aaa"))
    

    判断类是否有某个属性

    # coding:utf-8
    class MyClass:
        def __init__(self):
            self.name = "xiaohua"
    
        def process(self):
            return self.name
    
    t = MyClass()
    print(hasattr(t, "name"))  # name属性是否存在
    print(hasattr(t, "process"))  # process属性是否存在
    print(getattr(t, "name"))  # 获取name属性值,存在就打印出来
    print(getattr(t, "process"))  # 获取run方法,存在就打印出方法的内存地址
    print(getattr(t, "process")())  # 获取process方法,后面加括号可以将这个方法运行
    print(getattr(t, "age", "18"))
    

    判断操作系统类型

    # coding:utf-8
    import platform
    print(platform.platform())
    

    命令行接收参数

    # coding:utf-8
    from __future__ import print_function
    import argparse
    def build_parser():
        parser = argparse.ArgumentParser()
        parser.add_argument('--run_type', type=str, required=True)
        args = parser.parse_args()
        return args
    if __name__ == '__main__':
        args = build_parser()
        if args.run_type == "train":
            print("train")
        else:
            print("test")
    

    判断是否是汉语词

    # coding:utf-8
    import re
    import jieba
    WORD_FORMAT = r"[u4e00-u9fa5A-Za-z]+$"
    content = "我们都有一个家,名字叫中国08"
    seg_list = jieba.cut(content)
    pattern = re.compile(WORD_FORMAT)
    doc = " ".join(word for word in seg_list if pattern.search(word))
    print(doc)
    

    list逆序

    m = ["a", "b", "c", "d", "e", "f"]
    print(m[::-1])
    

    list取最后3个

    m = ["a", "b", "c", "d", "e", "f"]
    print(m[-3:])
    
  • 相关阅读:
    InnoDB 事务
    InnoDB 索引
    MySQL 8 事务管理、数据库维护、改善性能
    MySQL 7 存储过程、游标、触发器
    MySQL 6 插入数据(INSERT INTOVALUESSELECT FROM)、更新和删除数据(UPDATE SET WHEREDELETE)、创建和操纵表、视图
    MySQL 5 联结表、创建高级联结、组合查询、全文本搜索
    MySQL 4 数据处理函数、汇总数据、分组数据、子查询
    MySQL 3 通配符、正则、计算字段
    MySQL 2 SQL数据使用(检索、排序、过滤:SELECT/FROM/LIMIT/ORDER BY/DESC/WHERE/AND/OR/IN/NOT)
    JavaScript相关-深入理解函数2
  • 原文地址:https://www.cnblogs.com/zhouyang209117/p/6601321.html
Copyright © 2011-2022 走看看