zoukankan html css js c++ java

Python学习日记（八）—— 模块一（sys、os、hashlib、random、time、RE）

模块，用一砣代码实现了某个功能的代码集合。

类似于函数式编程和面向过程编程，函数式编程则完成一个功能，其他代码用来调用即可，提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来，可能需要多个函数才能完成（函数又可以在不同的.py文件中），n个 .py 文件组成的代码集合就称为模块。

如：os 是系统相关的模块；file是文件操作相关的模块

模块分为三种：

自定义模块
第三方模块
内置模块

使用模块

导入模块

Python之所以应用越来越广泛，在一定程度上也依赖于其为程序员提供了大量的模块以供使用，如果想要使用模块，则需要导入。导入模块有一下几种方法：

import module
from module.xx.xx import xx
from module.xx.xx import xx as rename 
from module.xx.xx import *

导入模块其实就是告诉Python解释器去解释那个py文件

导入一个py文件，解释器解释该py文件
导入一个包，解释器解释该包下的 __init__.py 文件【py2.7】

那么问题来了，导入模块时是根据那个路径作为基准来进行的呢？即：sys.path

import sys
for i in sys.path:
    print(i)

结果：
C:UsersSullivanPycharmProjectsq1day11      #pycharm自己添加的
C:UsersSullivanPycharmProjectsq1                #pycharm自己添加的
C:python36python36.zip
C:python36DLLs
C:python36lib
C:python36
C:python36libsite-packages

如果sys.path路径列表没有你想要的路径，可以通过 sys.path.append('路径') 添加。

import sys
sys.path.append("D:")    #把D盘当做路径

模块

内置模块是Python自带的功能，在使用内置模块相应的功能时，需要【先导入】再【使用】

一、sys

用于提供对Python解释器相关的操作：

#sys模块和python解释器进行交互
sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdin          输入相关
sys.stdout         输出相关
sys.stderror       错误相关

#argv
print(sys.argv)
结果:
C:UsersSullivanPycharmProjectsq1day10>python module-sys.py
['module-sys.py']

C:UsersSullivanPycharmProjectsq1day10>python module-sys.py 1 2
['module-sys.py', '1', '2']

#platform
print(sys.platform)
结果:
win32   

#exit是可以打印东西的
#sys.exit("Goodbye!")

#往屏幕上打东西,和print不一样
sys.stdout.write("hello")   #不会自动换行,所以能在一行里显示
print("hello")                   #会自动换行
sys.stdout.write("hello")
结果:
hellohello           #可以看到sys.stdout.write没有输出换行符,所以print输出的hello紧跟在后面
hello                  #因为print输出hello后加了换行符,所以hello显示在了下一行

几个例子

实例：带百分比的进度条

import sys,time

for i in range(31):
    sys.stdout.write('
')  #每一次都会清空本行
    sys.stdout.write("%s%% | %s" % (int(i/30*100) , int(i/30*100)*'*'))
    sys.stdout.flush()      #强制刷新到屏幕
    time.sleep(0.3)

二、OS

用于提供系统级别的操作：

os.getcwd()                 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")         改变当前脚本工作目录；相当于shell下cd
os.curdir                   返回当前目录: ('.')
os.pardir                   获取当前目录的父目录字符串名：('..')
os.makedirs('dir1/dir2')    可生成多层递归目录
os.removedirs('dirname1')   若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')         生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')         删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')       列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove()                 删除一个文件
os.rename("oldname","new")  重命名文件/目录
os.stat('path/filename')    获取文件/目录信息
os.sep                      操作系统特定的路径分隔符，win下为"\",Linux下为"/"
os.linesep                  当前平台使用的行终止符，win下为"
",Linux下为"
"
os.pathsep                  用于分割文件路径的字符串
os.name                     字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")   运行shell命令，直接显示
os.environ                  获取系统环境变量
os.path.abspath(path)       返回path规范化的绝对路径
os.path.split(path)         将path分割成目录和文件名二元组返回
os.path.dirname(path)       返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)      返回path最后的文件名。如何path以／或结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)        如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)         如果path是绝对路径，返回True
os.path.isfile(path)        如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)         如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)      返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)      返回path所指向的文件或者目录的最后修改时间

#stat
print(os.stat('joel'))
info = os.stat('joel')    #产看文件大小
print(info.st_size)
print(info.st_atime)     #最后一次访问的时间,显示的是时间戳
print(info.st_mtime)    #修改文件的时间

#system -- 输出shell命令,就不用调用控制台输入了
#print(os.system("dir"))

#路径拼接，很重要记得用
os.path.join([a,b])

几个实例

三、hashlib

用于加密相关的操作，代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

hash = hashlib.md5()                                        
hash.update(bytes('123',encoding='utf-8'))
print(hash.hexdigest())

结果：
202cb962ac59075b964b07152d234b70

以上加密算法虽然依然非常厉害，但时候存在缺陷，即：通过撞库可以反解。所以，有必要对加密算法中添加自定义key再来做加密，俗称加盐。　　

import hashlib   
                                
hash = hashlib.md5(bytes('joel-love-ellie',encoding='utf-8'))
hash.update(bytes('123',encoding='utf-8'))
print(hash.hexdigest())

结果：
178172ac856c2dae457bdb731229d01c

其它的加密算法

import hashlib
######## sha1 ########
 
hash = hashlib.sha1()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
# ######## sha256 ########
 
hash = hashlib.sha256()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
 
# ######## sha384 ########
 
hash = hashlib.sha384()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
 
# ######## sha512 ########
 
hash = hashlib.sha512()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())

四、random

import random
 
print(random.random())        　         生成一个随机数 0-1之间的小数
print(random.randint(1, 2))   　　　      左边什么时候都包括,包括右边的值（2）
print(random.randrange(1, 10))   　　　   不包括右边的值（10）
print(random.choice('hello',))           从提供的字符串中随机取一个
print(random.choice(['ciri','ellie']))   随机取列表中的一个值

随机验证码

#方法一：
import random

def captcha_code():
    code = ''
    for i in range(6):
        add = random.choice([random.randrange(10),chr(random.randrange(65,91))])
　　　　 #从随机的数字和字母中随机选一个
        code = code + str(add)
    print(code)

captcha_code()


#方法二：
import random
checkcode = ''
for i in range(4):
    current = random.randrange(0,4)　　　　#当随机数和i的值相同时输出字母，不同时输出数字
    if current != i:
        temp = chr(random.randint(65,90))
    else:
        temp = random.randint(0,9)
    checkcode += str(temp)
print checkcode

五、time模块

时间相关的操作，时间有三种表示方式：

时间戳 1970年1月1日之后的秒，即：time.time()
格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d')
结构化时间元组包含了：年、日、星期等... time.struct_time 即：time.localtime()

print time.time()
print time.mktime(time.localtime())
   
print time.gmtime()    #可加时间戳参数
print time.localtime() #可加时间戳参数
print time.strptime('2014-11-11', '%Y-%m-%d')
   
print time.strftime('%Y-%m-%d') #默认当前时间
print time.strftime('%Y-%m-%d',time.localtime()) #默认当前时间
print time.asctime()
print time.asctime(time.localtime())
print time.ctime(time.time())
   
import datetime
'''
datetime.date：表示日期的类。常用的属性有year, month, day
datetime.time：表示时间的类。常用的属性有hour, minute, second, microsecond
datetime.datetime：表示日期时间
datetime.timedelta：表示时间间隔，即两个时间点之间的长度
timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]]]]])
strftime("%Y-%m-%d")
'''
import datetime
print datetime.datetime.now()
print datetime.datetime.now() - datetime.timedelta(days=5)

    %Y  Year with century as a decimal number.
    %m  Month as a decimal number [01,12].
    %d  Day of the month as a decimal number [01,31].
    %H  Hour (24-hour clock) as a decimal number [00,23].
    %M  Minute as a decimal number [00,59].
    %S  Second as a decimal number [00,61].
    %z  Time zone offset from UTC.
    %a  Locale's abbreviated weekday name.
    %A  Locale's full weekday name.
    %b  Locale's abbreviated month name.
    %B  Locale's full month name.
    %c  Locale's appropriate date and time representation.
    %I  Hour (12-hour clock) as a decimal number [01,12].
    %p  Locale's equivalent of either AM or PM.

格式化占位符

六、RE模块

什么是正则表达式（Regular Expression，简称RE）？

正则表达式 -- 本身就是一门语言(所以它也有自己的语法)，比较短小，在Python中，通过RE模块调用。

1、基础知识

元字符： . ^ $ * + ? {} [] | ()

	" . " -- 通配符,一个"."只能匹配一个字符
	" ^ " -- 尖角符,开头匹配,控制开头(在字符集里还有个特殊的意义)
	" $ " -- dollar符,末尾匹配,控制结尾
	" * " -- 重复0到多次
	" + " -- 重复1到多次
	" ? " -- 重复0到1次
	"{} " -- 想重复几次重复几次	
	"[] " -- 字符集,会取消元字符的特殊功能
	" | " -- 管道符,或
	"() " -- 做分组用的
	"  " -- 反斜杠后面跟元字符,去除特殊功能
	　　　　 　反斜杠后面跟普通字符实现特殊功能(只是一部分,并不是所有的)				
	　　　　　 引用序号对应的字组所匹配的字符串

反斜杠 " "后面加普通字符实现的特殊功能

***大写的字母都表示 非 的意思***		
d -- 匹配任何十进制数:它相当于类[0-9]
D -- 匹配任何非数字字符:它相当于类[^0-9]
s -- 匹配任何空白字符:它相当于类[ 	

fv]　　#有空格，最前面是个空格
S -- 匹配任何非空白字符:它相当于类[^ 	

fv] #有空格，尖角号后面是个空格
w -- 匹配任何字母数字字符:它相当于类[a-zA-Z0-9_]
W -- 匹配任何非字母数字字符:它相当于类[^a-zA-Z0-9_]
 -- 匹配一个单词边界（特殊边界）,也就是指单词和空格间的位置

2、re模块的内置功能

match　

# match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None
 
 
 match(pattern, string, flags=0)
 # pattern： 正则模型
 # string ： 要匹配的字符串
 # falgs  ： 匹配模式
     X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
     I  IGNORECASE  Perform case-insensitive matching.忽略大小写
     M  MULTILINE   "^" matches the beginning of lines (after a newline)
                    as well as the string.
                    "$" matches the end of lines (before a newline) as well
                    as the end of the string.
     S  DOTALL      "." matches any character at all, including the newline.
 
     A  ASCII       For string patterns, make w, W, , B, d, D
                    match the corresponding ASCII character categories
                    (rather than the whole Unicode categories, which is the
                    default).
                    For bytes patterns, this flag is the only available
                    behaviour and needn't be specified.
      
     L  LOCALE      Make w, W, , B, dependent on the current locale.
     U  UNICODE     For compatibility only. Ignored for string patterns (it
                    is the default), and forbidden for bytes patterns.

为何要有分组？提取匹配成功的内容的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）

origin = "ciri prime deborah ellie joel"

# 无分组
r = re.match("cw+", origin)
print(r.group())        # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组结果
结果：
ciri
()
{}

# 有分组
r = re.match("(c)(w+)", origin)
print(r.group())         # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组中所有执行了key的组
结果：
ciri            #不管对于分组还是不分组,group没有区别,因为都是拿所有的内容
('c', 'iri')    #有几个括号放进几个到元组中
{}

r = re.match("(?P<n1>c)(w+)" , origin)    #?P<n1>给组起一个名
print(r.groupdict())
结果：
{'n1': 'c'}

search

# search,浏览整个字符串去匹配第一个，未匹配成功返回None
# search(pattern, string, flags=0)

有分组&无分组　　

origin = "ciri prime deborah ellie joel"

# 无分组
r = re.search("cw+", origin)
print(r.group())        # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组结果
结果：
ciri
()
{}

# 有分组
r = re.search("(c)(w+)", origin)
print(r.group())         # 获取匹配到的所有结果
print(r.groups())       # 获取模型中匹配到的分组结果
print(r.groupdict())   # 获取模型中匹配到的分组中所有执行了key的组
结果：
ciri           
('c', 'iri')    
{}

r = re.search("(?P<n1>c)(w+)" , origin)    #?P<n1>给组起一个名
print(r.groupdict())
结果：
{'n1': 'c'}

findall

# findall，获取非重复的匹配列表
#如果有一个组则以列表形式返回，且每一个匹配均是字符串
#如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖

# 空的匹配也会包含在结果中
#findall(pattern, string, flags=0)

有分组&无分组

origin = "ciri cirila prime deborah ellie joel"
#无分组
r = re.findall("cw+" , origin)
print(r)
结果：
['ciri', 'cirila']

#有分组
r = re.findall("(cw+)" , origin)   #和上面是相同的,和不加括号一样
print(r)
r = re.findall("c(w+)" , origin)   #这样findall里面就只有组里的东西
print(r)
r = re.findall("(c)(w+)" , origin)
print(r)
结果：
['ciri', 'cirila']
['iri', 'irila']
[('c', 'iri'), ('c', 'irila')]

#findall就是把groups里面所有的东西都放到列表中去

findal补充：

　　findall其实就是一个一个的search,然后把groups里的结果组合起来,以列表的形式

n = re.findall("d+wd+","a2b3c4d5")
print(n)        
结果：
['2b3', '4d5']
#从输出结果可以看出,findall在匹配过程中找到了就从下一个开始继续匹配

n = re.findall('','ciri')
print(n)        
结果：
['', '', '', '', '']
#多一个,因为末尾会有一个空值来进行匹配

n = re.findall('(w)(w)(w)(w)','ciri')
print(n)
结果：
[('c', 'i', 'r', 'i')]

n = re.findall('(w){4}','ciri') 
print(n)
结果：
['i']

#匹配时还是会全部匹配，但是提取内容时,有4个分组,它不知道拿哪一个,默认就拿最后一个
#匹配是一回事,但分组(去匹配到的东西中取)就是有几个括号分几次组

n = re.findall('(w)*','ciri')   
print(n)
结果：
['i', '']
#最后面还有一个什么都不是的,就由*为0时来匹配

sub

# sub，替换匹配成功的指定位置字符串
 
sub(pattern, repl, string, count=0, flags=0)
# pattern： 正则模型
# repl   ： 要替换的字符串或可执行对象
# string ： 要匹配的字符串
# count  ： 指定匹配个数
# flags  ： 匹配模式

无分组

origin = "ciri ciri ciri prime deborah ellie joel"

r = re.sub("cw+" ,"666" , origin , 2)
print(r)
结果：
666 666 ciri prime deborah ellie joel

split

# split，根据正则匹配分割字符串
 
split(pattern, string, maxsplit=0, flags=0)
# pattern： 正则模型
# string ： 要匹配的字符串
# maxsplit：指定分割个数
# flags  ： 匹配模式

有分组&无分组

origin = "sulli ciri prime deborah ellie joel"
#无分组
r = re.split("ciri"  , origin , 1)
print(r)
结果：
['sulli ', ' prime deborah ellie joel']

#有分组
r = re.split("(ciri)"  , origin , 1)#会保留分组内的东西
print(r)
r = re.split("c(iri)"  , origin , 1)
print(r)
r = re.split("(ci(ri))"  , origin , 1)#有几个()输出几个
print(r)
结果：
['sulli ', 'ciri', ' prime deborah ellie joel']
['sulli ', 'iri', ' prime deborah ellie joel']
['sulli ', 'ciri', 'ri', ' prime deborah ellie joel']

IP：
^(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}$
手机号：
^1[3|4|5|8][0-9]d{8}$
邮箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+

常用的正则表达式

未完待续。。。　

查看全文

相关阅读:
VirtualPathUtility.IsAppRelative的bug？
textoverflow: ellipsis 在IE8、9下显示问题
 Windows Server 2012 不支持SharePoint Server 2010（KB2724471）
本地测试网址推荐
 神奇的img
json三层解析（数组解析）
json两层解析
 把json数据转化成对象
 Centos7 修改yum源为阿里源
 使用Webdriver刷博客文章评论

原文地址：https://www.cnblogs.com/houzhaohui/p/7429750.html