文件和I/O - 走看看

zoukankan html css js c++ java

文件和I/O
一、读写文本数据

　　（1）使用open()函数配合rt模式读取文本文件的内容；（ t 为默认的文本模式）

　　（2）执行写入操作，使用wt模式，如果待操作文件已存在，会清除并覆盖其原先的内容；

　　（3）对已存在文件的结尾处追加内容，使用at模式；

　　（4）只在该文件不存在时，写入操作用x模式

　　（5）查询系统默认文本编码方式：sys.getdefaultencoding()

　　（6）不用with语句管理上下文，要记得手动关闭文件。

　　（7）换行符识别：UNIX： WINDOWS： MAC：。如果newline=None，则启用通用换行符模式。读取时将换行符转换成单独的字符，输出时换行符会被转换为当前系统默认的换行符。如果不想要这种翻译行为，设置newline=‘’ 即可。

　　（8）参数 errors是一个可选字符串，指定如何处理编码和解码错误 - 这不能在二进制模式下使用。‘ignore’，‘replace’等

二、输出重定向到文件中
with open('./hello.txt','wt',encoding='utf8') as f: print('Hello World!',file=f)
三、以不同分隔符或行结尾符完成打印

　　使用end参数是在输出中进制打印出换行符的方式。

　　>>> print('ACME', 50, 99, sep=',')

　　'ACME',50,99

　　>>> print('ACME', 50, 99, sep=',', end='!! ')

　　'ACME',50,99!!

　　>>> print(*row, sep=',')

　　

四、读写二进制数据

　　使用open()函数的rb或者wb模式就可以实现对二进制数据的读或写。

　　当在做索引和迭代操作时，字节串会返回代表该字节的整数值而不是字符串。

　　（1）关于二进制I/O，像数组和C结构体这样的对象可以直接用来进行写操作，而不用先转换为byte对象。

　　适用于任何实现了缓冲区接口的对象。
import array nums = array.array('i',[1,2,3,4,5,6]) with open('./data.bin','wb') as f: f.write(nums)
　　（2）直接将二进制数据读入到他们底层的内存中，只要使用文件对象的readinto()方法即可。
nums = array.array('i',[0,0,0,0,0,0,0,0,0,0]) with open('./data.bin','rb',) as f: f.readinto(nums) #　array('i', [1, 2, 3, 4, 5, 6, 0, 0, 0, 0])
　　readinto()是为已存在的缓冲区填充内容，而不是分配新的对象然后再将他们返回。
def read_into_buffer(filename): buf = bytearray(os.path.getsize(filename)) with open(filename, 'rb') as f: f.readinto(buf)
六、在字符串上执行I/O操作

　　需要模拟出一个普通文件时，使用StringIO和BytesIO类是最为适用的。
>>> import io >>> s = io.StringIO() >>> s.write('Hello World!!! ') >>> print('This is a test',file=s) >>> s.getvalue() Hello World!!! This is a test >>> s = io.StringIO('Hexxxx HHHH') >>> s.read(4) Hexx >>> s = io.BytesIO() >>> s.write(b'Hello World') >>> s.getvalue() b'Hello World'
七、读写压缩的数据文件

　　gzip和bz2模块用于处理压缩文件。默认的模式是二进制。压缩级别compressleve关键字指定，默认级别9，最高压缩等级。
import gzip,bz2 with gzip.open('./somefile.gz','rt') as f: text = f.read() with bz2.open('./somefile.bz2','rt') as f: text = f.read() with gzip.open('./somefile.gz','wt') as f: f.write(text ) with bz2.open('./somefile.bz2','wt') as f: f.write(text )
　　支持以二进制模式打开的文件进行叠加操作。
import gzip f = open('somefile.gz', 'rt') with gz.open(f, 'rt') as g: text = g.read()
八、按固定大小进行迭代

　　对固定大小的记录或数据块进行迭代
from functools import partial RECORD_SIZE = 32 with open('somefile.txt', 'rt') as f: records = iter(partial(f.read, RECORD_SIZE), b'') for r in records: ...
九、对二进制文件做内存映射

　　（1）先准备一个二进制文件
size = 100000 with open('data', 'wb') as f: f.seek(size-1) f.write(b'x00')
　　（2）映射函数
import os import mmap def memory_map(filename, access=mmap.ACCESS_WRITE): size = os.path.getsize(filename) fd = os.open(filename, os.O_RDWR) return mmap.mmap(fd, size, access=access)
　　（3）读写操作
>>> m = memory_map('data') >>> len(m) 100000 >>> m[0:10] b'x00x00x00x00x00x00x00x00x00x00' >>> m[0] 0 >>> m[0:11] = b'Heool World!' >>> m.close()
　　（4）mmap()返回的mmap对象也可以当做上下文管理器使用。
with memory_map('data') as m: print(len(m)) print(m[0:10])
　　（5）只读访问：mmap.ACCESS_READ；只在本地修改数据，不改写回原始文件：mmap.ACCESS_COPY

　　（6）对某个文件进行内存映射并不会导致将整个文件读到内存中。也就是说，文件并不会拷贝到某种内存缓冲区或数组上。

十一、处理路径名

　　找出基文件名、目录名、绝对路径等相关的信息。

　　（1）获取最后部分路径：

　　>>> os.path.basename(path)　　# path = /User/firefly/Data => 'Data'

　　（2）获取目录名字：

　　>>> os.path.dirname(path)

　　（3）组合路径：

　　>>> os.path.join('tmp', 'data')

　　（4）在Unix和Windows上，将参数中原始的~或~user部分用user主目录替换。

　　>>> os.path.expanduser(path)　　# path = '~/Data/firefly/Data/data.csv' => '/Users/beazley/...'

　　（5）分割文件扩展名：

　　>>> os.path.splitext(path)　　# ('~/Data/data', '.csv')

十二、检测文件是否存在

　　检测某个文件或者目录是否存在。

　　>>> os.path.exists('/etc/passwd')

　　isfile()、isdir()、islink()、realpath()、

　　检测文件大小或修改日期：

　　getsize()、getmtime()、

十三、获取目录内容列表

　　使用os.listdir()函数来获取目录中的文件列表。
import os names = [ name for name in os.listdir('somedir') if os.path.isfile(os.path.join('somedir',name))] names = [ name for name in os.listdir('somedir') if os.path.isdir(os.path.join('somedir',name))] names = [ name for name in os.listdir('somedir') if name.endswith('.py')] import glob pyfiles = glob.glob('somedir/*.py') from fnmatch import fnmatch pyfiles = [name for name in os.listdir('somedir') if fnmatch(name, '*.py')]
　　

十四、绕过文件名编码

　　>>> os.listdir(b'.')　　byte形式返回文件名

　　>>> with open(b'jalapenxccx83o.txt') as f

十六、为已经打开的文件添加或修改编码方式

　　（1）为已经打开的二进制对象添加编码解码
import urllib.request import io u = urllib.request.urlopen('http://www.baidu.com') f = io.TextIOWrapper(u, encoding='utf8') text = f.read()
　　（2）修改sys.stdout编码
>>> import sys,io >>> sys.stdout.encoding 'UTF-8' >>> sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='latin-1') >>> sys.stdout.encoding 'latin-1'
十七、将字节数据写入文本文件

　　（1）将原始字节写入到以文本模式打开 的文件中，只需将字节数据写入到文件底层的buffer中即可。

　　>>> sys.stdout.buffer.write(b'Hello ')

十九、创建临时文件和目录

　　（1）创建临时文件
from tempfile import TemporaryFile with TemporaryFile('w+t') as f: f.write('Hello World ') f.write('Testing ') f.seek(0) data = f.read()
　　（2）保存创建的临时文件
from tempfile import NamedTemporaryFile with NamedTemporaryFile('w+t',delete=False) as f: print('filename is:', f.name)
　　（3）创建临时目录
from tempfile import TemporaryDirectory with TemporaryDirectory() as dirname: print('dirname is:', dirname)
二十一、序列化Python对象

　　（1）对象 =》文件

　　>>> pickle.dump(object, f)

　　（2）文件 =》对象

　　>>> obj = pickle.load(data)

　　（3）对象 =》字符串

　　>>> s = pickle.dumps(object)

　　（4）字符串 =》对象

　　>>> obj = pickle.loads(s)

　　某些特定类型的对象是无法进行pickle操作的。一般来说都涉及某种外部系统状态，如打开的文件、打开的网络连接、线程、进程、栈帧等

　　但是可以通过提供__getstate__()和__setstate__()方法来规避这些限制。
import time import threading class Countdown: def __init__(self,n): self.n = n self.thr = threading.Thread(target=self.run) self.thr.daemon = True self.thr.start() def run(self): while self.n >0: print('T-minus', self.n) self.n -= 1 time.sleep(5) def __getstate__(self): return self.n def __setstate(self,n): self.__init__(n)
　　试验：
>>> import countdown >>> c = countdown.Countdown(30) >>> T-minus 30 ... ... ... >>> f = open('cstate.p', 'wb') >>> import pickle >>> pickle.dump(c, f) >>> f.close()
　　退出Python，重新加载文件
>>> f = open('cstate.p', 'rb') >>> pickle.load(f) <countdown.Countdown object at 0x10069e2d0> T-minus 19 T-minus 18 ...
查看全文

相关阅读:
【Luogu】P3381最小费用最大流模板（SPFA找增广路）
【Luogu】P1393动态逆序对（树套树）
【Luogu】P2617Dynamic Ranking（树状数组套主席树）
【Luogu】P2953牛的数字游戏（博弈论）
【Luogu】P2530化工厂装箱员（DP）
【Luogu】P3856公共子串（DP）
【Luogu】P3847调整队形（DP）
【Luogu】P3567Kur-Couriers（主席树）
【Luogu】P3758可乐（矩阵优化DP）
【Luogu】P1131时态同步（树形DP）

原文地址：https://www.cnblogs.com/5poi/p/11512955.html