pandas 遍历有以下三种访法。
- iterrows():在单独的变量中返回索引和行项目,但显着较慢
- itertuples():快于.iterrows(),但将索引与行项目一起返回,ir [0]是索引
- zip:最快,但不能访问该行的索引
df= pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})

0.for i in df:并不是遍历行的方式
for i in df: print(i)

正式因为for in df不是直接遍历行的方式所以我们研究了如下方法。
1.iterrows():在单独的变量中返回索引和行项目,但显着较慢
df.iterrows()其实返回也是一个tuple=>(索引,Series)
count=0 for i,r in df.iterrows(): print(i,'-->',r,type(r)) count+=1 if count>5: break

2.itertuples():快于.iterrows(),但将索引与行项目一起返回,ir [0]是索引
count=0 for tup in df.itertuples(): print(tup[0],'-->',tup[1::],type(tup[1:])) count+=1 if count>5: break

3.zip:最快,但不能访问该行的索引
count=0 for tup in zip(df['a'], df['b']): print(tup,type(tup[1:])) count+=1 if count>5: break

4.性能比较
df = pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})
import time
list1 = []
start = time.time()
for i,r in df.iterrows():
list1.append((r['a'], r['b']))
print("iterrows耗时 :",time.time()-start)
list1 = []
start = time.time()
for ir in df.itertuples():
list1.append((ir[1], ir[2]))
print("itertuples耗时:",time.time()-start)
list1 = []
start = time.time()
for r in zip(df['a'], df['b']):
list1.append((r[0], r[1]))
print("zip耗时 :",time.time()-start)
