pandas 时间对象处理¶
时间序列类型
时间戳:特定时刻
固定时期:如2017年7月
时间间隔:起始时间-结束时间
Python标准库处理时间对象:datetime
灵活处理时间对象:dateutil
dateutil.parser.parse()
成组处理时间对象:pandas
pd.to_datetime()
In [11]:
import datetime
import pandas as pd
import numpy as np
In [2]:
datetime.datetime.strptime('2010-01-01','%Y-%m-%d')
Out[2]:
In [3]:
datetime.datetime.strptime('2010/01/01','%Y/%m/%d')
Out[3]:
In [4]:
import dateutil
In [6]:
dateutil.parser.parse('03/08/2020 14:35')
Out[6]:
In [7]:
dateutil.parser.parse('2020-Mar-8')
Out[7]:
In [13]:
pd.to_datetime(['2001-01-01','2020/Mar/08'])
Out[13]:
pandas-时间对象处理
产生时间对象数组 | pd.date_range |
---|---|
start | 开始时间 |
end | 结束时间 |
periods | 时间长度 |
freq | 时间频率,默认为D,可选Hour,Week,Business,Sem,Month,(min)T(es),S(econd),A(year) |
In [15]:
pd.date_range('2019/7/23','2021/7/23')
Out[15]:
In [16]:
pd.date_range('2019-7-23',periods=720)
Out[16]:
In [17]:
pd.date_range('2019/7/23',periods=30,freq='M')
Out[17]:
In [18]:
pd.date_range('2019-7-23',periods=30,freq='W-MON')
Out[18]:
B business day frequency
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA business year end frequency
AS year start frequency
BAS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds
In [23]:
pd.date_range('2019-7-23',periods=60,freq='B') #B Business Day
Out[23]:
In [24]:
dt = _
dt[0]
Out[24]:
In [27]:
dt[0].to_pydatetime()
Out[27]:
pandas 时间序列¶
时间序列就是以时间对象为索引的 Series 或 Dataframe。
datetime对象作为索引时是存储在 DatetimeIndex对象中的。
时间序列特殊功能
传入“年”或“年月”作为切片方式
传入日期范围作为切片方式
丰富的函数支持:resample, truncate,
In [29]:
sr = pd.Series(np.arange(100),index=pd.date_range('2020-3-8',periods=100))
In [30]:
sr
Out[30]:
In [31]:
sr.index
Out[31]:
In [32]:
sr['2020-3']
Out[32]:
In [33]:
sr['2020-3':'2020-4']
Out[33]:
In [34]:
sr.resample('W').sum()
Out[34]:
In [36]:
sr.resample('M').sum()
Out[36]:
In [37]:
sr.resample('M').mean()
Out[37]:
In [38]:
sr.truncate(before='2020-4-1')
Out[38]:
pandas 文件处理¶
数据文件常用格式:csv(以某间隔符分割数据)
pandas读取文件:从文件名、URL、文件对象中加载数据
read_csv 默认分隔符为逗号
read_table 默认分隔符为制表符
read_csv、read_table | 函数主要参数: |
---|---|
sep | 指定分隔符,可用正则表达式如's+' |
header=None | 指定文件无列名 |
name | 指定列名 |
index_col | 指定某列作为索引 |
skip_row | 指定跳过某些行 |
na_values | 指定某些字符串表示缺失值 |
parse_dates | 指定某些列是否被解析为日期,类型为布尔值或列表 |
In [39]:
pd.read_csv('600519.csv')
Out[39]:
In [40]:
pd.read_csv('600519.csv',index_col=0)
Out[40]:
In [41]:
pd.read_csv('600519.csv',index_col='date')
Out[41]:
In [42]:
df = pd.read_csv('600519.csv',index_col='date')
In [43]:
df.index[0]
Out[43]:
In [44]:
df.index
Out[44]:
In [46]:
pd.read_csv('600519.csv',index_col='date',parse_dates=True).index
Out[46]:
In [52]:
pd.read_csv('600519.csv',index_col='date',parse_dates=['date']).index
Out[52]:
In [55]:
pd.read_csv('600519.csv',header=None,names=list('abcdefgh'))
Out[55]:
In [56]:
pd.read_csv('600519.csv',header=None,skiprows=[1,2,3])
Out[56]:
In [58]:
pd.read_csv('600519.csv',header=None,skiprows=[1,2,3],na_values=['None'])
Out[58]: