DA
来自Jack's Lab
(版本间的差异)
(→描述性统计) |
(→描述性统计) |
||
| 第26行: | 第26行: | ||
Press 3685.248525 | Press 3685.248525 | ||
| − | >>> stats.trimboth(p['Press'],0.1).mean( | + | >>> stats.trim_mean(p, 0.1) # stats.trimboth(p['Press'],0.1).mean() |
| − | + | ||
| − | + | ||
array([3680.07826531]) | array([3680.07826531]) | ||
2020年2月17日 (一) 11:18的版本
目录 |
1 Overview
2 描述性统计
import pandas as pd
from scipy import stats
>>> p = pd.read_csv('../DA/data/da01-press.csv', index_col='time', date_parser=lambda x: pd.to_datetime(float(x)))
>>> p = p.drop(columns=['name'])
>>> p.mean()
Press 3685.248525
>>> stats.trim_mean(p, 0.1) # stats.trimboth(p['Press'],0.1).mean()
array([3680.07826531])
>>> p.median()
Press 3677.105
>>> p.describe()
Press
count 122.000000
mean 3685.248525
std 123.990939
min 3484.480000
25% 3618.402500
50% 3677.105000
75% 3747.742500
max 4672.060000
3 探索数据分布
3.1 bar
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
hb = pd.read_csv("../DA/data/ncp-hb-new.csv", index_col='Date', parse_dates=True, skipinitialspace=True)
cn = pd.read_csv("../DA/data/ncp-cn-new.csv", index_col='Date', parse_dates=True, skipinitialspace=True)
xhb = cn-hb
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%m-%d'))
#plt.bar(hb.index, hb['Confirmed'].values)
plt.bar(xhb.index, xhb['Confirmed'].values)
plt.show()
同时显示湖北和非湖北柱状图:
plt.bar(xhb.index, xhb_cf, align='edge', width=0.3, label='Outside Hubei') plt.bar(hb.index, hb['Confirmed'].values, align='edge', width=-0.4, label='Hubei') plt.legend() plt.gcf().autofmt_xdate() plt.show()
4 时序数据分析
>>> x = pd.date_range('2020-1-9','2020-2-15',freq='1d')
>>> print(x)
DatetimeIndex(['2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
'2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
'2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20',
'2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24',
'2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28',
'2020-01-29', '2020-01-30', '2020-01-31', '2020-02-01',
'2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05',
'2020-02-06', '2020-02-07', '2020-02-08', '2020-02-09',
'2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13',
'2020-02-14', '2020-02-15'],
dtype='datetime64[ns]', freq='D')
5 Reference
- Numpy API reference
- Pandas API reference
- matplotlib Gallery
- Change the Colors Changes to the default style
- matplotlib.pyplot.plot()
- matplotlib.pyplot.figure()
- Time Series Analysis Example
- Introduction to Data Science
- Data Visualization tutorial
- FlowingData Tutorials