查看DA的源代码
←
DA
跳转到:
导航
,
搜索
因为以下原因,你没有权限编辑本页:
您刚才请求的操作只有这个用户组中的用户才能使用:
用户
您可以查看并复制此页面的源代码:
== 描述性统计 == === 位置估计 === 直观的: <source lang=python> import numpy as np import matplotlib.pyplot as plt from scipy import stats d = np.array([1, 2, 2, 100, 3, 3, 6, 8]) np.mean(d); stats.trim_mean(d, 0.2); np.median(d) 15.625 4.0 3.0 >>> plt.plot(d, 'o'); plt.show() </source> 实际的: <source lang=python> import pandas as pd from scipy import stats p = pd.read_csv('../DA/data/da01-press.csv', index_col='time', date_parser=lambda x: pd.to_datetime(float(x)+28800000000000)) p = p.drop(columns=['name']) p.mean() Press 3685.248525 stats.trim_mean(p, 0.1) # stats.trimboth(p['Press'],0.1).mean() array([3680.07826531]) p.median() Press 3677.105 p.describe() Press count 122.000000 mean 3685.248525 std 123.990939 min 3484.480000 25% 3618.402500 50% 3677.105000 75% 3747.742500 max 4672.060000 </source> * [https://numpy.org/doc/1.18/reference/routines.statistics.html NumPy Statistics] * [http://docs.scipy.org/doc/scipy/reference/stats.html SciPy Statistical functions] * [https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html?highlight=statistics#descriptive-statistics Pandas Statistics] <br> === 变异性估计 === <source lang=python> >>> d = np.array([3, 1, 5, 3, 15, 6, 7, 2]) >>> meanl = np.array([np.mean(d)]*len(d)); trimmeanl = np.array([stats.trim_mean(d, 0.2)]*len(d)); medianl = np.array([np.median(d)]*len(d)) >>> iqrv = np.array([stats.iqr(d)]*len(d)) >>> down = medianl -iqrv; up = medianl+iqrv >>> plt.plot(d,'o',color='C1'); plt.plot(meanl, ':C2', label='Mean'); plt.plot(trimmeanl, ':r', label='Trim mean'); plt.plot(medianl, '-g', label='Meidan') >>> plt.plot(up, '-C1'); plt.plot(down, '-C1') >>> plt.legend(); plt.grid(); plt.show() </source> <br> === 相关性估计 === <source lang=python> >>> t1 = pd.read_csv('../DA/data/da02-temp-0948.csv', index_col='time', date_parser=lambda x: pd.to_datetime(float(x)+28800000000000)) >>> t2 = pd.read_csv('../DA/data/da02-temp-0019.csv', index_col='time', date_parser=lambda x: pd.to_datetime(float(x)+28800000000000)) >>> plt.plot(t1.index, t1['Temp'], label='t1') >>> plt.plot(t2.index, t2['Temp'], label='t2') >>> plt.plot(t1['Temp'].index,t3, label='t3') >>> plt.legend(); plt.show() </source> <br> ==== Pearson ==== <source lang=python> >>> t1 = np.array([1,2,3,4,3,2,1]) >>> t2 = np.array([2,4,6,8,6,4,2]) >>> t3 = np.random.normal(4, 1, 7) >>> stats.pearsonr(t1, t2) (0.9999999999999998, 1.411088991461081e-39) >>> stats.pearsonr(t2, t3) (0.13788121813127208, 0.7681442360425068) >>> stats.pearsonr(t1, t3) (0.13788121813127208, 0.7681442360425068) >>> t4 = np.array([1,2,3,4,3,2,1]) >>> stats.pearsonr(t1, t4) (0.9999999999999998, 1.411088991461081e-39) </source> stats.pearsonr() 返回两个值,一个为皮尔逊相关系数 (Pearson's correlation),另一个为 p-value(表示相关系数不能表示其相关性的概率,即:失效的概率) [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html scipy.stats.pearsonr()] p-value: Two-tailed p-value <br> ==== Spearman ==== 斯皮尔曼等级相关系数 (Spearman's correlation coefficient for ranked data) <source lang=python> >>> print(stats.spearmanr([1,2,3,4,5], [5,6,7,8,7])) SpearmanrResult(correlation=0.8207826816681233, pvalue=0.08858700531354381) </source> [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html scipy.stats.spearmanr()] p-value: The two-sided p-value, null hypothesis is that two sets of data are uncorrelated <br>
返回到
DA
。
个人工具
登录
名字空间
页面
讨论
变换
查看
阅读
查看源代码
查看历史
操作
搜索
导航
首页
社区专页
新闻动态
最近更改
随机页面
帮助
工具箱
链入页面
相关更改
特殊页面