【pandas】描述统计&简单作图

方法	说明
count	非NaN值的数量
describe	描述统计
min, max	最值
idxmin, idxmax	最值所在位置(loc)
quantile	分位数
sum
mean
median	中位数
mad	平均绝对离差
var
std
skew	偏度(三阶矩)
kurt	峰度(四阶距)
cumsum
cumprod
cummin, cummax	累计最值
diff
pct_change	变化率（与上一列相比）
corr	相关系数矩阵df.corr(),se1.corr(se2),df.corr(se1)

描述统计

df.describe()
df.describe(include='all') # 对分类变量计算unique个数等，对字符串也有处理。

返回的是DataFrame格式的描述性统计数据

data.info() # DataFrame的简要情况
df.shape

max, min, min, std, sum
能返回每一列统计量

import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.rand(16).reshape(-1,4),columns=list('wxyz'))
df.idxmax()

add, sub,mul,div,mod
可以通过axis,level,fill_value等参数控制其运算行为。

数据准备

import pandas as pd
import numpy as np
df=pd.DataFrame(np.arange(16).reshape(-1,2),columns=list('wx'))
df

传入区间：按照分位点切分

from scipy.stats import norm
df=pd.DataFrame(norm().rvs(size=(100)),columns=list('w'))
pd.qcut(df.w,[-1,0.1,0.5,0.8,0.9,1]).value_counts()

df.count()

返回每列的非NaN的个数

value_counts只能针对Series

df.loc[:,'x'].value_counts()