IC的计算公式
IC = Corr(Rank(Factor), Rank(Return))
其中,Rank(Factor)是对因子值进行排序,Rank(Return)是对股票收益进行排序,Corr表示相关系数。
IC的取值范围为-1到1,其中1表示完全相关,-1表示完全负相关,0表示无相关性。通常情况下,较高的IC值表示因子能够较好地解释股票收益的变动。
IR(Information Ratio)是衡量因子的信息比率。IR计算公式如下:
IR = mean(每周期IC序列) / std(每周期IC序列)
注意,每个调仓周期都能算出一个IC值,最终的IC是所有IC的均值,表达的是整个市场所有股票和因子值的关系,重在相关性
SR(夏普率)= (收益率 - 无风险收益率) / 收益率标准差
注意:因子绩效中的夏普是分组后,多头组或者空头组的夏普,只表示某组的表现,不能代码整个市场股票的表现,而且重在绝对收益
所以IC/IR好,夏普不好,很可能是由于因子收益集中在空头端,例如因子值越小,收益越差,但是因子值大的表现一般
import numpy as np
import pandas as pd
from scipy.stats import rankdata
# 假设有一个因子数据和对应的收益数据
factor_data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
return_data = pd.Series([-1, -0.9, -0.8,-0.7,-0.6,-0.5,-0.4,-0.3,-0.2,-0.1])
# 将因子数据和收益数据进行排名
ranked_factor = rankdata(factor_data)
ranked_return = rankdata(return_data)
# 计算IC
IC = np.corrcoef(ranked_factor, ranked_return)[0, 1]
print("IC: ", IC)
# 每次都选因子值最大的组收益都是-0.1,夏普率必然不好
IC: 0.9999999999999999
import pandas as pd
import numpy as np
import statsmodels.api as sm
# 三个随机数列
data1 = np.random.rand(100)
data2 = np.random.rand(100)
data3 = np.random.rand(100)
print("data2和data3相关性:",np.corrcoef(data2, data3)[0, 1])
data2和data3相关性: -0.07471176558245518
# 构建因子,存在共线性
factor1 = data1 + data2
factor2 = data1 + data3
print("factor1和factor2相关性:",np.corrcoef(factor1, factor2)[0, 1])
factor1和factor2相关性: 0.3972515837839092
# 线性回归 残差为新的因子值
X = sm.add_constant(factor1.copy())
y = factor2.copy()
model = sm.OLS(y, X).fit()
factor2_new = model.resid
print(np.corrcoef(factor1, factor2_new)[0, 1])
2.6236760527960756e-17
#有第三个因子,如何处理?
# 构造第三个因子
factor3 = data1 + data2 + data3
print("factor3和factor1相关性:",np.corrcoef(factor1, factor3)[0, 1])
print("factor3和factor2相关性:",np.corrcoef(factor2, factor3)[0, 1])
factor3和factor1相关性: 0.8364920050174843 factor3和factor2相关性: 0.8266069776918507
train_df = pd.DataFrame({"factor1":factor1, "factor2_new":factor2_new})
X = sm.add_constant(train_df)
y = factor3.copy()
model = sm.OLS(y, X).fit()
factor3_new = model.resid
print("中性化后factor3和factor1相关性:",np.corrcoef(factor1, factor3_new)[0, 1])
print("中性化后factor3和factor2相关性:",np.corrcoef(factor2, factor3_new)[0, 1])
中性化后factor3和factor1相关性: -3.9221239628054596e-16 中性化后factor3和factor2相关性: 1.0268127416886495e-16