WorldQuant 101 Alpha因子构建及因子测试


(iQuant) #1
作者:bigquant
阅读时间:5分钟
本文由BigQuant宽客学院推出,难度标签:☆☆☆

导语:本文目的是介绍如何使用bigexpr表达式对WorldQuant公开的101个alpha进行因子构建,并进行因子测试。

背景介绍

根据WorldQuant发表的论文《101 Formulaic Alphas 》 ,其中公式化地给出了101个alpha因子。与传统方法不一样的是,他们根据数据挖掘的方法构建了101个alpha,据说里面80%的因子仍然还行之有效并被运用在实盘项目中。

在BigQuant策略研究平台上,可通过表达式快速进行因子构建和数据标注,再也不需要自己手动编写冗长代码。

表达式简介

因为在机器学习和深度学习中,因子是一个很重要的概念,也被称为特征,开发AI算法的关键在于特征选择。如果是简单的基础因子,比如近5日收益率:$close\_5/close\_0-1$,因子构建比较简单,但是如果想构建近5日每日收益率和成交量的相关性这个因子就比较棘手,需要编写大量的代码来计算该因子。因此,我们设计了bigexpr表达式引擎

bigexpr是BigQuant开发的表达式计算引擎,通过编写简单的表达式,就可以对数据做任何运算,而无需编写代码。

bigexpr在平台上被广泛使用,M.advanced_auto_labeler 和 M.derived_feature_extractor 都已经由bigexpr驱动,您可以用表达式就可以定义标注目标和完成后特征抽取。

正如刚刚提到的近5日每日收益率和成交量的相关性因子可以这样定义:

$$correlation(close\_0/shift(close\_0,1)-1,volume\_0,5)$$
其中,$correlation$表示求相关系数,$close\_0$表示当天收盘价,$shift(close\_0,1)$表示前一日收盘价,$volume\_0$表示当天成交量。因此,可以看出,并不需要编写大量代码计算该因子,通过表达式即可快速构建。

函数说明

表达式引擎中有不少简单函数,对其中的部分函数进行解释:

  • 可分为横截面函数和时间序列函数两大类,其中时间序列函数名多为以$ts\_$开头

  • $abs(x)$ 、$log(x)$分别表示$x$的绝对值和$x$的自然对数

  • $rank(x)$表示某股票$x$值在横截面上的升序排名序号,并将排名归一到[0,1]的闭区间

  • $delay(x,d)$表示$x$值在$d$天前的值

  • $delta(x,d)$表示$x$值的最新值减去$x$值在$d$天前的值

  • $correlation(x,y,d)$、$covariance(x,y,d)$分别表示$x$和$y$在长度为$d$的时间窗口上的皮尔逊相关系数和协方差

  • $ts\_min(x,d)$、$ts\_max(x,d)$、$ts\_argmax(x,d)$、$ts\_argmin(x,d)$、$ts\_rank(x)$、$sum(x,d)$、$stddev(x,d)$等均可以通过函数名称了解其作用

  • $group$_$mean(key, x)$,同时按$日期$和$key$做分组求平均,例如:
    group_mean(industry_sw_level1_0, pe_ttm_0) : 计算各行业的简单平均pe值

  • $ta$_$sma(x, timeperiod)$,计算$timeperiod$周期的简单移动平均值

因子说明

BigQuant平台上系统因子超过2000个,包括了基本信息因子、量价因子、估值因子、财报因子、技术指标因子等。本文简单举若干因子进行介绍。

基本信息因子

点击查看部分因子
  • list_days # 上市天数
  • list_board_0 # 上市板
  • company_found_date_0 # 公司成立天数
  • industry_sw_level1_0 # 申万一级行业类别
  • st_status_0 # ST状态
  • in_sse50_0 # 是否属于上证50指数成分
  • in_csi300_0 # 是否属于沪深300指数成分

量价因子

点击查看部分因子
  • open_0 # 当日开盘价
  • open_1 # 前一日开盘价
  • close_0 # 当日收盘价
  • high_0 # 当日最高价
  • low_0 # 当日最低价
  • volume_0 # 当日成交量
  • amount_0 # 当日成交额
  • adjust_factor_0 # 复权因子

估值因子

点击查看部分因子
  • market_cap_0 # 总市值
  • rank_market_cap_0 # 总市值排序
  • pe_ttm_0 # 市盈率(TTM)
  • rank_pe_ttm_0 # 市盈率(TTM)升序百分比排名
  • pe_lyr_0 # 市盈率(LYR)
  • pb_lf_0 # 市净率(LF)
  • ps_ttm_0 # 市销率(TTM)

财报因子

点击查看部分因子
  • fs_net_profit_0 # 归属母公司股东的净利润
  • fs_net_profit_yoy_0 # 归属母公司股东的净利润同比增长率
  • fs_net_profit_qoq_0 # 归属母公司股东的净利润环比增长率
  • fs_roe_0 # 净资产收益率
  • fs_roa_0 # 总资产收益率
  • fs_gross_profit_margin_0 # 销售毛利率
  • fs_net_profit_margin_0 # 销售净利率
  • fs_eps_0 # 每股收益
  • fs_bps_0 # 每股净资产
  • fs_cash_ratio_0 # 现金比率

数据标注

和因子构建一样,数据标注也是机器学习算法中非常重要的一部分,更详细的文档为:自定义标注

之前没有表达式的时候,数据标注主要通过fast_auto_label实现,自从有了表达式以后,数据标注主要是通过advanced_auto_label实现。数据标注的整体思想和内容主要体现在标注表达式上,可以通过 M.instruments模块获取证券代码列表,然后通过 M.advanced_auto_labeler模块实现标注表达式的编写,如下代码所示。

点击查看代码
m1 = M.instruments.v2(
    start_date='2014-01-01',
    end_date='2015-01-01',
    market='CN_STOCK_A',
    instrument_list='',
    max_count=0
)
m2 = M.advanced_auto_labeler.v2(
    instruments=m1.data,
    label_expr="""# #号开始的表示注释
# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
#   添加benchmark_前缀,可使用对应的benchmark数据
# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_

# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
shift(close, -5) / shift(open, -1)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)
""",
    start_date='',
    end_date='',
    benchmark='000300.SHA',
    drop_na_label=True,
    cast_label_int=True
)

接下来,我们对示例代码做解释:

  • label_expr为一个list,列表里四个元素决定了标注的具体操作,详细文档见:表达式引擎
  • 计算未来一段时间的相对收益作为标注的原始依据,这里可以使用bigexpr表达式,快速完成数据标注
  • 使用clip和all_quantile函数做极值处理
  • 将原始数据离散化,这里可以采取等宽离散化或者等频离散化,两者各有优劣
  • 通过where函数过滤掉一字涨停的样本数据

101 Alphas列表

点击查看完整列表
Alpha_1   	'where(mean(amount_0,20)<volume_0,((-1*ts_rank(abs(delta(close_0,7)),60))*sign(delta(close_0,7))),-1)'

Alpha_2   'rank(ts_argmax(signedpower(where(close_0/shift(close_0,1)-1<0,std(close_0/shift(close_0,1)-1<0,20),close_0),2),5))-0.5'

Alpha_3     '-1*correlation(rank(delta(log(volume_0),2)),rank(((close_0-open_0)/open_0)),6)'

Alpha_4	    '-1*correlation(rank(open_0),rank(volume_0),10)'   

Alpha_5	    '-1*ts_rank(rank(low_0),9)'

Alpha_6	    'rank((open_0-(sum(amount_0/volume_0*adjust_factor_0,10)/10)))*(-1*abs(rank((close_0-amount_0/volume_0*adjust_factor_0))))'

Alpha_7	    '-1*correlation(open_0,volume_0,10)' 

Alpha_8	    'where(mean(amount_0,20)<volume_0,((-1*ts_rank(abs(delta(close_0,7)),60))*sign(delta(close_0,7))),-1)'  

Alpha_9	    '(-1*rank(((sum(open_0,5)*sum(close_0/shift(close_0,1)-1,5))-delay((sum(open_0,5)*sum(close_0/shift(close_0,1)-1,5)),10))))'

Alpha_10	'where(0<ts_min(delta(close_0,1),5),delta(close_0,1),where(ts_max(delta(close_0,1),5)<0,delta(close_0,1),-1*delta(close_0,1)))'

Alpha_11	'rank(where(0<ts_min(delta(close_0,1),4),delta(close_0,1),where(ts_max(delta(close_0,1),4)<0,delta(close_0,1),-1*delta(close_0,1))))'

Alpha_12	'(rank(ts_max((amount_0/volume_0*adjust_factor_0-close_0),3))+rank(ts_min((amount_0/volume_0*adjust_factor_0-close_0),3)))*rank(delta(volume_0,3))'

Alpha_13	'sign(delta(volume_0,1))*(-1*delta(close_0,1))'

Alpha_14	'-1*rank(covariance(rank(close_0),rank(volume_0),5))'

Alpha_15	'(-1*rank(delta(close_0/shift(close_0,1)-1,3)))*correlation(open_0,volume_0,10)'

Alpha_16	'-1*sum(rank(correlation(rank(high_0),rank(volume_0),3)),3)'

Alpha_17	'-1*rank(covariance(rank(high_0),rank(volume_0),5))'

Alpha_18	'((-1*rank(ts_rank(close_0,10)))*rank(delta(delta(close_0,1),1)))*rank(ts_rank((volume_0/mean(amount_0,20)),5))'

Alpha_19	'-1*rank(((std(abs((close_0-open_0)),5)+(close_0-open_0))+correlation(close_0,open_0,10)))'

Alpha_20	'(-1*sign(((close_0-delay(close_0,7))+delta(close_0,7))))*(1+rank((1+sum(close_0/shift(close_0,1)-1,250))))'

Alpha_21	'((-1*rank((open_0-delay(high_0,1))))*rank((open_0-delay(close_0,1))))*rank((open_0-delay(low_0,1)))'

Alpha_22	'where(sum(close_0,8)/8+stddev(close_0,8)<sum(close_0,2)/2,-1,where(mean(close_0,2)<mean(close_0,8)-std(close_0,8),1,where((1<volume_0/mean(amount_0,20)) |(volume_0/mean(amount_0,20)==1),1,-1)))'

Alpha_23	'-1*(delta(correlation(high_0,volume_0,5),5)*rank(std(close_0,20)))'

Alpha_24	'where(sum(high_0,20)/20<high_0,-1*delta(high_2,0),0)'

Alpha_25	'where((delta(mean(close_0,100),100)/delay(close_0,100)<0.05)  |(delta(mean(close_0,100),100)/delay(close_0,100)==0.05) ,-1*(close_0-ts_min(close_0,100)),-1*delta(close_0,2))'

Alpha_26	'rank(-1*(close_0/shift(close_0,1)-1)*mean(amount_0,20)*amount_0/volume_0*adjust_factor_0*(high_0-close_0))'

Alpha_27	'-1*ts_max(correlation(ts_rank(volume_0,5),ts_rank(high_0,5),5),3)'

Alpha_28	'where(0.5<rank((sum(correlation(rank(volume_0),rank(amount_0/volume_0*adjust_factor_0),6),2)/2.0)),-1,1)'

Alpha_29	'scale(correlation(mean(amount_0,20),low_0,5)+(high_0+low_0)*0.5-close_0)'   

Alpha_30    'min(product(rank(rank(scale(log(sum(ts_min(rank(rank((-1*rank(delta((close_0-1),5))))),2),1))))),1),5)+ts_rank(delay((-1*shift(close_0,1)/close_0-1),6),5)'

Alpha_31	'((1.0-rank(((sign((close_0-delay(close_0,1)))+sign((delay(close_0,1)-delay(close_0,2)))) +sign((delay(close_0,2)-delay(close_0,3))))))*sum(volume_0,5))/sum(volume_0,20)'

Alpha_32	'(rank(rank(rank(decay_linear((-1*rank(rank(delta(close_0,10)))),10))))+rank((-1*delta(close_0,3))))+sign(scale(correlation(mean(amount_0,20),low_0,12)))'

Alpha_33	'scale(((sum(close_0,7)/7)-close_0))+20*scale(correlation(amount_0/volume_0*adjust_factor_0,delay(close_0,5),230))'

Alpha_34	'rank((-1*((1-(open_0/close_0)))))'

Alpha_35	'rank(((1-rank((std(close_0/shift(close_0,1),2)/stddev(close_0/shift(close_0,1)-1,5))))+(1-rank(delta(close_0,1)))))' 

Alpha_36	'ts_rank(volume_0,32)*(1-ts_rank(((close_0+high_0)-low_0),16))*(1-ts_rank(close_0/shift(close_0,1)-1,32))'

Alpha_37	'((((2.21*rank(correlation((close_0-open_0),delay(volume_0,1),15)))+(0.7*rank((open_0-close_0))))+(0.73*rank(ts_rank(delay((-1*close_0/shift(close_0,1)-1),6),5))))+rank(abs(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,20),6))))+(0.6*rank((((sum(close_0,200)/200)-open_0)*(close_0-open_0))))' 

Alpha_38	'rank(correlation(delay((open_0-close_0),1),close_0,200))+rank((open_0-close_0))'

Alpha_39	'(-1*rank(ts_rank(close_0,10)))*rank((close_0/open_0))'

Alpha_40	'((-1*rank((delta(close_0,7)*(1-rank(decay_linear((volume_0/mean(amount_0,20)),9))))))*(1 +rank(sum(close_0/shift(close_0,1),250))))'

Alpha_41	'((-1*rank(std(high_0,10)))*correlation(high_0,volume_0,10))'

Alpha_42	'(((high_0*low_0)**0.5)-amount_0/volume_0*adjust_factor_0)'

Alpha_43	'(rank((amount_0/volume_0*adjust_factor_0-close_0))/rank((amount_0/volume_0*adjust_factor_0+close_0)))'

Alpha_44	'(ts_rank((volume_0/mean(amount_0,20)),20)*ts_rank((-1*delta(close_0,7)),8))'

Alpha_45	'(-1*correlation(high_0,rank(volume_0),5))'

Alpha_46	'(-1*((rank((sum(delay(close_0,5),20)/20))*correlation(close_0,volume_0,2))*rank(correlation(sum(close_0,5),sum(close_0,20),2))))',

Alpha_47	'where((0.25<(((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))),-1,where(((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<0),1,((-1*1)*(close_0-delay(close_0,1)))))'

Alpha_48	'(((rank((1/close_0))*volume_0)/mean(amount_0,20))*((high_0*rank((high_0-close_0)))/(sum(high_0,5) /5)))-rank((amount_0/volume_0*adjust_factor_0-delay(amount_0/volume_0*adjust_factor_0,5)))' 

Alpha_49	'((correlation(delta(close_0,1),delta(delay(close_0,1),1),250)*delta(close_0,1))/close_0)/group_mean(industry_sw_level1_0,((correlation(delta(close_0,1),delta(delay(close_0,1),1),250)*delta(close_0,1))/close_0))/sum(((delta(close_0,1)/delay(close_0,1))**2),250)'    

Alpha_50	'where(((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<(-1*0.1)),1,(close_0-delay(close_0,1))*(-1))   '

Alpha_51	'(-1*ts_max(rank(correlation(rank(volume_0),rank(amount_0/volume_0*adjust_factor_0),5)),5))'

Alpha_52	'where((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<(-1*0.05),1,-1*(close_0-delay(close_0,1)))'

Alpha_53	'(((-1*ts_min(low_0,5))+delay(ts_min(low_0,5),5))*rank(((sum(close_0/shift(close_0,1),240)-sum(close_0/shift(close_0,1),20))/220)))*ts_rank(volume_0,5)'

Alpha_54	'(-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9))'

Alpha_55	'((-1*((low_0-close_0)*(open_0**5)))/((low_0-high_0)*(close_0** 5)))' 

Alpha_56	'-1*correlation(rank(((close_0-ts_min(low_0,12))/(ts_max(high_0,12)-ts_min(low_0,12)))),rank(volume_0),6)'

Alpha_57	'0-1*(1*(rank((sum(close_0/shift(close_0,1)-1,10)/sum(sum(close_0/shift(close_0,1)-1,2),3)))*rank(((close_0/shift(close_0,1)-1)*market_cap_0))))' 

Alpha_58	'(0-(1*((close_0-amount_0/volume_0*adjust_factor_0)/decay_linear(rank(ts_argmax(close_0,30)),2))))' 

Alpha_59	'(-1*ts_rank(decay_linear(correlation( amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),volume_0,4),8),5))'

Alpha_60	'(0-(1*((2*scale(rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0))))-scale(rank(ts_argmax(close_0,10))))))'

Alpha_61	'(rank((amount_0/volume_0*adjust_factor_0-ts_min(amount_0/volume_0*adjust_factor_0,16)))<rank(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,180),18)))'

Alpha_62	'(rank(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,20),22),10))<rank(((rank(open_0)+rank(open_0))<(rank(((high_0+low_0)/2))+rank(high_0)))))*-1'

Alpha_63	'((rank(decay_linear(delta(close_0/group_mean(industry_sw_level1_0,close_0),2),8))-rank(decay_linear(correlation(((amount_0/volume_0*adjust_factor_0*0.318108)+(open_0*(1-0.318108))),sum(mean(amount_0,180),37),14),12)))*-1)'

Alpha_64	'((rank(correlation(sum(((open_0*0.178404)+(low_0*(1-0.178404))),13),sum(mean(amount_0,20),13),17))<rank(delta(((((high_0+low_0)/2)*0.178404)+(amount_0/volume_0*adjust_factor_0*(1-0.178404))),4)))*-1)'

Alpha_65	'((rank(correlation(((open_0*0.00817205)+(amount_0/volume_0*adjust_factor_0*(1-0.00817205))),sum(mean(amount_0,60),9),6))<rank((open_0-ts_min(open_0,14))))*-1)'

Alpha_66	'((rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,4),7))+ts_rank(decay_linear(((((low_0* 0.96633)+(low_0*(1-0.96633)))-amount_0/volume_0*adjust_factor_0)/(open_0-((high_0+low_0)/2))),11),7))*-1)'

Alpha_67	'((rank((high_0-ts_min(high_0,2)))**rank(correlation( amount_0/volume_0*adjust_factor_0 /group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),mean(amount_0,20)/group_mean(industry_sw_level1_0,mean(amount_0,20)),6)))*-1)'

Alpha_68	'((ts_rank(correlation(rank(high_0),rank(mean(amount_0,15)),9),14)<rank(delta(((close_0*0.518371)+(low_0*(1-0.518371))),1.06157)))*-1)'

Alpha_69	'((rank(ts_max(delta(amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),3),5))**ts_rank(correlation(((close_0*0.490655)+(amount_0/volume_0*adjust_factor_0*(1-0.490655))),mean(amount_0,20),5),9))*-1)'

Alpha_70	'((rank(delta(amount_0/volume_0*adjust_factor_0,1))**ts_rank(correlation(  close_0/group_mean(industry_sw_level1_0,close_0),mean(amount_0,50),18),18))*-1)'

Alpha_71	'max(ts_rank(decay_linear(correlation(ts_rank(close_0,3),ts_rank(mean(amount_0,180),12),18),4),16),ts_rank(decay_linear((rank(((low_0+open_0)-(amount_0/volume_0*adjust_factor_0 +amount_0/volume_0*adjust_factor_0)))**2),16 ),4))'

Alpha_72	'(rank(decay_linear(correlation(((high_0+low_0)/2),mean(amount_0,40),9),10)) /rank(decay_linear(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,4),ts_rank(volume_0,19),7),3)))'

Alpha_73	'(max(rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,5),3)),ts_rank(decay_linear(((delta(((open_0* 0.147155)+(low_0*(1-0.147155))),2 ) /((open_0* 0.147155)+(low_0*(1-0.147155))))*-1),3),17))*-1)'      

Alpha_74	'(rank(correlation(close_0,sum(mean(amount_0,30),37),15))<rank(correlation(rank(high_0*0.0261661+amount_0/volume_0*adjust_factor_0*(1-0.0261661)),rank(volume_0),11)))*-1'

Alpha_75	'rank(correlation(amount_0/volume_0*adjust_factor_0,volume_0,4 ))<rank(correlation(rank(low_0),rank(mean(amount_0,50)),12))'

Alpha_76	'max(rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,1),12)),ts_rank(decay_linear(ts_rank(correlation( low_0/group_mean(industry_sw_level1_0,low_0),mean(amount_0,81),8 ),20),17),19))*-1'

Alpha_77	'min(rank(decay_linear(((((high_0+low_0)/2)+high_0)-(amount_0/volume_0*adjust_factor_0+high_0)),20 )),rank(decay_linear(correlation(((high_0+low_0)/2),mean(amount_0,40),3),6)))'

Alpha_78	'rank(correlation(sum(((low_0*0.352233)+(amount_0/volume_0*adjust_factor_0*(1-0.352233))),20),sum(mean(amount_0,20),20),7))**rank(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),6))'

Alpha_79	'rank(delta((close_0*0.60733+open_0*(1-0.60733))/ group_mean(industry_sw_level1_0,(close_0*0.60733+open_0*(1-0.60733))),1))<rank(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,4),ts_rank(mean(amount_0,150),9),115))'

Alpha_80	'(rank(sign(delta((open_0*0.868128+high_0*(1-0.868128))/group_mean(industry_sw_level1_0,(open_0*0.868128+high_0*(1-0.868128))),4)))**ts_rank(correlation(high_0,mean(amount_0,10),5),6))*-1'

Alpha_81	'(rank(log(product(rank((rank(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,10),50),8))**4)),15)))<rank(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),5)))*-1'

Alpha_82	'min(rank(decay_linear(delta(open_0,1.46063),15)),ts_rank(decay_linear(correlation( volume_0/group_mean(industry_sw_level1_0,volume_0),((open_0*0.634196) +(open_0*(1-0.634196))),17),7),13))*-1'

Alpha_83	'(rank(delay(((high_0-low_0)/(sum(close_0,5)/5)),2))*rank(rank(volume_0)))/(((high_0-low_0)/(sum(close_0,5)/5))/(amount_0/volume_0*adjust_factor_0-close_0))'

Alpha_84	'signedpower(ts_rank((amount_0/volume_0*adjust_factor_0-ts_max(amount_0/volume_0*adjust_factor_0,15)),20),delta(close_0,5))'

Alpha_85	'rank(correlation(((high_0*0.876703)+(close_0*(1-0.876703))),mean(amount_0,30),10))**rank(correlation(ts_rank(((high_0+low_0)/2),4),ts_rank(volume_0,10),7))'

Alpha_86	'(ts_rank(correlation(close_0,sum(mean(amount_0,20),15),6),20)<rank(((open_0+close_0)-(amount_0/volume_0*adjust_factor_0+open_0))))*-1'

Alpha_87	'max(rank(decay_linear(delta(((close_0*0.369701)+(amount_0/volume_0*adjust_factor_0*(1-0.369701))),2),3)),ts_rank(decay_linear(abs(correlation( mean(amount_0,81) /group_mean(industry_sw_level1_0,mean(amount_0,81)) ,close_0,14)),5),14))*-1'

Alpha_88	'min(rank(decay_linear(((rank(open_0)+rank(low_0))-(rank(high_0)+rank(close_0))),8)),ts_rank(decay_linear(correlation(ts_rank(close_0,8),ts_rank(mean(amount_0,60),21),8),7),3))'

Alpha_89	'ts_rank(decay_linear(correlation(((low_0*0.967285)+(low_0*(1-0.967285))),mean(amount_0,10),7),6),4)-ts_rank(decay_linear(delta( amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),3),10),15)'

Alpha_90	'(rank((close_0-ts_max(close_0,5)))**ts_rank(correlation(mean(amount_0,40)/group_mean(industry_sw_level1_0,mean(amount_0,40)),low_0,5),3))*-1'

Alpha_91	'(ts_rank(decay_linear(decay_linear(correlation(close_0/group_mean(industry_sw_level1_0,close_0),volume_0,10),16),4),5)-rank(decay_linear(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,30),4),3)))*-1'

Alpha_92	'min(ts_rank(decay_linear(((((high_0+low_0)/2)+close_0)<(low_0+open_0)),15),19),ts_rank(decay_linear(correlation(rank(low_0),rank(mean(amount_0,30)),8),7),7))'

Alpha_93	'ts_rank(decay_linear(correlation((amount_0/volume_0*adjust_factor_0)/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0) ,mean(amount_0,81),17),20),8)/rank(decay_linear(delta(((close_0*0.524434)+(amount_0/volume_0*adjust_factor_0*(1-0.524434))),3),16))'

Alpha_94	'(rank((amount_0/volume_0*adjust_factor_0-ts_min(amount_0/volume_0*adjust_factor_0,12)))**ts_rank(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,20),ts_rank(mean(amount_0,60),4),18),3))*-1'

Alpha_95	'rank((open_0-ts_min(open_0,12)))<ts_rank((rank(correlation(sum(((high_0+low_0)/ 2),19),sum(mean(amount_0,40),19),13))**5),12)'

Alpha_96	'max(ts_rank(decay_linear(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),4),4),8),ts_rank(decay_linear(ts_argmax(correlation(ts_rank(close_0,7),ts_rank(mean(amount_0,60),4),4),13),14),13))*-1'     

Alpha_97	'(rank(decay_linear(delta(((low_0*0.721001)+(amount_0/volume_0*adjust_factor_0*(1-0.721001)))/group_mean(industry_sw_level1_0,(low_0*0.721001)+(amount_0/volume_0*adjust_factor_0*(1-0.721001))),3),20)) -ts_rank(decay_linear(ts_rank(correlation(ts_rank(low_0,8),ts_rank(mean(amount_0,60),17),5),16),16),7))*-1'

Alpha_98	'rank(decay_linear(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,5),26),5),7))-rank(decay_linear(ts_rank(ts_argmin(correlation(rank(open_0),rank(mean(amount_0,15)),21),9),7),8))'

Alpha_99	'(rank(correlation(sum(((high_0+low_0)/2),20),sum(mean(amount_0,60),20),9)) <rank(correlation(low_0,volume_0,6)))*-1'

Alpha_100	'-1*(((1.5*scale(rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0))/group_mean(industry_sw_level2_0,rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0)))))-scale((correlation(close_0,rank(mean(amount_0,20)),5)-rank(ts_argmin(close_0,30)))/group_mean(industry_sw_level2_0,(correlation(close_0,rank(mean(amount_0,20)),5)-rank(ts_argmin(close_0,30))))))*(volume_0/mean(amount_0,20)))'

Alpha_101	'(close_0-open_0)/((high_0-low_0)+0.001)' 


这里展示了WorldQuant公开的101个alpha及其表达式,感兴趣的朋友可以参考下面的 单因子测试 的代码做实验,唯一需要修改的是将具体的因子变动下,希望大家能开发出可以稳定盈利的策略,发掘出新的alpha。

注:部分因子可能是布尔型因子,因子值要么是1,要么是-1,这样的单因子在传入StockRanker的时候可能会出错,导致模型训练失败。部分长表达式因子需要通过测试案例中的别名简称方式重命名,并在传给训练模型之前进行因子简称转换处理,这是因为模型会根据指定的特征名作为列名读取数据表中的因子数据,别名处理后只有简称对应的列,此时传入完整的表达式将无法获取因子数据导致训练失败。

单因子测试

这里我们以’shift(close_0,15) / close_0’因子为例,介绍如何进行单因子测试,开发基于单因子的AI策略。

克隆策略

    {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-107:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:data"},{"DestinationInputPortId":"-107:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-114:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-123:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-130:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-133:input_1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-119:training_ds","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data"},{"DestinationInputPortId":"-123:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-142:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-114:input_data","SourceOutputPortId":"-107:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data2","SourceOutputPortId":"-114:data"},{"DestinationInputPortId":"-130:input_data","SourceOutputPortId":"-123:data"},{"DestinationInputPortId":"-119:predict_ds","SourceOutputPortId":"-130:data"},{"DestinationInputPortId":"-142:options_data","SourceOutputPortId":"-119:predictions"},{"DestinationInputPortId":"-119:features","SourceOutputPortId":"-133:data_1"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2014-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\nshift(close, -5) / shift(open, -1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nfactor=shift(close_0,15) / close_0\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":7,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"end_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"IsPartOfPartialRun":null,"Comment":"预测数据,用于回测和模拟","CommentCollapsed":false},{"Id":"-107","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-107"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-107"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-107","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-114","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-114"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-114"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-114","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":16,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-123","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-123"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-123"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-123","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":17,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-130","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-130"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-130"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-130","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":18,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-142","ModuleId":"BigQuantSpace.trade.trade-v4","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"initialize","Value":"# 回测引擎:初始化函数,只执行一次\ndef bigquant_run(context):\n # 加载预测数据\n context.ranker_prediction = context.options['data'].read_df()\n\n # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数\n context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))\n # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)\n # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只\n stock_count = 5\n # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]\n context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, stock_count)])\n # 设置每只股票占用的最大资金比例\n context.max_cash_per_instrument = 0.2\n context.options['hold_days'] = 5\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"handle_data","Value":"# 回测引擎:每日数据处理函数,每天执行一次\ndef bigquant_run(context, data):\n # 按日期过滤得到今日的预测数据\n ranker_prediction = context.ranker_prediction[\n context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]\n\n # 1. 资金分配\n # 平均持仓时间是hold_days,每日都将买入股票,每日预期使用 1/hold_days 的资金\n # 实际操作中,会存在一定的买入误差,所以在前hold_days天,等量使用资金;之后,尽量使用剩余资金(这里设置最多用等量的1.5倍)\n is_staging = context.trading_day_index < context.options['hold_days'] # 是否在建仓期间(前 hold_days 天)\n cash_avg = context.portfolio.portfolio_value / context.options['hold_days']\n cash_for_buy = min(context.portfolio.cash, (1 if is_staging else 1.5) * cash_avg)\n cash_for_sell = cash_avg - (context.portfolio.cash - cash_for_buy)\n positions = {e.symbol: p.amount * p.last_sale_price\n for e, p in context.perf_tracker.position_tracker.positions.items()}\n\n # 2. 生成卖出订单:hold_days天之后才开始卖出;对持仓的股票,按机器学习算法预测的排序末位淘汰\n if not is_staging and cash_for_sell > 0:\n equities = {e.symbol: e for e, p in context.perf_tracker.position_tracker.positions.items()}\n instruments = list(reversed(list(ranker_prediction.instrument[ranker_prediction.instrument.apply(\n lambda x: x in equities and not context.has_unfinished_sell_order(equities[x]))])))\n # print('rank order for sell %s' % instruments)\n for instrument in instruments:\n context.order_target(context.symbol(instrument), 0)\n cash_for_sell -= positions[instrument]\n if cash_for_sell <= 0:\n break\n\n # 3. 生成买入订单:按机器学习算法预测的排序,买入前面的stock_count只股票\n buy_cash_weights = context.stock_weights\n buy_instruments = list(ranker_prediction.instrument[:len(buy_cash_weights)])\n max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument\n for i, instrument in enumerate(buy_instruments):\n cash = cash_for_buy * buy_cash_weights[i]\n if cash > max_cash_per_instrument - positions.get(instrument, 0):\n # 确保股票持仓量不会超过每次股票最大的占用资金量\n cash = max_cash_per_instrument - positions.get(instrument, 0)\n if cash > 0:\n context.order_value(context.symbol(instrument), cash)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"prepare","Value":"# 回测引擎:准备数据,只执行一次\ndef bigquant_run(context):\n pass\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_trading_start","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"volume_limit","Value":0.025,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_buy","Value":"open","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_sell","Value":"close","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"capital_base","Value":1000000,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"auto_cancel_non_tradable_orders","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"data_frequency","Value":"daily","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"price_type","Value":"后复权","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"product_type","Value":"股票","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"plot_charts","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"backtest_only","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-142"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"options_data","NodeId":"-142"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"history_ds","NodeId":"-142"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"benchmark_ds","NodeId":"-142"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"trading_calendar","NodeId":"-142"}],"OutputPortsInternal":[{"Name":"raw_perf","NodeId":"-142","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":19,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-119","ModuleId":"BigQuantSpace.stock_ranker.stock_ranker-v2","ModuleParameters":[{"Name":"learning_algorithm","Value":"排序","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"number_of_leaves","Value":30,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"minimum_docs_per_leaf","Value":1000,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"number_of_trees","Value":20,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"learning_rate","Value":0.1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_bins","Value":1023,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"feature_fraction","Value":1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"slim_data","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"training_ds","NodeId":"-119"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-119"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"base_model","NodeId":"-119"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"test_ds","NodeId":"-119"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_model","NodeId":"-119"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"predict_ds","NodeId":"-119"}],"OutputPortsInternal":[{"Name":"model","NodeId":"-119","OutputType":null},{"Name":"predictions","NodeId":"-119","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-133","ModuleId":"BigQuantSpace.features_short.features_short-v1","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-133"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-133","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='211,64,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='70,183,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='765,21,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-53' Position='249,375,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-62' Position='1074,127,200,200'/><NodePosition Node='-107' Position='381,188,200,200'/><NodePosition Node='-114' Position='385,281,200,200'/><NodePosition Node='-123' Position='1078,236,200,200'/><NodePosition Node='-130' Position='1081,327,200,200'/><NodePosition Node='-142' Position='914,615,200,200'/><NodePosition Node='-119' Position='608,509,200,200'/><NodePosition Node='-133' Position='702,315,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":false}
    In [7]:
    # 本代码由可视化策略环境自动生成 2019年7月23日 09:56
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # 回测引擎:初始化函数,只执行一次
    def m19_initialize_bigquant_run(context):
        # 加载预测数据
        context.ranker_prediction = context.options['data'].read_df()
    
        # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数
        context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))
        # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)
        # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只
        stock_count = 5
        # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]
        context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, stock_count)])
        # 设置每只股票占用的最大资金比例
        context.max_cash_per_instrument = 0.2
        context.options['hold_days'] = 5
    
    # 回测引擎:每日数据处理函数,每天执行一次
    def m19_handle_data_bigquant_run(context, data):
        # 按日期过滤得到今日的预测数据
        ranker_prediction = context.ranker_prediction[
            context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]
    
        # 1. 资金分配
        # 平均持仓时间是hold_days,每日都将买入股票,每日预期使用 1/hold_days 的资金
        # 实际操作中,会存在一定的买入误差,所以在前hold_days天,等量使用资金;之后,尽量使用剩余资金(这里设置最多用等量的1.5倍)
        is_staging = context.trading_day_index < context.options['hold_days'] # 是否在建仓期间(前 hold_days 天)
        cash_avg = context.portfolio.portfolio_value / context.options['hold_days']
        cash_for_buy = min(context.portfolio.cash, (1 if is_staging else 1.5) * cash_avg)
        cash_for_sell = cash_avg - (context.portfolio.cash - cash_for_buy)
        positions = {e.symbol: p.amount * p.last_sale_price
                     for e, p in context.perf_tracker.position_tracker.positions.items()}
    
        # 2. 生成卖出订单:hold_days天之后才开始卖出;对持仓的股票,按机器学习算法预测的排序末位淘汰
        if not is_staging and cash_for_sell > 0:
            equities = {e.symbol: e for e, p in context.perf_tracker.position_tracker.positions.items()}
            instruments = list(reversed(list(ranker_prediction.instrument[ranker_prediction.instrument.apply(
                    lambda x: x in equities and not context.has_unfinished_sell_order(equities[x]))])))
            # print('rank order for sell %s' % instruments)
            for instrument in instruments:
                context.order_target(context.symbol(instrument), 0)
                cash_for_sell -= positions[instrument]
                if cash_for_sell <= 0:
                    break
    
        # 3. 生成买入订单:按机器学习算法预测的排序,买入前面的stock_count只股票
        buy_cash_weights = context.stock_weights
        buy_instruments = list(ranker_prediction.instrument[:len(buy_cash_weights)])
        max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument
        for i, instrument in enumerate(buy_instruments):
            cash = cash_for_buy * buy_cash_weights[i]
            if cash > max_cash_per_instrument - positions.get(instrument, 0):
                # 确保股票持仓量不会超过每次股票最大的占用资金量
                cash = max_cash_per_instrument - positions.get(instrument, 0)
            if cash > 0:
                context.order_value(context.symbol(instrument), cash)
    
    # 回测引擎:准备数据,只执行一次
    def m19_prepare_bigquant_run(context):
        pass
    
    
    m1 = M.instruments.v2(
        start_date='2014-01-01',
        end_date='2015-01-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m2 = M.advanced_auto_labeler.v2(
        instruments=m1.data,
        label_expr="""# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
    #   添加benchmark_前缀,可使用对应的benchmark数据
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
    
    # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    shift(close, -5) / shift(open, -1)
    
    # 极值处理:用1%和99%分位的值做clip
    clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    all_wbins(label, 20)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(shift(high, -1) == shift(low, -1), NaN, label)
    """,
        start_date='',
        end_date='',
        benchmark='000300.SHA',
        drop_na_label=True,
        cast_label_int=True
    )
    
    m3 = M.input_features.v1(
        features="""# #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    factor=shift(close_0,15) / close_0
    """
    )
    
    m15 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m3.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    m16 = M.derived_feature_extractor.v3(
        input_data=m15.data,
        features=m3.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=True,
        remove_extra_columns=False
    )
    
    m7 = M.join.v3(
        data1=m2.data,
        data2=m16.data,
        on='date,instrument',
        how='inner',
        sort=False
    )
    
    m6 = M.features_short.v1(
        input_1=m3.data
    )
    
    m9 = M.instruments.v2(
        start_date=T.live_run_param('trading_date', '2015-01-01'),
        end_date=T.live_run_param('trading_date', '2017-01-01'),
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m17 = M.general_feature_extractor.v7(
        instruments=m9.data,
        features=m3.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    m18 = M.derived_feature_extractor.v3(
        input_data=m17.data,
        features=m3.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=True,
        remove_extra_columns=False
    )
    
    m4 = M.stock_ranker.v2(
        training_ds=m7.data,
        features=m6.data_1,
        predict_ds=m18.data,
        learning_algorithm='排序',
        number_of_leaves=30,
        minimum_docs_per_leaf=1000,
        number_of_trees=20,
        learning_rate=0.1,
        max_bins=1023,
        feature_fraction=1,
        slim_data=True
    )
    
    m19 = M.trade.v4(
        instruments=m9.data,
        options_data=m4.predictions,
        start_date='',
        end_date='',
        initialize=m19_initialize_bigquant_run,
        handle_data=m19_handle_data_bigquant_run,
        prepare=m19_prepare_bigquant_run,
        volume_limit=0.025,
        order_price_field_buy='open',
        order_price_field_sell='close',
        capital_base=1000000,
        auto_cancel_non_tradable_orders=True,
        data_frequency='daily',
        price_type='后复权',
        product_type='股票',
        plot_charts=True,
        backtest_only=False,
        benchmark='000300.SHA'
    )
    
    设置测试数据集,查看训练迭代过程的NDCG
    bigcharts-data-start/{"__id":"bigchart-fdd059aa1587421fbfb49e209651d4ac","__type":"tabs"}/bigcharts-data-end
    设置测试数据集,查看训练迭代过程的NDCG
    bigcharts-data-start/{"__id":"bigchart-d2bdb14bec2944a0817faacf531c6614","__type":"tabs"}/bigcharts-data-end
    • 收益率43.34%
    • 年化收益率20.43%
    • 基准收益率-6.33%
    • 阿尔法0.24
    • 贝塔0.97
    • 夏普比率0.6
    • 胜率0.56
    • 盈亏比0.95
    • 收益波动率38.65%
    • 信息比率0.07
    • 最大回撤47.34%
    bigcharts-data-start/{"__id":"bigchart-cb1dffe1680043c284465dca3a321ff4","__type":"tabs"}/bigcharts-data-end

    相关阅读

    小结: 了解上述方法过后,大家即可在策略研究平台上,通过表达式快速进行因子构建和数据标数。


       本文由BigQuant宽客学院推出,版权归BigQuant所有,转载请注明出处。
    


    求教量价背离因子如何编写?
    (zykphzx) #2

    请教下jupyter格式ipynb的链接怎么打开呢?


    (iQuant) #3

    已经修复了哈,您看一下呢


    (zykphzx) #4

    好像还是不行


    (iQuant) #5

    重新克隆策略试一下。


    (yangziriver) #6

    这里的101个因子和原文的顺序好象不一样?