282个因子快速定义的秘密:表达式引擎

自定义因子
ai_alphas
特征抽取
策略分享
标签: #<Tag:0x00007fc0706a5c60> #<Tag:0x00007fc0706a5a58> #<Tag:0x00007fc0706a58a0> #<Tag:0x00007fc0706a56c0>

(iQuant) #1

在本月初,我们发布了文章《AI Alphas(A股版)》反响不错,在量化领域得到不少好评。很多用户来信,希望了解文中提到的因子以及因子构造(虽然原文附录有完整因子列表),以便快速开发AI策略,这正是本文目的。

由于个别因子出现重复、构建错误,本文一共依次列举了274个因子,分别与《AI Alphas(A股版)》文章使用的因子一一对应。在《AI Alphas(A股版)》这篇文章中,因子构造使用的是M.user_feature_extractor 模块,最近BigQuant上线了M.derived_feature_extractor 模块,因子定义不需要编写代码,只需一些表达式就可快速定义因子(抽取特征),表达式引擎帮助文档为:bigexpr。这将是一个巨大的变革,通过表达式即可定义特征将大大提高策略开发速度,加快实验迭代。

什么是因子表达式呢?即通过简单的固定的语言就可以描述因子定义的思想,比如说,想定义5日简单移动平均线,之前的因子定义方式为:

(close_0+close_1+close_2+close_3+close_4)/5

这样的定义方式虽然简单直观,但是不太方便,比如,如果定义60日简单移动平均线就很麻烦,需要书写60个收盘价。

表达式引擎借鉴了WorldQuant、通达信、TB等平台的一些思想,通过引入一些简单常用的函数表达式,达到快速构建因子的目的。比如构建上述因子,我们可以这样:

  mean(close_0,5)

这样就能快捷灵活很多。不仅如此,再举几个例子:

  1. shift(close_0, 5) 表示5天前的收盘价,等价于close_5
  2. delta(close_0, 3) 表示今天的 收盘价减去 3 天以前的 收盘价,等价于close_0-close_3
  3. correlation(close_0, volume_0, 20) 表示在过去20 天中收盘价 和 成交量 的相关性
  4. ts_min(low_0,7) 表示过去7天最低价(是时间序列角度)
  5. rank(fs_pettm_0) 表示在当天所有股票中按市盈率的的百分比排名(是横截面角度)

这只是简单举几个常用的表达式,更多内容请参考:bigexpr表达式引擎

文中提到的两百多个因子,可以在此获取:链接
使用BigStudio可以快速定义因子,我们以274个因子中如下几个示例因子举例,告诉大家怎样快速定义因子。

示例因子:

  • return_5
  • close_0 / close_1
  • mean(amount_0,9)/mean(amount_0,3)
  • amount_5/amount_0
  • (-1*correlation(open_0,volume_0,10))
  • (-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9))
克隆策略

    {"Description":"实验创建于2017/10/13","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-14:instruments","SourceOutputPortId":"-3:data"},{"DestinationInputPortId":"-14:features","SourceOutputPortId":"-9:data"},{"DestinationInputPortId":"-20:features","SourceOutputPortId":"-9:data"},{"DestinationInputPortId":"-20:input_data","SourceOutputPortId":"-14:data"}],"ModuleNodes":[{"Id":"-3","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2017-06-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[],"OutputPortsInternal":[{"Name":"data","NodeId":"-3","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-9","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nreturn_5\nclose_0 / close_1\nmean(amount_0,9)/mean(amount_0,3)\namount_5/amount_0\n(-1*correlation(open_0,volume_0,10))\n(-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9)) \n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[],"OutputPortsInternal":[{"Name":"data","NodeId":"-9","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-14","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v6","ModuleParameters":[{"Name":"start_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2017-06-01","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-14"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-14"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-14","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-20","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v2","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-20"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-20"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-20","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-3' Position='16,139,200,200'/><NodePosition Node='-9' Position='420.9541931152344,136.2848663330078,200,200'/><NodePosition Node='-14' Position='282.1912536621094,310.7808837890625,200,200'/><NodePosition Node='-20' Position='628.3964233398438,406.2390441894531,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":false}
    In [7]:
    # 本代码由可视化策略环境自动生成 2017年10月13日 21:36
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2017-01-01',
        end_date='2017-06-01',
        market='CN_STOCK_A',
        max_count=0
    )
    
    m2 = M.input_features.v1(
        features="""
    # #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    return_5
    close_0 / close_1
    mean(amount_0,9)/mean(amount_0,3)
    amount_5/amount_0
    (-1*correlation(open_0,volume_0,10))
    (-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9)) 
    """
    )
    
    m3 = M.general_feature_extractor.v6(
        instruments=m1.data,
        features=m2.data,
        start_date='2017-01-01',
        end_date='2017-06-01'
    )
    
    m4 = M.derived_feature_extractor.v2(
        input_data=m3.data,
        features=m2.data,
        date_col='date',
        instrument_col='instrument',
        user_functions={}
    )
    
    [2017-10-13 21:34:26.571447] INFO: bigquant: instruments.v2 开始运行..
    [2017-10-13 21:34:26.574461] INFO: bigquant: 命中缓存
    [2017-10-13 21:34:26.575449] INFO: bigquant: instruments.v2 运行完成[0.004048s].
    [2017-10-13 21:34:26.580222] INFO: bigquant: input_features.v1 开始运行..
    [2017-10-13 21:34:26.584707] INFO: bigquant: input_features.v1 运行完成[0.004496s].
    [2017-10-13 21:34:26.590777] INFO: bigquant: general_feature_extractor.v6 开始运行..
    [2017-10-13 21:34:29.954418] INFO: general_feature_extractor: 年份 2017, 特征行数=286124
    [2017-10-13 21:34:29.961024] INFO: general_feature_extractor: 总行数: 286124
    [2017-10-13 21:34:29.966115] INFO: bigquant: general_feature_extractor.v6 运行完成[3.375337s].
    [2017-10-13 21:34:29.974944] INFO: bigquant: derived_feature_extractor.v2 开始运行..
    [2017-10-13 21:34:30.204243] INFO: derived_feature_extractor: 提取完成 close_0 / close_1, 0.002s
    [2017-10-13 21:34:34.155928] INFO: derived_feature_extractor: 提取完成 mean(amount_0,9)/mean(amount_0,3), 3.950s
    [2017-10-13 21:34:34.160413] INFO: derived_feature_extractor: 提取完成 amount_5/amount_0, 0.003s
    [2017-10-13 21:35:01.290493] INFO: derived_feature_extractor: 提取完成 (-1*correlation(open_0,volume_0,10)), 27.129s
    [2017-10-13 21:35:01.384221] INFO: derived_feature_extractor: 提取完成 (-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9)), 0.092s
    [2017-10-13 21:35:01.505779] INFO: derived_feature_extractor: /y_2017, 286124
    [2017-10-13 21:35:02.494084] INFO: bigquant: derived_feature_extractor.v2 运行完成[32.519127s].
    
    In [9]:
    m4.data.read_df().tail()
    
    Out[9]:
    amount_0 amount_5 close_0 close_1 date high_0 instrument low_0 open_0 return_5 volume_0 close_0 / close_1 mean(amount_0,9)/mean(amount_0,3) amount_5/amount_0 (-1*correlation(open_0,volume_0,10)) (-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9))
    286119 41908720.0 77158384.0 34.299999 37.160000 2017-06-01 36.799999 603991.SHA 34.080002 36.799999 0.883793 1179418 0.923036 1.299168 1.841106 -0.445650 10.018940
    286120 325846880.0 209245664.0 13.137134 13.589025 2017-06-01 13.556747 603993.SHA 13.008022 13.556747 0.959906 79660839 0.966746 0.998587 0.642159 -0.102105 1.583360
    286121 67045340.0 40373360.0 26.524748 26.095942 2017-06-01 26.601322 603996.SHA 25.651821 25.743708 1.088262 3913236 1.016432 0.629411 0.602180 -0.382270 -0.312278
    286122 14990906.0 39956024.0 51.039860 52.178440 2017-06-01 52.531792 603998.SHA 50.922073 52.531792 0.939306 1136900 0.978179 1.307715 2.665351 0.387761 11.133095
    286123 34542324.0 53836952.0 22.516445 23.588083 2017-06-01 23.527880 603999.SHA 22.468283 23.527880 0.895594 1803040 0.954569 1.157170 1.558579 -0.026805 20.641499

    WorldQuant 101alpha因子构建及因子测试
    如何定义涨幅大于0.1的因子
    基于AI模板的一些问题
    WorldQuant 101 Alpha因子构建及因子测试
    如何过滤60天内没有涨停的股票
    (1899) #2

    你好 请问平台上的某些函数能看到python源码吗?


    (小Q) #3

    指的是哪些函数呢?


    (1899) #4

    例如stock-ranker模型或者打标签函数等,因为有的时候我可能会看一下源码的原理,看能不能修改一下更加符合我的要求。


    (小Q) #5

    开源是我们的理念,我们已将部分模块上传至Github,未来会上传更多的代码。


    (1899) #6

    你好,请问一下,例如5日均量线向上交叉35日均量线可以作为一个特征放在平台上进行机器学习模型训练么?还说着这种只能做传统的量化测试。


    (小Q) #7

    可以作为特征。比如mean(close_0, 5) > mean(close_0, 35) 这就可以作为一个特征进行传入到算法


    (1899) #8

    那要是“流通市值小于100亿“这个特征怎么写入


    (小Q) #9

    market_cap_float_0 < 10000000000


    (a20180322) #10

    如何知道近期哪个因子影响最大,比2018年3月


    (iQuant) #11

    最近那个因子收益最好是需要统计分析获得的,传统barra的分析方法是通过因子暴露和收益做线性回归获得。
    比如整理出3月初股票的因子数据,再结合三月整月的收益,横截面的回归就能获得因子收益(回归的beta值),因子收益大的最近就是影响大。