【精品推荐】如何按需求实现自定义因子?


(iQuant) #1

BigQuant平台的因子库上有许多因子,但是如果想计算一些个性化的自定义的因子,比如相对大盘的收益率等因子应该怎么构造呢?本文将通过几个示例进行展示。

自定义因子的方式很灵活,为大家介绍几种常用的自定义因子构建方式。

一、构建大盘收益率因子

第一步:通过DataSource模块获取‘000001.HIX’指数对应的close数据列,然后通过衍生特征抽取模块计算5日收益率数据,我们通过选择列模块仅保留日期和大盘5日收益率因子列。
第二步:通过基础特征抽取和衍生特征抽取模块获取股票的其它因子数据。
第三步:将上述两步的结果以date列作为公共列横向连接在一起。
流程图如下所示,左支为第一步,右支为第二步,最后通过m8模块将因子连接在一起。


我们可以查看结果:
image

本案例策略链接

二、构建大盘相对收益率因子

第一步:通过DataSource模块获取‘000001.HIX’指数对应的close数据列,然后通过衍生特征抽取模块计算5日收益率数据,我们通过选择列模块仅保留日期和大盘5日收益率列bmret。
第二步:通过基础特征抽取和衍生特征抽取模块获取股票收益率列stockret。
第三步:将上述两步的结果以date列作为公共列横向连接在一起。
第四步:通过衍生数据抽取模块m17计算相对收益率列relative_ret
流程图如下所示,左支为第一步,右支为第二步,通过m8模块将因子连接在一起后计算相对收益率。


我们可以查看结果:
image

本案例策略链接

三、计算三年财报ROE的滚动平均值

第一步:通过DataSource模块获取financial_statement_CN_STOCK_A表中的fs_roe数据,然后过滤出年报数据,我们通过衍生特征抽取模块m5计算3年ROE的滚动平均值。
第二步:抽取股票的交易日数据。
第三步:将上述两步的结果以date和instrument列作采用outer方式合并,这是因为财报数据的发布日不一定是交易日,采用outer方式合并可以保留所有的日期序列。
第四步:将合并后的数据按股票分组后采用前后向填充的方式填充每个交易日的3年ROE滚动均值。
第五步:上一步处理后的数据再次使用inner方式与交易日数据合并去除非交易日
流程图如下所示,左支为第一步,右支为第二步,通过m9模块将因子连接在一起。在m11模块按股票进行数据填充,使用m12模块inner方式拼接去除非交易日数据

我们可以查看结果:

image

本案例策略链接

结语:因子构建流程包含了数据读取、数据处理和数据拼接,熟悉平台常用的数据表字段,结合表达式引擎的计算能力,就可以轻松构建自定义因子。


自定义函数构建因子
请问有200多个特征因子对应的中文解释吗?
想通过财务报表营业总收入的环比、同比筛选股票。怎么表达呢?
(w890912y) #2

怎么会出现同一时间不同的股票的三年滚动ROE完全一模一样呢?貌似不科学,另外,fs_eps_avg3=mean(fs_roe,3)这个表达式能体现滚动的意思吗?


(达达) #3

缺失数据填充向前填充造成的,参考下面的例子,只向后填充

克隆策略

    {"Description":"实验创建于2019/1/21","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-797:input_data","SourceOutputPortId":"-778:data"},{"DestinationInputPortId":"-778:features","SourceOutputPortId":"-792:data"},{"DestinationInputPortId":"-803:input_data","SourceOutputPortId":"-797:data"},{"DestinationInputPortId":"-821:input_data","SourceOutputPortId":"-803:data"},{"DestinationInputPortId":"-803:features","SourceOutputPortId":"-811:data"},{"DestinationInputPortId":"-761:data1","SourceOutputPortId":"-821:data"},{"DestinationInputPortId":"-948:input_ds","SourceOutputPortId":"-754:data"},{"DestinationInputPortId":"-784:data1","SourceOutputPortId":"-775:data_1"},{"DestinationInputPortId":"-778:instruments","SourceOutputPortId":"-860:data"},{"DestinationInputPortId":"-754:instruments","SourceOutputPortId":"-860:data"},{"DestinationInputPortId":"-754:features","SourceOutputPortId":"-937:data"},{"DestinationInputPortId":"-948:columns_ds","SourceOutputPortId":"-937:data"},{"DestinationInputPortId":"-784:data2","SourceOutputPortId":"-948:data"},{"DestinationInputPortId":"-761:data2","SourceOutputPortId":"-948:data"},{"DestinationInputPortId":"-775:input_1","SourceOutputPortId":"-761:data"}],"ModuleNodes":[{"Id":"-778","ModuleId":"BigQuantSpace.use_datasource.use_datasource-v1","ModuleParameters":[{"Name":"datasource_id","Value":"financial_statement_CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-778"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-778"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-778","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true},{"Id":"-784","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"-784"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"-784"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-784","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":12,"Comment":"","CommentCollapsed":true},{"Id":"-792","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nfs_roe\nfs_quarter_index","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-792"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-792","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"Comment":"","CommentCollapsed":true},{"Id":"-797","ModuleId":"BigQuantSpace.filter.filter-v3","ModuleParameters":[{"Name":"expr","Value":"fs_quarter_index==4","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"output_left_data","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-797"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-797","OutputType":null},{"Name":"left_data","NodeId":"-797","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"Comment":"过滤4季报","CommentCollapsed":false},{"Id":"-803","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-803"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-803"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-803","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":5,"Comment":"","CommentCollapsed":true},{"Id":"-811","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nfs_eps_avg3=mean(fs_roe,3)","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-811"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-811","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"Comment":"","CommentCollapsed":true},{"Id":"-821","ModuleId":"BigQuantSpace.dropnan.dropnan-v1","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-821"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-821","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":7,"Comment":"","CommentCollapsed":true},{"Id":"-754","ModuleId":"BigQuantSpace.use_datasource.use_datasource-v1","ModuleParameters":[{"Name":"datasource_id","Value":"stock_status_CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-754"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-754"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-754","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":8,"Comment":"","CommentCollapsed":true},{"Id":"-775","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"# Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端\ndef bigquant_run(input_1, input_2, input_3):\n # 示例代码如下。在这里编写您的代码\n df = input_1.read_df()\n result = df.groupby('instrument').ffill().dropna()\n data_1 = DataSource.write_df(result)\n return Outputs(data_1=data_1, data_2=None, data_3=None)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"post_run","Value":"# 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。\ndef bigquant_run(outputs):\n return outputs\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"input_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"params","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"output_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-775"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-775"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-775"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-775","OutputType":null},{"Name":"data_2","NodeId":"-775","OutputType":null},{"Name":"data_3","NodeId":"-775","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":11,"Comment":"","CommentCollapsed":true},{"Id":"-860","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2012-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-860"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-860","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"","CommentCollapsed":true},{"Id":"-937","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nst_status","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-937"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-937","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":10,"Comment":"","CommentCollapsed":true},{"Id":"-948","ModuleId":"BigQuantSpace.select_columns.select_columns-v3","ModuleParameters":[{"Name":"columns","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"reverse_select","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_ds","NodeId":"-948"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"columns_ds","NodeId":"-948"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-948","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":14,"Comment":"","CommentCollapsed":true},{"Id":"-761","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"outer","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"-761"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"-761"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-761","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-778' Position='-963,162,200,200'/><NodePosition Node='-784' Position='-336,736.3283081054688,200,200'/><NodePosition Node='-792' Position='-750,56,200,200'/><NodePosition Node='-797' Position='-950,241,200,200'/><NodePosition Node='-803' Position='-827,358.32830810546875,200,200'/><NodePosition Node='-811' Position='-652,254,200,200'/><NodePosition Node='-821' Position='-816,439,200,200'/><NodePosition Node='-754' Position='-320,172,200,200'/><NodePosition Node='-775' Position='-513,634,200,200'/><NodePosition Node='-860' Position='-1069,51,200,200'/><NodePosition Node='-937' Position='-204,64,200,200'/><NodePosition Node='-948' Position='-349.32830810546875,349.32830810546875,200,200'/><NodePosition Node='-761' Position='-631,546,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [6]:
    # 本代码由可视化策略环境自动生成 2019年7月30日 11:45
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
    def m11_run_bigquant_run(input_1, input_2, input_3):
        # 示例代码如下。在这里编写您的代码
        df = input_1.read_df()
        result = df.groupby('instrument').ffill().dropna()
        data_1 = DataSource.write_df(result)
        return Outputs(data_1=data_1, data_2=None, data_3=None)
    
    # 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。
    def m11_post_run_bigquant_run(outputs):
        return outputs
    
    
    m3 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    fs_roe
    fs_quarter_index"""
    )
    
    m6 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    fs_eps_avg3=mean(fs_roe,3)"""
    )
    
    m2 = M.instruments.v2(
        start_date='2012-01-01',
        end_date='2018-01-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m1 = M.use_datasource.v1(
        instruments=m2.data,
        features=m3.data,
        datasource_id='financial_statement_CN_STOCK_A',
        start_date='',
        end_date=''
    )
    
    m4 = M.filter.v3(
        input_data=m1.data,
        expr='fs_quarter_index==4',
        output_left_data=False
    )
    
    m5 = M.derived_feature_extractor.v3(
        input_data=m4.data,
        features=m6.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False,
        user_functions={}
    )
    
    m7 = M.dropnan.v1(
        input_data=m5.data
    )
    
    m10 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    st_status"""
    )
    
    m8 = M.use_datasource.v1(
        instruments=m2.data,
        features=m10.data,
        datasource_id='stock_status_CN_STOCK_A',
        start_date='',
        end_date=''
    )
    
    m14 = M.select_columns.v3(
        input_ds=m8.data,
        columns_ds=m10.data,
        columns='',
        reverse_select=True
    )
    
    m9 = M.join.v3(
        data1=m7.data,
        data2=m14.data,
        on='date,instrument',
        how='outer',
        sort=False
    )
    
    m11 = M.cached.v3(
        input_1=m9.data,
        run=m11_run_bigquant_run,
        post_run=m11_post_run_bigquant_run,
        input_ports='',
        params='{}',
        output_ports=''
    )
    
    m12 = M.join.v3(
        data1=m11.data_1,
        data2=m14.data,
        on='date,instrument',
        how='inner',
        sort=True
    )
    

    查看结果

    roe的3年滚动平均值与如下表所示

    In [8]:
    m12.data.read_df().head()
    
    Out[8]:
    date instrument fs_quarter_index fs_roe fs_eps_avg3
    0 2014-01-21 300108.SZA 4.0 6.9435 7.606467
    1 2014-01-21 300181.SZA 4.0 10.9572 10.436667
    2 2014-01-22 000860.SZA 4.0 6.6472 7.388667
    3 2014-01-22 002473.SZA 4.0 0.5955 3.305033
    4 2014-01-22 300108.SZA 4.0 6.9435 7.606467