【学院教程】利用表达式引擎批量生成因子

用户成长系列
标签: #<Tag:0x00007fc066b97598>

(iQuant) #1

相信大家已经很熟悉平台的表达式引擎功能了,在创建因子的过程中我们经常会遇到需要批量生成因子比如close_0,close_1,close_2…close_20,又或者因子本身有很多重复的项只是参数不同,例如生成一个规则循环因子close_0turn_0 + close_1turn_1+…,能否有个快捷的方式生成这类因子呢?本文介绍使用列表生成式搞定批量因子/ 规则循环因子生成问题。

1. 利用列表生成式批量生成因子

在特征因子列表中输入列表生成式例如:

['close_{}*turn_{}'.format(k,k) for k in range(3)]

就可以实现批量生成3个因子[close_0turn_0, close_1turn_1, close_2*turn_2]

如果循环参数不一致可以通过zip连接

['close_{}*turn_{}'.format(k,j) for k,j in zip(range(3),range(1,6,2)]

就可以实现批量生成3个因子[close_0turn_1, close_1turn_3, close_2*turn_5]

2. 生成规则循环表达式

我们通过’+’.join(列表生成式)的方式批量生成因子并用+号连接因子。同理可以用减号,乘号以及除号等运算符连接。

alpha1='+'.join(['close_{}*turn_{}'.format(k,k) for k in range(3)])
alpha2='-'.join(['close_{}*turn_{}'.format(k,j) for k,j in zip(range(0,3),range(1,7,2))])
alpha3='+'.join(['shift(close_0,{})/shift(close_0,{})-1'.format(k,j) for k,j in zip(range(0,66,22),range(22,88,22))])

如果表达式过长,可能会引发错误,此时在表达式项前后加括号括起来,例如:

alpha4='+'.join(['(close_{}*turn_{})'.format(k,k) for k in range(300)])

案例如下:

克隆策略

    {"Description":"实验创建于2019/7/15","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-2321:instruments","SourceOutputPortId":"-2109:data"},{"DestinationInputPortId":"-2321:features","SourceOutputPortId":"-2316:data"},{"DestinationInputPortId":"-2328:features","SourceOutputPortId":"-2316:data"},{"DestinationInputPortId":"-2328:input_data","SourceOutputPortId":"-2321:data"}],"ModuleNodes":[{"Id":"-2109","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2010-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2016-12-31","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-2109"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2109","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true},{"Id":"-2316","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nalpha1='+'.join(['close_{}*turn_{}'.format(k,k) for k in range(3)])\nalpha2='-'.join(['close_{}*turn_{}'.format(k,j) for k,j in zip(range(0,3),range(1,7,2))])\nalpha3='+'.join(['shift(close_0,{})/shift(close_0,{})-1'.format(k,j) for k,j in zip(range(0,66,22),range(22,88,22))])","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-2316"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2316","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"","CommentCollapsed":true},{"Id":"-2321","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"200","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-2321"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-2321"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2321","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"Comment":"","CommentCollapsed":true},{"Id":"-2328","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-2328"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-2328"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2328","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-2109' Position='383,170,200,200'/><NodePosition Node='-2316' Position='838,165,200,200'/><NodePosition Node='-2321' Position='600,306,200,200'/><NodePosition Node='-2328' Position='636.7109375,463.4323425292969,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [8]:
    # 本代码由可视化策略环境自动生成 2019年7月15日 10:18
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2010-01-01',
        end_date='2016-12-31',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m2 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    alpha1='+'.join(['close_{}*turn_{}'.format(k,k) for k in range(3)])
    alpha2='-'.join(['close_{}*turn_{}'.format(k,j) for k,j in zip(range(0,3),range(1,7,2))])
    alpha3='+'.join(['shift(close_0,{})/shift(close_0,{})-1'.format(k,j) for k,j in zip(range(0,66,22),range(22,88,22))])"""
    )
    
    m3 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m2.data,
        start_date='',
        end_date='',
        before_start_days=200
    )
    
    m4 = M.derived_feature_extractor.v3(
        input_data=m3.data,
        features=m2.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False,
        user_functions={}
    )
    
    In [11]:
    m4.data.read().tail()
    
    Out[11]:
    close_0 close_1 close_2 date instrument turn_0 turn_1 turn_2 turn_3 turn_5 alpha1 alpha2 alpha3
    4067996 37.290604 37.350811 39.120819 2016-12-26 603999.SHA 4.848342 5.095364 6.146203 9.435085 6.827219 611.558044 -429.485321 0.216330
    4067997 36.664478 37.290604 37.350811 2016-12-27 603999.SHA 3.714744 4.848342 5.095364 6.146203 4.946309 507.312683 -236.182343 0.117074
    4067998 36.568153 36.664478 37.290604 2016-12-28 603999.SHA 3.602861 3.714744 4.848342 5.095364 9.435085 448.746704 -402.817566 0.109103
    4067999 35.905903 36.568153 36.664478 2016-12-29 603999.SHA 4.598748 3.602861 3.714744 4.848342 6.146203 433.071350 -273.278259 0.073060
    4068000 35.303860 35.905903 36.568153 2016-12-30 603999.SHA 4.199971 4.598748 3.602861 3.714744 5.095364 445.147400 -157.355698 0.057441

    一个自定义因子就足足跑了27分钟,有那位大佬能优化一下吗?
    【研报复现大赛】进阶篇(评审中)