教程:抽取特征时如何给因子做重命名

策略分享
标签: #<Tag:0x00007ff196b67e18>

(chad) #1

输入特征列表模块,由于策略开发需要,我们可能会构建很多复杂的特征,这些特征如果能够自定义名称,那么再后续的分析中就非常方便,因此给出一个示例。

本例中,我们构建5日收盘价移动平均值因子和按行业中性化的市值因子,将其重命名:

ma_5 # 5日收盘价移动平均值因子
neutral_market_cap #按行业中性化后的市值因子

我们看看抽取出来的特征时怎样的:

原来,ma_5和neutral_market_cap这两个自定义的特征已经构建好了。具体的策略源码在下方,大家可以克隆研究哦~

克隆策略

    {"Description":"实验创建于2018/11/1","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-513:features","SourceOutputPortId":"-500:data"},{"DestinationInputPortId":"-520:features","SourceOutputPortId":"-500:data"},{"DestinationInputPortId":"-520:instruments","SourceOutputPortId":"-504:data"},{"DestinationInputPortId":"-513:input_data","SourceOutputPortId":"-520:data"}],"ModuleNodes":[{"Id":"-500","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nreturn_5\nma_5 = mean(close_0,5)\nneutral_market_cap=group_mean(industry_sw_level1_0, market_cap_float_0)","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-500"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-500","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"","CommentCollapsed":true},{"Id":"-504","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-10-10","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"3","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-504"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-504","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"Comment":"","CommentCollapsed":true},{"Id":"-513","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-513"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-513"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-513","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"Comment":"","CommentCollapsed":true},{"Id":"-520","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":90,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-520"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-520"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-520","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-500' Position='307,100,200,200'/><NodePosition Node='-504' Position='0,98,200,200'/><NodePosition Node='-513' Position='136,398,200,200'/><NodePosition Node='-520' Position='98,288,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [2]:
    # 本代码由可视化策略环境自动生成 2018年11月1日 22:30
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m2 = M.input_features.v1(
        features="""
    # #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    return_5
    ma_5 = mean(close_0,5)
    neutral_market_cap=group_mean(industry_sw_level1_0, market_cap_float_0)"""
    )
    
    m3 = M.instruments.v2(
        start_date='2018-01-01',
        end_date='2018-10-10',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=3
    )
    
    m1 = M.general_feature_extractor.v7(
        instruments=m3.data,
        features=m2.data,
        start_date='',
        end_date='',
        before_start_days=90
    )
    
    m4 = M.derived_feature_extractor.v3(
        input_data=m1.data,
        features=m2.data,
        date_col='date',
        instrument_col='instrument',
        user_functions={}
    )
    
    [2018-11-01 22:32:10.979349] INFO: bigquant: input_features.v1 开始运行..
    [2018-11-01 22:32:10.983985] INFO: bigquant: 命中缓存
    [2018-11-01 22:32:10.985064] INFO: bigquant: input_features.v1 运行完成[0.005738s].
    [2018-11-01 22:32:10.987568] INFO: bigquant: instruments.v2 开始运行..
    [2018-11-01 22:32:11.014007] INFO: bigquant: instruments.v2 运行完成[0.026405s].
    [2018-11-01 22:32:11.020971] INFO: bigquant: general_feature_extractor.v7 开始运行..
    [2018-11-01 22:32:11.207333] INFO: 基础特征抽取: 年份 2017, 特征行数=180
    [2018-11-01 22:32:11.276427] INFO: 基础特征抽取: 年份 2018, 特征行数=555
    [2018-11-01 22:32:11.286066] INFO: 基础特征抽取: 总行数: 735
    [2018-11-01 22:32:11.288135] INFO: bigquant: general_feature_extractor.v7 运行完成[0.267161s].
    [2018-11-01 22:32:11.292197] INFO: bigquant: derived_feature_extractor.v3 开始运行..
    [2018-11-01 22:32:11.332962] INFO: derived_feature_extractor: 提取完成 ma_5 = mean(close_0,5), 0.008s
    [2018-11-01 22:32:11.481784] INFO: derived_feature_extractor: 提取完成 neutral_market_cap=group_mean(industry_sw_level1_0, market_cap_float_0), 0.147s
    [2018-11-01 22:32:11.520796] INFO: derived_feature_extractor: /y_2017, 180
    [2018-11-01 22:32:11.554691] INFO: derived_feature_extractor: /y_2018, 555
    [2018-11-01 22:32:11.590329] INFO: bigquant: derived_feature_extractor.v3 运行完成[0.298084s].
    
    In [4]:
    m4.data.read_df().tail()
    
    Out[4]:
    close_0 date industry_sw_level1_0 instrument market_cap_float_0 return_5 ma_5 neutral_market_cap
    730 3145.829346 2018-10-08 430000 000002.SZA 2.142195e+11 0.918367 3427.170654 2.142195e+11
    731 1140.811523 2018-10-09 480000 000001.SZA 1.813178e+11 0.989691 1156.152002 1.813178e+11
    732 3113.015625 2018-10-09 430000 000002.SZA 2.119850e+11 0.845736 3346.705908 2.119850e+11
    733 1128.928101 2018-10-10 480000 000001.SZA 1.794291e+11 0.990521 1150.534375 1.794291e+11
    734 3103.029053 2018-10-10 430000 000002.SZA 2.113050e+11 0.882711 3263.387793 2.113050e+11