衍生特征抽取的示例代码问题


(jijingling777) #1

在文档示例代码中的这部分:

再计算一个衍生特征,比如 rank(return_10 / return_20)

m2_2 = M.derived_feature_extractor.v3(
input_data=m2.data,
features=[‘rank(return_10 / return_20)’])

 (features=['rank(return_10 / return_20)']))这部分是如何在可视化界面中输入并生成的?
按之前的理解,这是这个衍生特征抽取模块自动从输入特征列表模块里自动筛选出来的非基础特征,那么这个示例中的代码为什么是直接输入的rank(return_10 / return_20),并没有在模块可视化视图中找到输入这个公示的地方。



#### 衍生特征抽取:
示例代码

start_date=‘2014-01-01’
end_date=‘2015-01-01’
instruments=D.instruments(start_date, end_date,market=‘CN_STOCK_A’)
features = [
‘return_5’, # 5日收益
‘return_10’, # 10日收益
‘return_20’, # 20日收益
‘avg_amount_0/avg_amount_5’, # 当日/5日平均交易额
‘avg_amount_5/avg_amount_20’, # 5日/20日平均交易额
]

抽取基础特征,比如 return_5, return_10, …

m2 = M.general_feature_extractor.v7(
instruments=instruments,
start_date=start_date, end_date= end_date,
features=features)

计算衍生特征,比如 avg_amount_0/avg_amount_5 …

m2_1 = M.derived_feature_extractor.v3(input_data=m2.data, features=features)

再计算一个衍生特征,比如 rank(return_10 / return_20)

m2_2 = M.derived_feature_extractor.v3(
input_data=m2.data,
features=[‘rank(return_10 / return_20)’])


(达达) #2
克隆策略

    {"Description":"实验创建于2019/11/6","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-2631:features","SourceOutputPortId":"-2618:data"},{"DestinationInputPortId":"-2638:features","SourceOutputPortId":"-2618:data"},{"DestinationInputPortId":"-2631:instruments","SourceOutputPortId":"-2622:data"},{"DestinationInputPortId":"-2638:input_data","SourceOutputPortId":"-2631:data"}],"ModuleNodes":[{"Id":"-2618","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nrank(return_10 / return_20)\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-2618"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2618","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-2622","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2019-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2019-03-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-2622"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2622","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-2631","ModuleId":"BigQuantSpace.general_feature_extractor_vx1.general_feature_extractor_vx1-v1","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":90,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-2631"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-2631"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2631","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-2638","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-2638"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-2638"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2638","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-2618' Position='242,154,200,200'/><NodePosition Node='-2622' Position='-78,156,200,200'/><NodePosition Node='-2631' Position='100,259,200,200'/><NodePosition Node='-2638' Position='117,369,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":false}
    In [1]:
    # 本代码由可视化策略环境自动生成 2019年11月6日 10:00
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    rank(return_10 / return_20)
    """
    )
    
    m2 = M.instruments.v2(
        start_date='2019-01-01',
        end_date='2019-03-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m3 = M.general_feature_extractor_vx1.v1(
        instruments=m2.data,
        features=m1.data,
        start_date='',
        end_date='',
        before_start_days=90
    )
    
    m4 = M.derived_feature_extractor.v3(
        input_data=m3.data,
        features=m1.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False,
        user_functions={}
    )
    
    In [4]:
    m4.data.read_df().head()
    
    Out[4]:
    date instrument return_10 return_20 rank(return_10 / return_20)
    0 2018-10-08 002415.SZA 0.942441 0.822576 0.965807
    1 2018-10-08 300177.SZA 0.938521 0.927692 0.390032
    2 2018-10-08 600527.SHA 0.950820 0.935484 0.440452
    3 2018-10-08 300003.SZA 0.960068 0.842216 0.960591
    4 2018-10-08 300267.SZA 1.067183 1.176638 0.029557

    (jijingling777) #3

    m2_2 = M.derived_feature_extractor.v3(
    input_data=m2.data,
    features=[‘rank(return_10 / return_20)’])

    您回复的策略里没有生成上面代码,是衍生特征抽取模块调用的上层模块,跟示例代码不一样啊!我问的是上面代码是怎么生成的,您回复的策略用法我是理解的。
    m4 = M.derived_feature_extractor.v3(
    input_data=m3.data,
    features=m1.data,
    date_col=‘date’,
    instrument_col=‘instrument’,
    drop_na=False,
    remove_extra_columns=False,
    user_functions={}


    (达达) #4

    这个属于旧版本用法,目前模块只保留最新版本的,因此无法通过可视化模块实现您这个代码。