克隆策略

确保回测和仿真一致的方案

  • 我发现:

如果特征列表(查看特征列表m1)包含跨多天表达式因子(比如跨20天). 会导致 回测 和 仿真 差异.

  • 原因:

在回测中, 由于区间足够大, 在区间右端(时间最近的这端), 衍生因子都是非Nan值

在仿真中, 区间就是一天, 用于计算衍生因子的天数仅有 120个自然日. 可能由于长期停牌,导致实际120天内的实际交易日<20, 进而可能导致nan值

  • 解决方案:

首先,在基础特征抽取m3中配置: 向前取数据天数120,

然后, 写一个自定义模块m4, 计算出所有股票每天的最近120个自然日中的交易日数(及非停牌数).

上步实际上是获得一个因子: u_trade_num_in_n120

在实际应用时, 只要对u_trade_num_in_n120进行过滤, 比如: u_trade_num_in_n120>20

    {"Description":"实验创建于2018/1/30","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-76:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-189:input_1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-76:input_data","SourceOutputPortId":"-189:data_1"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\nts_min(low_0,20)/adjust_factor_0\nts_max(high_0,20)/adjust_factor_0\n\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2016-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-01-29","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"预测数据,用于回测和模拟","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-70","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v6","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"120","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-70","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"Comment":"","CommentCollapsed":true},{"Id":"-189","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"# Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端\ndef bigquant_run(input_1, input_2, input_3):\n '''\n 计算最近120个自然日中, 各股票的交易日统计\n \n input_1 基础因子数据\n input_2 空\n input_3 空\n '''\n \n # 原始输入\n df = input_1.read_df()\n \n # 双索引化, 并标注交易日\n dff = df[['date','instrument']].set_index(['date','instrument'])\n dff[\"flag\"] = 1\n \n # 双索引 => 行索引 + 列索引, 将nan值标记为非交易日\n tmp_df = dff['flag'].unstack().fillna(0)\n \n # 标注最近120自然日的起始日\n tmp_df[\"start_date\"] = list(map(lambda x: x-datetime.timedelta(120), tmp_df.index))\n\n # 计算最近120个自然日中, 各股票的交易日统计, 然后转换成双索引Series\n def func(xs):\n return tmp_df[(tmp_df.index>=xs.start_date)&(tmp_df.index<=xs.name)].iloc[:,:-1].sum()\n tmp_df = tmp_df.apply(func,axis=1).stack()\n \n # 合并交易日数(最近120自然日) 顺便过滤掉停牌日, 并且保持原有排序\n dff[\"u_trade_num_in_n120\"] = tmp_df\n \n # 去除双索引 恢复成与df类似的结构\n dff = dff[\"u_trade_num_in_n120\"].reset_index(['date','instrument'])\n \n # 合并老&新因子\n df = pd.merge(df,dff,on=['date','instrument'])\n \n # 输出\n data_1 = DataSource.write_df(df)\n return Outputs(data_1=data_1, data_2=None, data_3=None)\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-189"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-189"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-189"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-189","OutputType":null},{"Name":"data_2","NodeId":"-189","OutputType":null},{"Name":"data_3","NodeId":"-189","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"Comment":"最近120个自然日, 各股票的交易日统计","CommentCollapsed":false},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-76","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v2","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-76"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-76"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-76","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":5,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='0,37,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-62' Position='332,0,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-70' Position='263,126,200,200'/><NodePosition Node='-189' Position='476,191,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-76' Position='275,304,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [1]:
    # 本代码由可视化策略环境自动生成 2018年1月30日 10:39
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.input_features.v1(
        features="""
    ts_min(low_0,20)/adjust_factor_0
    ts_max(high_0,20)/adjust_factor_0
    
    """
    )
    
    m2 = M.instruments.v2(
        start_date='2016-01-01',
        end_date='2018-01-29',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m3 = M.general_feature_extractor.v6(
        instruments=m2.data,
        features=m1.data,
        start_date='',
        end_date='',
        before_start_days=120
    )
    
    # Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
    def m4_run_bigquant_run(input_1, input_2, input_3):
        '''
        计算最近120个自然日中, 各股票的交易日统计
        
            input_1 基础因子数据
            input_2 空
            input_3 空
        '''
        
        # 原始输入
        df = input_1.read_df()
        
        # 双索引化, 并标注交易日
        dff = df[['date','instrument']].set_index(['date','instrument'])
        dff["flag"] = 1
        
        # 双索引 =>  行索引 + 列索引,  将nan值标记为非交易日
        tmp_df = dff['flag'].unstack().fillna(0)
        
        # 标注最近120自然日的起始日
        tmp_df["start_date"] = list(map(lambda x: x-datetime.timedelta(120), tmp_df.index))
    
        # 计算最近120个自然日中, 各股票的交易日统计,  然后转换成双索引Series
        def func(xs):
            return tmp_df[(tmp_df.index>=xs.start_date)&(tmp_df.index<=xs.name)].iloc[:,:-1].sum()
        tmp_df = tmp_df.apply(func,axis=1).stack()
        
        # 合并交易日数(最近120自然日) 顺便过滤掉停牌日, 并且保持原有排序
        dff["u_trade_num_in_n120"] = tmp_df
        
        # 去除双索引 恢复成与df类似的结构
        dff = dff["u_trade_num_in_n120"].reset_index(['date','instrument'])
        
        # 合并老&新因子
        df = pd.merge(df,dff,on=['date','instrument'])
        
        # 输出
        data_1 = DataSource.write_df(df)
        return Outputs(data_1=data_1, data_2=None, data_3=None)
    
    m4 = M.cached.v3(
        input_1=m3.data,
        run=m4_run_bigquant_run
    )
    
    m5 = M.derived_feature_extractor.v2(
        input_data=m4.data_1,
        features=m1.data,
        date_col='date',
        instrument_col='instrument'
    )
    
    [2018-01-30 10:24:42.125353] INFO: bigquant: input_features.v1 开始运行..
    [2018-01-30 10:24:42.134975] INFO: bigquant: 命中缓存
    [2018-01-30 10:24:42.139704] INFO: bigquant: input_features.v1 运行完成[0.014469s].
    [2018-01-30 10:24:42.160151] INFO: bigquant: instruments.v2 开始运行..
    [2018-01-30 10:24:42.165084] INFO: bigquant: 命中缓存
    [2018-01-30 10:24:42.166801] INFO: bigquant: instruments.v2 运行完成[0.006644s].
    [2018-01-30 10:24:42.211149] INFO: bigquant: general_feature_extractor.v6 开始运行..
    [2018-01-30 10:24:44.896755] INFO: 基础特征抽取: 年份 2015, 特征行数=190352
    [2018-01-30 10:24:49.800078] INFO: 基础特征抽取: 年份 2016, 特征行数=641546
    [2018-01-30 10:25:01.532844] INFO: 基础特征抽取: 年份 2017, 特征行数=743233
    [2018-01-30 10:25:02.076076] INFO: 基础特征抽取: 年份 2018, 特征行数=65333
    [2018-01-30 10:25:02.109552] INFO: 基础特征抽取: 总行数: 1640464
    [2018-01-30 10:25:02.113815] INFO: bigquant: general_feature_extractor.v6 运行完成[19.902694s].
    [2018-01-30 10:25:02.135611] INFO: bigquant: cached.v3 开始运行..
    [2018-01-30 10:25:26.892127] INFO: bigquant: cached.v3 运行完成[24.756578s].
    [2018-01-30 10:25:27.190428] INFO: bigquant: derived_feature_extractor.v2 开始运行..
    [2018-01-30 10:25:34.049201] INFO: derived_feature_extractor: 提取完成 ts_max(high_0,20)/adjust_factor_0, 5.881s
    [2018-01-30 10:25:39.236616] INFO: derived_feature_extractor: 提取完成 ts_min(low_0,20)/adjust_factor_0, 5.185s
    [2018-01-30 10:25:40.089113] INFO: derived_feature_extractor: /data, 1640464
    [2018-01-30 10:25:42.644726] INFO: bigquant: derived_feature_extractor.v2 运行完成[15.454307s].
    
    In [2]:
    m5.data.read_df().tail(10)
    
    Out[2]:
    adjust_factor_0 date high_0 instrument low_0 u_trade_num_in_n120 ts_max(high_0,20)/adjust_factor_0 ts_min(low_0,20)/adjust_factor_0
    1640454 1.509191 2018-01-29 18.532866 603987.SHA 18.170660 80.0 13.100000 11.700000
    1640455 1.513542 2018-01-29 62.539555 603988.SHA 60.420597 80.0 53.830000 39.919999
    1640456 1.608156 2018-01-29 60.145035 603989.SHA 58.665531 80.0 43.820003 36.480001
    1640457 1.004616 2018-01-29 36.357052 603990.SHA 35.161560 80.0 39.499998 34.999999
    1640458 1.000000 2018-01-29 27.760000 603991.SHA 27.200001 80.0 28.969999 26.700001
    1640459 3.227797 2018-01-29 25.693264 603993.SHA 24.983149 80.0 8.450000 6.820000
    1640460 1.531452 2018-01-29 27.183273 603996.SHA 26.754467 80.0 18.720001 17.400000
    1640461 1.531580 2018-01-29 17.521275 603997.SHA 17.230274 80.0 11.910000 10.760000
    1640462 3.929271 2018-01-29 44.047127 603998.SHA 43.143394 80.0 12.100000 10.980000
    1640463 2.419014 2018-01-29 19.981056 603999.SHA 19.231161 80.0 8.900000 7.330000
    In [ ]: