根据财务数据生成目标因子

用户成长系列
标签: #<Tag:0x00007fcf72152e28>

(chaoskey) #1
克隆策略

根据财务数据生成目标因子的一个例子

某些我需要的,并且计算比较麻烦的衍生财务因子, 平台没有提供. 我只好根据平台提供财务数据计算出我的目标因子.

下面以营业总收入复合年增长率的计算为例, 展示如何生成因子.

实际上,我是在衍生特征抽取模块中,实现了一个自定义表达式:

#最近的营业总收入复合年增长率(利用最近3年的年报数据)
fu_gross_revenues_cagr(fs_quarter_year_0,fs_quarter_index_0)

    {"Description":"实验创建于2018/2/1","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-6163:instruments","SourceOutputPortId":"-6150:data"},{"DestinationInputPortId":"-6163:features","SourceOutputPortId":"-6158:data"},{"DestinationInputPortId":"-23:features","SourceOutputPortId":"-6158:data"},{"DestinationInputPortId":"-23:input_data","SourceOutputPortId":"-6163:data"}],"ModuleNodes":[{"Id":"-6150","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-01-31","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-6150"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-6150","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-6158","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# 最近的营业总收入复合年增长率(利用最近3年的年报数据)\nfu_gross_revenues_cagr(fs_quarter_year_0,fs_quarter_index_0)\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-6158"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-6158","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-6163","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v6","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-6163"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-6163"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-6163","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-23","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v2","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"def fu_gross_revenues_cagr(df,fs_quarter_year_0,fs_quarter_index_0):\n \n end_date = max(df.date).strftime('%Y-%m-%d')\n\n instruments = D.instruments(start_date='2005-01-01', end_date=end_date)\n\n dff = D.financial_statements(instruments, start_date='2005-01-01', end_date=end_date, \n fields=['instrument','fs_quarter_year','fs_quarter_index','fs_gross_revenues'])\n # 仅保留年报\n dff = dff[dff.fs_quarter_index==4]\n\n # 对年报数据进行双索引化\n tmp_df = dff.set_index([\"fs_quarter_year\",\"instrument\"])\n\n # 对营业总收入, 双索引化 => 行索引fs_quarter_year + 列索引instrument\n gross_revenues = tmp_df[\"fs_gross_revenues\"].unstack().astype(np.float64)\n\n # 计算最近3年区间的营业总收入复合年增长率\n gross_revenues_cagr= ((gross_revenues/gross_revenues.shift(3))**(1/3)-1)*100\n\n # 将算出的营业总收入复合年增长率合并到双索引化的年报数据中\n tmp_df[\"u_gross_revenues_cagr\"] = gross_revenues_cagr.stack()\n\n # 恢复成无双索引的年报数据\n dff = tmp_df.reset_index([\"fs_quarter_year\",\"instrument\"])\n\n # 最近的年报\n last_quarter_year = (df[\"fs_quarter_year_0\"] - (df[\"fs_quarter_index_0\"]<4))\n\n # 根据最新年报查询营业总收入复合年增长率\n tmp_df = df[[\"date\",\"instrument\"]]\n tmp_df[\"fs_quarter_year\"] = last_quarter_year\n tmp_df = pd.merge(tmp_df,dff[['instrument','fs_quarter_year','u_gross_revenues_cagr']],\n on=['instrument','fs_quarter_year'],\n how='left')\n\n # 目标因子\n return tmp_df['u_gross_revenues_cagr']\n\nbigquant_run = {\n 'fu_gross_revenues_cagr': fu_gross_revenues_cagr\n}\n\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-23"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-23"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-23","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-6150' Position='152,-169,200,200'/><NodePosition Node='-6158' Position='506,-169,200,200'/><NodePosition Node='-6163' Position='358,1,200,200'/><NodePosition Node='-23' Position='356,99,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [1]:
    # 本代码由可视化策略环境自动生成 2018年2月3日 13:33
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2018-01-01',
        end_date='2018-01-31',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m2 = M.input_features.v1(
        features="""# 最近的营业总收入复合年增长率(利用最近3年的年报数据)
    fu_gross_revenues_cagr(fs_quarter_year_0,fs_quarter_index_0)
    """
    )
    
    m3 = M.general_feature_extractor.v6(
        instruments=m1.data,
        features=m2.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    def fu_gross_revenues_cagr(df,fs_quarter_year_0,fs_quarter_index_0):
        
        end_date = max(df.date).strftime('%Y-%m-%d')
    
        instruments = D.instruments(start_date='2005-01-01', end_date=end_date)
    
        dff = D.financial_statements(instruments, start_date='2005-01-01', end_date=end_date, 
                                    fields=['instrument','fs_quarter_year','fs_quarter_index','fs_gross_revenues'])
        # 仅保留年报
        dff = dff[dff.fs_quarter_index==4]
    
        # 对年报数据进行双索引化
        tmp_df = dff.set_index(["fs_quarter_year","instrument"])
    
        # 对营业总收入,  双索引化 =>  行索引fs_quarter_year + 列索引instrument
        gross_revenues = tmp_df["fs_gross_revenues"].unstack().astype(np.float64)
    
        # 计算最近3年区间的营业总收入复合年增长率
        gross_revenues_cagr= ((gross_revenues/gross_revenues.shift(3))**(1/3)-1)*100
    
        # 将算出的营业总收入复合年增长率合并到双索引化的年报数据中
        tmp_df["u_gross_revenues_cagr"] =  gross_revenues_cagr.stack()
    
        # 恢复成无双索引的年报数据
        dff = tmp_df.reset_index(["fs_quarter_year","instrument"])
    
        # 最近的年报
        last_quarter_year = (df["fs_quarter_year_0"] - (df["fs_quarter_index_0"]<4))
    
        # 根据最新年报查询营业总收入复合年增长率
        tmp_df = df[["date","instrument"]]
        tmp_df["fs_quarter_year"] = last_quarter_year
        tmp_df = pd.merge(tmp_df,dff[['instrument','fs_quarter_year','u_gross_revenues_cagr']],
                      on=['instrument','fs_quarter_year'],
                      how='left')
    
        # 目标因子
        return tmp_df['u_gross_revenues_cagr']
    
    m4_user_functions_bigquant_run = {
        'fu_gross_revenues_cagr':  fu_gross_revenues_cagr
    }
    
    
    m4 = M.derived_feature_extractor.v2(
        input_data=m3.data,
        features=m2.data,
        date_col='date',
        instrument_col='instrument',
        user_functions=m4_user_functions_bigquant_run
    )
    
    [2018-02-03 13:32:36.570319] INFO: bigquant: instruments.v2 开始运行..
    [2018-02-03 13:32:36.576159] INFO: bigquant: 命中缓存
    [2018-02-03 13:32:36.577293] INFO: bigquant: instruments.v2 运行完成[0.006995s].
    [2018-02-03 13:32:36.585773] INFO: bigquant: input_features.v1 开始运行..
    [2018-02-03 13:32:36.588507] INFO: bigquant: 命中缓存
    [2018-02-03 13:32:36.589640] INFO: bigquant: input_features.v1 运行完成[0.003873s].
    [2018-02-03 13:32:36.609553] INFO: bigquant: general_feature_extractor.v6 开始运行..
    [2018-02-03 13:32:36.611735] INFO: bigquant: 命中缓存
    [2018-02-03 13:32:36.612642] INFO: bigquant: general_feature_extractor.v6 运行完成[0.003048s].
    [2018-02-03 13:32:36.674144] INFO: bigquant: derived_feature_extractor.v2 开始运行..
    [2018-02-03 13:32:36.677020] INFO: bigquant: 命中缓存
    [2018-02-03 13:32:36.677977] INFO: bigquant: derived_feature_extractor.v2 运行完成[0.00385s].
    
    In [2]:
    m4.data.read_df().head()
    
    Out[2]:
    date fs_quarter_index_0 fs_quarter_year_0 instrument fu_gross_revenues_cagr(fs_quarter_year_0,fs_quarter_index_0)
    0 2018-01-02 3.0 2017.0 000001.SZA 27.320717
    1 2018-01-02 3.0 2017.0 000002.SZA 21.096526
    2 2018-01-02 3.0 2017.0 000004.SZA 58.107242
    3 2018-01-02 3.0 2017.0 000005.SZA 109.959338
    4 2018-01-02 3.0 2017.0 000008.SZA 83.332854
    In [ ]:
     
    

    (sold) #2

    @chaoskey 最喜欢你的帖子了,分享不少干货,顶顶顶!