克隆策略

Alphalens因子分析模板

模板而已, 方便随意修改,进行针对性的因子分析

1) 因子数据提取

    {"Description":"实验创建于2017/12/27","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-57:features","SourceOutputPortId":"-5441:data"},{"DestinationInputPortId":"-64:features","SourceOutputPortId":"-5441:data"},{"DestinationInputPortId":"-1623:input_3","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-57:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data2","SourceOutputPortId":"-1623:data_1"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:data"},{"DestinationInputPortId":"-73:input_data","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data"},{"DestinationInputPortId":"-64:input_data","SourceOutputPortId":"-57:data"},{"DestinationInputPortId":"-1623:input_1","SourceOutputPortId":"-64:data"}],"ModuleNodes":[{"Id":"-5441","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nreturn_5\nreturn_10\nreturn_20\navg_amount_0/avg_amount_5\navg_amount_5/avg_amount_20\nrank_avg_amount_0/rank_avg_amount_5\nrank_avg_amount_5/rank_avg_amount_10\nrank_return_0\nrank_return_5\nrank_return_10\nrank_return_0/rank_return_5\nrank_return_5/rank_return_10\npe_ttm_0\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-5441"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-5441","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2016-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-1623","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"# Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端\ndef bigquant_run(input_1, input_2, input_3):\n '''\n input_1 数据输入\n input_2 空\n input_3 参数输入\n '''\n \n # 参数\n params = input_3.read_pickle()\n \n # 输入\n df = input_1.read_df()\n # 过滤\n df = df[(df.date>=params[\"start_date\"]) & (df.date<=params[\"end_date\"])]\n # 输出\n data_1 = DataSource.write_df(df)\n return Outputs(data_1=data_1, data_2=None, data_3=None)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"post_run","Value":"# 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。\ndef bigquant_run(outputs):\n return outputs\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"input_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"params","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"output_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-1623"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-1623"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-1623"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-1623","OutputType":null},{"Name":"data_2","NodeId":"-1623","OutputType":null},{"Name":"data_3","NodeId":"-1623","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":5,"IsPartOfPartialRun":null,"Comment":"按时间轴截取数据, \n确保时间区间:\n start_date到end_date","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\nshift(close, -5) / shift(open, -1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":7,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-57","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"120","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-57"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-57"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-57","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"Comment":"","CommentCollapsed":true},{"Id":"-64","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-64"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-64"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-64","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":10,"Comment":"","CommentCollapsed":true},{"Id":"-73","ModuleId":"BigQuantSpace.dropnan.dropnan-v2","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-73"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-73"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-73","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":11,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-5441' Position='628.5714721679688,3.5714282989501953,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='211,66,200,200'/><NodePosition Node='-1623' Position='421,334.2857360839844,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='70,184,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-53' Position='148.14285278320312,388.2857360839844,200,200'/><NodePosition Node='-57' Position='425.2857360839844,191.42855834960938,200,200'/><NodePosition Node='-64' Position='421.85711669921875,265.7142639160156,200,200'/><NodePosition Node='-73' Position='430.28570556640625,482.5714111328125,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":false}
    In [1]:
    # 本代码由可视化策略环境自动生成 2021年6月8日15:09
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
    def m5_run_bigquant_run(input_1, input_2, input_3):
        '''
            input_1 数据输入
            input_2 空
            input_3 参数输入
        '''
        
        # 参数
        params = input_3.read_pickle()
        
        # 输入
        df = input_1.read_df()
        # 过滤
        df = df[(df.date>=params["start_date"]) & (df.date<=params["end_date"])]
        # 输出
        data_1 = DataSource.write_df(df)
        return Outputs(data_1=data_1, data_2=None, data_3=None)
    
    # 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。
    def m5_post_run_bigquant_run(outputs):
        return outputs
    
    
    m1 = M.input_features.v1(
        features="""# #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    return_5
    return_10
    return_20
    avg_amount_0/avg_amount_5
    avg_amount_5/avg_amount_20
    rank_avg_amount_0/rank_avg_amount_5
    rank_avg_amount_5/rank_avg_amount_10
    rank_return_0
    rank_return_5
    rank_return_10
    rank_return_0/rank_return_5
    rank_return_5/rank_return_10
    pe_ttm_0
    """
    )
    
    m2 = M.instruments.v2(
        start_date='2016-01-01',
        end_date='2018-01-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m6 = M.advanced_auto_labeler.v2(
        instruments=m2.data,
        label_expr="""# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
    #   添加benchmark_前缀,可使用对应的benchmark数据
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
    
    # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    shift(close, -5) / shift(open, -1)
    
    # 极值处理:用1%和99%分位的值做clip
    clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    all_wbins(label, 20)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(shift(high, -1) == shift(low, -1), NaN, label)
    """,
        start_date='',
        end_date='',
        benchmark='000300.SHA',
        drop_na_label=True,
        cast_label_int=True
    )
    
    m9 = M.general_feature_extractor.v7(
        instruments=m2.data,
        features=m1.data,
        start_date='',
        end_date='',
        before_start_days=120
    )
    
    m10 = M.derived_feature_extractor.v3(
        input_data=m9.data,
        features=m1.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False
    )
    
    m5 = M.cached.v3(
        input_1=m10.data,
        input_3=m2.data,
        run=m5_run_bigquant_run,
        post_run=m5_post_run_bigquant_run,
        input_ports='',
        params='{}',
        output_ports=''
    )
    
    m7 = M.join.v3(
        data1=m6.data,
        data2=m5.data_1,
        on='date,instrument',
        how='inner',
        sort=False
    )
    
    m11 = M.dropnan.v2(
        input_data=m7.data
    )
    
    In [3]:
    data = m11.data.read_df()
    df = data.groupby('date').apply(lambda x: x.sort_values('instrument').set_index("instrument"))
    

    2) 导入依赖库

    In [5]:
    import pandas as pd
    import numpy as np
    
    from alphalens.tears import (create_returns_tear_sheet,
                          create_information_tear_sheet,
                          create_turnover_tear_sheet,
                          create_summary_tear_sheet,
                          create_full_tear_sheet,
                          create_event_returns_tear_sheet,
                          create_event_study_tear_sheet)
    
    from alphalens.plotting import plot_quantile_statistics_table
    
    from alphalens.utils import get_clean_factor_and_forward_returns
    
    import warnings
    warnings.filterwarnings('ignore')
    

    3)待分析数据准备

    以开盘价+因子pe_ttm为例

    In [6]:
    #
    # 准备价格数据
    #
    prices = df["m:open"]
    prices = prices.unstack()
    
    #
    # 准备因子数据
    #
    factor = df["pe_ttm_0"]
    
    # 
    #
    # 准备行业分组数据
    #
    m2_dict =  m2.data.read_pickle()
    
    industry_data = D.history_data(m2_dict['instruments'],m2_dict['start_date'],m2_dict['end_date'],['industry_sw_level1'])
    industry_data = industry_data.drop('date',axis=1).drop_duplicates()
    ticker_sector = dict(zip(industry_data['instrument'],industry_data['industry_sw_level1']))
    
    In [ ]:
     
    

    4)因子清洗和收益对齐

    获取清洗后的因子及其未来收益(可以包含行业,也可以不包含行业),并将它们的收益对齐.

    将因子数据、价格数据以及行业分类按照索引对齐地格式化到一个数据表中,这个数据表的索引是包含日期和资产的多重索引.

    In [10]:
    # 格式化因子数据
    factor_data = get_clean_factor_and_forward_returns(
        factor, # 因子
        prices, # 价格
        groupby=ticker_sector, # 分组
        quantiles=7,  # 分组个数   (bins 直方图个数)
        periods=(1, 3),  # 因子换手周期
        filter_zscore=None) # 异常值阈值设定
    
    Dropped 0.7% entries from factor data: 0.7% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
    max_loss is 35.0%, not exceeded: OK!
    
    In [11]:
    factor_data.head(10)
    
    Out[11]:
    1D 3D factor group factor_quantile
    date asset
    2016-01-04 000001.SZA -0.060833 -0.049167 7.420235 480000 2
    000004.SZA -0.183007 -0.161220 295.369110 370000 7
    000005.SZA -0.184000 -0.096000 211.351349 410000 7
    000006.SZA -0.157941 -0.116056 25.699854 430000 2
    000008.SZA -0.184343 -0.092593 272.514771 640000 7
    000009.SZA -0.170379 -0.136971 35.605896 510000 3
    000010.SZA -0.156496 -0.137795 -128.686234 620000 1
    000011.SZA -0.146593 -0.139711 55.311367 430000 4
    000012.SZA -0.148455 -0.119066 47.028557 610000 4
    000014.SZA -0.164114 -0.163239 89.095406 430000 5

    5)因子分位数统计

    In [12]:
    plot_quantile_statistics_table(factor_data)
    
    Quantiles Statistics
    
    min max mean std count count %
    factor_quantile
    1 -474317.156250 1.915473e+01 -487.712128 7543.278573 191872 14.301815
    2 -16.002075 3.353682e+01 20.587755 5.832009 191586 14.280497
    3 20.937843 4.889155e+01 34.620502 4.758090 191582 14.280198
    4 33.186512 6.957501e+01 48.939014 6.965004 191598 14.281391
    5 42.147785 1.053466e+02 68.267998 12.392537 191515 14.275204
    6 55.265610 1.984626e+02 105.946342 26.297321 191653 14.285491
    7 91.218582 2.432540e+06 1318.697266 33950.086287 191786 14.295404

    6)因子收益分析

    因子收益部分包括了因子分组超额收益分布直方图和琴型图、因子的累计收益曲线、超额收益曲线、因子加权收益、因子收益分布琴型图,因子spread 等结果。

    In [13]:
    create_returns_tear_sheet(factor_data,
                              long_short=False,  # 是否计算多空组合的收益
                              group_neutral=False,  # 是否按照行业调整或者行业中性后的收益
                              by_group=False) # 是否按照行业分组展示
    
    Returns Analysis
    
    1D 3D
    Ann. alpha -0.035 -0.070
    beta 0.169 0.229
    Mean Period Wise Return Top Quantile (bps) -3.458 -5.725
    Mean Period Wise Return Bottom Quantile (bps) -3.033 -0.640
    Mean Period Wise Spread (bps) -0.425 -5.397
    <Figure size 432x288 with 0 Axes>

    7)因子IC分析

    因子IC 部分包括了因子IC 表、因子IC 时间序列、因子IC 分布图和QQ 图、因子IC 热力图等结果。

    In [14]:
    create_information_tear_sheet(factor_data,group_neutral=False,by_group=False)
    
    Information Analysis
    
    1D 3D
    IC Mean -0.006 -0.024
    IC Std. 0.104 0.114
    Risk-Adjusted IC -0.058 -0.210
    t-stat(IC) -1.267 -4.604
    p-value(IC) 0.206 0.000
    IC Skew -0.061 -0.049
    IC Kurtosis 0.011 -0.251
    <Figure size 432x288 with 0 Axes>

    8)因子换手分析

    因子换手部分包括了因子分组平均换手率、因子换手率时间序列及因子排序自相关性等结果。

    In [19]:
    # create_turnover_tear_sheet(factor_data) # index有问题
    

    事件研究

    创建一个样例图表(tear sheet)来查看一个窗口内(事件前后)因子的平均累计收益率。

    事件研究部分包括了因子分组平均超额收益随时间的关系及每组超额收益随时间的分布特征等。

    In [17]:
    create_event_returns_tear_sheet(factor_data, prices, avgretplot=(3, 11),
                                    long_short=False, group_neutral=False, by_group=False)
    
    <Figure size 432x288 with 0 Axes>
    In [ ]: