基于因子分组的快速回测工具

用户成长系列
标签: #<Tag:0x00007fc4d27ebd48>

(iQuant) #1

本文介绍如何基于机器学习模型的预测结果,将预测值作为因子进行快速分组回测。

一、背景介绍

分组回测是检验单因子有效性的一种常见方式,通过不同分组的股票收益率表现可以查看因子对股票分层的有效性。同样的道理,我们可以通过把训练集模型预测值作为一个因子进行分组快速回测,通过统计分组回测的结果来对一个机器学习模型效果进行评价。也可以把预测集模型预测值通过因子分组回测的方式快速评价,提高模型回测效率和灵活性。

二、案例介绍

这里给出了一个快速回测的例子,用的是stockranker分类模型,不管使用什么机器学习模型,无论是xgboost,还是随机森林,还是GBDT,或者是官网提供的stockranker模型,我们本质得到是验证集数据的预测值,此时我们把预测值作为因子进行快速验证。对于样本内的训练效果我们可以把训练集数据作为预测集数据得到样本内的预测值来考察。

我们这里介绍对验证集数据的预测结果进行因子分组测试。对于预测结果首先查看其预测值列名,以stockranker算法为例,得到的每日的预测数据排序列名为position或score,这里我们根据每天每只股票这个position的值进行分组,设置好调仓周期rabalance_period和手续费率buy_commission_rate和sell_commission_rate,然后我们可以勾选分组的方式,默认分5组,选择long_port_is_big_factor时,表示第5组对应的是预测值position最大的一组;否则,表示第1组对应的是预测值position最大的一组。我们通过“因子分组快速回测”这个模块填入相关参数后就可以进行快速分析。

克隆策略

    {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-215:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:data"},{"DestinationInputPortId":"-215:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-222:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-231:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-238:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-107:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-84:input_data","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data"},{"DestinationInputPortId":"-220:input_1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60:predictions"},{"DestinationInputPortId":"-231:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-220:input_2","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-107:training_ds","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-84:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60:data","SourceOutputPortId":"-86:data"},{"DestinationInputPortId":"-222:input_data","SourceOutputPortId":"-215:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data2","SourceOutputPortId":"-222:data"},{"DestinationInputPortId":"-238:input_data","SourceOutputPortId":"-231:data"},{"DestinationInputPortId":"-86:input_data","SourceOutputPortId":"-238:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60:model","SourceOutputPortId":"-107:model"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2010-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/develop/datasource/deprecated/history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/develop/bigexpr/usage.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\nshift(close, -5) / shift(open, -1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nreturn_5\nreturn_10\nreturn_20\navg_amount_0/avg_amount_5\navg_amount_5/avg_amount_20\nrank_avg_amount_0/rank_avg_amount_5\nrank_avg_amount_5/rank_avg_amount_10\nrank_return_0\nrank_return_5\nrank_return_10\nrank_return_0/rank_return_5\nrank_return_5/rank_return_10\npe_ttm_0\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":7,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-60","ModuleId":"BigQuantSpace.stock_ranker_predict.stock_ranker_predict-v5","ModuleParameters":[{"Name":"m_lazy_run","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"model","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60"}],"OutputPortsInternal":[{"Name":"predictions","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60","OutputType":null},{"Name":"m_lazy_run","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-60","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":8,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"end_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"IsPartOfPartialRun":null,"Comment":"预测数据,用于回测和模拟","CommentCollapsed":false},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-84","ModuleId":"BigQuantSpace.dropnan.dropnan-v1","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-84"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-84","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":13,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-86","ModuleId":"BigQuantSpace.dropnan.dropnan-v1","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-86"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-86","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":14,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-215","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-215"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-215"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-215","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-222","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-222"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-222"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-222","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":16,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-231","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-231"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-231"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-231","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":17,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-238","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-238"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-238"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-238","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":18,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-107","ModuleId":"BigQuantSpace.stock_ranker_train.stock_ranker_train-v6","ModuleParameters":[{"Name":"learning_algorithm","Value":"排序","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"number_of_leaves","Value":30,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"minimum_docs_per_leaf","Value":1000,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"number_of_trees","Value":20,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"learning_rate","Value":0.1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_bins","Value":1023,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"feature_fraction","Value":1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"data_row_fraction","Value":1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"ndcg_discount_base","Value":1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"m_lazy_run","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"training_ds","NodeId":"-107"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-107"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"test_ds","NodeId":"-107"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"base_model","NodeId":"-107"}],"OutputPortsInternal":[{"Name":"model","NodeId":"-107","OutputType":null},{"Name":"feature_gains","NodeId":"-107","OutputType":null},{"Name":"m_lazy_run","NodeId":"-107","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":5,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-220","ModuleId":"BigQuantSpace.factor_group__fast_backtest.factor_group__fast_backtest-v1","ModuleParameters":[{"Name":"rabalance_period","Value":21,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"long_port_is_big_factor","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"buy_commission_rate","Value":"0.0003","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"stamp_tax","Value":0.001,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sell_commission_rate","Value":0.0003,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"rename_factor","Value":"position","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-220"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-220"}],"OutputPortsInternal":[],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-101","ModuleId":"BigQuantSpace.factorlens.factorlens-v1","ModuleParameters":[{"Name":"title","Value":"因子分析: {factor_name}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"2019-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2019-12-31","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"rebalance_period","Value":22,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"stock_pool","Value":"全市场","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"quantile_count","Value":5,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"commission_rate","Value":0.0016,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_price_limit_stocks","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_st_stocks","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_new_stocks","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"normalization","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"neutralization","Value":"%7B%22enumItems%22%3A%5B%7B%22value%22%3A%22%E8%A1%8C%E4%B8%9A%22%2C%22displayValue%22%3A%22%E8%A1%8C%E4%B8%9A%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%B8%82%E5%80%BC%22%2C%22displayValue%22%3A%22%E5%B8%82%E5%80%BC%22%2C%22selected%22%3Atrue%7D%5D%7D","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"metrics","Value":"%7B%22enumItems%22%3A%5B%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E8%A1%A8%E7%8E%B0%E6%A6%82%E8%A7%88%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E8%A1%A8%E7%8E%B0%E6%A6%82%E8%A7%88%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E5%88%86%E5%B8%83%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E5%88%86%E5%B8%83%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E8%A1%8C%E4%B8%9A%E5%88%86%E5%B8%83%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E8%A1%8C%E4%B8%9A%E5%88%86%E5%B8%83%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E5%B8%82%E5%80%BC%E5%88%86%E5%B8%83%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E5%B8%82%E5%80%BC%E5%88%86%E5%B8%83%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22IC%E5%88%86%E6%9E%90%22%2C%22displayValue%22%3A%22IC%E5%88%86%E6%9E%90%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E4%B9%B0%E5%85%A5%E4%BF%A1%E5%8F%B7%E9%87%8D%E5%90%88%E5%88%86%E6%9E%90%22%2C%22displayValue%22%3A%22%E4%B9%B0%E5%85%A5%E4%BF%A1%E5%8F%B7%E9%87%8D%E5%90%88%E5%88%86%E6%9E%90%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E4%BC%B0%E5%80%BC%E5%88%86%E6%9E%90%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E4%BC%B0%E5%80%BC%E5%88%86%E6%9E%90%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E6%8B%A5%E6%8C%A4%E5%BA%A6%E5%88%86%E6%9E%90%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E6%8B%A5%E6%8C%A4%E5%BA%A6%E5%88%86%E6%9E%90%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%9B%A0%E5%AD%90%E5%80%BC%E6%9C%80%E5%A4%A7%2F%E6%9C%80%E5%B0%8F%E8%82%A1%E7%A5%A8%22%2C%22displayValue%22%3A%22%E5%9B%A0%E5%AD%90%E5%80%BC%E6%9C%80%E5%A4%A7%2F%E6%9C%80%E5%B0%8F%E8%82%A1%E7%A5%A8%22%2C%22selected%22%3Atrue%7D%2C%7B%22value%22%3A%22%E5%A4%9A%E5%9B%A0%E5%AD%90%E7%9B%B8%E5%85%B3%E6%80%A7%E5%88%86%E6%9E%90%22%2C%22displayValue%22%3A%22%E5%A4%9A%E5%9B%A0%E5%AD%90%E7%9B%B8%E5%85%B3%E6%80%A7%E5%88%86%E6%9E%90%22%2C%22selected%22%3Atrue%7D%5D%7D","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-101"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"user_factor_data","NodeId":"-101"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-101","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='211,64,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='70,183,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='765,21,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-53' Position='249,375,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-60' Position='660,681,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-62' Position='1115,54,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-84' Position='376,467,200,200'/><NodePosition Node='-86' Position='1078,418,200,200'/><NodePosition Node='-215' Position='381,188,200,200'/><NodePosition Node='-222' Position='385,280,200,200'/><NodePosition Node='-231' Position='1078,236,200,200'/><NodePosition Node='-238' Position='1081,327,200,200'/><NodePosition Node='-107' Position='638,561,200,200'/><NodePosition Node='-220' Position='777,785,200,200'/><NodePosition Node='-101' Position='-22.323974609375,537.644287109375,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [3]:
    # 本代码由可视化策略环境自动生成 2020年3月18日 11:07
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2010-01-01',
        end_date='2015-01-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m2 = M.advanced_auto_labeler.v2(
        instruments=m1.data,
        label_expr="""# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/develop/datasource/deprecated/history_data.html
    #   添加benchmark_前缀,可使用对应的benchmark数据
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/develop/bigexpr/usage.html>`_
    
    # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    shift(close, -5) / shift(open, -1)
    
    # 极值处理:用1%和99%分位的值做clip
    clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    all_wbins(label, 20)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(shift(high, -1) == shift(low, -1), NaN, label)
    """,
        start_date='',
        end_date='',
        benchmark='000300.SHA',
        drop_na_label=True,
        cast_label_int=True
    )
    
    m3 = M.input_features.v1(
        features="""# #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    return_5
    return_10
    return_20
    avg_amount_0/avg_amount_5
    avg_amount_5/avg_amount_20
    rank_avg_amount_0/rank_avg_amount_5
    rank_avg_amount_5/rank_avg_amount_10
    rank_return_0
    rank_return_5
    rank_return_10
    rank_return_0/rank_return_5
    rank_return_5/rank_return_10
    pe_ttm_0
    """
    )
    
    m15 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m3.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    m16 = M.derived_feature_extractor.v3(
        input_data=m15.data,
        features=m3.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False
    )
    
    m7 = M.join.v3(
        data1=m2.data,
        data2=m16.data,
        on='date,instrument',
        how='inner',
        sort=False
    )
    
    m13 = M.dropnan.v1(
        input_data=m7.data
    )
    
    m5 = M.stock_ranker_train.v6(
        training_ds=m13.data,
        features=m3.data,
        learning_algorithm='排序',
        number_of_leaves=30,
        minimum_docs_per_leaf=1000,
        number_of_trees=20,
        learning_rate=0.1,
        max_bins=1023,
        feature_fraction=1,
        data_row_fraction=1,
        ndcg_discount_base=1,
        m_lazy_run=False
    )
    
    m9 = M.instruments.v2(
        start_date=T.live_run_param('trading_date', '2015-01-01'),
        end_date=T.live_run_param('trading_date', '2017-01-01'),
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m17 = M.general_feature_extractor.v7(
        instruments=m9.data,
        features=m3.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    m18 = M.derived_feature_extractor.v3(
        input_data=m17.data,
        features=m3.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=False,
        remove_extra_columns=False
    )
    
    m14 = M.dropnan.v1(
        input_data=m18.data
    )
    
    m8 = M.stock_ranker_predict.v5(
        model=m5.model,
        data=m14.data,
        m_lazy_run=False
    )
    
    m4 = M.factor_group__fast_backtest.v1(
        input_1=m8.predictions,
        input_2=m9.data,
        rabalance_period=21,
        long_port_is_big_factor=True,
        buy_commission_rate=0.0003,
        stamp_tax=0.001,
        sell_commission_rate=0.0003,
        rename_factor='position'
    )
    
    m6 = M.factorlens.v1(
        title='因子分析: {factor_name}',
        start_date='2019-01-01',
        end_date='2019-12-31',
        rebalance_period=22,
        stock_pool='全市场',
        quantile_count=5,
        commission_rate=0.0016,
        drop_price_limit_stocks=True,
        drop_st_stocks=True,
        drop_new_stocks=True,
        normalization=True,
        neutralization=['行业', '市值'],
        metrics=['因子表现概览', '因子分布', '因子行业分布', '因子市值分布', 'IC分析', '买入信号重合分析', '因子估值分析', '因子拥挤度分析', '因子值最大/最小股票', '多因子相关性分析']
    )
    
    设置测试数据集,查看训练迭代过程的NDCG
    bigcharts-data-start/{"__type":"tabs","__id":"bigchart-7e93798199cb4ccd806d379979a22b89"}/bigcharts-data-end

    三、结果解读

    我们通过分组测试,可以看到预测值作为因子分组后的各组收益率具有典型的分层效果,说明我们的模型对股票具有一定的区分能力也就是预测能力。
    同时,我们通过观察第一组曲线和最后一组曲线的相对净值走势能判断出模型的多头组合/空头组合预测效果的增强/减弱趋势,以及模型对多头/空头组合各自的判断能力哪个占据主导。
    此外根据多头组合超额收益的分布,可以判断出模型的超额收益来源的集中程度或稳健性。通过多空收益率和净值可以观察模型多头/空头组合对冲收益表现,作为模型之间比对的参考指标。

    小结:通过因子分组快速回测模块,可以将预测值作为因子进行快速分组回测,帮助我们对模型进行快速评价,为我们模型比对/模型监控/模型改进提供支持。我们还可以把预测结果传给因子分析模块获取更全面的回测结果。


    (公孙睿) #2

    老师,请问 这里根据position值进行分组,具体是怎么分?
    是先排序(大–>小)后,取一定量的股票为一组,这样吗?比如排序后为1-100,则取1-5,6-10,11-15,16-20,21-25,这五组?


    (polll) #3

    每日全市场所有股票按照得分等股票数量分组


    (公孙睿) #4

    不好意思,还是没看明白,能举个例子吗?比如3600只股票,怎么分?分成几组?


    (polll) #5

    你需要用单因子分析模块连到预测结果而不是这个帖子中的模块!并指定单因子分析模块属性栏中的组数,分60组不就是每组6只股票么,按因子排序前6名不就是第0组么,单因子分析模块画的就是各组股票的收益率。你去搜索“因子看板”相关的教程帖子就知道了。


    (公孙睿) #6

    明白了,谢谢老师!