回归树选股策略 - 开发经验分享与心得交流 (以此版为准)

策略分享
标签: #<Tag:0x00007f8c59b3d278>

(lulalulaheiii) #1

Hello大家好,我是Barry,一个希望能不断提升自己ML量化姿势水平的大学生~

在开发这个回归树策略的过程中,我试出了一些模块调参、优化的经验,但也遇到了一些自己感觉很困惑、很难解释清楚的问题。所以我想把这个策略的开发经验,和一些心得体会都拿出来分享给大家,希望能互相交流促进~

克隆策略

    {"Description":"实验创建于2020/8/6","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-274:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:data"},{"DestinationInputPortId":"-281:input_data","SourceOutputPortId":"-274:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data2","SourceOutputPortId":"-281:data"},{"DestinationInputPortId":"-274:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-281:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-288:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-295:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-4627:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-288:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-728:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62:data"},{"DestinationInputPortId":"-243:input_data","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data"},{"DestinationInputPortId":"-4627:training_ds","SourceOutputPortId":"-243:data"},{"DestinationInputPortId":"-295:input_data","SourceOutputPortId":"-288:data"},{"DestinationInputPortId":"-247:input_data","SourceOutputPortId":"-295:data"},{"DestinationInputPortId":"-4627:predict_ds","SourceOutputPortId":"-247:data"},{"DestinationInputPortId":"-749:input_ds","SourceOutputPortId":"-4627:predictions"},{"DestinationInputPortId":"-728:options_data","SourceOutputPortId":"-749:sorted_data"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2013-07-31","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-07-30","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:13日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\n#shift(open, -5) / shift(close, 0) - shift(benchmark_open, -5) / shift(benchmark_close, 0)\nshift(open, -5) / shift(close,0)\n# 极值处理:用1%和99%分位的值做clip\n#clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\n#all_wbins(label, 3)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(label>1.09,NaN,label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000001.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-274","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"30","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-274"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-274"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-274","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-281","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-281"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-281"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-281","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nreturn_5\nrank_return_5\navg_amount_5\npe_ttm_0\nrank_pe_lyr_0\nrank_fs_net_profit_yoy_0\nrank_fs_roe_0\nrank_fs_eps_0\n\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":5,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2018-07-31","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"end_date","Value":"2020-07-31","ValueType":"Literal","LinkedGlobalParameter":"交易日期"},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-62","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"IsPartOfPartialRun":null,"Comment":"预测数据,用于回测和模拟","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":7,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-243","ModuleId":"BigQuantSpace.dropnan.dropnan-v2","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-243"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-243"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-243","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":8,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-288","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"60","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-288"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-288"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-288","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-295","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-295"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-295"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-295","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":10,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-247","ModuleId":"BigQuantSpace.dropnan.dropnan-v2","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-247"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-247"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-247","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":11,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-728","ModuleId":"BigQuantSpace.trade.trade-v4","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"initialize","Value":"# 回测引擎:初始化函数,只执行一次\ndef bigquant_run(context):\n # 加载预测数据\n context.ranker_prediction = context.options['data'].read_df()\n\n # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数\n context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))\n # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)\n # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只\n context.stock_count = 5\n # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]\n context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, context.stock_count)])\n # 设置每只股票占用的最大资金比例\n context.max_cash_per_instrument = 0.2\n context.options['hold_days'] = 5\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"handle_data","Value":"# 回测引擎:每日数据处理函数,每天执行一次\ndef bigquant_run(context, data):\n #########\n # step 1: 按日期过滤得到今日的预测数据\n #########\n \n # 第一天,只出预测结果,不卖股票\n if context.trading_day_index == 0:\n \n ranker_prediction = context.ranker_prediction[context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]\n context.instruments = list(ranker_prediction.instrument[:context.stock_count])\n \n # 以后,每隔5天是操作日\n elif context.trading_day_index%5==0:\n \n # 操作日当天,先把所有股票在开盘价全部清空\n for instrument in context.instruments:\n # order target函数:买卖股票,使成交之后最终持有的股票数量达到预期目标的amount(股数)\n # 比如这里就是使最终持有的context.symbol(instrument)股票数量为0,即卖出这支指定股票的全部\n context.order_target(context.symbol(instrument), 0)\n \n # 然后预测新的一轮股票\n ranker_prediction = context.ranker_prediction[context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]\n context.instruments = list(ranker_prediction.instrument[:context.stock_count])\n \n # 然后每天都按照weight来建仓\n\n cash_avg = context.portfolio.portfolio_value / context.options['hold_days'] \n \n last_day_staging = (context.trading_day_index+1)%5==0\n # 于是在建仓期,每天最多共花掉cash_avg的钱,在建仓截止日那天,最多共花掉1.5*cash_avg的钱,\n # 当然,花的钱不能超过账户中还剩下的钱 context.portfolio.cash\n cash_for_buy = min(context.portfolio.cash, (1.5 if last_day_staging else 1) * cash_avg)\n \n positions = {e.symbol: p.amount * p.last_sale_price\n for e, p in context.perf_tracker.position_tracker.positions.items()}\n \n buy_cash_weights = context.stock_weights # [0.339160, 0.213986, 0.169580, ..]\n buy_instruments = context.instruments\n \n # 在这里 max_cash_per_instrument = 200000\n max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument \n \n i = context.trading_day_index%5\n for instrument in buy_instruments:\n # 计算用来买入股票的钱\n cash = cash_for_buy * buy_cash_weights[i]\n\n # 确保股票持仓量不会超过每次股票最大的占用资金量\n if cash > max_cash_per_instrument - positions.get(instrument, 0):\n # postitons 这里是获取已经花在指定的持仓股票上的钱,如果这支股票还没有建仓(第一次买),则返回0值\n # 于是 max_cash_per_instrument - positions.get(instrument, 0)指的是还剩下的可用来购买这支股票的钱 \n cash = max_cash_per_instrument - positions.get(instrument, 0)\n \n if cash > 0:\n # 最后用cash这么多钱去买指定的股票,这里cash>0代表是买入(做多)的钱,即这里暂时不接受做空\n context.order_value(context.symbol(instrument), cash)\n ","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"prepare","Value":"# 回测引擎:准备数据,只执行一次\ndef bigquant_run(context):\n pass\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_trading_start","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"volume_limit","Value":0.025,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_buy","Value":"close","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_sell","Value":"open","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"capital_base","Value":"1000000","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"auto_cancel_non_tradable_orders","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"data_frequency","Value":"daily","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"price_type","Value":"后复权","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"product_type","Value":"股票","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"plot_charts","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"backtest_only","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000001.SHA","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-728"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"options_data","NodeId":"-728"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"history_ds","NodeId":"-728"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"benchmark_ds","NodeId":"-728"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"trading_calendar","NodeId":"-728"}],"OutputPortsInternal":[{"Name":"raw_perf","NodeId":"-728","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":13,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-4627","ModuleId":"BigQuantSpace.decision_tree_regressor.decision_tree_regressor-v1","ModuleParameters":[{"Name":"criterion","Value":"mse","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"feature_fraction","Value":"0.95","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_depth","Value":"8","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"min_samples_per_leaf","Value":"245","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"key_cols","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"other_train_parameters","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"training_ds","NodeId":"-4627"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-4627"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"model","NodeId":"-4627"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"predict_ds","NodeId":"-4627"}],"OutputPortsInternal":[{"Name":"output_model","NodeId":"-4627","OutputType":null},{"Name":"predictions","NodeId":"-4627","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-749","ModuleId":"BigQuantSpace.sort.sort-v4","ModuleParameters":[{"Name":"sort_by","Value":"pred_label","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"group_by","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"keep_columns","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"ascending","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_ds","NodeId":"-749"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"sort_by_ds","NodeId":"-749"}],"OutputPortsInternal":[{"Name":"sorted_data","NodeId":"-749","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":12,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-117","ModuleId":"BigQuantSpace.hyper_parameter_search.hyper_parameter_search-v1","ModuleParameters":[{"Name":"param_grid_builder","Value":"def bigquant_run():\n param_grid = {}\n\n fraction_list = []\n for i in range(16):\n a = 0.8+0.01*i\n fraction_list.append(a)\n \n param_grid['m15.criterion'] = ['mse','mae','friedman_mse']\n #param_grid['m15.feature_fraction'] = fraction_list \n #param_grid['m15.max_depth'] = list(range(7,22,1))\n #param_grid['m15.min_samples_per_leaf'] = list(range(235,255,1))\n\n\n return param_grid\n\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"scoring","Value":"def bigquant_run(result):\n score = result.get('m15').read_raw_perf()['sharpe'].tail(1)[0]\n\n return score\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"search_algorithm","Value":"随机搜索","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"search_iterations","Value":"50","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"random_state","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"workers","Value":1,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"worker_distributed_run","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"worker_silent","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"run_now","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"bq_graph","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"bq_graph_port","NodeId":"-117"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-117"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-117"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-117"}],"OutputPortsInternal":[{"Name":"result","NodeId":"-117","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":14,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='212,58,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='71,175,200,200'/><NodePosition Node='-274' Position='382,188,200,200'/><NodePosition Node='-281' Position='383,256,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='708,0,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-62' Position='1050,99,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-53' Position='225,324,200,200'/><NodePosition Node='-243' Position='225,386,200,200'/><NodePosition Node='-288' Position='1054,177,200,200'/><NodePosition Node='-295' Position='1053,256,200,200'/><NodePosition Node='-247' Position='1054,332,200,200'/><NodePosition Node='-728' Position='801,661,200,200'/><NodePosition Node='-4627' Position='611,472,200,200'/><NodePosition Node='-749' Position='662,568,200,200'/><NodePosition Node='-117' Position='-43,580,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [20]:
    # 本代码由可视化策略环境自动生成 2020年9月6日 17:25
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # 回测引擎:初始化函数,只执行一次
    def m13_initialize_bigquant_run(context):
        # 加载预测数据
        context.ranker_prediction = context.options['data'].read_df()
    
        # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数
        context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))
        # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)
        # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只
        context.stock_count = 5
        # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]
        context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, context.stock_count)])
        # 设置每只股票占用的最大资金比例
        context.max_cash_per_instrument = 0.2
        context.options['hold_days'] = 5
    
    # 回测引擎:每日数据处理函数,每天执行一次
    def m13_handle_data_bigquant_run(context, data):
        #########
        # step 1: 按日期过滤得到今日的预测数据
        #########
        
        # 第一天,只出预测结果,不卖股票
        if context.trading_day_index == 0:
        
            ranker_prediction = context.ranker_prediction[context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]
            context.instruments = list(ranker_prediction.instrument[:context.stock_count])
        
        #  以后,每隔5天是操作日
        elif context.trading_day_index%5==0:
            
            # 操作日当天,先把所有股票在开盘价全部清空
            for instrument in context.instruments:
                # order target函数:买卖股票,使成交之后最终持有的股票数量达到预期目标的amount(股数)
                # 比如这里就是使最终持有的context.symbol(instrument)股票数量为0,即卖出这支指定股票的全部
                context.order_target(context.symbol(instrument), 0)
            
            # 然后预测新的一轮股票
            ranker_prediction = context.ranker_prediction[context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]
            context.instruments = list(ranker_prediction.instrument[:context.stock_count])
            
        # 然后每天都按照weight来建仓
    
        cash_avg = context.portfolio.portfolio_value / context.options['hold_days'] 
        
        last_day_staging  = (context.trading_day_index+1)%5==0
        # 于是在建仓期,每天最多共花掉cash_avg的钱,在建仓截止日那天,最多共花掉1.5*cash_avg的钱,
        # 当然,花的钱不能超过账户中还剩下的钱 context.portfolio.cash
        cash_for_buy = min(context.portfolio.cash, (1.5 if last_day_staging else 1) * cash_avg)
        
        positions = {e.symbol: p.amount * p.last_sale_price
                     for e, p in context.perf_tracker.position_tracker.positions.items()}
        
        buy_cash_weights = context.stock_weights # [0.339160, 0.213986, 0.169580, ..]
        buy_instruments = context.instruments
        
        # 在这里 max_cash_per_instrument = 200000
        max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument 
        
        i = context.trading_day_index%5
        for instrument in buy_instruments:
            # 计算用来买入股票的钱
            cash = cash_for_buy * buy_cash_weights[i]
    
            #  确保股票持仓量不会超过每次股票最大的占用资金量
            if cash > max_cash_per_instrument - positions.get(instrument, 0):
            # postitons 这里是获取已经花在指定的持仓股票上的钱,如果这支股票还没有建仓(第一次买),则返回0值
            # 于是 max_cash_per_instrument - positions.get(instrument, 0)指的是还剩下的可用来购买这支股票的钱            
                cash = max_cash_per_instrument - positions.get(instrument, 0)
            
            if cash > 0:
                # 最后用cash这么多钱去买指定的股票,这里cash>0代表是买入(做多)的钱,即这里暂时不接受做空
                context.order_value(context.symbol(instrument), cash)
            
    # 回测引擎:准备数据,只执行一次
    def m13_prepare_bigquant_run(context):
        pass
    
    
    g = T.Graph({
    
        'm1': 'M.instruments.v2',
        'm1.start_date': '2013-07-31',
        'm1.end_date': '2018-07-30',
        'm1.market': 'CN_STOCK_A',
        'm1.instrument_list': '',
        'm1.max_count': 0,
    
        'm2': 'M.advanced_auto_labeler.v2',
        'm2.instruments': T.Graph.OutputPort('m1.data'),
        'm2.label_expr': """# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
    #   添加benchmark_前缀,可使用对应的benchmark数据
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
    
    # 计算收益:13日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    #shift(open, -5) / shift(close, 0) - shift(benchmark_open, -5) / shift(benchmark_close, 0)
    shift(open, -5) / shift(close,0)
    # 极值处理:用1%和99%分位的值做clip
    #clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    #all_wbins(label, 3)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(label>1.09,NaN,label)
    where(shift(high, -1) == shift(low, -1), NaN, label)
    """,
        'm2.start_date': '',
        'm2.end_date': '',
        'm2.benchmark': '000001.SHA',
        'm2.drop_na_label': True,
        'm2.cast_label_int': False,
    
        'm5': 'M.input_features.v1',
        'm5.features': """# #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    return_5
    rank_return_5
    avg_amount_5
    pe_ttm_0
    rank_pe_lyr_0
    rank_fs_net_profit_yoy_0
    rank_fs_roe_0
    rank_fs_eps_0
    
    """,
    
        'm3': 'M.general_feature_extractor.v7',
        'm3.instruments': T.Graph.OutputPort('m1.data'),
        'm3.features': T.Graph.OutputPort('m5.data'),
        'm3.start_date': '',
        'm3.end_date': '',
        'm3.before_start_days': 30,
    
        'm4': 'M.derived_feature_extractor.v3',
        'm4.input_data': T.Graph.OutputPort('m3.data'),
        'm4.features': T.Graph.OutputPort('m5.data'),
        'm4.date_col': 'date',
        'm4.instrument_col': 'instrument',
        'm4.drop_na': False,
        'm4.remove_extra_columns': False,
    
        'm7': 'M.join.v3',
        'm7.data1': T.Graph.OutputPort('m2.data'),
        'm7.data2': T.Graph.OutputPort('m4.data'),
        'm7.on': 'date,instrument',
        'm7.how': 'inner',
        'm7.sort': True,
    
        'm8': 'M.dropnan.v2',
        'm8.input_data': T.Graph.OutputPort('m7.data'),
    
        'm6': 'M.instruments.v2',
        'm6.start_date': T.live_run_param('trading_date', '2018-07-31'),
        'm6.end_date': T.live_run_param('trading_date', '2020-07-31'),
        'm6.market': 'CN_STOCK_A',
        'm6.instrument_list': '',
        'm6.max_count': 0,
    
        'm9': 'M.general_feature_extractor.v7',
        'm9.instruments': T.Graph.OutputPort('m6.data'),
        'm9.features': T.Graph.OutputPort('m5.data'),
        'm9.start_date': '',
        'm9.end_date': '',
        'm9.before_start_days': 60,
    
        'm10': 'M.derived_feature_extractor.v3',
        'm10.input_data': T.Graph.OutputPort('m9.data'),
        'm10.features': T.Graph.OutputPort('m5.data'),
        'm10.date_col': 'date',
        'm10.instrument_col': 'instrument',
        'm10.drop_na': False,
        'm10.remove_extra_columns': False,
    
        'm11': 'M.dropnan.v2',
        'm11.input_data': T.Graph.OutputPort('m10.data'),
    
        'm15': 'M.decision_tree_regressor.v1',
        'm15.training_ds': T.Graph.OutputPort('m8.data'),
        'm15.features': T.Graph.OutputPort('m5.data'),
        'm15.predict_ds': T.Graph.OutputPort('m11.data'),
        'm15.criterion': 'mse',
        'm15.feature_fraction': 0.95,
        'm15.max_depth': 8,
        'm15.min_samples_per_leaf': 245,
        'm15.key_cols': 'date,instrument',
        'm15.other_train_parameters': {},
    
        'm12': 'M.sort.v4',
        'm12.input_ds': T.Graph.OutputPort('m15.predictions'),
        'm12.sort_by': 'pred_label',
        'm12.group_by': 'date',
        'm12.keep_columns': '',
        'm12.ascending': False,
    
        'm13': 'M.trade.v4',
        'm13.instruments': T.Graph.OutputPort('m6.data'),
        'm13.options_data': T.Graph.OutputPort('m12.sorted_data'),
        'm13.start_date': '',
        'm13.end_date': '',
        'm13.initialize': m13_initialize_bigquant_run,
        'm13.handle_data': m13_handle_data_bigquant_run,
        'm13.prepare': m13_prepare_bigquant_run,
        'm13.volume_limit': 0.025,
        'm13.order_price_field_buy': 'close',
        'm13.order_price_field_sell': 'open',
        'm13.capital_base': 1000000,
        'm13.auto_cancel_non_tradable_orders': True,
        'm13.data_frequency': 'daily',
        'm13.price_type': '后复权',
        'm13.product_type': '股票',
        'm13.plot_charts': True,
        'm13.backtest_only': False,
        'm13.benchmark': '000001.SHA',
    })
    
    # g.run({})
    
    
    def m14_param_grid_builder_bigquant_run():
        param_grid = {}
    
        fraction_list = []
        for i in range(16):
            a = 0.8+0.01*i
            fraction_list.append(a)
        
        param_grid['m15.criterion'] = ['mse','mae','friedman_mse']
        #param_grid['m15.feature_fraction'] = fraction_list    
        #param_grid['m15.max_depth'] = list(range(7,22,1))
        #param_grid['m15.min_samples_per_leaf'] = list(range(235,255,1))
    
    
        return param_grid
    
    
    def m14_scoring_bigquant_run(result):
        score = result.get('m15').read_raw_perf()['sharpe'].tail(1)[0]
    
        return score
    
    
    m14 = M.hyper_parameter_search.v1(
        param_grid_builder=m14_param_grid_builder_bigquant_run,
        scoring=m14_scoring_bigquant_run,
        search_algorithm='随机搜索',
        search_iterations=50,
        workers=1,
        worker_distributed_run=True,
        worker_silent=True,
        run_now=True,
        bq_graph=g
    )
    
    • 收益率30.93%
    • 年化收益率14.96%
    • 基准收益率15.37%
    • 阿尔法0.09
    • 贝塔0.47
    • 夏普比率0.86
    • 胜率0.59
    • 盈亏比1.08
    • 收益波动率13.91%
    • 信息比率0.02
    • 最大回撤10.2%
    bigcharts-data-start/{"__type":"tabs","__id":"bigchart-6096b5c9d5a64c099c215b7daf01a540"}/bigcharts-data-end
    In [24]:
    print(m14.result.best_params_)
    print(m14.result.best_score_)
    
    {'m15.min_samples_per_leaf': 245, 'm15.feature_fraction': 0.9500000000000001}
    -inf
    
    In [16]:
    print(m14.result.best_params_)
    
    {'m15.criterion': 'mse'}
    

    当前最佳参数: criterion = mse min_samples_per_leaf = 245 max_depth = 8, feature_fraction = 0.95

    In [11]:
    m15.feature_gains()
    
    Out[11]:
    feature gain
    0 return_5 0.357418
    2 avg_amount_5 0.308070
    1 rank_return_5 0.151206
    3 pe_ttm_0 0.138084
    4 rank_pe_lyr_0 0.033761
    7 rank_fs_eps_0 0.008346
    5 rank_fs_net_profit_yoy_0 0.002149
    6 rank_fs_roe_0 0.000967
    In [ ]:
     
    

    首先说一下策略各个模块的配置:

    • 训练集:2013/07/31 - 2018/07/30 测试集:2018/07/31 - 2020/07/30。

      我选择使用比较长时间的市场数据作为训练测试集,因为我觉得这样能一定程度上减少短期市场波动的影响(比如疫情期,再好的算法也基本很难达到破个位数的年回报率)。

      我曾经在论坛上看到一些策略,开发者可能只用一个季度或一个月做回测,这样的策略也许短期表现的很好,但是时间一拉长(比如到一年),很多策略的表现就会直线下滑。

    • 特征选择:[return_5,rank_return_5,avg_amount_5,pe_ttm_0,rank_pe_lyr_0 ,rank_fs_net_profit_yoy_0,rank_fs_roe_0,rank_fs_eps_0]。

      这里我并没有用什么很fancy的因子作特征,主要还是按照最后回测模块里的“5日滚动交易”来进行选择,以及加上一些常见的财报估值因子。

    • 标注股票:where(label>1.09,NaN,label)

      这个策略刚开始跑的并不好,我在检查m1、m2模块时发现,竟然有一些公司5日的回报率达到了特别特别大的一个数(比如1500%这种),这是不符合常识的。我想可能是这些公司正好在这五天里经历了融资或者并购,所以股价一下飙涨那么多。

      但是这些公司并不一定就是真正的好公司。所以这里我用 where(label>1.09,NaN,label) ,把5日回报率超过1.09的公司给过滤掉。那么现在训练集里的 return_5 最高便不超过1.09了,模型回测的表现也肉眼可见地上升了一个量级。

    • 机器学习模块:这里使用的 crieterion = mse。值得一提的是,在回归树模型里,我的经验就是用mse,而不用mae或friedman_mse,以前在做回归树的时候,不管其它各个模块怎么写,使用mse的表现总是要比mae和friedman_mse要好。并不清楚为啥。至于其它各个参数,通过超参数随机搜索,得出来一个参数集合是[criterion=‘mse’, min_samples_per_leaf=245, max_depth=8, feature_fraction=0.95 ]

    • 回测模块:这里我们使用BigQuant平台自带的滚动交易方法

    以上大概就是我在各个模块使用的一些算法以及调参经验。接下来我想分享一下自己的一些困惑:

    • 我的困惑主要还是在 m14 超参选优的模块上。现在一般来说,我的搜索方式都是先随机搜索,然后再此基础上局部调参,用网格搜索希望能搜出更优的参数。可是好像每一次局部调优(网格搜索)出来的结果,带进学习模型,最后回测结果总是变得更差了,反而使随机搜索出来的结果,最后也基本上是用在学习模块上的参数了。这是第一个问题。

    • 而且我发现,如果需要调的参数是数值类型,比如 max_depth = [4,5,6,…,29,30](而不是像criterion那种类别型数据),那么最后网格搜索的结果,总是返回给我可取的最小值,比如 max_depth=4,我要是敢调成[1,2,3,4…],怕不是最后给我返回个“max_depth = 1” ?

      这个结果明显不符合逻辑,因为在这么大数据的训练集下,光4层的树结构是无论如何不够的。最后假如真的把4填进max_depth那一格,回测结果也奇烂。

    • 第三个问题是,超参搜索中score函数的选取。按逻辑来说,目标函数不应该是 'algorithm_period_return’嘛?总之应该用最后回测的回报率作为得分函数?

      可是如果用这个作为目标函数,最后无论再怎么优化树的结构,回测结果也不会超过年化10%(2%,3%就很好了,甚至经常是负值)。但是如果用‘sharpe’即回测的夏普比率作为目标函数,最后优化出来的结果能直接上一个台阶!基本都是年化 10%以上。

      这就使我感觉很奇怪了, 为什么目标函数定位‘sharpe’最后反而能比目标函数定为“return”有更好的表现?而且夏普比率和最终策略的回报率之间真的有那么强烈的关系吗?

    以上就是我的一些困惑吧,希望看到本帖的朋友能不吝交流~

    (最后,也替我自己摇旗呐喊一波hhh。我虽然现在还是大学生一枚,但是也临近毕业了。所以希望能在量化/Fintech/数据分析领域找一份实习。如果有岗位缺口的朋友,欢迎发邮件至 118020044@link.cuhk.edu.cn 非常感谢~)


    (yangziriver) #2

    根据我的浮浅的经验,首先要有一个较好的特征组合,在此基础上再调参才有意义。如果特征组合并不能代表这一时期的股市特征,再怎么调整也是不行的。你的策略中财报因子太多可能会影响策略的表现。一个在多种情况下表现都不错的特征组合要花费太多的时间和精力去挖掘。我猜测,夏普比率作目标函数,能避免过拟合,所以策略的回报率能提高。而用‘return’作目标函数会产生过拟合,所以策略表现得不到提高,因为特征组合不能挖掘出市场真正的规律,经过训练后会过拟合。


    (royshu) #3

    你自己看看选的都什么股票,长期的超额是什么,最基本的都没搞明白就在这瞎调参,有什么用,真正有效的策略,默认模板就够了。不知所谓。


    (神龙斗士) #4

    加油!!


    (redmoon10) #5

    提示错误:ERROR: moduleinvoker: module name: hyper_parameter_search, module version: v1, trackeback: Traceback (most recent call last):
    AttributeError: ‘Outputs’ object has no attribute ‘read_raw_perf’

    AttributeError Traceback (most recent call last)
    in ()
    256 worker_silent=True,
    257 run_now=True,
    –> 258 bq_graph=g
    259 )

    in m14_scoring_bigquant_run(result)
    242
    243 def m14_scoring_bigquant_run(result):
    –> 244 score = result.get(‘m15’).read_raw_perf()[‘sharpe’].tail(1)[0]
    245
    246 return score

    AttributeError: ‘Outputs’ object has no attribute ‘read_raw_perf’