{"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35:input_data","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29:data"},{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data2","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35:data"},{"DestinationInputPortId":"-316:input_1","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53:data"},{"DestinationInputPortId":"-316:input_2","SourceOutputPortId":"-179:data"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2016-12-20","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\nshift(close, -5) / shift(open, -1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\npb_lf_0\nreturn_10\npb_lf_0*2\nlog10(market_cap_float_0+1)\nindustry_sw_level1_0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-29","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v6","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-29","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-35","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v2","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-35","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":5,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","ModuleId":"BigQuantSpace.join.join-v3","ModuleParameters":[{"Name":"on","Value":"date,instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"how","Value":"inner","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"sort","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data1","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"data2","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-53","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":6,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-316","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"from sklearn.linear_model import LinearRegression\n# 行业、市值中性模块\ndef bigquant_run(input_1, input_2, input_3): \n \n # 1、获取特征数值\n df = input_1.read_df() \n df = df[df['industry_sw_level1_0']>0] # 去除没有查出行业的股票\n industry_List = df['industry_sw_level1_0'].unique() # 所有行业代码\n factors_all=df.columns #获取因子列表\n\n # 2、获取用来中性化的因子列表 通常是行业和市值\n factor0 = input_2.read_pickle() \n \n #3、需要做清洗的因子列表 \n factors_need_cal=[k for k in set(factors_all)-set(factor0) if k!='date' and k!='instrument' and k[:2]!='m:' and k!='label']\n \n #4、缺失值处理 按中信一级行业相同个股的平均值填充\n for fac in factors_need_cal:\n df['fac_mean'] = df[['date']+['industry_sw_level1_0']+[fac]].groupby(['date','industry_sw_level1_0']).transform(np.mean)\n df[fac]=df[fac].fillna(df['fac_mean'])\n del df['fac_mean']\n\n #5、因子异常值处理 \n # 固定比例法\n #for fac in factors_need_cal:\n # df[fac][df[fac]>df[fac].quantile(0.99)]=df[fac].quantile(0.99)\n #df[fac][df[fac]<df[fac].quantile(0.01)]=df[fac].quantile(0.01)\n # 均值标准差法\n #print(df[factors_need_cal].head())\n #for fac in factors_need_cal:\n #df[fac][df[fac]>df[fac].mean()+3*df[fac].std()]=df[fac].mean()+3*df[fac].std()\n #df[fac][df[fac]<=df[fac].mean()-3*df[fac].std()]=df[fac].mean()-3*df[fac].std()\n # MAD法\n #print(df[factors_need_cal].head())\n for fac in factors_need_cal:\n median = np.median(list(df[fac]))\n MAD = np.mean(abs(df[fac]) - median)\n df[fac][df[fac]>median+6*MAD] = median+6*MAD # 剔除偏离中位数6倍以上的数据\n df[fac][df[fac]<median-6*MAD] = median-6*MAD\n \n #计算行业哑变量\n dfTmp = df.copy() #copy一份用于计算行业哑变量\n for n in range(len(industry_List)): # 行业哑变量赋值\n dfTmp['industry_%d' % n] = 0\n dfTmp['industry_%d' % n][df['industry_sw_level1_0']==industry_List[n]]=1\n \n # 准备线性回归参数\n model0 = LinearRegression()\n X = dfTmp[list('industry_%d' % n for n in range(len(industry_List)))+factor0] #组装行业哑变量列和中性化因子列矩阵\n del X['industry_sw_level1_0'] #删去中性化因子中的行业列\n \n #需要计算中性化的因子列表\n factors_need_cal=[k for k in set(factors_all)-set(factor0) if k!='date' and k!='instrument' and k[:2]!='m:' and k!='label']\n\n # 逐个特征进行行业市值中性化\n from sklearn.preprocessing import scale\n for fac in factors_need_cal:\n y = df[fac] #获取需要中性化的因子暴露值\n model0.fit(X, y)\n df[fac] = y-model0.predict(X) # 计算因子暴露相对于行业哑变量和中性化因子回归后的残差\n #df[fac]=(df[fac]-np.mean(df[fac]))/np.std(df[fac])#一种与scale基本等效的处理\n df[fac] = scale(df[fac])\n #对残差取Z-Score标准化将计算后的结果返回给df中的各列,即完成中性化后的结果\n\n #多重共线性分析\n from sklearn.decomposition import PCA\n import matplotlib.pyplot as plt\n pca = PCA(n_components=len(factors_need_cal))\n pca.fit(df[factors_need_cal])\n var= pca.explained_variance_ratio_ #计算每个因子解释程度\n var1=np.cumsum(np.round(pca.explained_variance_ratio_, decimals=4)*100)#累计解释程度\n plt.plot(var1)\n print(var)\n data_1 = DataSource.write_df(df)\n print(data_1)\n return Outputs(data_1=data_1, data_2=None, data_3=None)\n ","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-316"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-316"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-316"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-316","OutputType":null},{"Name":"data_2","NodeId":"-316","OutputType":null},{"Name":"data_3","NodeId":"-316","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":7,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-179","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\nlog10(market_cap_float_0+1)\nindustry_sw_level1_0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-179"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-179","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":8,"IsPartOfPartialRun":null,"Comment":"中性化因子列表","CommentCollapsed":false}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='211,62.07228088378906,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='70,183,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='702,11.072284698486328,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-29' Position='381,184.07228088378906,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-35' Position='385,278,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-53' Position='249,371.0722961425781,200,200'/><NodePosition Node='-316' Position='503,612.2891082763672,200,200'/><NodePosition Node='-179' Position='765,387,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
[2018-04-16 19:53:42.278918] INFO: bigquant: instruments.v2 开始运行..
[2018-04-16 19:53:42.287157] INFO: bigquant: 命中缓存
[2018-04-16 19:53:42.294609] INFO: bigquant: instruments.v2 运行完成[0.015699s].
[2018-04-16 19:53:42.321896] INFO: bigquant: advanced_auto_labeler.v2 开始运行..
[2018-04-16 19:53:42.335150] INFO: bigquant: 命中缓存
[2018-04-16 19:53:42.346146] INFO: bigquant: advanced_auto_labeler.v2 运行完成[0.024241s].
[2018-04-16 19:53:42.358930] INFO: bigquant: input_features.v1 开始运行..
[2018-04-16 19:53:42.375850] INFO: bigquant: input_features.v1 运行完成[0.016902s].
[2018-04-16 19:53:42.403558] INFO: bigquant: general_feature_extractor.v6 开始运行..
[2018-04-16 19:53:44.115632] INFO: 基础特征抽取: 年份 2016, 特征行数=25291
[2018-04-16 19:53:48.213777] INFO: 基础特征抽取: 年份 2017, 特征行数=0
[2018-04-16 19:53:48.232906] INFO: 基础特征抽取: 总行数: 25291
[2018-04-16 19:53:48.237236] INFO: bigquant: general_feature_extractor.v6 运行完成[5.833682s].
[2018-04-16 19:53:48.252682] INFO: bigquant: derived_feature_extractor.v2 开始运行..
[2018-04-16 19:53:48.344577] INFO: derived_feature_extractor: 提取完成 log10(market_cap_float_0+1), 0.002s
[2018-04-16 19:53:48.347877] INFO: derived_feature_extractor: 提取完成 pb_lf_0*2, 0.002s
[2018-04-16 19:53:48.420425] INFO: derived_feature_extractor: /y_2016, 25291
[2018-04-16 19:53:48.818625] INFO: bigquant: derived_feature_extractor.v2 运行完成[0.565954s].
[2018-04-16 19:53:48.834354] INFO: bigquant: join.v3 开始运行..
[2018-04-16 19:53:48.965400] INFO: join: /y_2016, 行数=10994/25291, 耗时=0.084517s
[2018-04-16 19:53:48.990888] INFO: join: 最终行数: 10994
[2018-04-16 19:53:48.995605] INFO: bigquant: join.v3 运行完成[0.161239s].
[2018-04-16 19:53:49.007517] INFO: bigquant: input_features.v1 开始运行..
[2018-04-16 19:53:49.022138] INFO: bigquant: input_features.v1 运行完成[0.014603s].
[2018-04-16 19:53:49.039680] INFO: bigquant: cached.v3 开始运行..
[ 5.02731332e-01 2.79213248e-01 2.18055420e-01 1.64450534e-32]
DataSource(d9079a7e416c11e8968b0242ac110013, v2_t2)
[2018-04-16 19:53:58.889870] INFO: bigquant: cached.v3 运行完成[9.850176s].