【宽客学院】自定义标注

用户成长系列
新手专区
标签: #<Tag:0x00007f4cdb3f8610> #<Tag:0x00007f4cdb3f8480>

(iQuant) #1
作者:bigquant
阅读时间:15分钟
本文由BigQuant宽客学院推出,难度标签:☆☆☆☆

导语:本文标题为自定义标注,其实就是想告诉大家如何灵活地对数据进行标注,从而得到预测能力更强的机器学习算法。

谈标注一词之前,我们先简单了解机器学习算法中的分类和回归。

分类问题是监督学习的一个核心问题。在监督学习中,当输出变量Y取有限个离散值时,预测问题便成为分类问题。监督学习从数据中学习一个分类模型,称为分类器(classifier)。分类器对新的输入进行输出的预测,这个过程称为分类。

当输出变量Y为有限个离散值时,成为分类问题,那如果输出变量Y是连续值时,又该怎样处理呢?可能大家马上想到这其实就是回归问题,用回归算法就可以解决。的确如此,但很多时候,回归算法预测效果不好。此时,我们可以对连续性数值进行标注,将Y标注为多个类别,这时又可以通过分类算法来解决。对数据进行标注在图像识别、文本分析、语音分析中经常遇到,标注的思想也广泛存在于机器学习领域。将数据标注为多个离散值成为分类标注,将数据标注为连续性数据称为回归标注。

对股票进行标注然后结合股票的特征是否能训练出一个有预测能力的模型呢?这正是许多机器学习算法在在量化选股领域的尝试。股票标注可以直接影响到AI策略的效果,可见其重要性,接下来我们详细介绍如何对股票进行标注。

数据标注应注意的几点:

  • 数据标注既包括分类标注也包括回归标注。分类标注为将数据分为具有区分性的多个类别,回归标注后数据为连续性数据。分类标注比较常用。

  • 数据标注时,应尽可能结合机器学习的算法预测目的。如果目标是想预测收益率较高的股票,在标注时应结合股票收益率;如果目标是想预测波动率较低的股票,在标注时应结合股票波动率。

  • 数据标注时,应尽可能将数据区别开来,但又不可分得太细。比如,通过股票收益率将股票分为五类,分别为高收益、较高收益、一般、较低收益、低收益,因此此时就可以采取分类算法。如果分得太细,可能算法在训练集上会学到不少数据噪音,泛化能力不强。

  • 分类标注中标注后的数据不一定是具体的类别,而是具体的数值。比如,'数值>=20’为高收益股票,"15<数值<20"为较高收益股票,"10<数值<15"为一般股票,“5<数值<10”为较低收益股票,“数值<5”为低收益股票。

数据标注和特征工程一样重要,共同决定了机器学习算法的预测能力。数据标注确定的标注结果和特征工程确定的因子数据合并起来就形成了训练集数据,已经可以训练出一个学习算法。当我们得到学习算法后,传入测试集的因子数据就可以得到预测结果,通过回测就可以开发AI策略。如下图所示:

image

在BigQuant上,数据标注有专门的模块接口,方便大家高效灵活地进行标注。本文简单枚举了一些标注数据的应用例子,希望大家理解以便开发出更好的AI策略。

首先我们在可视化策略中建立如下流程

如图所示,在m1证券代码列表模块中可以指定需要标注的股票列表和起止日期,在m2模块中通过表达式引擎自定义标注。我们可以通过改变m2模块中的标注代码实现不同的自定义标注方式,下面列举几种方式的实现:

1.默认根据收益率进行分类标注

m2模块代码示例
# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
shift(close, -5) / shift(open, -1)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

通过运行 m2.plot_label_counts() 可以查看标注结果:


这幅柱状图描述了整个训练集中各个label的分布情况,本例中将收益率分为0到19个档级,柱状图高度表示各个档级的样本数量。

代码解读
  • 根据未来几天的收益率进行标注可以直接修改shift(close, -5) 中的-5。
  • clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99)) ,利用分位数对极值进行处理。
  • all_wbins(label, 20)是按照收益率等间隔分为20类,每个标注类别的之间的收益率间隔大小相等,
    也可以使用all_cbins(label,20)是按照收益率等频间隔分为20类,但每个标注类别中样本个数相等,但每个标注类别之间的收益率间隔大小不等。
  • 如果不涉及相对收益率可以不用传入基准指数,默认的基准指数为000300.SHA。
    image
  • 默认删除无标注数据
  • 默认将标注转化为整数

2.根据收益率大小标注

m2模块代码示例
# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
shift(close, -5) / shift(open, -1)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

同时取消将标注转化为整数选项
image
绘制标注分布图如下:

3.根据相对收益率标注

m2模块代码示例
# 计算收益:5日相对基准的收益率
shift(close, -5) / shift(open, -1)-shift(benchmark_close, -5)/shift(benchmark_close, -1)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

绘制标注分布图如下:

4.根据经波动率调整后的收益率标注

标注数据为:经过波动率调整后的收益率(类似于夏普比率)

m2模块代码示例
# 计算收益:5日经波动率调整后收益率
shift(close, -5) / shift(open, -1)/ std(shift(close, -2) / shift(open, -1),5)**0.5

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

绘制标注分布图如下:

5.根据经平均真实波幅调整后的收益率标注

m2模块代码示例
# 计算收益:5日经平均真实波幅调整后收益率
shift(close, -5) / shift(open, -1)/ shift(ta_atr(high, low, close ,5),-5)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

6.根据收益率排序计算标注

m2模块代码示例
# 计算5日收益率排名
rank(shift(close, -5) / shift(open, -1))

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

7.自定义计算标注

可以通过自定义模块编写代码自定义计算标注值,平台默认识别label列为标注列。

克隆策略

    {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-784:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-784:features","SourceOutputPortId":"-714:data"},{"DestinationInputPortId":"-795:input_data","SourceOutputPortId":"-784:data"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-09-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true},{"Id":"-714","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nclose_0\nopen_0\nhigh_0\nlow_0\nindustry_sw_level1_0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-714"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-714","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"Comment":"","CommentCollapsed":true},{"Id":"-784","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":90,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-784"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-784"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-784","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"","CommentCollapsed":true},{"Id":"-795","ModuleId":"BigQuantSpace.auto_labeler_on_datasource.auto_labeler_on_datasource-v1","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\ngroup_rank(industry_sw_level1_0,shift(close_0,-5)/shift(open_0,-1)-1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high_0, -1) == shift(low_0, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-795"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-795","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='127.47903442382812,64,200,200'/><NodePosition Node='-714' Position='427.73065185546875,61.36854553222656,200,200'/><NodePosition Node='-784' Position='224.22097778320312,165.26048278808594,200,200'/><NodePosition Node='-795' Position='215.39358520507812,274.58465576171875,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [124]:
    # 本代码由可视化策略环境自动生成 2019年1月24日 19:06
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2018-01-01',
        end_date='2018-09-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m6 = M.input_features.v1(
        features="""
    # #号开始的表示注释,注释需单独一行
    # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
    close_0
    open_0
    high_0
    low_0
    industry_sw_level1_0"""
    )
    
    m2 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m6.data,
        start_date='',
        end_date='',
        before_start_days=90
    )
    
    m15 = M.auto_labeler_on_datasource.v1(
        input_data=m2.data,
        label_expr="""# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
    
    # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    group_rank(industry_sw_level1_0,shift(close_0,-5)/shift(open_0,-1)-1)
    
    # 极值处理:用1%和99%分位的值做clip
    clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    all_wbins(label, 20)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(shift(high_0, -1) == shift(low_0, -1), NaN, label)
    """,
        drop_na_label=True,
        cast_label_int=True,
        date_col='date',
        instrument_col='instrument',
        user_functions={}
    )
    
    [2019-01-24 19:06:26.652761] INFO: bigquant: instruments.v2 开始运行..
    [2019-01-24 19:06:26.659276] INFO: bigquant: 命中缓存
    [2019-01-24 19:06:26.660503] INFO: bigquant: instruments.v2 运行完成[0.007759s].
    [2019-01-24 19:06:26.663556] INFO: bigquant: input_features.v1 开始运行..
    [2019-01-24 19:06:26.668660] INFO: bigquant: 命中缓存
    [2019-01-24 19:06:26.669799] INFO: bigquant: input_features.v1 运行完成[0.006249s].
    [2019-01-24 19:06:26.675387] INFO: bigquant: general_feature_extractor.v7 开始运行..
    [2019-01-24 19:06:26.679937] INFO: bigquant: 命中缓存
    [2019-01-24 19:06:26.681620] INFO: bigquant: general_feature_extractor.v7 运行完成[0.00623s].
    [2019-01-24 19:06:26.684382] INFO: bigquant: auto_labeler_on_datasource.v1 开始运行..
    [2019-01-24 19:06:27.004980] INFO: 自动标注(任意数据源): 开始标注 ..
    [2019-01-24 19:06:29.736701] INFO: bigquant: auto_labeler_on_datasource.v1 运行完成[3.05228s].
    
    In [125]:
    m15.plot_label_counts()
    

    小结: 可以看出,对股票数据标注的方法丰富多样,因此策略研究者的开发空间非常大,好的标注结果结合好的特征选择可以直接决定AI算法预测能力。


       本文由BigQuant宽客学院推出,版权归BigQuant所有,转载请注明出处。
    


    AI量化策略开发第二步:数据标注
    (w890912y) #2

    std()不是表示标准差了吗,函数后面为什么还需要开根号?(std()**0.5)


    (yangziriver) #3

    克隆了以后运行出错

    克隆策略

      {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-784:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-784:features","SourceOutputPortId":"-714:data"},{"DestinationInputPortId":"-795:input_data","SourceOutputPortId":"-784:data"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-09-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"Comment":"","CommentCollapsed":true},{"Id":"-714","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"\n# #号开始的表示注释,注释需单独一行\n# 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征\nclose_0\nopen_0\nhigh_0\nlow_0\nindustry_sw_level1_0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-714"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-714","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"Comment":"","CommentCollapsed":true},{"Id":"-784","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":90,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-784"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-784"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-784","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"Comment":"","CommentCollapsed":true},{"Id":"-795","ModuleId":"BigQuantSpace.auto_labeler_on_datasource.auto_labeler_on_datasource-v1","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\ngroup_rank(industry_sw_level1_0,shift(close_0,-5)/shift(open_0,-1)-1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high_0, -1) == shift(low_0, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-795"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-795","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='127,64,200,200'/><NodePosition Node='-714' Position='427,61,200,200'/><NodePosition Node='-784' Position='224,165,200,200'/><NodePosition Node='-795' Position='215,274,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
      In [ ]:
      # 本代码由可视化策略环境自动生成 2019年9月12日 17:27
      # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
      
      
      m1 = M.instruments.v2(
          start_date='2015-01-01',
          end_date='2018-09-01',
          market='CN_STOCK_A',
          instrument_list='',
          max_count=0
      )
      
      m6 = M.input_features.v1(
          features="""
      # #号开始的表示注释,注释需单独一行
      # 多个特征,每行一个,可以包含基础特征和衍生特征,特征须为本平台特征
      close_0
      open_0
      high_0
      low_0
      industry_sw_level1_0"""
      )
      
      m2 = M.general_feature_extractor.v7(
          instruments=m1.data,
          features=m6.data,
          start_date='',
          end_date='',
          before_start_days=90
      )
      
      m15 = M.auto_labeler_on_datasource.v1(
          input_data=m2.data,
          label_expr="""# #号开始的表示注释
      # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
      # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
      # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
      
      # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
      group_rank(industry_sw_level1_0,shift(close_0,-5)/shift(open_0,-1)-1)
      
      # 极值处理:用1%和99%分位的值做clip
      clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
      
      # 将分数映射到分类,这里使用20个分类
      all_wbins(label, 20)
      
      # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
      where(shift(high_0, -1) == shift(low_0, -1), NaN, label)
      """,
          drop_na_label=True,
          cast_label_int=True,
          date_col='date',
          instrument_col='instrument',
          user_functions={}
      )
      
      In [125]:
      m15.plot_label_counts()
      


      麻烦老师看一下,是什么原因?

      (yangziriver) #4

      我又反复试了一下,按原策略的时间区间,2018.1.1.-2018.9.1,是可以运行的,只要策略区间跨年度,就会产生这种错误。


      (iQuant) #5

      收到提问,已提交至策略工程师,会尽快给您回复。


      (PeterYashoi) #6

      我觉得是他写错了