有关自动数据标注的使用问题?

可视化
用户成长系列
新手专区
标签: #<Tag:0x00007fcf692bf990> #<Tag:0x00007fcf692bf738> #<Tag:0x00007fcf692bf580>

(ceiwei) #1
在可视化策略中标注表达式里的参数是否就是所执行的语句

# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
shift(close, -5) / shift(open, -1)

# 极值处理:用1%和99%分位的值做clip
clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))

# 将分数映射到分类,这里使用20个分类
all_wbins(label, 20)

# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
where(shift(high, -1) == shift(low, -1), NaN, label)

**比如我修改shift(close, -5) / shift(open, -1)这个语句是否就是修改了表达式?也就是我这一句我可以用我认为合适的其他表达式代替?**
**另外all_wbins(label, 20)我不太看得明白,搜索了文档也没有这个函数的解释,请问具体是什么?谢谢!**

(luckychan) #2

all_wbins的用法可以参考表达式引擎,里面有注释


(大胡子) #3

是的,你可以修改标注语句,修改了就会通过不同的标注方式来对股票进行数据标注。
all_wbins(label,20)的目的是将标注语句确定的数值进行离散化,假设这里的标注语句是:shift(close, -5) / shift(open, -1),标注的含义是未来5天的收益率。
于是,将股票的收益率数据进行离散化,一共分成20个分类,最高数值为19,最小数值是0,数值高表明未来5天的收益率高。
你可以通过一只股票(000002)来详细了解下数据标注模块的使用。

克隆策略
In [5]:
m2.data.read_df().tail()
Out[5]:
m:close m:open instrument m:high m:amount m:low date label
1181 1432.700439 1480.785889 000002.SZA 1514.075806 2.874782e+09 1424.069702 2014-12-18 6
1182 1415.438965 1432.700439 000002.SZA 1433.933350 2.075060e+09 1378.450195 2014-12-19 11
1183 1445.030029 1417.904907 000002.SZA 1501.746216 3.266151e+09 1394.478638 2014-12-22 14
1184 1448.728882 1442.564087 000002.SZA 1504.212158 3.420036e+09 1399.410522 2014-12-23 15
1185 1359.955688 1445.030029 000002.SZA 1448.728882 3.056208e+09 1347.626099 2014-12-24 19

    {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2010-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2015-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"000002.SZA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","ModuleId":"BigQuantSpace.advanced_auto_labeler.advanced_auto_labeler-v2","ModuleParameters":[{"Name":"label_expr","Value":"# #号开始的表示注释\n# 0. 每行一个,顺序执行,从第二个开始,可以使用label字段\n# 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html\n# 添加benchmark_前缀,可使用对应的benchmark数据\n# 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_\n\n# 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)\nshift(close, -5) / shift(open, -1)\n\n# 极值处理:用1%和99%分位的值做clip\nclip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))\n\n# 将分数映射到分类,这里使用20个分类\nall_wbins(label, 20)\n\n# 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)\nwhere(shift(high, -1) == shift(low, -1), NaN, label)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na_label","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"cast_label_int","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-15","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='211,64,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-15' Position='446.16552734375,183,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":false}
    In [1]:
    # 本代码由可视化策略环境自动生成 2018年5月1日 23:05
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2010-01-01',
        end_date='2015-01-01',
        market='CN_STOCK_A',
        instrument_list='000002.SZA',
        max_count=0
    )
    
    m2 = M.advanced_auto_labeler.v2(
        instruments=m1.data,
        label_expr="""# #号开始的表示注释
    # 0. 每行一个,顺序执行,从第二个开始,可以使用label字段
    # 1. 可用数据字段见 https://bigquant.com/docs/data_history_data.html
    #   添加benchmark_前缀,可使用对应的benchmark数据
    # 2. 可用操作符和函数见 `表达式引擎 <https://bigquant.com/docs/big_expr.html>`_
    
    # 计算收益:5日收盘价(作为卖出价格)除以明日开盘价(作为买入价格)
    shift(close, -5) / shift(open, -1)
    
    # 极值处理:用1%和99%分位的值做clip
    clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))
    
    # 将分数映射到分类,这里使用20个分类
    all_wbins(label, 20)
    
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    where(shift(high, -1) == shift(low, -1), NaN, label)
    """,
        start_date='',
        end_date='',
        benchmark='000300.SHA',
        drop_na_label=True,
        cast_label_int=True
    )
    
    [2018-05-01 23:03:35.978555] INFO: bigquant: instruments.v2 开始运行..
    [2018-05-01 23:03:35.988199] INFO: bigquant: instruments.v2 运行完成[0.009655s].
    [2018-05-01 23:03:36.049635] INFO: bigquant: advanced_auto_labeler.v2 开始运行..
    [2018-05-01 23:03:36.077704] INFO: 自动数据标注: 加载历史数据: 1192 行
    [2018-05-01 23:03:36.078870] INFO: 自动数据标注: 开始标注 ..
    [2018-05-01 23:03:36.347982] INFO: bigquant: advanced_auto_labeler.v2 运行完成[0.298319s].
    


    (yilong10) #4

    shift(close, -5) / shift(open, -1) 这样是否使用了未来函数,这样得到的数据有什么参考价值呢