M.fast_auto_labeler

定义

M.fast_auto_labeler.v8(self, instruments, start_date, end_date, label_expr, hold_days, post_label_map={}, buy_at='open', sell_at='close', benchmark=None, is_regression=False, atr_days=50, filter_price_limit=True, plot_charts=True)

对于有监督机器学习,我们需要标注数据 (打标签/label)。fast_auto_labeler基于未来给定天数的收益/波动率等数据,来实现对数据做自动标注。对参数进行合理的配置,可以支持大部分的标注情况。

参数:
  • instruments (字符串数组) – 股票代码列表,见 股票代码
  • start_date (字符串) – 开始日期,e.g. ‘2017-01-01’
  • end_date (字符串) – 结束日期,e.g. ‘2017-02-01’
  • label_expr (字符串|字符串数组) –

    标注表达式,可以用数组来给定多个表达式,顺序执行。最终结果会取整数部分。可用变量:

    • return:收益
    • volatility:波动率
    • buy_price:买入价格
    • sell_price:卖出价格
    • low_price:期间最低价格
    • high_price:期间最高价格
    • atr:ATR
    • benchmark_return:基准收益
    • benchmark_volatility:基准波动率

    e.g. [‘100 * return / exp(0.6 * log(volatility)’, ‘where(label > {0}, {0}, where(label < -{0}, -{0}, label)) + {0}’.format(20)],更多支持的函数

  • hold_days (正整数) – 持仓时长,用于计算收益率和波动率
  • post_label_map (dict[int->int]) – 可以对最后取整后的label再做一次从key到value的映射,e.g. {2: 20}将把标注为2的修改为20
  • buy_at ('open'|'close') – 买入价,用于计算收益等
  • sell_at ('open'|'close') – 卖出价,用于计算收益等
  • benchmark (字符串) – 基准,如果给定,可以使用 benchmark_* 变量。常用 000300.SHA,默认为None
  • is_regression (boolean) – label是否用来训练回归类模型。默认是False
  • atr_days (正整数) – 计算atr指标使用的窗口大小。默认是50个交易日
  • filter_price_limit (boolean) – 是否过滤一字涨跌停的数据。默认为True
  • plot_charts (boolean) – (optional) 是否绘制结果数据,默认为True
返回:

特征数据

  • .data: DataSource, 带标签数据
  • .label_counts: 各标签样本数量
  • .plot_label_counts(): 函数,绘制标签数量分布图表

返回类型:

Outputs

示例代码

查看模块可用版本
In [2]:
M.fast_auto_labeler.m_get_version
Out[2]:
<bound method BigQuantModule.m_get_version of 模块:fast_auto_labeler
可用版本(推荐使用最新版本):v8, v7>
使用收益做标注

如下代码使用未来5天的收益作为标注分数

类别标注:
In [3]:
label_expr = [
    # 将百分比收益乘以100
    'return * 100',
    # where 将分数限定在[-20, 20]区间,+20将分数调整到 [0, 40] 区间
    'where(label > {0}, {0}, where(label < -{0}, -{0}, label)) + {0}'.format(20)
]
m = M.fast_auto_labeler.v8(
    instruments=['000001.SZA', '600519.SHA'], start_date='2017-01-01', end_date='2017-02-01',
    label_expr=label_expr, hold_days=5,
    benchmark='000300.SHA', sell_at='open', buy_at='open')
[2017-11-14 14:13:56.766997] INFO: bigquant: fast_auto_labeler.v8 开始运行..
[2017-11-14 14:13:56.776737] INFO: bigquant: 命中缓存
bigcharts-data-start/{"stock":false,"xAxis":{"title":{"text":"label"}},"chart":{"renderTo":"bigchart-0f9ae3af04e1486d8f7306a762a9f2b6","height":400,"type":"column"},"legend":{"enabled":true},"title":{"text":"label"},"series":[{"data":[[18,3],[19,5],[20,8],[21,3],[22,3],[23,2]],"yAxis":0,"name":"count"}]}/bigcharts-data-end
[2017-11-14 14:13:56.791212] INFO: bigquant: fast_auto_labeler.v8 运行完成[0.024201s].
In [4]:
m.label_counts
Out[4]:
[[18, 3], [19, 5], [20, 8], [21, 3], [22, 3], [23, 2]]
查看标注结果
In [5]:
m.data.read_df().head()
# label 列(类型:整数),表示最终标注分数/标签
Out[5]:
date instrument m:sell_price m:buy_price m:atr m:low_price m:high_price m:Return m:Return_f1 m:volatility m:not_available m:benchmark_Return m:benchmark_volatility label m:cannot_trading_f1
0 2017-01-03 000001.SZA 957.490417 958.538025 10.978668 954.347656 961.680786 -0.001093 1.000000 0.002513 False 0.004563 0.004797 19 False
1 2017-01-03 600519.SHA 2437.730957 2344.004395 40.904834 2343.864258 2520.249512 0.039986 1.051859 0.010368 False 0.004563 0.004797 23 False
2 2017-01-04 000001.SZA 956.442871 960.633179 10.894863 954.347656 961.680786 -0.004362 1.001092 0.002514 False -0.010585 0.004825 19 False
3 2017-01-04 600519.SHA 2427.573730 2451.740967 42.521582 2406.208496 2520.249512 -0.009857 0.985309 0.008693 False -0.010585 0.004825 19 False
4 2017-01-05 000001.SZA 957.490417 960.633179 10.769154 954.347656 960.633179 -0.003272 0.995638 0.001248 False -0.014521 0.004712 19 False
In [6]:
m.plot_label_counts()
# 和前面显示的图一样
# 标注分布图:横轴是标注分数,纵轴表示对应的数量
bigcharts-data-start/{"stock":false,"xAxis":{"title":{"text":"label"}},"chart":{"renderTo":"bigchart-578bfaf4bf0745b497af5d9d7d6cd1b2","height":400,"type":"column"},"legend":{"enabled":true},"title":{"text":"label"},"series":[{"data":[[18,3],[19,5],[20,8],[21,3],[22,3],[23,2]],"yAxis":0,"name":"count"}]}/bigcharts-data-end
回归标注:
In [8]:
label_expr = [
    # 将百分比收益乘以100
    'return * 10',
    # where 将分数限定在[-20, 20]区间,+20将分数调整到 [0, 40] 区间
    'where(label > {0}, {0}, where(label < -{0}, -{0}, label)) + {0}'.format(1)
]
m = M.fast_auto_labeler.v8(
    instruments=['000001.SZA', '600519.SHA'], start_date='2012-01-01', end_date='2017-02-01',
    label_expr=label_expr, hold_days=5,
    benchmark='000300.SHA', sell_at='open', buy_at='open', is_regression=True)
[2017-11-14 14:14:48.127734] INFO: bigquant: fast_auto_labeler.v8 开始运行..
[2017-11-14 14:14:48.334400] INFO: fast_auto_labeler: load history data: 3360 rows
[2017-11-14 14:14:48.370124] INFO: fast_auto_labeler: start labeling
bigcharts-data-start/{"stock":false,"xAxis":{"title":{"text":"label"}},"chart":{"renderTo":"bigchart-6ec6a30639834ce8bdb282079d637357","height":400,"type":"column"},"legend":{"enabled":true},"title":{"text":"label"},"series":[{"pointPadding":0,"pointPlacement":"between","groupPadding":0,"yAxis":0,"data":[[0.0,47.0],[0.1,20.0],[0.2,40.0],[0.3,54.0],[0.4,72.0],[0.5,95.0],[0.6,155.0],[0.7,191.0],[0.8,212.0],[0.9,278.0],[1.0,296.0],[1.1,227.0],[1.2,186.0],[1.3,170.0],[1.4,96.0],[1.5,66.0],[1.6,63.0],[1.7,40.0],[1.8,21.0],[1.9,108.0],[2.0,null]],"name":"count"}]}/bigcharts-data-end
[2017-11-14 14:14:48.739410] INFO: bigquant: fast_auto_labeler.v8 运行完成[0.611664s].