新手AI策略研究

新手专区
标签: #<Tag:0x00007fcf790351d8>

(q83831295) #1

1. 新建可视化AI选股策略

证券代码列表(训练用)

默认包含 2010-01-012015-01-01 中国A股(CN_STOCK_A) 的所有有效股票代码
交易市场

基础特征抽取(训练用)

默认包含 2010-01-012015-01-01 时间段的数据

输入特征列表

return_5
return_10
return_20
avg_amount_0/avg_amount_5
avg_amount_5/avg_amount_20
rank_avg_amount_0/rank_avg_amount_5
rank_avg_amount_5/rank_avg_amount_10
rank_return_0
rank_return_5
rank_return_10
rank_return_0/rank_return_5
rank_return_5/rank_return_10
pe_ttm_0

因子库

证券代码列表(预测用)

默认包含 2010-01-012017-01-01 中国A股(CN_STOCK_A) 的所有有效股票代码

基础特征抽取(预测用)

默认包含 2010-01-012017-01-01 时间段的数据

StfckRanker训练

默认的机器学习算法是StockRanker

学习算法: 排序
叶节点数量: 30
每叶节点最小样本数: 1000
树的数量: 20
学习率: 0.1
特征值离散化数量: 1023
特征使用率: 1

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514341613175

2. 修改输入特征

使用市值排名作为特征训练

输入特征列表

market_cap_0

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514345282928

3. 修改StockRanker参数

StockRanker训练

  • 修改学习算法
学习算法: 二分类
叶节点数量: 30
每叶节点最小样本数: 1000
树的数量: 20
学习率: 0.1
特征值离散化数量: 1023
特征使用率: 1

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514347096197

  • 增加叶节点数
学习算法: 排序
叶节点数量: 60
每叶节点最小样本数: 1000
树的数量: 20
学习率: 0.1
特征值离散化数量: 1023
特征使用率: 1

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514345784848

  • 增加树的数量
学习算法: 排序
叶节点数量: 30
每叶节点最小样本数: 1000
树的数量: 40
学习率: 0.1
特征值离散化数量: 1023
特征使用率: 1

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514346470040

  • 增加学习率
学习算法: 排序
叶节点数量: 30
每叶节点最小样本数: 1000
树的数量: 20
学习率: 0.25
特征值离散化数量: 1023
特征使用率: 1

https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514347089078

4. 修改模型算法

将训练及预测模型算法修改为随机森林
https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514342262293

5. 调整训练及预测数据集时间段

基础特征抽取(训练用)

默认是与证券代码列表(训练用)时间相同
这里改为 2014-01-012015-12-31

基础特征抽取(预测用)

默认是与证券代码列表(预测用)时间相同
这里改为 2016-01-012016-12-31
https://i.bigquant.com/user/q83831295/lab/share/userlib%2fai%e9%80%89%e8%82%a1%e7%ad%96%e7%95%a5.ipynb?_t=1514356227166

6. 交互式数据研究

证券代码列表 m1 (训练用)

m1_data = m1.data.read_pickle()  # 读取数据
type(m1_data) # m1_data类型

dict

m1_data.keys()

dict_keys([‘instruments’, ‘end_date’, ‘start_date’])

m1_data['start_date'], m1_data['end_date'], len(m1_data['instruments']), m1_data['instruments'][:5], m1_data['instruments'][-5:]

(‘2010-01-01’,
‘2015-01-01’,
2617,
[‘000001.sza’, ‘000002.sza’, ‘000004.sza’, ‘000005.sza’, ‘000006.sza’],
[‘603993.sha’, ‘603998.sha’, ‘900935.sha’, ‘900949.sha’, ‘900950.sha’])

自动数据标注 m2

m2_data = m2.data.read_df()
m2_data.head()
m:high m:open instrument m:close date m:amount m:low label
0 882.557983 880.403625 000001.sza 851.320190 2010-01-04 5.802495e+08 850.242981 5
1 1204.440674 1202.224609 000002.sza 1174.523560 2010-01-04 1.034345e+09 1174.523560 6
2 56.068981 55.698277 000005.sza 55.512924 2010-01-04 1.334784e+08 54.771515 7
3 125.116898 124.896423 000006.sza 122.581490 2010-01-04 7.054856e+07 122.471252 6
4 39.145252 39.145252 000007.sza 38.261860 2010-01-04 1.810142e+07 38.096226 9
m2_data.tail()
m:high m:open instrument m:close date m:amount m:low label
2624850 15.393583 15.393583 603766.sha 14.800723 2014-12-24 291056576.0 14.561498 4
2624851 39.380001 38.549999 603806.sha 39.240002 2014-12-24 75985936.0 38.520000 6
2624852 38.470001 35.000000 603988.sha 38.470001 2014-12-24 114484752.0 34.970001 5
2624853 9.342248 9.342248 603993.sha 8.984550 2014-12-24 212258688.0 8.700495 9
2624854 36.410000 34.450001 603998.sha 35.919998 2014-12-24 311016352.0 33.340000 9

输入特征列表 m3

m3_data = m3.data.read_pickle()
type(m3_data)

list

m3_data

[‘return_5’,
‘return_10’,
‘return_20’,
‘avg_amount_0/avg_amount_5’,
‘avg_amount_5/avg_amount_20’,
‘rank_avg_amount_0/rank_avg_amount_5’,
‘rank_avg_amount_5/rank_avg_amount_10’,
‘rank_return_0’,
‘rank_return_5’,
‘rank_return_10’,
‘rank_return_0/rank_return_5’,
‘rank_return_5/rank_return_10’,
‘pe_ttm_0’]

基础特征抽取 m4 (训练用)

m4_data = m4.data.read_df()
m4_data.head()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_return_10 rank_return_5 return_10 return_20 return_5
0 5.802495e+08 679993792.0 618105024.0 2010-01-04 000001.sza 78.793991 0.971779 0.977118 0.977259 0.036196 0.086121 0.622607 0.945375 0.980157 1.043574
1 1.293477e+09 701774784.0 753136768.0 2010-01-05 000001.sza 77.431465 0.991400 0.980173 0.980911 0.090295 0.027881 0.464901 0.937626 0.955701 1.033259
2 9.444537e+08 708039744.0 798424320.0 2010-01-06 000001.sza 76.102173 0.984653 0.980793 0.980900 0.218942 0.012407 0.164612 0.934313 0.956159 0.983677
3 8.041663e+08 652998336.0 837255808.0 2010-01-07 000001.sza 75.271362 0.969883 0.981378 0.980283 0.751538 0.022967 0.203704 0.946511 0.895257 0.960153
4 6.506674e+08 633099392.0 799411712.0 2010-01-08 000001.sza 75.105194 0.978435 0.980637 0.981401 0.237477 0.031855 0.030998 0.959253 0.891167 0.926609
m4_data.tail()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_return_10 rank_return_5 return_10 return_20 return_5
569943 216834864.0 nan 199364688.0 2014-12-25 603998.sha 45.220268 0.796814 0.686182 0.734395 0.010331 0.969854 0.309944 1.226283 nan 0.926713
569944 130147696.0 nan 179246800.0 2014-12-26 603998.sha 44.779728 0.662936 0.712441 0.712010 0.208351 0.907878 0.682307 1.103801 nan 0.986583
569945 146767104.0 nan 177739136.0 2014-12-29 603998.sha 45.596024 0.638637 0.733707 0.714286 0.885196 0.810531 0.932240 1.021777 nan 1.065395
569946 325962560.0 nan 213011456.0 2014-12-30 603998.sha 48.537285 0.840462 0.771110 0.768110 0.982441 0.800257 0.993571 0.988912 nan 1.235896
569947 211115424.0 nan 223640672.0 2014-12-31 603998.sha 45.686722 0.794193 0.742281 0.779160 0.006846 0.370926 0.801029 0.891304 nan 1.057588

衍生特征抽取 m5 (训练用)

m5_data = m5.data.read_df()
m5_data.head()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_return_5 return_10 return_20 return_5 avg_amount_0/avg_amount_5 avg_amount_5/avg_amount_20 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10
0 5.802495e+08 679993792.0 618105024.0 2010-01-04 000001.sza 78.793991 0.971779 0.977118 0.977259 0.036196 0.622607 0.945375 0.980157 1.043574 0.938755 0.908986 0.994393 1.000144 0.058137 7.229403
1 1.293477e+09 701774784.0 753136768.0 2010-01-05 000001.sza 77.431465 0.991400 0.980173 0.980911 0.090295 0.464901 0.937626 0.955701 1.033259 1.717453 1.073189 1.010693 1.000753 0.194224 16.674467
2 9.444537e+08 708039744.0 798424320.0 2010-01-06 000001.sza 76.102173 0.984653 0.980793 0.980900 0.218942 0.164612 0.934313 0.956159 0.983677 1.182897 1.127655 1.003827 1.000109 1.330053 13.267694
3 8.041663e+08 652998336.0 837255808.0 2010-01-07 000001.sza 75.271362 0.969883 0.981378 0.980283 0.751538 0.203704 0.946511 0.895257 0.960153 0.960479 1.282171 0.989391 0.998885 3.689366 8.869369
4 6.506674e+08 633099392.0 799411712.0 2010-01-08 000001.sza 75.105194 0.978435 0.980637 0.981401 0.237477 0.030998 0.959253 0.891167 0.926609 0.813933 1.262695 0.996978 1.000779 7.661002 0.973098
m5_data.tail()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_return_5 return_10 return_20 return_5 avg_amount_0/avg_amount_5 avg_amount_5/avg_amount_20 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10
2642808 216834864.0 nan 199364688.0 2014-12-25 603998.sha 45.220268 0.796814 0.686182 0.734395 0.010331 0.309944 1.226283 nan 0.926713 1.087629 nan 1.084994 1.070264 0.033333 0.319578
2642809 130147696.0 nan 179246800.0 2014-12-26 603998.sha 44.779728 0.662936 0.712441 0.712010 0.208351 0.682307 1.103801 nan 0.986583 0.726081 nan 0.931076 0.999396 0.305363 0.751541
2642810 146767104.0 nan 177739136.0 2014-12-29 603998.sha 45.596024 0.638637 0.733707 0.714286 0.885196 0.932240 1.021777 nan 1.065395 0.825744 nan 0.894092 0.973529 0.949537 1.150160
2642811 325962560.0 nan 213011456.0 2014-12-30 603998.sha 48.537285 0.840462 0.771110 0.768110 0.982441 0.993571 0.988912 nan 1.235896 1.530258 nan 1.094195 0.996109 0.988799 1.241564
2642812 211115424.0 nan 223640672.0 2014-12-31 603998.sha 45.686722 0.794193 0.742281 0.779160 0.006846 0.801029 0.891304 nan 1.057588 0.943994 nan 1.019294 1.049682 0.008547 2.159538

连接数据 m7

m7_data = m7.data.read_df()
m7_data.head()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10 m:high m:open m:close m:amount m:low label
0 5.802495e+08 679993792.0 618105024.0 2010-01-04 000001.sza 78.793991 0.971779 0.977118 0.977259 0.036196 0.994393 1.000144 0.058137 7.229403 882.557983 880.403625 851.320190 5.802495e+08 850.242981 5
1 1.293477e+09 701774784.0 753136768.0 2010-01-05 000001.sza 77.431465 0.991400 0.980173 0.980911 0.090295 1.010693 1.000753 0.194224 16.674467 858.142212 852.756409 836.598877 1.293477e+09 816.850830 6
2 9.444537e+08 708039744.0 798424320.0 2010-01-06 000001.sza 76.102173 0.984653 0.980793 0.980900 0.218942 1.003827 1.000109 1.330053 13.267694 834.803589 834.803589 822.236694 9.444537e+08 815.773682 3
3 8.041663e+08 652998336.0 837255808.0 2010-01-07 000001.sza 75.271362 0.969883 0.981378 0.980283 0.751538 0.989391 0.998885 3.689366 8.869369 827.622498 822.236694 813.260315 8.041663e+08 804.283936 4
4 6.506674e+08 633099392.0 799411712.0 2010-01-08 000001.sza 75.105194 0.978435 0.980637 0.981401 0.237477 0.996978 1.000779 7.661002 0.973098 816.850830 807.874451 811.465027 6.506674e+08 802.488647 3
m7_data.tail()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10 m:high m:open m:close m:amount m:low label
555186 250855056.0 nan 188973760.0 2014-12-18 603998.sha 45.388710 0.781169 nan 0.656922 0.012038 1.189136 nan 0.012286 nan 37.689999 37.660000 35.029999 250855056.0 34.860001 8
555187 155813104.0 nan 214812672.0 2014-12-19 603998.sha 42.797291 0.615418 0.437931 0.699397 0.044358 0.879926 1.597049 0.053257 nan 34.700001 34.680000 33.029999 155813104.0 32.830002 11
555188 114328616.0 nan 233299184.0 2014-12-22 603998.sha 39.272961 0.478898 0.485345 0.719638 0.231697 0.665470 1.482736 1.013183 0.230972 32.799999 32.799999 30.309999 114328616.0 30.010000 17
555189 147340160.0 nan 256806592.0 2014-12-23 603998.sha 43.198959 0.699355 0.542599 0.765591 0.996129 0.913483 1.410971 2.814095 0.355971 33.340000 30.799999 33.340000 147340160.0 30.799999 14
555190 311016352.0 nan 210717616.0 2014-12-24 603998.sha 46.541893 0.846121 0.637344 0.729741 0.971121 1.159480 1.144973 2.678954 0.366131 36.410000 34.450001 35.919998 311016352.0 33.340000 9

缺失数据处理 m13 (训练用)

m13_data = m13.data.read_df()
m13_data.head()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10 m:high m:open m:close m:amount m:low label
0 5.802495e+08 679993792.0 618105024.0 2010-01-04 000001.sza 78.793991 0.971779 0.977118 0.977259 0.036196 0.994393 1.000144 0.058137 7.229403 882.557983 880.403625 851.320190 5.802495e+08 850.242981 5
1 1.293477e+09 701774784.0 753136768.0 2010-01-05 000001.sza 77.431465 0.991400 0.980173 0.980911 0.090295 1.010693 1.000753 0.194224 16.674467 858.142212 852.756409 836.598877 1.293477e+09 816.850830 6
2 9.444537e+08 708039744.0 798424320.0 2010-01-06 000001.sza 76.102173 0.984653 0.980793 0.980900 0.218942 1.003827 1.000109 1.330053 13.267694 834.803589 834.803589 822.236694 9.444537e+08 815.773682 3
3 8.041663e+08 652998336.0 837255808.0 2010-01-07 000001.sza 75.271362 0.969883 0.981378 0.980283 0.751538 0.989391 0.998885 3.689366 8.869369 827.622498 822.236694 813.260315 8.041663e+08 804.283936 4
4 6.506674e+08 633099392.0 799411712.0 2010-01-08 000001.sza 75.105194 0.978435 0.980637 0.981401 0.237477 0.996978 1.000779 7.661002 0.973098 816.850830 807.874451 811.465027 6.506674e+08 802.488647 3
m13_data.tail()
avg_amount_0 avg_amount_20 avg_amount_5 date instrument pe_ttm_0 rank_avg_amount_0 rank_avg_amount_10 rank_avg_amount_5 rank_return_0 rank_avg_amount_0/rank_avg_amount_5 rank_avg_amount_5/rank_avg_amount_10 rank_return_0/rank_return_5 rank_return_5/rank_return_10 m:high m:open m:close m:amount m:low label
555178 264997776.0 277396480.0 185800512.0 2014-12-18 603993.sha 27.061989 0.788908 0.693368 0.650043 0.876182 1.213624 0.937515 1.252844 2.049498 9.973481 9.542139 9.742029 264997776.0 9.457974 5
555179 252664112.0 285901504.0 201766832.0 2014-12-19 603993.sha 27.471132 0.754953 0.678879 0.681309 0.842808 1.108091 1.003579 1.067649 1.654469 10.047125 9.657865 9.889317 252664112.0 9.479015 6
555180 438923808.0 302409152.0 244854976.0 2014-12-22 603993.sha 28.903139 0.835487 0.703879 0.736434 0.972438 1.134503 1.046251 1.063589 1.086109 10.415344 9.868276 10.404824 438923808.0 9.742029 3
555181 279892576.0 307435264.0 268354448.0 2014-12-23 603993.sha 26.009903 0.825376 0.720310 0.775914 0.006452 1.063747 1.077195 0.007677 1.326304 10.383782 10.204933 9.363290 279892576.0 9.363290 7
555182 212258688.0 301725888.0 270177920.0 2014-12-24 603993.sha 24.957817 0.787069 0.728331 0.787931 0.042241 0.998906 1.081831 0.091846 0.973121 9.342248 9.342248 8.984550 212258688.0 8.700495 9

StockRanker预测 m8

m8_predictions = m8.predictions.read_df()  # 预测股票排序数据
m8_predictions.head()
score date instrument position
0 2.104507 2015-01-05 300391.SZA 1
1 2.022820 2015-01-05 300038.SZA 2
2 1.935834 2015-01-05 300367.SZA 3
3 1.867642 2015-01-05 300302.SZA 4
4 1.701532 2015-01-05 300109.SZA 5
m8_predictions.tail()
score date instrument position
1202053 -0.394825 2016-12-30 600215.SHA 2784
1202054 -0.422984 2016-12-30 300567.SZA 2785
1202055 -0.424847 2016-12-30 000538.SZA 2786
1202056 -0.425943 2016-12-30 002346.SZA 2787
1202057 -0.436981 2016-12-30 002822.SZA 2788

https://i.bigquant.com/user/q83831295/lab/share/userlib%2F%E5%8F%AF%E8%A7%86%E5%8C%96%E7%AD%96%E7%95%A5%20-%20AI%E9%80%89%E8%82%A1%E7%AD%96%E7%95%A5.ipynb?_t=1514366944598