使用 282 个因子测试时部分因子计算报错

新手专区
问答交流
标签: #<Tag:0x00007fcf637ed1c0> #<Tag:0x00007fcf637ed058>

(qci133) #1

使用 282 个因子测试时部分因子计算报错。先是显示 ta_rsi 计算出异常,将其注释掉后,又报错说 sqrt 计算失败。请问是什么原因?

异常内容:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-cc710da306be> in <module>()
     17 m4 = M.join.v2(data1=m1.data, data2=m3.data, on=['date', 'instrument'], sort=True)
     18 # StockRanker机器学习训练
---> 19 m5 = M.stock_ranker_train.v3(training_ds=m4.data, features=conf.features)

KeyError: 'sqrt'
克隆策略

示例1:以市净率排序因子举例

直接使用基础因子 (例如:rank_pb_lf_0) 或者通过表达式组合基础因子构建衍生因子 (例如: (close_1 + close_2 + close_3) / close_0)

In [5]:
# 基础参数配置
class conf:
    start_date = '2010-01-01'
    end_date='2017-10-01'
    # split_date 之前的数据用于训练,之后的数据用作效果评估
    split_date = '2015-01-01'
    # D.instruments: https://bigquant.com/docs/data_instruments.html
    instruments = D.instruments(start_date, split_date)
    hold_days = 40
    
    # 机器学习目标标注函数
    # 如下标注函数等价于 min(max((持有期间的收益 * 100), -20), 20) + 20 (后面的M.fast_auto_labeler会做取整操作)
    # 说明:max/min这里将标注分数限定在区间[-20, 20],+20将分数变为非负数 (StockRanker要求标注分数非负整数)
    label_expr = ['return * 100/%s'%(np.sqrt(hold_days/3)), 'where(label > {0}, {0}, where(label < -{0}, -{0}, label)) + {0}'.format(20)]
   
    # 特征 https://bigquant.com/docs/data_features.html,你可以通过表达式构造任何特征
    features = [
        'turn_0',
        'return_6',
        'fs_roe_0',
        'fs_eps_0',
        'fs_bps_0',
        'fs_roa_0',
        'return_20',
        'rank_turn_0',
        'rank_turn_9',
        #'ta_rsi(close_0,28)',
        'rank_pb_lf_0',
        'fs_roa_ttm_0',
        'fs_roe_ttm_0',
        'high_0/low_0',
        'fs_eps_yoy_0',
        'sqrt(high_0*low_0)-amount_0/volume_0*adjust_factor_0',
        'sum(max(0,high_0-delay(close_0,1)),20)/sum(max(0,delay(close_0, 1)-low_0),20)*100',
        '((close_0-open_0)/((high_0-low_0)+.001))',
        'turn_9',
        'ta_ema(((high_0+low_0)/2-(delay(high_0,1)+delay(low_0,1))/2)*(high_0-low_0)/volume_0,7)',
        'turn_1',
        'fs_operating_revenue_yoy_0',
        'fs_operating_revenue_qoq_0',
        'fs_net_profit_margin_ttm_0',
        'fs_gross_profit_margin_ttm_0',
        'rank_pe_lyr_0',
        'rank_pe_ttm_0',
        'rank_ps_ttm_0',
        'rank_return_9',
        'rank_fs_bps_0',
        'rank_return_6',
        'rank_return_15',
        'close_1/open_0',
        'open_0/close_0',
        'high_0/close_1',
        'close_0/open_0',
        'rank_return_30',
        'rank_return_20',
        'rank_avg_turn_1',
        'close_9/close_0',
        'rank_avg_turn_6',
        'fs_cash_ratio_0',
        'close_4/close_0',
        'close_6/close_0',
        'close_2/close_0',
        'close_3/close_0',
        'close_5/close_0',
        'close_1/close_0',
        'rank_avg_turn_0',
        'volume_0/mean(volume_0, 3)*100', 
        'rank_avg_turn_3',
        'rank_avg_turn_9',
        'close_20/close_0',
        'rank_avg_turn_15',
        'close_15/close_0',
        'rank_avg_turn_20',
        'rank_market_cap_0',
        'amount_2/amount_0',
        'rank_fs_eps_yoy_0',
        'return_5/return_0',
        'amount_4/amount_0',
        'rank_fs_roe_ttm_0',
        'return_9/return_0',
        'amount_3/amount_0',
        'amount_5/amount_0',
        '(-1*correlation(open_0,volume_0,10))' ,
        '(-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9))', 
        'ta_atr(high_0,low_0,close_0,5)',
        '((-1*((low_0-close_0)*(open_0**5)))/((low_0-high_0)*(close_0**5)))',
        'turn_6',
        '-1*delta(((close_0-low_0)-(high_0-close_0))/(high_0-low_0),1)',
        'turn_3',
        'std(volume_0,10)',
        'ta_ema(((high_0+low_0-0)/2-(delay(high_0,1)+delay(low_0,1))/2)*(high_0-low_0)/volume_0,15)',
        '(close_0-mean(close_0,12))/mean(close_0,12)*100',
        '(close_0-delay(close_0,6))/delay(close_0,6)*volume_0',
        '(volume_0-delay(volume_0,5))/delay(volume_0,5)*100',
        'sum(((close_0-low_0)-(high_0-close_0))/(high_0-low_0)*volume_0,20)',
        '(close_0-mean(close_0,24))/mean(close_0,24)*100',
        '((sum(close_0,7)/7)-close_0)+correlation(amount_0/volume_0*adjust_factor_0,delay(close_0,5),230)',
        'turn_15',
        'rank((-1*((1-(open_0/close_0))**1)))', 
        'mean(close_0,12)/close_0',
        'ta_ema((close_0-ts_min(low_0,9))/(ts_max(high_0,9)-ts_min(low_0,9))*100,3)',
        'turn_20',
        '(close_0-delay(close_0,20))/delay(close_0,20)*100',
        'close_0-delay(close_0,5)',
        'ta_ema(volume_0, 21)',
        'close_0/delay(close_0,5)',
        'std(amount_0,20)',
        'sum(((close_0-low_0)-(high_0-close_0))/(high_0-low_0)*volume_0,6)',
        '((high_0+low_0+close_0)/3-mean((high_0+low_0+close_0)/3,12))/(0.015*mean(abs(close_0-mean((high_0+low_0+close_0)/3,12)),12))',
        'std(amount_0,6)',
        'ta_ema(((ts_max(high_0,6)-close_0)/(ts_max(high_0,6)-ts_min(low_0,6))*100),20)',
        'ta_ema(ta_ema((close_0-ts_min(low_0,9))/(ts_max(high_0,9)-ts_min(low_0,9))*100,3),3)',
        '(close_0-delay(close_0,6))/delay(close_0,6)*100',
        '(((high_0*low_0)**0.5)-amount_0/volume_0*adjust_factor_0)',
        '(mean(close_0,3)+mean(close_0,6)+mean(close_0,12)+mean(close_0,24))/(4*close_0)',
        'ta_ema(close_0-delay(close_0,5),5)',
        'ta_ema(high_0-low_0,10)/ta_ema(ta_ema(high_0-low_0,10),10)',
        '((high_0-ta_ema(close_0,15))-(low_0-ta_ema(close_0,15)))/close_0',
        '(close_0+high_0+low_0)/3',
        'std(volume_0,20)',
        'open_0/shift(close_0,1)-1 ',  
        'return_9',
        '(mean(close_0,3)+mean(close_0,6)+mean(close_0,12)+mean(close_0,24))/4',
        'rank(delta(((((high_0+low_0)/2)*0.2)+(amount_0/volume_0*adjust_factor_0*0.8)),4)*-1)',
        '(rank(sign(delta((((open_0*0.85)+(high_0 *0.15))),4)))*-1)',
        '(-1*correlation(close_0,volume_0, 10))',
        'close_0-delay(close_0,20)',
        '(close_0-delay(close_0,1))/delay(close_0,1)*volume_0',
        '(close_0-delay(close_0,12))/delay(close_0,12)*volume_0',
        'return_3',
        'return_0',
        '(high_0-low_0-ta_ema(high_0-low_0, 11))/ta_ema(high_0-low_0, 11)*100',
        'return_1',
        'mean(abs(close_0-mean(close_0,6)),6)',
        '-1*((low_0-close_0*(open_0**5)))/((close_0-high_0)*(close_0**5))',
        'mean(amount_0,20)',
        'return_30',
        'return_15',
        '(rank((amount_0/volume_0*adjust_factor_0-close_0))/rank((amount_0/volume_0*adjust_factor_0 + close_0)))', 
        '((rank(max((amount_0/volume_0*adjust_factor_0-close_0),3))+rank(min((amount_0/volume_0*adjust_factor_0-close_0), 3)))*rank(delta(volume_0, 3)))',
        'ta_beta(high_0,low_0,12)',
        'correlation(amount_0/volume_0*adjust_factor_0,volume_0,5)',
        'ta_adx(high_0,low_0,close_0,14)',
        'rank_turn_3',
        'rank_turn_1',
        'correlation(high_0/low_0,volume_0,4)',
        'rank_turn_6',
        #'ta_rsi(close_0,14)',
        'rank_turn_15',
        'rank_turn_20',
        'rank_fs_roa_0',
        'rank_fs_roe_0',
        'rank_fs_eps_0',
        'rank_return_3',
        'rank_return_1',
        'rank_return_0',
        'low_0/close_1',
        'return_4/return_0',
        'rank_fs_roa_ttm_0',
        'amount_1/amount_0',
        'ta_wma(close_0,5)/close_0',
        'mean(close_0,5)/close_0',
        'ta_ema(close_0,5)/close_0',
        'ta_atr(high_0,low_0,close_0,14)/close_0',
        'avg_turn_9/turn_0',
        'avg_turn_1/turn_0',
        'ta_wma(close_0,30)/close_0',
        'return_9/return_5',
        'avg_turn_6/turn_0',
        'return_3/return_0',
        'ta_atr(high_0,low_0,close_0,28)/close_0',
        'close_0/mean(close_0,10)',
        'return_1/return_5',
        'return_0/return_3',
        'mean(close_0,30)/close_0',
        'return_1/return_0',
        'return_9/return_3',
        'ta_ema(close_0,30)/close_0',
        'avg_turn_3/turn_0',
        'return_1/return_3',
        'close_0/mean(close_0,30)',
        'return_6/return_5',
        'return_6/return_0',
        'close_0/mean(close_0,20)',
        'return_0/return_5',
        'return_6/return_3',
        'fs_net_profit_yoy_0',
        'fs_net_profit_qoq_0',
        'return_90/return_5',
        'return_15/return_0',
        'avg_turn_15/turn_0',
        'return_20/return_5',
        'return_50/return_5',
        'rank_sh_holder_num_0',
        'return_30/return_5',
        'avg_turn_20/turn_0',
        'return_30/return_0',
        'return_30/return_3',
        'return_20/return_0',
        'return_20/return_3',
        'return_15/return_5',
        'rank_fs_cash_ratio_0',
        'return_70/return_5',
        'return_60/return_5',
        'return_80/return_5',
        'return_15/return_3',
        'return_30/return_10',
        'return_70/return_10',
        'amount_0/avg_amount_5',
        'return_80/return_10',
        'return_50/return_10',
        'return_20/return_10',
        'return_90/return_10',
        'amount_0/avg_amount_3',
        'return_120/return_5',
        'return_60/return_10',
        'fs_net_profit_margin_0',
        '(high_0-low_0)/close_0',
        'return_120/return_10',
        'mean(close_0,20)/mean(close_0,30)',
        'mean(close_0,30)/mean(close_0,60)',
        'mean(close_0,10)/mean(close_0,60)',
        '(low_1-close_0)/close_0',
        'rank_market_cap_float_0',
        'mean(close_0,10)/mean(close_0,20)',
        '(low_1-close_1)/close_0',
        '(close_1-low_0)/close_0',
        '(low_0-close_1)/close_0',
        'mean(close_0,10)/mean(close_0,30)',
        'rank_fs_net_profit_qoq_0',
        'rank_sh_holder_avg_pct_0',
        'fs_gross_profit_margin_0',
        '(high_0-close_1)/close_0',
        '(high_1-close_0)/close_0',
        'rank_fs_net_profit_yoy_0',
        '(open_0-close_0)/close_0',
        '(close_1-high_0)/close_0',
        '(high_1-close_1)/close_0',
        '(high_0-low_0)/(close_0-open_0)',
        'rank_fs_operating_revenue_yoy_0',
        'rank_fs_operating_revenue_qoq_0',
        '(open_0-close_0)/(high_0-low_0)',
        'rank_sh_holder_avg_pct_6m_chng_0',
        'rank_sh_holder_avg_pct_3m_chng_0',
        'mean(close_0,3)/close_0',
        'mean(amount_0,3)/amount_0',
        'mean(volume_0,3)/volume_0',
        'avg_mf_net_amount_6/mf_net_amount_0',
        'avg_mf_net_amount_9/mf_net_amount_0',
        'avg_mf_net_amount_3/mf_net_amount_0',
        'avg_mf_net_amount_20/mf_net_amount_0',
        'avg_mf_net_amount_15/mf_net_amount_0',
        'avg_mf_net_amount_12/mf_net_amount_0',
        'avg_mf_net_amount_9/avg_mf_net_amount_3',
        'avg_mf_net_amount_6/avg_mf_net_amount_3',
        'close_0/mean(close_0,3)',
        'avg_mf_net_amount_20/avg_mf_net_amount_3',
        'avg_mf_net_amount_12/avg_mf_net_amount_3',
        'avg_mf_net_amount_15/avg_mf_net_amount_3',
        'amount_0/mean(amount_0,3)',
        '((close_0-low_0)-(high_0-close_0))/(high_0-close_0)',
        '(high_0-low_0+high_1-low_1+high_2-low_2)/close_0',
        'mean(close_0,6)/close_0',
        'mean(amount_0,6)/amount_0',
        'mean(volume_0,6)/volume_0',
        '3/1*(high_0-low_0)/(high_0-low_0+high_1-low_1+high_2-low_2)',
        'mean(close_0,6)/mean(close_0,3)',
        'mean(close_0,9)/close_0',
        'mean(amount_0,6)/mean(amount_0,3)',
        'mean(amount_0,9)/amount_0',
        'mean(volume_0,9)/volume_0',
        '(mean(high_0,6)-mean(low_0,6))/close_0',
        'mean(close_0,9)/mean(close_0,3)',
        'mean(amount_0,9)/mean(amount_0,3)',
        'mean(close_0,15)/close_0',
        '(mean(high_0,9)-mean(low_0,9))/close_0',
        'mean(amount_0,15)/amount_0',
        'mean(volume_0,15)/volume_0',
        '(mean(high_0,6)-mean(low_0,6))/(mean(high_0,3)-mean(low_0,3))',
        'mean(close_0,15)/mean(close_0,3)',
        'mean(amount_0,15)/mean(amount_0,3)',
        'mean(close_0,20)/close_0',
        'mean(amount_0,20)/amount_0',
        'mean(volume_0,20)/volume_0',
        'mean(close_0,20)/mean(close_0,3)',
        '(mean(high_0,9)-mean(low_0,9))/(mean(high_0,3)-mean(low_0,3))',
        'mean(amount_0,20)/mean(amount_0,3)',
        '(sum(high_0,15)-sum(low_0,15))/close_0',
        '(mean(high_0,15)-mean(low_0,15))/(mean(high_0,3)-mean(low_0,3))',
        '(sum(high_0,20)-sum(low_0,20))/close_0',
        '(mean(high_0,20)-mean(low_0,20))/(mean(high_0,3)-mean(low_0,3))',
    ]
In [6]:
# 给数据做标注:给每一行数据(样本)打分,一般分数越高表示越好
m1 = M.fast_auto_labeler.v8(
    instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
    label_expr=conf.label_expr, hold_days=conf.hold_days,
    benchmark='000300.SHA', sell_at='open', buy_at='open')

# 计算特征数据
m2 = M.general_feature_extractor.v5(
    instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
    features=conf.features)
# 数据预处理:缺失数据处理,数据规范化,T.get_stock_ranker_default_transforms为StockRanker模型做数据预处理
m3 = M.transform.v2(
    data=m2.data, transforms=T.get_stock_ranker_default_transforms(),
    drop_null=True, astype='int32', except_columns=['date', 'instrument'],
    clip_lower=0, clip_upper=200000000)
# 合并标注和特征数据
m4 = M.join.v2(data1=m1.data, data2=m3.data, on=['date', 'instrument'], sort=True)
# StockRanker机器学习训练
m5 = M.stock_ranker_train.v3(training_ds=m4.data, features=conf.features)
[2017-10-12 07:01:33.352926] INFO: bigquant: fast_auto_labeler.v8 开始运行..
[2017-10-12 07:01:33.356315] INFO: bigquant: 命中缓存
[2017-10-12 07:01:33.361737] INFO: bigquant: fast_auto_labeler.v8 运行完成[0.008872s].
[2017-10-12 07:01:33.376946] INFO: bigquant: general_feature_extractor.v5 开始运行..
[2017-10-12 07:01:33.379164] INFO: bigquant: 命中缓存
[2017-10-12 07:01:33.380081] INFO: bigquant: general_feature_extractor.v5 运行完成[0.00314s].
[2017-10-12 07:01:33.390706] INFO: bigquant: transform.v2 开始运行..
[2017-10-12 07:01:33.392467] INFO: bigquant: 命中缓存
[2017-10-12 07:01:33.393199] INFO: bigquant: transform.v2 运行完成[0.002497s].
[2017-10-12 07:01:33.399965] INFO: bigquant: join.v2 开始运行..
[2017-10-12 07:01:33.401987] INFO: bigquant: 命中缓存
[2017-10-12 07:01:33.402967] INFO: bigquant: join.v2 运行完成[0.002993s].
[2017-10-12 07:01:33.411562] INFO: bigquant: stock_ranker_train.v3 开始运行..
[2017-10-12 07:01:33.423215] INFO: stock_ranker_train: 202dd4c2 training: 2239612 rows
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-cc710da306be> in <module>()
     17 m4 = M.join.v2(data1=m1.data, data2=m3.data, on=['date', 'instrument'], sort=True)
     18 # StockRanker机器学习训练
---> 19 m5 = M.stock_ranker_train.v3(training_ds=m4.data, features=conf.features)

KeyError: 'sqrt'
In [ ]:
## 量化回测 https://bigquant.com/docs/module_trade.html
# 回测引擎:准备数据,只执行一次
def prepare(context):
    # context.start_date / end_date,回测的时候,为trader传入参数;在实盘运行的时候,由系统替换为实盘日期
    n1 = M.general_feature_extractor.v5(
        instruments=D.instruments(),
        start_date=context.start_date, end_date=context.end_date,
        model_id=context.options['model_id'])
    n2 = M.transform.v2(
        data=n1.data, transforms=T.get_stock_ranker_default_transforms(),
        drop_null=True, astype='int32', except_columns=['date', 'instrument'],
        clip_lower=0, clip_upper=200000000)
    n3 = M.stock_ranker_predict.v2(model_id=context.options['model_id'], data=n2.data)
    context.instruments = n3.instruments
    context.options['predictions'] = n3.predictions
    

# 回测引擎:初始化函数,只执行一次
def initialize(context):
    # 加载预测数据
    context.ranker_prediction = context.options['predictions'].read_df()
    buy_cost = 0.0003
    sell_cost = 0.0013
    # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数
    context.set_commission(PerOrder(buy_cost=buy_cost, sell_cost=sell_cost, min_cost=5))
    # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)
    # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只
    stock_count = 5
    # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]
    context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, stock_count)])
    # 设置每只股票占用的最大资金比例
    context.max_cash_per_instrument = 0.2

# 回测引擎:每日数据处理函数,每天执行一次
def handle_data(context, data):
    # 按日期过滤得到今日的预测数据
    ranker_prediction = context.ranker_prediction[
        context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]

    # 1. 资金分配
    # 平均持仓时间是hold_days,每日都将买入股票,每日预期使用 1/hold_days 的资金
    # 实际操作中,会存在一定的买入误差,所以在前hold_days天,等量使用资金;之后,尽量使用剩余资金(这里设置最多用等量的1.5倍)
    is_staging = context.trading_day_index < context.options['hold_days'] # 是否在建仓期间(前 hold_days 天)
    cash_avg = context.portfolio.portfolio_value / context.options['hold_days']
    cash_for_buy = min(context.portfolio.cash, (1 if is_staging else 1.5) * cash_avg)
    cash_for_sell = cash_avg - (context.portfolio.cash - cash_for_buy)
    positions = {e.symbol: p.amount * p.last_sale_price
                 for e, p in context.perf_tracker.position_tracker.positions.items()}

    # 2. 生成卖出订单:hold_days天之后才开始卖出;对持仓的股票,按StockRanker预测的排序末位淘汰
    if not is_staging and cash_for_sell > 0:
        equities = {e.symbol: e for e, p in context.perf_tracker.position_tracker.positions.items()}
        instruments = list(reversed(list(ranker_prediction.instrument[ranker_prediction.instrument.apply(
                lambda x: x in equities and not context.has_unfinished_sell_order(equities[x]))])))
        # print('rank order for sell %s' % instruments)
        for instrument in instruments:
            context.order_target(context.symbol(instrument), 0)
            cash_for_sell -= positions[instrument]
            if cash_for_sell <= 0:
                break

    # 3. 生成买入订单:按StockRanker预测的排序,买入前面的stock_count只股票
    buy_cash_weights = context.stock_weights
    buy_instruments = list(ranker_prediction.instrument[:len(buy_cash_weights)])
    max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument
    for i, instrument in enumerate(buy_instruments):
        cash = cash_for_buy * buy_cash_weights[i]
        if cash > max_cash_per_instrument - positions.get(instrument, 0):
            # 确保股票持仓量不会超过每次股票最大的占用资金量
            cash = max_cash_per_instrument - positions.get(instrument, 0)
        if cash > 0:
            context.order_value(context.symbol(instrument), cash)
In [ ]:
m6 = M.trade.v2(
    instruments=None,
    start_date=conf.split_date,
    end_date=conf.end_date,
    prepare=prepare,
    initialize=initialize,
    handle_data=handle_data,
    order_price_field_buy='open',       # 表示 开盘 时买入
    order_price_field_sell='close',     # 表示 收盘 前卖出
    capital_base=100000,               # 初始资金
    benchmark='000300.SHA',             # 比较基准,不影响回测结果
    # 通过 options 参数传递预测数据和参数给回测引擎
    options={'hold_days': conf.hold_days, 'model_id': m5.model_id})

示例2:以国泰君安100因子举例

使用 M.user_feature_extractor 可以构建出任何复杂的因子

In [ ]:
class conf:
    start_date = '2011-01-01'
    end_date='2017-07-18'
    # split_date 之前的数据用于训练,之后的数据用作效果评估
    split_date = '2016-01-01'
    # D.instruments: https://bigquant.com/docs/data_instruments.html
    instruments = D.instruments(start_date, split_date)
    hold_days = 6
    
    # 机器学习目标标注函数
    # 如下标注函数等价于 min(max((持有期间的收益 * 100), -20), 20) + 20 (后面的M.fast_auto_labeler会做取整操作)
    # 说明:max/min这里将标注分数限定在区间[-20, 20],+20将分数变为非负数 (StockRanker要求标注分数非负整数)
    label_expr = ['return * 100/%s'%(np.sqrt(hold_days/3)), 'where(label > {0}, {0}, where(label < -{0}, -{0}, label)) + {0}'.format(10)]

# 自定义因子举例 
user_features = {
     'gtja_100':lambda x:x.volume.rolling(20).std()
}
In [ ]:
m1 = M.fast_auto_labeler.v8(
        instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
        label_expr=conf.label_expr, hold_days=conf.hold_days,
        benchmark='000300.SHA', sell_at='open', buy_at='open',plot_charts=False)

m2_u = M.user_feature_extractor.v1(
    instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
    history_data_fields=['volume'],look_back_days=30,
    features_by_instrument={'gtja_100': lambda x:x.volume.rolling(20).std()},
                     )

m3 = M.transform.v2(
    data=m2_u.data, transforms=T.get_stock_ranker_default_transforms()+[('.*', None)],
    drop_null=True, astype='int32', except_columns=['date', 'instrument'],
    clip_lower=0, clip_upper=200000000)

m4 = M.join.v2(data1=m1.data, data2=m3.data, on=['date', 'instrument'], sort=True)


# 只对于一个特征
m5 = M.stock_ranker_train.v2(training_ds=m4.data, features=['gtja_100'])
In [ ]:
def prepare(context):
    instruments = D.instruments(context.start_date, context.end_date)
    n1 = M.user_feature_extractor.v1(
        instruments=instruments, start_date=context.start_date, end_date=context.end_date,
        history_data_fields=['open', 'close', 'high', 'low', 'volume', 'adjust_factor', 'amount'],look_back_days=30,
        features_by_instrument={'gtja_100': lambda x:x.volume.rolling(20).std()},
    )
    n2 = M.transform.v2(
        data=n1.data, transforms=T.get_stock_ranker_default_transforms()+[('.*', None)],
        drop_null=True, astype='int32', except_columns=['date', 'instrument'],
        clip_lower=0, clip_upper=200000000)
    n3 = M.stock_ranker_predict.v2(model_id=context.options['model_id'], data=n2.data)
    context.instruments = n3.instruments
    context.options['predictions'] = n3.predictions 

# 回测引擎:初始化函数,只执行一次
def initialize(context):
    # 加载预测数据
    context.ranker_prediction = context.options['predictions'].read_df()
    buy_cost =  0.0003
    sell_cost = 0.0013
    # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数
    context.set_commission(PerOrder(buy_cost=buy_cost, sell_cost=sell_cost, min_cost=5))
    # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)
    # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只
    stock_count = 5
    # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]
    context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, stock_count)])
    # 设置每只股票占用的最大资金比例
    context.max_cash_per_instrument = 0.2

# 回测引擎:每日数据处理函数,每天执行一次
def handle_data(context, data):
    # 按日期过滤得到今日的预测数据
    ranker_prediction = context.ranker_prediction[
        context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]

    # 1. 资金分配
    # 平均持仓时间是hold_days,每日都将买入股票,每日预期使用 1/hold_days 的资金
    # 实际操作中,会存在一定的买入误差,所以在前hold_days天,等量使用资金;之后,尽量使用剩余资金(这里设置最多用等量的1.5倍)
    is_staging = context.trading_day_index < context.options['hold_days'] # 是否在建仓期间(前 hold_days 天)
    cash_avg = context.portfolio.portfolio_value / context.options['hold_days']
    cash_for_buy = min(context.portfolio.cash, (1 if is_staging else 1.5) * cash_avg)
    cash_for_sell = cash_avg - (context.portfolio.cash - cash_for_buy)
    positions = {e.symbol: p.amount * p.last_sale_price
                 for e, p in context.perf_tracker.position_tracker.positions.items()}

    # 2. 生成卖出订单:hold_days天之后才开始卖出;对持仓的股票,按StockRanker预测的排序末位淘汰
    if not is_staging and cash_for_sell > 0:
        equities = {e.symbol: e for e, p in context.perf_tracker.position_tracker.positions.items()}
        instruments = list(reversed(list(ranker_prediction.instrument[ranker_prediction.instrument.apply(
                lambda x: x in equities and not context.has_unfinished_sell_order(equities[x]))])))
        # print('rank order for sell %s' % instruments)
        for instrument in instruments:
            context.order_target(context.symbol(instrument), 0)
            cash_for_sell -= positions[instrument]
            if cash_for_sell <= 0:
                break

    # 3. 生成买入订单:按StockRanker预测的排序,买入前面的stock_count只股票
    buy_cash_weights = context.stock_weights
    buy_instruments = list(ranker_prediction.instrument[:len(buy_cash_weights)])
    max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument
    for i, instrument in enumerate(buy_instruments):
        cash = cash_for_buy * buy_cash_weights[i]
        if cash > max_cash_per_instrument - positions.get(instrument, 0):
            # 确保股票持仓量不会超过每次股票最大的占用资金量
            cash = max_cash_per_instrument - positions.get(instrument, 0)
        if cash > 0:
            context.order_value(context.symbol(instrument), cash)
In [ ]:
m6 = M.trade.v2(
    instruments=None,
    start_date=conf.split_date,
    end_date=conf.end_date,
    prepare=prepare,
    initialize=initialize,
    handle_data=handle_data,
    order_price_field_buy='open',       # 表示 开盘 时买入
    order_price_field_sell='close',     # 表示 收盘 前卖出
    capital_base=100000,               # 初始资金
    benchmark='000300.SHA',             # 比较基准,不影响回测结果
    # 通过 options 参数传递预测数据和参数给回测引擎
    options={'hold_days': conf.hold_days, 'model_id': m5.model_id})

(iQuant) #6

找到原因了,sqrt是后来支持的,我们稍后修改一下这个模板。


(qci133) #7

不止 sqrt,ta_rsi也会报错。其他的我还没有测试。


(iQuant) #8

@qci133 采取直接新建策略,修改特征这种最快捷的方式来实现。

  1. 新建 > 可视化策略-AI选股策略

image

  1. 选择 输入特征列表

image

  1. 打开 代码编辑器窗口,输入特征

image

  1. 运行策略

image

补充:

  • 我们并不建议把282个因子全部放进特征列表进行测试
  • 在StockRanker算法下部分因子可能会训练失败,你可以修改为其他的机器学习算法
  • 对于BigStudio的使用欢迎点击 链接

(qci133) #9

好的,我试一下。再问一下,你的意思是ta_rsi等之前报错的特征,如果放到可视化策略中就不会出错,如果放到纯代码策略中,就会计算出错吗?

另外还有个建议,bigstudio用的时候有个地方感觉不便:拖动画布与拖动画布上的模块,目前是两种不同的模式,需要点击界面右下方的小图标来切换模式。这种交互方式较为不便。能否改进成如果鼠标抓取的是画布区域,则可拖动画布;反之如果抓取的是模块,则可拖动模块?最好同时还能支持鼠标滚轮缩放画布。


(iQuant) #11

非常感谢您的反馈:

  1. 可视化界面最后也是生成代码(可以在右上角切换)。只是如上的这个代码模板还没有更新到最新,所有有错误
  2. 画布拖动会和圈选模块有一定冲突,还需要想想怎么优化。我们目前是使用左下角的导航图来拖动。滚动缩放和上线滚动联动的时候体验不太好,我们暂时禁用了。可视化界面的体验我们在持续优化的,希望大家更多的反馈和建议。

(qci133) #12

这段话我有点疑问,按照我的想法,如果是较大的树,或者较深的神经网络,按说是更适合使用较高维的特征维度的。为何你说不建议使用282个因子进行测试?是出于性能考虑吗?

StockRanker算法下部分因子可能训练失败,指的是程序会抛出异常,还是说部分因子在逻辑上不适合使用StockRanker算法进行训练?

我这几天提的问题有点多,希望没有过多地打扰到各位。


(iQuant) #13
  • 这282个因子共线性太强
  • StockRanker可能会报错,训练模型失败
  • 该算法下并不一定是因子数量越多越好
  • 因子多,训练时间也会较长,抽取因子的时间也会变长

(qci133) #14

了解。部分因子之间有相关性。
不过对于你说的抽取因子耗时的问题,我的想法是,应该可以通过更优化的缓存办法,使得不同用户不同策略基本都能取得缓存好的feature. 目前的缓存办法,我推测还是一个策略独享一个内核(或者说是容器?),只有这个策略再次回测时能够使用缓存进行加速。如果能全局维护一个全面的feature cache(针对平台自己提供的数千个因子,不一定要支持用户自定义因子),那么无疑会获得很多加速。