DQN个股择时策略研究与改进

策略分享
dqn
rl
标签: #<Tag:0x00007f4cdaf2c708> #<Tag:0x00007f4cdaf2c578> #<Tag:0x00007f4cdaf2c438>

(iQuant) #1

之前在社区分享过一个初版的强化学习策略,之后我们在那个基础上做了一些调整和优化,本文主要是关于新版策略的一些介绍和结果分析。

1. 与初版策略的区别

新版策略与初版的主要区别在于state的定义不同。初版用当天的OHLCV和7个常用因子数据作为一条state。新版设置了一个window_size参数,从当天向前取window_size天的收盘价数据作为一个block,之后对block中的数据依次计算1/e^(-(block[i+1]-block[i])),得到一条state数据,具体如下图所示:
1

新版的回测也做了一些修改,初版是在有买入信号时,买入一定数量的股票,如果有连续的买入信号,在资金允许的情况下将连续买入,卖出同理。新版改为当出现买入信号时,用全部资金买入,有卖出信号时将全部持有股票卖出,同时输出在期初买入并持有到期末时的收益率,便于将两者进行比较分析。

2. 结果分析

我们将之前提到的在期初买入并持有到期末时的收益率作为基准收益率,在多只股票上进行了训练和测试,其中部分结果如下表所示:

股票 DQN收益率 基准收益率
000523.SZA -0.09 -0.16
000333.SZA 0.44 0.47
000001.SZA -0.01 -0.021
000004.SZA 0.49 0.08
300003.SZA -0.14 -0.21

可以看到,在大部分股票上DQN的收益率都高于基准收益率,虽然我们只测试了某些股票,但已经覆盖了趋势上升,趋势下降,宽幅震荡等价格走势,已经能初步体现出强化学习的优势

3. Tips

初版策略分享后有用户反映没有提供调参的接口,因此这次添加了batch_size, window_size, total_episode, learning_rate等可调节的参数。

另外还有一些关于深度神经网络的参数,比如网络层数,各层的输出维度,激活函数等,封装起来有一些困难,因此没有在模块里直接提供参数接口,后续我们会继续调整,尽可能的支持更多的参数调节。

4.

新版策略依然有很大的调整空间,我们会继续在这个基础上进行研究和改进,这次版本更新也只是给大家提供一个调整的思路,欢迎大家和我们一起尝试调整和讨论。

参考文献

1.《DQN个股择时策略研究》
2.《Introduction to Deep Q-network》
3. 深度强化学习(一): Deep Q Network(DQN)
4. 2018秋季CS294-112深度强化学习课程
5. Simple Reinforcement Learning with Tensorflow Part 4: Deep Q-Networks and Beyond

克隆策略

    {"Description":"实验创建于2019/7/10","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-213:options_data","SourceOutputPortId":"-3071:data"},{"DestinationInputPortId":"-213:instruments","SourceOutputPortId":"-2172:data"},{"DestinationInputPortId":"-56:input_2","SourceOutputPortId":"-2172:data"},{"DestinationInputPortId":"-56:input_1","SourceOutputPortId":"-49:data"},{"DestinationInputPortId":"-3071:input_data","SourceOutputPortId":"-56:data_1"}],"ModuleNodes":[{"Id":"-3071","ModuleId":"BigQuantSpace.dropnan.dropnan-v1","ModuleParameters":[],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-3071"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-3071","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-213","ModuleId":"BigQuantSpace.trade.trade-v4","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"initialize","Value":"# 回测引擎:初始化函数,只执行一次\ndef bigquant_run(context):\n # 加载预测数据\n context.pre_act = context.options['data'].read_df()\n # 系统已经设置了默认的交易手续费和滑点,要修改手续()费可使用如下函数\n context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))\n \n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"handle_data","Value":"# 回测引擎:每日数据处理函数,每天执行一次\ndef bigquant_run(context, data):\n sid = context.symbol(context.instruments[0])\n \n action = context.pre_act[context.pre_act['date'] == data.current_dt.strftime('%Y-%m-%d')]\n \n # 持仓\n cur_position = context.portfolio.positions[sid].amount \n \n if len(action['pre_action'])>0:\n\n if int(action['pre_action'])==0 and cur_position ==0:\n #cur_position=cur_position+1\n context.order_target_percent(sid, 1)\n elif int(action['pre_action'])==1 and cur_position > 0:\n context.order_target_percent(sid, 0)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"prepare","Value":"# 回测引擎:准备数据,只执行一次\ndef bigquant_run(context):\n pass\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_trading_start","Value":"# 回测引擎:每个单位时间开始前调用一次,即每日开盘前调用一次。\ndef bigquant_run(context, data):\n pass\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"volume_limit","Value":0.025,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_buy","Value":"open","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"order_price_field_sell","Value":"open","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"capital_base","Value":"100000","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"auto_cancel_non_tradable_orders","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"data_frequency","Value":"daily","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"price_type","Value":"后复权","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"product_type","Value":"股票","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"plot_charts","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"backtest_only","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"benchmark","Value":"000300.SHA","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-213"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"options_data","NodeId":"-213"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"history_ds","NodeId":"-213"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"benchmark_ds","NodeId":"-213"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"trading_calendar","NodeId":"-213"}],"OutputPortsInternal":[{"Name":"raw_perf","NodeId":"-213","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-2172","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2016-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"000004.SZA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-2172"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-2172","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-49","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2010-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2016-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"000004.SZA","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-49"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-49","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-56","ModuleId":"BigQuantSpace.DQN_v2.DQN_v2-v3","ModuleParameters":[{"Name":"batch_size","Value":32,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"window_size","Value":"6","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"total_episode","Value":"5","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"epsilon_decay","Value":0.995,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"learning_rate","Value":0.01,"ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"gamma","Value":0.95,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-56"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-56"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-56","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":6,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-3071' Position='-210,437,200,200'/><NodePosition Node='-213' Position='-96,566,200,200'/><NodePosition Node='-2172' Position='20,187,200,200'/><NodePosition Node='-49' Position='-336,117,200,200'/><NodePosition Node='-56' Position='-204,334,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [8]:
    # 本代码由可视化策略环境自动生成 2019年8月14日 11:23
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # 回测引擎:初始化函数,只执行一次
    def m3_initialize_bigquant_run(context):
        # 加载预测数据
        context.pre_act = context.options['data'].read_df()
        # 系统已经设置了默认的交易手续费和滑点,要修改手续()费可使用如下函数
        context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))
         
    
    # 回测引擎:每日数据处理函数,每天执行一次
    def m3_handle_data_bigquant_run(context, data):
        sid = context.symbol(context.instruments[0])
        
        action = context.pre_act[context.pre_act['date'] == data.current_dt.strftime('%Y-%m-%d')]
        
        # 持仓
        cur_position = context.portfolio.positions[sid].amount 
        
        if len(action['pre_action'])>0:
    
            if int(action['pre_action'])==0 and cur_position ==0:
                #cur_position=cur_position+1
                context.order_target_percent(sid, 1)
            elif int(action['pre_action'])==1 and cur_position > 0:
                context.order_target_percent(sid, 0)
    
    # 回测引擎:准备数据,只执行一次
    def m3_prepare_bigquant_run(context):
        pass
    
    # 回测引擎:每个单位时间开始前调用一次,即每日开盘前调用一次。
    def m3_before_trading_start_bigquant_run(context, data):
        pass
    
    
    m4 = M.instruments.v2(
        start_date='2016-01-01',
        end_date='2017-01-01',
        market='CN_STOCK_A',
        instrument_list='000004.SZA',
        max_count=0
    )
    
    m1 = M.instruments.v2(
        start_date='2010-01-01',
        end_date='2016-01-01',
        market='CN_STOCK_A',
        instrument_list='000004.SZA',
        max_count=0
    )
    
    m6 = M.DQN_v2.v3(
        input_1=m1.data,
        input_2=m4.data,
        batch_size=32,
        window_size=6,
        total_episode=5,
        epsilon_decay=0.995,
        learning_rate=0.01,
        gamma=0.95
    )
    
    m2 = M.dropnan.v1(
        input_data=m6.data_1
    )
    
    m3 = M.trade.v4(
        instruments=m4.data,
        options_data=m2.data,
        start_date='',
        end_date='',
        initialize=m3_initialize_bigquant_run,
        handle_data=m3_handle_data_bigquant_run,
        prepare=m3_prepare_bigquant_run,
        before_trading_start=m3_before_trading_start_bigquant_run,
        volume_limit=0.025,
        order_price_field_buy='open',
        order_price_field_sell='open',
        capital_base=100000,
        auto_cancel_non_tradable_orders=True,
        data_frequency='daily',
        price_type='后复权',
        product_type='股票',
        plot_charts=True,
        backtest_only=False,
        benchmark='000300.SHA'
    )
    
    本次验证的股票标的为: 000004.SZA
    episode: 0
    episode time:  286 current_value: 171476.50234413147  init_value:  6063.861846923828  value if not act: 21384.622192382812
    reward: 150091.88015174866
    episode: 1
    episode time:  294 current_value: 189621.60607528687  init_value:  6063.861846923828  value if not act: 21384.622192382812
    reward: 168236.98388290405
    episode: 2
    episode time:  282 current_value: 213051.11242294312  init_value:  6063.861846923828  value if not act: 21384.622192382812
    reward: 191666.4902305603
    episode: 3
    episode time:  281 current_value: 189959.02869987488  init_value:  6063.861846923828  value if not act: 21384.622192382812
    reward: 168574.40650749207
    episode: 4
    episode time:  281 current_value: 210317.75874710083  init_value:  6063.861846923828  value if not act: 21384.622192382812
    reward: 188933.13655471802
    buy and hold benchmark return :  0.0869038912887421
    prediction action result :              date
    pre_action      
    0            234
    1              3
    
    • 收益率49.1%
    • 年化收益率51.06%
    • 基准收益率-11.28%
    • 阿尔法0.54
    • 贝塔0.73
    • 夏普比率1.22
    • 胜率1.0
    • 盈亏比0.0
    • 收益波动率36.81%
    • 信息比率0.11
    • 最大回撤19.57%
    bigcharts-data-start/{"__id":"bigchart-ffce8f4da5214cecb3c49436eb82b110","__type":"tabs"}/bigcharts-data-end

    (tkyz) #4


    这个只能用于单个股票吗?用整个市场的股票运行,出现错误


    (focus666) #5

    学习了 谢谢分享


    (小Q) #6

    题目就说的的个股的择时预测哦,不过后期我们会推出全市场的。


    (Quant_Squall) #7

    赞!太牛了!


    (outside) #8

    用i+1的数据去预测i的状态,会不会有未来函数呢?


    (xgl891) #9

    这里的state[i]指的不是第i天的state,每天的state都是个类似list,state[i]指的是一条state数据里的第i个值,每一条state数据都是用的向前取的数据计算的。