股票连续上涨后的持续上涨的概率是多少

数据可视化
股票数据分析
标签: #<Tag:0x00007fb3d4531f80> #<Tag:0x00007fb3d4531e18>

(think) #1

在BigQuant上做股票数据分析和可视化展示真是太方便了。只需要写两行代码,其他都是搭积木。

在交易中是否我们常常会想一个已经连续涨的股票(比如3天),继续上涨的概率大么?(好吧,这很散户思维,只考虑了一个维度)。除了看连续上涨的股票上涨概率,我们还需要用全市场股票上涨情况做为对比(不然就很业余了)。

  • 构造如下模型
  • 特征输入
    • cond = (close_0 / close_1 > 1.01) & (close_1 / close_2 > 1.01) & (close_2 / close_3 > 1.01):结果为True这三日都在上涨(超过1%),否则为False
    • label = where(shift(return_0, -1) > 1.01, 1, 0):1 表示明天上涨超过1%,0表示明天下跌
  • 用自定义模块计算:主要是使用 pandas groupby
    g1 = df[‘label’].groupby(pd.Grouper(freq=‘M’)).apply(lambda x: x.sum() / len(x))
    g2 = df[‘label’][df[‘cond’]].groupby(pd.Grouper(freq=‘M’)).apply(lambda x: x.sum() / len(x))
  • 最后使用绘制DataFrame模块
  • 结果:初步看起来没有必然上涨关系,和市场强弱有关系
  • 改进:
    • 可以看更长时间周期
    • 考虑不同日期参数等
    • 考虑引入其他维度
克隆策略

    {"Description":"实验创建于2017/8/26","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-215:instruments","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8:data"},{"DestinationInputPortId":"-222:input_data","SourceOutputPortId":"-215:data"},{"DestinationInputPortId":"-191:input_1","SourceOutputPortId":"-222:data"},{"DestinationInputPortId":"-215:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-222:features","SourceOutputPortId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24:data"},{"DestinationInputPortId":"-202:input_data","SourceOutputPortId":"-191:data_1"}],"ModuleNodes":[{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2016-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2019-03-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":"0","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-8","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-215","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-215"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-215"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-215","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":15,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-222","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"drop_na","Value":"True","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"remove_extra_columns","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-222"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-222"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-222","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":16,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"# #号开始的表示注释\n# 多个特征,每行一个,可以包含基础特征和衍生特征\ncond = (close_0 / close_1 > 1.01) & (close_1 / close_2 > 1.01) & (close_2 / close_3 > 1.01)\nlabel = where(shift(return_0, -1) > 1.01, 1, 0)\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24"}],"OutputPortsInternal":[{"Name":"data","NodeId":"287d2cb0-f53c-4101-bdf8-104b137c8601-24","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":3,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-191","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"# Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端\ndef bigquant_run(input_1):\n # 示例代码如下。在这里编写您的代码\n df = input_1.read()\n df.set_index('date', inplace=True)\n g1 = df['label'].groupby(pd.Grouper(freq='M')).apply(lambda x: x.sum() / len(x))\n g2 = df['label'][df['cond']].groupby(pd.Grouper(freq='M')).apply(lambda x: x.sum() / len(x))\n ds = DataSource.write_df(pd.DataFrame({'全市场上涨比例': g1, '连续上涨三天后上涨比例': g2}))\n return Outputs(data_1=ds)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"post_run","Value":"# 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。\ndef bigquant_run(outputs):\n return outputs\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"input_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"params","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"output_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-191"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-191"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-191"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-191","OutputType":null},{"Name":"data_2","NodeId":"-191","OutputType":null},{"Name":"data_3","NodeId":"-191","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"数据聚合计算","CommentCollapsed":false},{"Id":"-202","ModuleId":"BigQuantSpace.plot_dataframe.plot_dataframe-v1","ModuleParameters":[{"Name":"title","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"chart_type","Value":"column","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"x","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"y","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"options","Value":"{\n 'chart': {\n 'height': 400\n }\n}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"candlestick","Value":"False","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"pane_1","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"pane_2","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"pane_3","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"pane_4","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-202"}],"OutputPortsInternal":[],"UsePreviousResults":false,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-8' Position='264,36,200,200'/><NodePosition Node='-215' Position='434,160,200,200'/><NodePosition Node='-222' Position='438,252,200,200'/><NodePosition Node='287d2cb0-f53c-4101-bdf8-104b137c8601-24' Position='626,38,200,200'/><NodePosition Node='-191' Position='498,339,200,200'/><NodePosition Node='-202' Position='567,458,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [52]:
    # 本代码由可视化策略环境自动生成 2019年3月13日 17:16
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
    def m2_run_bigquant_run(input_1):
        # 示例代码如下。在这里编写您的代码
        df = input_1.read()
        df.set_index('date', inplace=True)
        g1 = df['label'].groupby(pd.Grouper(freq='M')).apply(lambda x: x.sum() / len(x))
        g2 = df['label'][df['cond']].groupby(pd.Grouper(freq='M')).apply(lambda x: x.sum() / len(x))
        ds = DataSource.write_df(pd.DataFrame({'全市场上涨比例': g1, '连续上涨三天后上涨比例': g2}))
        return Outputs(data_1=ds)
    
    # 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。
    def m2_post_run_bigquant_run(outputs):
        return outputs
    
    
    m1 = M.instruments.v2(
        start_date='2016-01-01',
        end_date='2019-03-01',
        market='CN_STOCK_A',
        instrument_list='',
        max_count=0
    )
    
    m3 = M.input_features.v1(
        features="""# #号开始的表示注释
    # 多个特征,每行一个,可以包含基础特征和衍生特征
    cond = (close_0 / close_1 > 1.01) & (close_1 / close_2 > 1.01) & (close_2 / close_3 > 1.01)
    label = where(shift(return_0, -1) > 1.01, 1, 0)
    """
    )
    
    m15 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m3.data,
        start_date='',
        end_date='',
        before_start_days=0
    )
    
    m16 = M.derived_feature_extractor.v3(
        input_data=m15.data,
        features=m3.data,
        date_col='date',
        instrument_col='instrument',
        drop_na=True,
        remove_extra_columns=False
    )
    
    m2 = M.cached.v3(
        input_1=m16.data,
        run=m2_run_bigquant_run,
        post_run=m2_post_run_bigquant_run,
        input_ports='',
        params='{}',
        output_ports=''
    )
    
    m4 = M.plot_dataframe.v1(
        input_data=m2.data_1,
        title='',
        chart_type='column',
        x='',
        y='',
        options={
        'chart': {
            'height': 400
        }
    },
        candlestick=False,
        pane_1='',
        pane_2='',
        pane_3='',
        pane_4=''
    )
    

    有 用数据出些 历史数据曲线图的例子吗,谢谢
    (wujunjun) #2

    我的pandas什么时候才能用的和你一样优秀


    (think) #3

    跟着BigQuant平台学的 :)