读取外部数据后如何转换格式


(iQuant) #1

BigQuant支持外部数据导入和读取,对已导入的数据利用模块化进行格式转换,这里以date列转换为时间格式为例,提供两种方式:

一、使用读取csv文件模块

  1. 在模块中找到“读取csc文件模块”
    image

  2. 修改模块里对应的参数即可
    image

二、使用读取数据(文件)模块

  1. 在模块中找到“读取数据(文件)模块和自定义python模块:
    image
    image
  2. 链接两个模块,并在自定义模块中设定转换格式
    image
Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
def bigquant_run(input_1, input_2, input_3):
    # 示例代码如下。在这里编写您的代码
    data_1 = input_1.read_df()
    data_1.date = pd.to_datetime(data_1.date)
    data_1 = DataSource.write_df(data_1)
    return Outputs(data_1=data_1, data_2=None, data_3=None)
  1. 示例
克隆策略
In [9]:
#此行表示当作外部数据“data_to_read“,可不管,只使用可视化模块部分
DataSource('bar1d_CN_STOCK_A').read(start_date='2020-06-23', end_date='2020-06-23').to_csv('data_to_read.csv', index=False)

    {"Description":"实验创建于2020/6/24","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-80:input_1","SourceOutputPortId":"-69:data"}],"ModuleNodes":[{"Id":"-69","ModuleId":"BigQuantSpace.datahub_load_file.datahub_load_file-v1","ModuleParameters":[{"Name":"file_path","Value":"# Python 动态生成文件路径\ndef bigquant_run():\n # 示例代码如下。在这里编写您的代码\n # 如果不需要动态生成路径,直接返回path即可\n # import datetime\n # import os\n # base_path = \"/var/app/data/datahub\"\n # data_path = os.path.join(base_path, \"bigcrawler\")\n # date = datetime.datetime.now().strftime(\"%Y%m%d\")\n # file_path = os.path.join(data_path, \"{}.csv\".format(date))\n # return file_path\n\n return \"data_to_read.csv\"\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"file_type","Value":"csv","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"csv_delimiter","Value":",","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"h5_data_key","Value":"data","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[],"OutputPortsInternal":[{"Name":"data","NodeId":"-69","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-80","ModuleId":"BigQuantSpace.cached.cached-v3","ModuleParameters":[{"Name":"run","Value":"# Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端\ndef bigquant_run(input_1, input_2, input_3):\n # 示例代码如下。在这里编写您的代码\n data_1 = input_1.read_df()\n data_1.date = pd.to_datetime(data_1.date)\n data_1 = DataSource.write_df(data_1)\n return Outputs(data_1=data_1, data_2=None, data_3=None)\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"post_run","Value":"# 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。\ndef bigquant_run(outputs):\n return outputs\n","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"input_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"params","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"output_ports","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_1","NodeId":"-80"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_2","NodeId":"-80"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_3","NodeId":"-80"}],"OutputPortsInternal":[{"Name":"data_1","NodeId":"-80","OutputType":null},{"Name":"data_2","NodeId":"-80","OutputType":null},{"Name":"data_3","NodeId":"-80","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-69' Position='259,153,200,200'/><NodePosition Node='-80' Position='208,292,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [10]:
    # 本代码由可视化策略环境自动生成 2020年6月24日 13:59
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    # Python 动态生成文件路径
    def m1_file_path_bigquant_run():
        # 示例代码如下。在这里编写您的代码
        # 如果不需要动态生成路径,直接返回path即可
        # import datetime
        # import os
        # base_path = "/var/app/data/datahub"
        # data_path = os.path.join(base_path, "bigcrawler")
        # date = datetime.datetime.now().strftime("%Y%m%d")
        # file_path = os.path.join(data_path, "{}.csv".format(date))
        # return file_path
    
        return "data_to_read.csv"
    
    # Python 代码入口函数,input_1/2/3 对应三个输入端,data_1/2/3 对应三个输出端
    def m2_run_bigquant_run(input_1, input_2, input_3):
        # 示例代码如下。在这里编写您的代码
        data_1 = input_1.read_df()
        data_1.date = pd.to_datetime(data_1.date)
        data_1 = DataSource.write_df(data_1)
        return Outputs(data_1=data_1, data_2=None, data_3=None)
    
    # 后处理函数,可选。输入是主函数的输出,可以在这里对数据做处理,或者返回更友好的outputs数据格式。此函数输出不会被缓存。
    def m2_post_run_bigquant_run(outputs):
        return outputs
    
    
    m1 = M.datahub_load_file.v1(
        file_path=m1_file_path_bigquant_run,
        file_type='csv',
        csv_delimiter=',',
        h5_data_key='data'
    )
    
    m2 = M.cached.v3(
        input_1=m1.data,
        run=m2_run_bigquant_run,
        post_run=m2_post_run_bigquant_run,
        input_ports='',
        params='{}',
        output_ports=''
    )
    

    读取数据(文件) 数据统计 (前 3872 行) </font></font>

    adjust_factor amount close date deal_number high instrument low open turn volume
    count(Nan) 0 0 0 0 0 0 0 0 0 0 0
    type float64 float64 float64 object float64 float64 object float64 float64 float64 float64

    读取数据(文件) 数据预览 (前 5 行) </font></font>

    adjust_factor amount close date deal_number high instrument low open turn volume
    0 5.123768 38536857.0 39.094350 2020-06-23 4105.0 39.709200 300096.SZA 38.940636 39.709200 1.338301 5039000.0
    1 3.165347 230194233.0 47.037056 2020-06-23 19610.0 47.321938 002734.SZA 44.821312 45.644302 4.900450 15704121.0
    2 4.605921 15872418.0 23.444138 2020-06-23 1337.0 23.582315 002533.SZA 23.398079 23.536257 0.595412 3112828.0
    3 8.016884 22200309.0 21.485249 2020-06-23 2567.0 22.126600 002379.SZA 21.405080 22.126600 0.883406 8183875.0
    4 3.636345 3080365.0 10.218129 2020-06-23 238.0 10.254493 600423.SHA 10.109039 10.109039 0.137625 1099200.0

    (iQuant) #2

    (iQuant) #3