历史研究数据怎么存储比较好?

datasource
标签: #<Tag:0x00007fcf5fec8c00>

(pivotll) #1

历史数据量很大每次都要调取,如果用增量方式拼接。之前的数据用什么方式保存比较好,效率高空间小的?能不能给个例子


模拟交易不能读取*.h5文件,怎么解决?
(iQuant) #2

这里介绍了DataSource这种方式的存储和读取,你可以先了解下,如果还有疑问欢迎@我们

克隆策略

保存数据为DataSource

In [1]:
def foo1():
    start_date = '2018-02-01'
    end_date = '2018-02-28'
    ins = ['000002.SZA','000001.SZA','000333.SZA']
    fields = ['close_0','open_0','high_0','low_0','daily_return_0','market_cap_0','fs_roe_0']
    df = D.features(ins, start_date, end_date, fields)
    ds = DataSource.write_df(df)
    return Outputs(data=ds)

# 使用 M.cached 实现 DataSource 复用
m1 = M.cached.v2(run=foo1)
[2018-03-08 14:22:18.013833] INFO: bigquant: cached.v2 开始运行..
[2018-03-08 14:22:18.070114] INFO: bigquant: cached.v2 运行完成[0.056321s].

DataSource的id

In [2]:
ds_id = m1.data.id

通过id获取数据

In [4]:
data = DataSource(id=ds_id).read_df()

查看数据

In [5]:
data.head()
Out[5]:
open_0 market_cap_0 date fs_roe_0 high_0 close_0 daily_return_0 instrument low_0
0 1483.007690 2.409009e+11 2018-02-01 9.1144 1520.215698 1491.512329 0.998576 000001.SZA 1471.313721
1 5148.751953 4.139682e+11 2018-02-01 9.6520 5222.894043 5148.751953 0.998402 000002.SZA 5066.372070
2 257.851990 3.885896e+11 2018-02-01 22.6654 258.281738 255.015610 0.991976 000333.SZA 249.858566
3 1478.755249 2.412443e+11 2018-02-02 9.1144 1498.953979 1493.638550 1.001426 000001.SZA 1448.988770
4 5011.452148 4.126435e+11 2018-02-02 9.6520 5203.671875 5132.275879 0.996800 000002.SZA 4915.341797