以财务数据为准, 历史数据和因子数据都存在缺失数据的情况

新手专区
标签: #<Tag:0x00007fcf6181bf18>

(chaoskey) #1
克隆策略

以财务数据为准, 历史数据和因子数据都存在缺失数据的情况

快速察觉问题, 以"营业总收入为例", 关注000004.SZA的年报数据.

按财报数据(D.financial_statements)查看, 数据是完整的:

In [1]:
instruments = D.instruments(start_date='2005-01-01', end_date='2018-02-02')

df = D.financial_statements(instruments, start_date='2005-01-01', end_date='2018-02-02', 
                            fields=['instrument','fs_publish_date', 'fs_quarter', 'fs_quarter_year','fs_quarter_index','fs_gross_revenues'])

df[(df.instrument=="000004.SZA")&(df.fs_quarter_index==4)]
Out[1]:
instrument fs_publish_date fs_quarter fs_quarter_year fs_quarter_index fs_gross_revenues
2 000004.SZA 2005-04-19 20041231 2004 4 1.06748e+08
5430 000004.SZA 2006-04-20 20051231 2005 4 6.13767e+07
10851 000004.SZA 2007-04-27 20061231 2006 4 4.76686e+07
16643 000004.SZA 2008-04-24 20071231 2007 4 4.04495e+07
22902 000004.SZA 2009-04-20 20081231 2008 4 4.33148e+07
29333 000004.SZA 2010-04-17 20091231 2009 4 6.00806e+07
36634 000004.SZA 2011-04-23 20101231 2010 4 1.31331e+08
45282 000004.SZA 2012-04-21 20111231 2011 4 7.45037e+07
54855 000004.SZA 2013-04-20 20121231 2012 4 9.73633e+07
64731 000004.SZA 2014-04-22 20131231 2013 4 7.27846e+07
74783 000004.SZA 2015-04-30 20141231 2014 4 8.06088e+07
85592 000004.SZA 2016-04-30 20151231 2015 4 1.20454e+08
97033 000004.SZA 2017-04-11 20161231 2016 4 2.8767e+08

按因子数据(D.features)查看, 数据不完整:

In [2]:
df = D.features(instruments, start_date='2005-01-01', end_date='2018-02-02', 
           fields=['fs_publish_date_0', 'fs_quarter_year_0','fs_quarter_index_0','fs_gross_revenues_0'])

df = df.drop_duplicates(["instrument","fs_quarter_year_0","fs_quarter_index_0"],keep='first') 

df[(df.instrument=="000004.SZA")&(df.fs_quarter_index_0==4)]
Out[2]:
date instrument fs_quarter_index_0 fs_quarter_year_0 fs_publish_date_0 fs_gross_revenues_0
507 2005-04-19 000004.SZA 4.0 2004.0 0.0 106747752.0
314881 2006-04-21 000004.SZA 4.0 2005.0 1.0 61376724.0
602924 2007-04-27 000004.SZA 4.0 2006.0 0.0 47668640.0
1286639 2009-04-21 000004.SZA 4.0 2008.0 1.0 43314824.0
1661876 2010-04-19 000004.SZA 4.0 2009.0 2.0 60080600.0

按历史数据(D.history_data)查看, 数据不完整:

In [3]:
df = D.history_data(instruments, start_date='2005-01-01', end_date='2018-02-02', 
           fields=['fs_publish_date', 'fs_quarter', 'fs_quarter_year','fs_quarter_index','fs_gross_revenues'])

df = df.drop_duplicates(["instrument","fs_quarter_year","fs_quarter_index"],keep='first') 

df[(df.instrument=="000004.SZA")&(df.fs_quarter_index==4)]
Out[3]:
fs_quarter date fs_quarter_index instrument fs_gross_revenues fs_quarter_year fs_publish_date
94608 20041231 2005-04-19 4 000004.SZA 106747752.0 2004 2005-04-19
433990 20051231 2006-04-20 4 000004.SZA 61376724.0 2005 2006-04-20
787309 20061231 2007-04-27 4 000004.SZA 47668640.0 2006 2007-04-27
1549094 20081231 2009-04-20 4 000004.SZA 43314824.0 2008 2009-04-20
1963596 20091231 2010-04-19 4 000004.SZA 60080600.0 2009 2010-04-17
6257816 20161231 2017-04-11 4 000004.SZA 287670016.0 2016 2017-04-11
In [ ]:
 

(小Q) #2

获取完整的财务数据,建议使用financial_statement接口。
因为股票可能会在某段时间停牌,停牌期间的缺失数据我们没有进行填充处理。最近我们一直在重构我们的数据,希望能够提供更简单、快捷、高效的数据api,我们会考虑你的这个需求的。


(chaoskey) #3

明白了,谢谢.