M.user_feature_extractor¶
定义¶
-
M.user_feature_extractor.
v1
(self, instruments, start_date, end_date, history_data_fields, look_back_days, features_by_instrument=None, features_by_date=None)¶ 因子/特征计算:用于用户自定义的因子。features_by_instrument, features_by_date至少赋值一个。实际计算时,先计算features_by_instrument然后在计算features_by_date。
参数: - instruments (字符串数组) – 股票代码列表,见 股票代码
- start_date (字符串) – 开始日期,e.g. ‘2017-01-01’
- end_date (字符串) – 结束日期,e.g. ‘2017-02-01’
- history_data_fields (字符串数组) – 计算特征需要依赖的历史数据字段。见 历史数据
- look_back_days (整数) – 在star_date前多加载数据的天数,避免计算的因子前面的部分是nan。
- features_by_instrument (字典) – 字典格式,key是feature的名字,value是对应的函数。按instrument分组计算
- features_by_date (字典) – 字典格式,key是feature的名字,value是对应的函数。按date分组计算
返回: 特征数据
- .data: DataSource, 特征数据,包含基础特征和构成衍生特征的基础特征
返回类型: Outputs
示例代码¶
查看最新版本和定义¶
In [1]:
M.user_feature_extractor.m_latest_version
Out[1]:
M.user_feature_extractor.v1(instruments, start_date, end_date, history_data_fields, look_back_days, features_by_instrument=None, features_by_date=None)
自定义特征¶
In [42]:
#为了避免和因子库中的名称混淆,自定义因子名称请以'u_'为前缀
m1 = M.user_feature_extractor.v1(
instruments=['000001.SZA', '600519.SHA'], start_date='2017-01-01', end_date='2017-02-01',
history_data_fields=['close', 'open'], look_back_days=30,
features_by_instrument={
'u_ma5':lambda x:x.close.rolling(5).mean(),
'u_opening_gap':lambda x:x.open/x.shift(1).close,
},
features_by_date={
'u_rank_opening_gap':lambda x:x.u_opening_gap.rank(pct=True)
}
)
[2017-06-28 15:28:48.155197] INFO: bigquant: user_feature_extractor.v1 start ..
[2017-06-28 15:28:48.156951] INFO: bigquant: hit cache
[2017-06-28 15:28:48.157713] INFO: bigquant: user_feature_extractor.v1 end [0.00253s].
In [43]:
m1.data.read_df().head()
Out[43]:
close | amount | open | instrument | date | u_opening_gap | u_ma5 | u_rank_opening_gap | |
---|---|---|---|---|---|---|---|---|
42 | 959.585571 | 4.205952e+08 | 954.347656 | 000001.SZA | 2017-01-03 | 1.001099 | 952.881079 | 1.0 |
43 | 2343.583984 | 6.950053e+08 | 2341.622803 | 600519.SHA | 2017-01-03 | 1.000389 | 2305.897363 | 0.5 |
44 | 959.585571 | 4.115035e+08 | 958.538025 | 000001.SZA | 2017-01-04 | 0.998908 | 954.557202 | 0.5 |
45 | 2465.120361 | 2.245052e+09 | 2344.004395 | 600519.SHA | 2017-01-04 | 1.000179 | 2341.972949 | 1.0 |
46 | 960.633179 | 3.157697e+08 | 960.633179 | 000001.SZA | 2017-01-05 | 1.001092 | 956.861877 | 1.0 |
In [48]:
#如果要将因子输入到stockranker中,需要保证因子的数值不为负且是整数。
m2 = M.transform.v2(
data=m1.data,
# stockranker 默认的转换函数,主要是将特征映射到非负整数区间,因为stockranker要求输入特征数据为非负整数
transforms=T.get_stock_ranker_default_transforms()+
[
('u_opening_gap',lambda x:x*1000),
('u_rank_opening_gap',lambda x:x*1000),
('.*',None)
],
drop_null=True, # 缺失数据处理,如果某一行有空列,则删除
astype='int32', # 数据类型转换
except_columns=['date', 'instrument'], # 跳过的列,不需要处理
# clip最后的数据,保证输入落到如下区间
clip_lower=0, clip_upper=200000000)
[2017-06-28 15:32:25.157474] INFO: bigquant: transform.v2 start ..
[2017-06-28 15:32:25.207936] INFO: transform: transformed /y_2017, 36/36
[2017-06-28 15:32:25.211408] INFO: transform: transformed rows: 36/36
[2017-06-28 15:32:25.213367] INFO: bigquant: transform.v2 end [0.055912s].
In [50]:
m2.data.read_df().head()
Out[50]:
close | amount | open | instrument | date | u_opening_gap | u_ma5 | u_rank_opening_gap | |
---|---|---|---|---|---|---|---|---|
42 | 959 | 200000000 | 954 | 000001.SZA | 2017-01-03 | 1001 | 952 | 1000 |
43 | 2343 | 200000000 | 2341 | 600519.SHA | 2017-01-03 | 1000 | 2305 | 500 |
44 | 959 | 200000000 | 958 | 000001.SZA | 2017-01-04 | 998 | 954 | 500 |
45 | 2465 | 200000000 | 2344 | 600519.SHA | 2017-01-04 | 1000 | 2341 | 1000 |
46 | 960 | 200000000 | 960 | 000001.SZA | 2017-01-05 | 1001 | 956 | 1000 |