M.user_feature_extractor

定义

M.user_feature_extractor.v1(self, instruments, start_date, end_date, history_data_fields, look_back_days, features_by_instrument=None, features_by_date=None)

因子/特征计算:用于用户自定义的因子。features_by_instrument, features_by_date至少赋值一个。实际计算时,先计算features_by_instrument然后在计算features_by_date。

参数:
  • instruments (字符串数组) – 股票代码列表,见 股票代码
  • start_date (字符串) – 开始日期,e.g. ‘2017-01-01’
  • end_date (字符串) – 结束日期,e.g. ‘2017-02-01’
  • history_data_fields (字符串数组) – 计算特征需要依赖的历史数据字段。见 历史数据
  • look_back_days (整数) – 在star_date前多加载数据的天数,避免计算的因子前面的部分是nan。
  • features_by_instrument (字典) – 字典格式,key是feature的名字,value是对应的函数。按instrument分组计算
  • features_by_date (字典) – 字典格式,key是feature的名字,value是对应的函数。按date分组计算
返回:

特征数据

  • .data: DataSource, 特征数据,包含基础特征和构成衍生特征的基础特征

返回类型:

Outputs

示例代码

查看最新版本和定义
In [1]:
M.user_feature_extractor.m_latest_version
Out[1]:
M.user_feature_extractor.v1(instruments, start_date, end_date, history_data_fields, look_back_days, features_by_instrument=None, features_by_date=None)
自定义特征
In [42]:
#为了避免和因子库中的名称混淆,自定义因子名称请以'u_'为前缀
m1 = M.user_feature_extractor.v1(
    instruments=['000001.SZA', '600519.SHA'], start_date='2017-01-01', end_date='2017-02-01',
    history_data_fields=['close', 'open'], look_back_days=30,
    features_by_instrument={
        'u_ma5':lambda x:x.close.rolling(5).mean(),
        'u_opening_gap':lambda x:x.open/x.shift(1).close,
    },
    features_by_date={
        'u_rank_opening_gap':lambda x:x.u_opening_gap.rank(pct=True)
    }
)
[2017-06-28 15:28:48.155197] INFO: bigquant: user_feature_extractor.v1 start ..
[2017-06-28 15:28:48.156951] INFO: bigquant: hit cache
[2017-06-28 15:28:48.157713] INFO: bigquant: user_feature_extractor.v1 end [0.00253s].
In [43]:
m1.data.read_df().head()
Out[43]:
close amount open instrument date u_opening_gap u_ma5 u_rank_opening_gap
42 959.585571 4.205952e+08 954.347656 000001.SZA 2017-01-03 1.001099 952.881079 1.0
43 2343.583984 6.950053e+08 2341.622803 600519.SHA 2017-01-03 1.000389 2305.897363 0.5
44 959.585571 4.115035e+08 958.538025 000001.SZA 2017-01-04 0.998908 954.557202 0.5
45 2465.120361 2.245052e+09 2344.004395 600519.SHA 2017-01-04 1.000179 2341.972949 1.0
46 960.633179 3.157697e+08 960.633179 000001.SZA 2017-01-05 1.001092 956.861877 1.0
In [48]:
#如果要将因子输入到stockranker中,需要保证因子的数值不为负且是整数。
m2 = M.transform.v2(
    data=m1.data,
    # stockranker 默认的转换函数,主要是将特征映射到非负整数区间,因为stockranker要求输入特征数据为非负整数
    transforms=T.get_stock_ranker_default_transforms()+
    [
        ('u_opening_gap',lambda x:x*1000),
        ('u_rank_opening_gap',lambda x:x*1000),
        ('.*',None)
    ],
    drop_null=True, # 缺失数据处理,如果某一行有空列,则删除
    astype='int32', # 数据类型转换
    except_columns=['date', 'instrument'], # 跳过的列,不需要处理
    # clip最后的数据,保证输入落到如下区间
    clip_lower=0, clip_upper=200000000)
[2017-06-28 15:32:25.157474] INFO: bigquant: transform.v2 start ..
[2017-06-28 15:32:25.207936] INFO: transform: transformed /y_2017, 36/36
[2017-06-28 15:32:25.211408] INFO: transform: transformed rows: 36/36
[2017-06-28 15:32:25.213367] INFO: bigquant: transform.v2 end [0.055912s].
In [50]:
m2.data.read_df().head()
Out[50]:
close amount open instrument date u_opening_gap u_ma5 u_rank_opening_gap
42 959 200000000 954 000001.SZA 2017-01-03 1001 952 1000
43 2343 200000000 2341 600519.SHA 2017-01-03 1000 2305 500
44 959 200000000 958 000001.SZA 2017-01-04 998 954 500
45 2465 200000000 2344 600519.SHA 2017-01-04 1000 2341 1000
46 960 200000000 960 000001.SZA 2017-01-05 1001 956 1000