AI可视化模板策略的流程细化

由ypyu创建，最终由ypyu更新于2021-04-29 03:25 被浏览 6 用户

   很多使用平台的朋友都希望建立更加细化的AI可视化流程，下图是平台默认生成的AI可视化流程

image|602x398 细心的朋友可以发现，通过添砖加瓦进一步细化AI可视化流程，比方说加入代码过滤，加入自定义函数特征定义，加入特征的过滤等常用功能，如下图所示 image|602x387 下图标识了这些增加的模块功能 image|602x384 对应的模块代码也附上： 代码过滤 ：自定义python模块 代码例子：

    start_date=input_1.read_pickle()['start_date']
    end_date=input_1.read_pickle()['end_date']
    ins = input_1.read_pickle()['instruments']
    ins_filter=[name  for name in ins if name[0]=='3'] #抽取创业板代码
    df={'start_date':start_date,'end_date':end_date,'instruments':ins_filter}
    data_1 = DataSource.write_pickle(df)
    return Outputs(data_1=data_1, data_2=None, data_3=None)

自定义函数因子列表 ： 输入特征列表模块 代码例子：

cyc(close_0,open_0,turn_0) 
#表示生成一个叫cyc(close_0,open_0,turn_0)的特征，此特征计算需要用到close_0,open_0,turn_0的数据

自定义函数因子计算： 衍生特征抽取模块 代码例子:

def dma(X,A):
    result=pd.DataFrame(np.zeros((len(X),X.columns.size)),index=list(X.index),columns=X.columns)
    result.iloc[0]=X.iloc[0]*A.iloc[0]
    for i in range(1,len(X)):
        result.iloc[i]=A.iloc[i]*X.iloc[i]+(1-A.iloc[i])*result.iloc[i-1]        
    return result
#计算dma均线 即dma(X,A)[n]=A[n]*X[n]+(1-A[n])*dma(X,A)[n-1]
def cal_cyc(df,N):
    hsl=pd.pivot_table(df,values='turn_0',index=['date'],columns=['instrument'])/100
    if N>0:    
        AN=dma(N*hsl/(1+(N-1)*hsl),(1+(N-1)*hsl)/N)
    else:
        AN=hsl
    mclose=pd.pivot_table(df,values='close_0',index=['date'],columns=['instrument'])
    mopen=pd.pivot_table(df,values='open_0',index=['date'],columns=['instrument'])
    mid=(mclose+mopen)/2
    AN_1=AN.shift(1)
    AN_1.iloc[0]=0
    if N>0:
        CYCN=dma(mid*hsl/(AN-AN_1*(1-hsl)*4/5),1-(N-1)*AN_1*(1-hsl)/N/AN)
    else:
        CYCN=dma(mid*hsl/(AN-AN_1*(1-hsl)),1-AN_1*(1-hsl)/AN)
    cycn=CYCN.T.unstack().reset_index()
    cycn.columns=['date','instrument','cyc'+str(N)]
    df1=df.merge(cycn,on=['date','instrument']).copy()
    df1['close_0']=df1['cyc'+str(N)]
    return df1['close_0']
#定义CYC指标的计算函数，这里没有使用groupby和apply逐个股票计算，因为考虑到数据量大时的计算效率问题，直接采用了数据透视表的方法计算。需要注意的是最后return的必须是某个dataframe的'close_0'或'open_0'或'turn_0'，也就使用是cyc函数的参数对应的列名！
def cyc(df,close_0,open_0,turn_0):
    return cal_cyc(df,5)	#计算5日成本均线
#定义cyc函数(注意与自定义函数因子同名)
bigquant_run = {
    'cyc':  cyc
}

自定义函数因子+基础因子+表达式因子的融合作为因子输入列表：自定义python模块 此模块用于将基础因子、衍生因子与自定义函数的因子融合在一起作为机器学习的输入代码例子：

    df1=input_1.read_pickle()
    df2=input_2.read_pickle()
    df=df1+df2
    print(df)
    data_1 = DataSource.write_pickle(df)
    return Outputs(data_1=data_1, data_2=None, data_3=None)

最后有个常见问题就是自定义模块中write_pickle()和write_df()以及read_pickle()和read_df()的区别

如果传输的in_put或out_put数据是DataFrame格式就用read_df()和write_df()，其它格式比如证券代码输出的字典数据、因子列表输出的list等数据都需要用read_pickle()承接读入，而标注模块的输入需要上游模块的output数据是个write_pickle()输出的字典格式策略见：

https://bigquant.com/experimentshare/6225131b0df64ed0b3d432be8693093a