Mastering pandas for Finance

创建于 2025-04-30T14:03:53.258646+08:00 更新于 2025-05-19T18:36:17.266476+08:00

摘要

本报告全面系统地介绍了如何使用Python及其pandas库处理金融数据，涵盖了历史股价获取、时间序列分析、量化交易策略构建与回测、期权定价与风险管理、多因子投资组合优化等核心金融计算技术。通过丰富的代码示例和图表演示，深入剖析了量化策略逻辑、策略信号生成与绩效评估方法，特别利用Zipline平台实现算法交易模拟，并采用Mibian库完成Black-Scholes期权定价，全面展现了pandas在金融量化分析中的应用[page::13][page::14][page::185][page::223].

速读内容

pandas的核心数据结构是Series和DataFrame，Series为带索引的一维数组，DataFrame是多列Series的组合，支持数据对齐和灵活切片 [page::31][page::32][page::33].

- 提供了pandas环境的快速搭建指引，推荐使用Wakari.io的在线数据分析环境，讲解了依赖包matplotlib、Zipline、Quandl和Mibian的安装和升级过程 [page::6][page::14].

时间序列处理支持高效的DatetimeIndex和Period，通过频率转换、升采样和降采样，可以灵活地调整时序频度，支持滑动窗口计算及移动均值、方差等统计指标 [page::91][page::92][page::148].

- 深度挖掘历史股价数据，进行了收盘价、成交量的可视化，绘制了蜡烛图与量价关系图，通过计算日收益率和累积收益率反映投资绩效，利用直方图、Q-Q图、箱线图分析收益分布，对股票间的相关性进行了散点矩阵展示 [page::117][page::134][page::137][page::143][page::145].

构建了基于Google趋势数据的量化策略，复现了文献中基于搜索词“debt”对标普指数的交易信号生成方法，策略基于滚动均值判断信号方向，实现买卖决策，最终回测结果呈现策略明显优于随机策略 [page::164][page::168][page::179][page::182].

- 量化策略框架以Zipline库为核心，实现了买入苹果、双均线交叉和配对交易三大策略。详细说明了策略的信号逻辑、交易执行过程及绩效输出，结合可视化直观展现买卖点和资金变化 [page::185][page::199][page::211][page::213].

期权分析围绕Yahoo! Finance选取AAPL期权数据，展示了期权的定价、隐含波动率(smile和smirk形态)、期权的盈亏结构，通过Mibian库计算Black-Scholes模型价格及希腊值，揭示了期权风险敏感度 [page::223][page::233][page::239][page::258].

- 投资组合部分介绍了现代投资组合理论（MPT），通过计算资产组合收益率、方差协方差矩阵和夏普比率实现风险调整后的资产配置优化，利用SciPy优化工具生成高效前沿，并计算投资组合VaR风险度量 [page::263][page::270][page::273][page::277][page::283][page::284].

深度阅读

金融领域pandas应用详尽分析报告

---

1. 元数据与概览

报告标题：Mastering pandas for Finance
作者：Michael Heydt
出版机构：Packt Publishing Ltd.
出版日期：2015年5月
主题：Python数据分析库pandas在金融领域的应用，涵盖公司股票数据分析、算法交易、期权建模及投资组合管理等。

该报告核心目的是通过多个金融领域的实际数据案例，系统介绍pandas如何用于时间序列分析、交易策略构建、期权定价及现代投资组合分析。整书逐步带领读者从pandas基础、金融时间序列数据处理，直到复杂的算法交易和期权定价技巧，实现金融数据的建模与分析。无评分及目标价，此书为技术指导与工具书，旨在传授pandas在金融领域的精通技能。

---

2. 章节内容深度解读

2.1 关于pandas及环境建设（第1-4章）

第一章介绍了使用在线数据分析平台Wakari.io完成Python及pandas环境配置，包括更新Matplotlib、安装Zipline等必要库，确保后续案例环境一致。[page::19-29]

- 第二章介绍pandas核心数据结构Series和DataFrame，强调Series的索引对齐优势及DataFrame的二维对齐表示，演示了索引、重建索引、布尔索引及基于标签及位置的行列选择方法。[page::31-42]

第三章围绕数据重组，阐述如何使用concat完成多数据框拼接，merge实现数据库连接式表间合并，pivot、stack/unstack实现数据透视表变换，melt功能用于格式转化；重点演示了对历史股价数据的合并及整理，配合分组聚合概念。[page::63-72]

- 第四章关注时间序列，详细介绍DatetimeIndex及Period时间区间对象，时间频率转换、时间序列的位移（shift）和滑动窗口计算，时间序列重采样和插值，强调了时间序列数据的频率转换及如何生成不同采样频率的时间序列，解释了时间标签和数据对齐的差异，规范了对日期、时期的建模方式。[page::91-115]

2.2 历史股票数据分析（第5章）

介绍如何获取多支股票和标准普尔500指数的历史数据，数据来源Yahoo! Finance及Quandl。

- 构建调整收盘价的透视表，演示单股票及多股票价格和成交量的绘制，使用条形图展示成交量关系。

通过计算每日百分比变化、累计收益、滚动波动率等指标，绘制累计收益曲线揭示长短期收益趋势。

- 通过直方图、Q-Q图及箱线图对股票日收益率的分布进行统计分析，验证收益率近似正态分布但存在微小偏斜。

利用散点图（两两股票间日收益率折线）和矩阵图观察不同股票收益的关系和相关性差异，进行了MSFT与AAPL及DAL与UAL的相关性对比，必然领域内公司相关性较强。[page::117-144]

2.3 利用Google Trends数据进行交易策略构建（第6章）

复现了一篇运用Google搜索量预测股市变动的论文框架，核心思路是基于金融词汇（如“debt”）的搜索量变动，构建买入卖出信号，实现超市场平均收益策略。

- 重点说明如何用pandas读取论文中提供的数据、如何动态爬取Dow Jones数据（借助Quandl替代Yahoo!停供数据），Google Trends数据采集难点和处理方式。

计算滚动均线作为搜索量比较的基准，通过信号生成对投资组合每周调整头寸，并计算累积收益，复现策略产生约2.7倍收益的结论，但指出量化交易未考虑交易成本及前瞻性偏差等现实因素。[page::163-183]

2.4 算法交易及Zipline框架（第7章）

介绍基于pandas和Quantopian平台开源引擎Zipline的算法交易实现，涵盖基本买入策略（buy apple），双移动平均交叉策略和配对交易策略。

- 重点讲解了algorithmic trading的动量与均值回归策略，简易和指数加权的移动平均计算，对交叉点买卖信号的判定。

通过Zipline的初始化及handledata事件驱动模型实现交易。详细Demo剖析订单生成、资金波动及头寸管理流程。

- 模拟示例演示了基本买入策略如何随时间调整投资组合市值。
展示了双均线交叉策略买卖信号及其对投资组合表现的影响。

- 展示了基于PEP/KO的配对交易策略，通过计算价差和z-score量化偏离，做多低价卖空高价资产进行套利，结合交易日志和时序价格走势评估表现。[page::185-221]

2.5 期权数据处理与Black-Scholes定价（第8章）

详细介绍期权交易相关知识：定义、买卖双方类型、看涨和看跌期权特征与原理，风险收益关系。

- 利用pandas读入Yahoo! Finance获取的AAPL期权数据，整理期权类型、行权价、隐含波动率等基本属性，并展示示例数据。
介绍期权价格的隐含波动率及其“波动率微笑”和“微笑歪斜”现象，解释市场供需影响其分布的原理。

- 介绍期权盈亏（payoff）计算函数，分别给出了看涨和看跌期权的Payoff模型，给出了买卖双方盈亏的变动曲线及图示。
采用Mibian库简化了Black-Scholes期权价格模型与隐含波动率计算，通过可视化展示期权价格随剩余天数减小而贬值的特征。

- 说明了期权“希腊字母”（Delta, Gamma, Theta, Vega等）的含义、计算及其在风险管理中的意义，给出不同价格平面上希腊字母数值的动态曲线。[page::223-261]

2.6 投资组合与风险管理（第9章）

介绍了现代投资组合理论（MPT）及其基于历史收益率分布构建有效前沿的数学基础，阐明回报与风险的均衡关系。

- 利用pandas模拟并计算组合收益，解释了多样化对降低风险的作用，演示了负相关资产构建低方差投资组合的实例。
通过历史股价数据转换为对数收益率，年化收益和方差矩阵计算，建立投资组合收益和风险公式。

- 计算商誉比率（Sharpe Ratio）以判别单位风险下的收益优劣，并利用SciPy优化工具求解最优权重以最大化Sharpe Ratio，进而确定有效前沿。
通过画图展现有效前沿曲线（Markowitz连续体）和不同权重的风险回报组合。

- 介绍了风险量化指标Value at Risk (VaR)，通过统计学方法结合Z分数来估计特定置信水平下的潜在最大亏损，示例计算了AAPL 2014年的单日VaR。[page::263-287]

---

3. 图表数据深度解读

图表：AAPL、MSFT等多支股票历年收盘价及成交量折线图，展示金融时间序列的周期性起伏和整体趋势。[page::122,123,125]

- 滚动均线图：MSFT 2012年不同窗口大小的滚动均线对比原始收盘价时间序列，揭示均线平滑与延迟特性。[page::147-150]
收益分布直方图：单支和多支股票每日收益率直方图，呈现近似正态分布，暗示经典风险假设的适用范围及偏斜特征。[page::135,137]

- 概率Q-Q图：AAPL日收益率与理论正态分布的拟合直观显示高拟合度，说明日收益率接近正态，尾部略带偏斜。[page::139]
相关性矩阵散点图：多支股票间日收益率散点矩阵图，评估不同股票间的相关性，区分高度正相关与低相关资产对。[page::145]

- 波动率动态曲线：多支股票约75日滚动波动率变化曲线，显示股票间周期性波动风险差异，PEP最低，UAl最高。[page::152]
滚动相关性折线：AAPL和MSFT日收益率滚动相关系数，展示股票间相关性的时间演变，呈现相关性变化波动特征。[page::154]

- 期权隐含波动率曲线：2015年2月AAPL认购期权隐含波动率随行权价变化曲线，形状似“微笑”，中间临近标的价格处波动较低。[page::231]
期权价格与期限关系图：AAPL看涨期权价格随到期日倒数增长曲线，验证期权价值随时间衰减特性。[page::258]

- 期权希腊字母曲线：示意不同标的价格下的Delta、Gamma、Theta、Vega曲线形态，反映期权敏感性参数变化规律。[page::261]
有效前沿曲线：模拟三支股票的标准差与预期收益的关系曲线，描绘Markowitz有效投资组合最优状态。[page::283]

- VaR分布示意图：标准正态分布与置信区间示意，反映投资组合最大可能亏损的统计学基础。[page::284]
股票收益波动及组合图：演示组合中负相关股票的收益波动被显著削弱，体现回报分散降低风险效应。[page::271]

- 算法交易买卖点线图：双均线交叉策略中买入/卖出信号点标注于价格及组合价值曲线，直观呈现策略行为与其效果。[page::213]
配对交易z-score及价格走势线图：可视化价格差与标准分数关系及买卖时点，演示交易信号与投资组合价值的动态关系。[page::220]

---

4. 估值分析

期权定价部分重点采用了Black-Scholes模型，通过Mibian库实现。Black-Scholes提供了基于股价、执行价、剩余期限、利率和波动率的闭式欧式期权定价公式。报告详细罗列了定价主要参数：

$S$：标的资产当前价格

- $K$：执行价格
$T$：剩余到期时间（年化）

- $r$：无风险利率
$\sigma$：标的资产波动率

核心公式为：
$$ C = N(d1) S - N(d2) K e^{-rT} $$
$$ P = N(-d2) K e^{-rT} - N(-d1) S $$

其中$N(\cdot)$为标准正态分布累积函数，$d1,d_2$为时间加权与波动率调整的风险因子。作者没有详细推导，但给出Python实现思路并用Mibian实现定价、隐含波动率与希腊字母计算。

该模型为欧式期权定价基准。美式期权由于提前行权等复杂性，模型定价采用了二叉树模型，本文未详细涉诉。总体估值分析块与实际市场标价偏差数值计算，明确标的价差对期权价格及买卖双方盈亏的重要性。

---

5. 风险因素评估

报告风险点主要集中在：

历史数据的代表性：例如Google Trends数据采集限制、权威数据更新及标准普尔500指数历史数据源变更，可能导致分析样本的时效性、完整性不足。

- 策略假设与现实环境差异：诸如量化策略未计交易成本、滑点、资金限制和市场冲击，可能致使回测与实盘表现出入。

模型合理性和假设局限：Black-Scholes模型多项假设（无套利、常数利率、波动率等）在实际中可能不成立，导致期权估值偏差。

- 数据预处理与偏差：如时间序列对齐、收益率计算中的附近期覆盖不一致造成的先验偏差（look-ahead bias）等。

计算误差和参数敏感：优化算法局部最优风险；隐含波动率估计的敏感性；行权价格选择对盈亏计算影响。

- 交易仿真模型受限：交易执行流程简化，忽略具体市场机制及订单流管理，可能影响算法可信度。

报告虽然未针对风险提供明晰缓解方案，但通过多数据来源、价格合理性对比及多种算法简介，暗含了多重验证思路和谨慎的数据处理原则。

---

6. 审慎评价

报告整体技术细致，涵盖面广，实用案例丰富，易于读者上手金融量化分析框架。

- 个别示例中环境依赖性较强（如Wakari、旧版pandas、Yahoo!历史数据等），可能影响在现有环境下复现。

部分内容，如Black-Scholes数学推导未详尽，希腊字母计算未完全展开，针对金融数学深度不足，目的更偏工具应用。

- 对量化策略效果验证叙述较为乐观，忽略现实中交易成本、市场冲击等因素的影响，评估略显理想化。

某些代码示例中有语法问题或因版本变化失效（如.ix索引器已废弃），需结合最新pandas版本调整。

- 报告跨领域结合金融与Python工具，带来了较高的学习门槛，但作者通过实例与图示尽量降低难度。

---

7. 结论性综合

本报告系统展示了Python数据分析库pandas在金融领域的综合应用。涵盖了数据环境搭建、时间序列数据操作、历史及社交数据融合、常见股票统计技术分析、算法交易策略设计及回测、期权定价与风险的计算，以及现代投资组合理论的实现。

通过丰富的实际案例，展示了pandas强大的时间序列索引对齐能力、组群分组聚合、灵活的重塑数据结构Pivot/Stack等关键功能，对效率优化与计算准确性均有体现。算法交易章节结合Zipline框架，演示了现代交易系统的核心模型与设计方法。期权章节则借助Mibian库补充了数量金融复杂定价模型的实现与应用。

报告中多处图表有效阐释了金融数据的动态变化，如股票价格及期权价格趋势、波动率微笑、投资组合的有效边界以及量化策略的信号触发点，拥有良好的可视化指导意义。

虽然报告在数据时效性、部分程序版本兼容性及算法现实适应性方面存在局限，其高水准的技术详解及案例丰富度为有志于金融数据科学的技术人员提供了宝贵的学习与实践资源。

综上，报告立足于技术实现，成功搭建起金融分析与Python数据科学的桥梁，适合中高级技术读者及量化金融研究人员系统深造，具备明确的指导价值和实操参考意义。[page::1,31,63,117,163,185,223,263]

---

如需提取或重构报告中某章节的细节解析或对图表做更深层注释，可进一步提供指令。 initialize") if BuyApple.trace: print(context) if BuyApple.trace: print( $"<--"$ initialize")

def handledata(self, context):

if BuyApple.trace: print("---> handledata") if BuyApple.trace: print(context) self.order("AAPL", 1) if BuyApple.trace: print("<-- handledata")

Trading simulation starts with the call to the static .initialize() method. This is your opportunity to initialize the trading simulation. In this sample, we do not perform any initialization other than printing the context for examination.

The implementation of the actual trading is handled in the override of the handledata method. This method will be called for each day of the trading simulation. It is your opportunity to analyze the state of the simulation provided by the context and make any trading actions you desire. In this example, we will buy one share of AAPL regardless of how AAPL is performing.

The trading simulation can be started by instantiating an instance of BuyApple() and calling that object's .run method, thereby passing the base data for the simulation, which we will retrieve from Zipline's own method for accessing data from Yahoo! Finance:

In [12]: import zipline.utils.factory as zpf data $\mathbf{\sigma}=\mathbf{\sigma}$ zpf.loadfromyahoo(stocks $\mathbf{\lambda}=$ ['AAPL'],

indexes={},
start $\mathbf{\sigma}=$ datetime(1990, 1, 1),
end $\c=$ datetime(2014, 1, 1),
adjusted $\c=$ False)

data.plot(figsize $\mathbf{\lambda}=$ (12,8));

Our first simulation will purposely use only one week of historical data so that we can easily keep the output to a nominal size that will help us to easily examine the results of the simulation:

In [13]: result $\mathbf{\lambda}=\mathbf{\lambda}$ BuyApple().run(data['2000-01-03':'2000-01-07']) -> initialize BuyApple( capitalbase $\mathbf{\sigma}=$ 100000.0 simparams= SimulationParameters( periodstart=2006-01-01 00:00:00+00:00, periodend=2006-12-31 00:00:00+00:00,

capitalbase=100000.0, datafrequency $\mathbf{\bar{\Pi}}=\mathbf{\bar{\Pi}}$ daily, emissionrate $\mathbf{\sigma}=$ daily, firstopen=2006-01-03 14:31:00+00:00, lastclose $:=$ 2006-12-29 21:00:00+00:00), initialized $\c=$ False, slippage $\mathbf{\sigma}=$ VolumeShareSlippage( volumelimit ${.=0}$ .25, priceimpact=0.1), commission $\c=$ PerShare(cost $\mathtt{\Omega}=0$ .03, min trade cost $\mathbf{\sigma}=$ None), blotter $\mathbf{\sigma}=$ Blotter( transactpartial $\mathbf{\sigma}=\mathbf{\sigma}$ (VolumeShareSlippage( volumelimi $t=0.25$ , priceimpact $\mathbf{\sigma}{:=0.1}$ ), PerShare(cost $=0$ 0.03, min trade cost $\mathbf{\sigma}=$ None)), openorders $\mathbf{\sigma}=$ defaultdict(, {}), order $s=\{\}$ , neworders=[], currentdt=None), recordedvars={}) <--- initialize ---> handledata BarData({'AAPL': SIDData({'volume': 1000, 'sid': 'AAPL', 'sourceid': 'DataFrameSource-fc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-03 00:00:00+0000', tz $\mathbf{\sigma}=\mathbf{\sigma}$ 'UTC'), 'type': 4, 'price': 111.94})}) <-- handledata ---> handledata [2015-04-16 21:53] INFO: Performance: Simulated 5 trading days out of 5. [2015-04-16 21:53] INFO: Performance: first open: 2000-01-03 14:31:00+00:00 [2015-04-16 21:53] INFO: Performance: last close: 2000-01-07 $21:00:00+00:00$ BarData({'AAPL': SIDData({'price': 102.5, 'volume': 1000, 'sid': 'AAPL', 'sourceid': 'DataFrameSourcefc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-04

00:00:00+0000', tz $\mathbf{\lambda}=$ 'UTC'), 'type': 4})})

<-- handledata -> handledata
BarData({'AAPL': SIDData({'price': 104.0, 'volume': 1000, 'sid':
'AAPL', 'sourceid': 'DataFrameSource
fc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-05
00:00:00+0000', tz='UTC'), 'type': 4})})
<-- handledata -> handledata
BarData({'AAPL': SIDData({'price': 95.0, 'volume': 1000, 'sid':
'AAPL', 'sourceid': 'DataFrameSource
fc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-06
$00:00:00+0000:$ , t $\mathbf{z}={\mathrm{~}}$ 'UTC'), 'type': 4})})
<-- handledata -> handledata
BarData({'AAPL': SIDData({'price': 99.5, 'volume': 1000, 'sid':
'AAPL', 'sourceid': 'DataFrameSource
fc37c5097c557f0d46d6713256f4eaa3', 'dt': Timestamp('2000-01-07
$00:00:00+0000:$ ', tz $\mathbf{\lambda}=$ 'UTC'), 'type': 4})})
<-- handledata

The context in the initialize method shows us some parameters that the simulation will use during its execution. The context also shows that we start with a base capitalization of 100000.0. There will be a commission of $\$0.03$ assessed for each share purchased.

The context is also printed for each day of trading. The output shows us that Zipline passes the price data for each day of AAPL. We do not utilize this information in this simulation and blindly purchase one share of AAPL.

The result of the simulation is assigned to the result variable, which we can analyze for detailed results of the simulation on each day of trading. This is a DataFrame where each column represents a particular measurement during the simulation, and each row represents the values of those variables on each day of trading during the simulation.

We can examine a number of the variables to demonstrate what Zipline was doing during the processing. The orders variable contains a list of all orders made during the day. The following command gets the orders for the first day of the simulation:

In [14]: result.iloc[0].orders

Out[14]: [{'amount': 1,

'commission': None,
'created': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'),
'dt': Timestamp('2000-01-03 00:00:00+0000', t $\mathbf{z}=$ 'UTC'),
'filled': 0,
'id': 'dccb19f416104f259a7f0bff726136a2',
'limit': None,
'limitreached': False,
'sid': 'AAPL',
'status': 0,
'stop': None,
'stopreached': False}]

This tells us that Zipline placed an order in the market for one share of AAPL on 2000-01-03. The order filled the value 0, which means that this trade has not yet been executed in the market.

On the second day of trading, Zipline reports that two orders were made:

In [15]: result.iloc[1].orders

Out[15]: [{'amount': 1, 'commission': 0.03, 'created': Timestamp('2000-01-03 00:00:00+0000', t $\mathbf{z}=\mathbf{\cdot}$ UTC'), 'dt': Timestamp('2000-01-04 00:00:00+0000', t $\mathbf{z}={\boldsymbol{\cdot}}$ UTC'), 'filled': 1, 'id': 'dccb19f416104f259a7f0bff726136a2', 'limit': None, 'limitreached': False, 'sid': 'AAPL', 'status': 1, 'stop': None, 'stopreached': False}, {'amount': 1, 'commission': None, 'created': Timestamp('2000-01-04 00:00:00+0000', t $\mathbf{z}={\mathrm{~}}$ 'UTC'), 'dt': Timestamp('2000-01-04 00:00:00+0000', t $\pmb{z}=$ 'UTC'), 'filled': 0, 'id': '1ec23ea51fd7429fa97b9f29a66bf66a',

'limit': None,
'limitreached': False,
'sid': 'AAPL',
'status': 0,
'stop': None,
'stopreached': False}]

The first order listed has the same ID as the order from day one. This tells us that this represents that same order, and we can see this from the filled key, which is now 1 and from the fact that this order has been filled in the market.

The second order is a new order, which represents our request on the second day of trading, which will be reported as filled at the start of day two.

During the simulation, Zipline keeps track of the amount of cash we have (capital) at the start and end of the day. As we purchase stocks, our cash is reduced. Starting and ending cash is represented by the startingcash and endingcase variables of the result.

Zipline also accumulates the total value of the purchases of stock during the simulation. This value is represented in each trading period using the endingvalue variable of the result.

The following command shows us the running values for endingcash and endingvalue, along with endingvalue:

In [16]:

result[['startingcash', 'endingcash', 'endingvalue']]

Out[16]:

Ending cash represents the amount of cash (capital) that we have to invest at the end of the given day. We made an order on day one for one share of the apple, but since the transaction did not execute until the next day, we still have our starting seed at the end of the day. But on day two, this will execute at the value reported at the close of day one, which is 111.94. Hence, our endingcash is reduced by 111.94 for one share and also deducted is the $\$0.03$ for the commission resulting in 9987.47.

At the end of day two, our endingvalue, that is, our position in the market, is 102.5 as we have accumulated one share of AAPL, and it closed at 102.5 on day two.

We did not print startingcash and startingvalue as this will always be equal to our initial capitalization of 100000.0 and a portfolio value of 0.0 as we have not yet bought any securities.

While investing, we would be interested in the overall value of our portfolio, which, in this case, would be the value of our on-hand cash $^+$ our position in the market. This can be easily calculated:

In [17]: pvalue $\mathbf{\lambda}=\mathbf{\lambda}$ result.endingcash $^+$ result.endingvalue pvalue

Out[17]:

2000-01-03 21:00:00 100000.00000
2000-01-04 21:00:00 99999.96999
2000-01-05 21:00:00 100001.43998
2000-01-06 21:00:00 99983.40997
2000-01-07 21:00:00 99996.87996
dtype: float64

There is also a convenient shorthand to retrieve this result:

In [18]: result.portfoliovalue

Out[18]:

2000-01-03 21:00:00 100000.00000
2000-01-04 21:00:00 99999.96999
2000-01-05 21:00:00 100001.43998
2000-01-06 21:00:00 99983.40997
2000-01-07 21:00:00 99996.87996
Name: portfoliovalue, dtype: float64

In a similar vein, we can also calculate the daily returns on our investment using .pctchange():

In [19]: result.portfoliovalue.pctchange()

Out[19]:

2000-01-03 21:00:00 NaN
2000-01-04 21:00:00 -3.00103e-07
2000-01-05 21:00:00 1.46999e-05
2000-01-06 21:00:00 -1.80297e-04
2000-01-07 21:00:00 1.34722e-04
Name: portfoliovalue, dtype: float64

This is actually a column of the results from the simulation, so we do not need to actually calculate it:

In [20]: result['returns']

Out[20]: 2000-01-03 21:00:00 NaN 2000-01-04 21:00:00 -3.00103e-07 2000-01-05 21:00:00 1.46999e-05 2000-01-06 21:00:00 -1.80297e-04 2000-01-07 21:00:00 1.34722e-04 Name: portfoliovalue, dtype: float64

Using this small trading interval, we have seen what type of calculations Zipline performs during each period. Now, let's run this simulation over a longer period of time to see how it performs. The following command runs the simulation across the entire year 2000:

In [21]: resultfor2000 $\mathbf{\sigma}=\mathbf{\sigma}$ BuyApple().run(data['2000'])

Out[21]: [2015-02-15 05:05] INFO: Performance: Simulated 252 trading days out of 252. [2015-02-15 05:05] INFO: Performance: first open: 2000-01-03 $14:31:00+00:00$ [2015-02-15 05:05] INFO: Performance: last close: 2000-12-29 $21:00:00+00:00$

The following command shows us our cash on hand and the value of our investments throughout the simulation:

In [22]: resultfor2000[['endingcash', 'endingvalue']]

Out[22]:

[252 rows x 2 columns]

The following command visualizes our overall portfolio value during the year 2000:

[23]: resultfor2000.portfoliovalue.plot(figsize $\mathbf{\lambda}=$ (12,8));

Our strategy has lost us money over the year 2000. AAPL generally trended downward during the year, and simply buying every day is a losing strategy.

The following command runs the simulation over 5 years:

In [24]:

result $\mathbf{\lambda}=\mathbf{\lambda}$ BuyApple().run(data['2000':'2004']).portfoliovalue result.plot(figsize $\mathbf{\lambda}=$ (12,8)); [2015-04-16 22:52] INFO: Performance: Simulated 1256 trading days out of 1256. [2015-04-16 22:52] INFO: Performance: first open: 2000-01-03
14:31:00+00:00 [2015-04-16 22:52] INFO: Performance: last close: 2004-12-31
21:00:00+00:00

Hanging in with this strategy over several more years has paid off as AAPL had a marked upswing in value starting in mid-2013.

Algorithm – dual moving average crossover

We now analyze a dual moving average crossover strategy. This algorithm will buy apple once its short moving average crosses its long moving average. This will indicate upward momentum and a buy situation. It will then begin selling shares once the averages cross again, which will represent downward momentum.

We will load data for AAPL for 1990 through 2014, but we will only use the data from 1990 through 2001 in the simulation:

In [25]: subdata $\mathbf{\lambda}=\mathbf{\lambda}$ data['1990':'2002-01-01'] subdata.plot();

The following class implements a double moving average crossover where investments will be made whenever the short moving average moves across the long moving average. We will trade only at the cross, not continuously buying or selling until the next cross. If trending down, we will sell all of our stock. If trending up, we buy as many shares as possible up to 100. The strategy will record our buys and sells in extra data returned from the simulation:

In [26]:

class DualMovingAverage(zp.TradingAlgorithm):

def initialize(context):

we need to track two moving averages, so we will set # these up in the context the .addtransform method # informs Zipline to execute a transform on every day # of trading

the following will set up a MovingAverge transform,

named shortmavg, accessing the .price field of the

data, and a length of 100 days

context.addtransform(zp.transforms.MovingAverage, 'shortmavg', ['price'], windowlength $\mathtt{\Omega}=\mathtt{1}00$ )

and the following is a 400 day MovingAverage

context.addtransform(zp.transforms.MovingAverage, 'longmavg', ['price'], windowlength $\mathtt{\Gamma}=400$ )

this is a flag we will use to track the state of # whether or not we have made our first trade when the # means cross. We use it to identify the single event # and to prevent further action until the next cross context.invested $\mathbf{\lambda}=\mathbf{\lambda}$ False

def handledata(self, data):

access the results of the transforms shortmavg $\mathbf{\sigma}=\mathbf{\sigma}$ data['AAPL'].shortmavg['price'] longmavg $\mathbf{\sigma}=\mathbf{\sigma}$ data['AAPL'].longmavg['price']

these flags will record if we decided to buy or sell

buy $\mathbf{\sigma}=\mathbf{\sigma}$ False
sell $\mathbf{\sigma}=\mathbf{\sigma}$ False

check if we have crossed if shortmavg $>$ longmavg and not self.invested: # short moved across the long, trending up # buy up to 100 shares self.ordertarget('AAPL', 100) # this will prevent further investment until

the next cross self.invested $\mathbf{\sigma}=\mathbf{\sigma}$ True buy $\mathbf{\sigma}=\mathbf{\sigma}$ True # records that we did a buy elif shortmavg $\mathbf{\Psi}<$ longmavg and self.invested: # short move across the long, trending down # sell it all! self.ordertarget('AAPL', -100) # prevents further sales until the next cross self.invested $\mathbf{\sigma}=\mathbf{\sigma}$ False sell $\mathbf{\lambda}=\mathbf{\lambda}$ True # and note that we did sell

add extra data to the results of the simulation to

give the short and long ma on the interval, and if

we decided to buy or sell

self.record(shortmavg=shortmavg, longmavg $\mathbf{\sigma}=\mathbf{\sigma}$ longmavg, buy=buy, sell=sell)

We can now execute this algorithm by passing it data from 1990 through 2001, as shown here:

In [27]:

results $\mathbf{\lambda}=\mathbf{\lambda}$ DualMovingAverage().run(subdata) [2015-02-15 22:18] INFO: Performance: Simulated 3028 trading days out of 3028. [2015-02-15 22:18] INFO: Performance: first open: 1990-01-02 $14:31:00+00:00$ [2015-02-15 22:18] INFO: Performance: last close: 2001-12-31 $21:00:00+00:00$

To analyze the results of the simulation, we can use the following function that creates several charts that show the short/long means relative to price, the value of the portfolio, and the points at which we made buys and sells:

In [28]: def analyze(data, perf): fig $\mathbf{\sigma}=\mathbf{\sigma}$ plt.figure() ax1 $\mathbf{\sigma}=\mathbf{\sigma}$ fig.addsubplot(211, ylabel $\mathbf{\sigma}=\mathbf{\sigma}$ 'Price in \$') data['AAPL'].plot( $a x=a x1$ , color $\mathbf{\bar{\rho}}=\mathbf{\rho}$ 'r', $z{\mathbf{w}=2}$ .) perf[['shortmavg', 'longmavg']].plot( $\mathtt{a x}\mathtt{=a x}1$ , $1w=2$ .) ax1.plot(perf.ix[perf.buy].index, perf.shortmavg[perf.buy],

'^', markersize $\scriptstyle:=10$ , $\mathsf{c o l o r}\mathbf{=}^{\mathsf{\pm}}\mathbf{m}^{\mathsf{\pm}}\mathrm{~.~}$ ) ax1.plot(perf.ix[perf.sell].index, perf.shortmavg[perf.sell 'v', markersize ${\mathbf{\lambda}=\mathbf{1}0}$ , color='k') ax2 $\mathbf{\sigma}=\mathbf{\sigma}$ fig.addsubplot(212, ylabel $\mathbf{\lambda}=\mathbf{\lambda}^{\prime}$ Portfolio value in \$') perf.portfoliovalue.plot( $a x=a x2$ , $1w=2$ .) ax2.plot(perf.ix[perf.buy].index, perf.portfoliovalue[perf.buy], '^', markersize $\mathtt{\Omega}=\mathtt{10}$ , $\mathsf{c o l o r}\mathsf{\Gamma}=\mathsf{\Omega}^{\mathsf{\Gamma}}\mathsf{m}^{\mathsf{\Gamma}},$ ) ax2.plot(perf.ix[perf.sell].index, perf.portfoliovalue[perf.sell], 'v', markersize $\mathtt{\Omega}=\mathtt{10}$ , color='k') plt.legend( $\scriptstyle10c=0$ ) plt.gcf().setsizeinches(14, 10)

Using this function, we can plot the decisions made and the resulting portfolio value as trades are executed:

In [29]: analyze(subdata, results)

The crossover points are noted on the graphs using triangles. Upward-pointing red triangles identify buys and downward-pointing black triangles identify sells. Portfolio value stays level after a sell as we are completely divested from the market until we make another purchase.

Algorithm – pairs trade

To demonstrate a pairs trade algorithm, we will create one such algorithm and run data for Pepsi and Coca-Cola through the simulation. Since these two stocks are in the same market segment, their prices tend to follow each other based on common influences in the market.

If there is an increase in the delta between the two stocks, a trader can potentially make money by buying the stock that stayed the same and selling the increasing stock. The assumption is that the two stocks will revert to a common spread on the mean. Hence, if the stock that stayed normal increases to close the gap, then the buy will result in increased value. If the rising stock reverts, then the sell will create profit. If both happen, then even better.

To start with, we will need to gather data for Coke and Pepsi:

In [30]:

data $\mathbf{\sigma}=\mathbf{\sigma}$ zpf.loadfromyahoo(stocks $\mathbf{\sigma}=$ ['PEP', 'KO'], data.plot(figsize $\mathbf{\sigma}=$ (12,8));

PEP

KO

www.it-ebooks.info

Analyzing the chart, we can see that the two stocks tend to follow along the same trend line, but that there is a point where Coke takes a drop relative to Pepsi (August 1997 through December 1997). It then tends to follow the same path although with a wider spread during 1998 than in early 1997.

We can dive deeper into this information to see what we can do with pairs trading. In this algorithm, we will examine how the spread between the two stocks change. Therefore, we need to calculate the spread:

In [31]: data['PriceDelta'] $\mathbf{\sigma}=\mathbf{\sigma}$ data.PEP - data.KO data['1997':].PriceDelta.plot(figsize $\mathbf{\lambda}=$ (12,8)) plt.ylabel('Spread') plt.axhline(data.Spread.mean());

Using this information, we can make a decision to buy one stock and sell the other if the spread exceeds a particular size. In the algorithm we implement, we will normalize the spread data on a 100-day window and use that to calculate the $z\cdot$ -score on each particular day.

If the $z\cdot$ -score is $>2$ , then we will want to buy PEP and sell KO as the spread increases over our threshold with PEP taking the higher price. If the $z$ -score is $<-2,$ then we want to buy KO and sell PEP, as PEP takes the lower price as the spread increases. Additionally, if the absolute value of the z-score $<0.5$ , then we will sell off any holdings we have in either stock to limit our exposure as we consider the spread to be fairly stable and we can divest.

One calculation that we will need to perform during the simulation is calculating the regression of the two series prices. This will then be used to calculate the z-score of the spread at each interval. To do this, the following function is created:

In [32]: @zp.transforms.batchtransform def olstransform(data, ticker1, ticker2): $\mathtt{p0}\ =$ data.price[ticker1] $\mathtt{p1}\mathtt{\mathtt{=}}$ sm.addconstant(data.price[ticker2], prepend=True) slope, intercept $\mathbf{\lambda}=\mathbf{\lambda}$ sm.OLS(p0, p1).fit().params return slope, intercept

You may wonder what the $@$ zp.transforms.batchtransform code does. At each iteration of the simulation, Zipline will only give us the data representing the current price. Passing the data from handledata to this function would only pass the current day's data. This decorator will tell Zipline to pass all of the historical data instead of the current day's data. This makes this very simple as, otherwise, we would need to manage multiple windows of data manually in our code.

The actual algorithm is then implemented using a 100-day window where we will execute on the spread when the $z\cdot$ -score is $>2.0$ or $<-2.0$ . If the absolute value of the $z$ -score is $<0.5$ , then we will empty our position in the market to limit exposure:

In [33]:

class Pairtrade(zp.TradingAlgorithm): def initialize(self, windowlength $\mathtt{\Omega}=\mathtt{100}$ ): self.spreads ${\bf\varepsilon}=[]{\bf\varepsilon}$ self.invested=False self.windowlength $\c=$ windowlength self.olstransform $\mathbf{\sigma}{\cdot}=\mathbf{\sigma}{\cdot}$ \ olstransform(refreshperiod $\c=$ self.windowlength, windowlength $\c=$ self.windowlength)

def handledata(self, data):

calculate the regression, will be None until 100 samples params $\mathbf{\lambda}=$ self.olstransform.handledata(data, 'PEP', 'KO') if params:

intercept, slope $\mathbf{\lambda}=$ params
zscore $\mathbf{\lambda}=$ self.computezscore(data, slope, intercept)
self.record(zscore $\mathbf{\lambda}=$ zscore)
self.placeorders(data, zscore)
ef computezscore(self, data, slope, intercept):

calculate the spread spread $\c=$ (data['PEP'].price-(slope\data['KO'].price+ intercept)) self.spreads.append(spread) # record for z-score calc self.record(spread $\mathbf{\lambda}=$ spread) spreadwind $\c=$ self.spreads[-self.windowlength:] zscore $\mathbf{\lambda}=$ (spread - np.mean(spreadwind))/np.std(spreadwin return zscore

def placeorders(self, data, zscore): if zscore> $^{\cdot=2}$ .0 and not self.invested:

buy the spread, buying PEP and selling KO self.order('PEP', int(100/data['PEP'].price)) self.order('KO', -int(100/data['KO'].price)) self.invested $\c=$ True self.record(action $\mathbf{\sigma}=$ "PK") elif zscore $<=-2.0$ and not self.invested: # buy the spread, buying KO and selling PEP self.order('PEP', -int(100 / data['PEP'].price)) self.order('KO', int(100 / data['KO'].price)) self.invested $\mathbf{\sigma}=\mathbf{\sigma}$ True self.record(action $.=\boldsymbol{\cdot}\boldsymbol{\mathtt{K P}}^{\boldsymbol{\cdot}}$ ) elif abs(zscore)<.5 and self.invested: # minimize exposure koamount $\mathbf{\sigma}=$ self.portfolio.positions['KO'].amount self.order('KO', -1\koamount) pepamount $\mathbf{\sigma}=$ self.portfolio.positions['PEP'].amount self.order('PEP', -1\pepamount) self.invested $\c=$ False self.record(action $\mathbf{\sigma}=\mathbf{\sigma}$ 'DE') else: # take no action self.record(action $\mathbf{\sigma}=$ 'noop')

Then, we can run the algorithm with the following command:

In [34]:

perf $\mathbf{\lambda}=\mathbf{\lambda}$ Pairtrade().run(data['1997':]) [2015-02-16 01:54] INFO: Performance: Simulated 356 trading days out of 356. [2015-02-16 01:54] INFO: Performance: first open: 1997-01-02 $14:31:00+00:00$ [2015-02-16 01:54] INFO: Performance: last close: 1998-06-01 $20:00:00+00:00$

During the simulation of the algorithm, we recorded any transactions made, which can be accessed using the action column of the result DataFrame:

In [35]: selection $\mathbf{\sigma}=\mathbf{\sigma}$ ((perf.action $\mathbf{=}=$ 'PK') | (perf.action=='KP') (perf.action $\mathbf{=}=$ 'DE')) actions $\mathbf{\sigma}=\mathbf{\sigma}$ perf[selection][['action']] actions

Out[35]:
1997-07-16 20:00:00 KP
1997-07-22 20:00:00 DE
1997-08-05 20:00:00 PK
1997-10-15 20:00:00 DE
1998-03-09 21:00:00 PK
1998-04-28 20:00:00 DE

Our algorithm made six transactions. We can examine these transactions by visualizing the prices, spreads, z-scores, and portfolio values relative to when we made transactions (represented by vertical lines):

In [36]: ax1 $\mathbf{\lambda}=\mathbf{\lambda}$ plt.subplot(411) data[['PEP', 'KO']].plot( $a x=a x1$ ) plt.ylabel('Price') data.Spread.plot( $a x=a x2$ ) plt.ylabel('Spread') ax3 $\mathbf{\sigma}=\mathbf{\sigma}$ plt.subplot(413) perf['1997':].zscore.plot()

ax3.axhline(2, $\mathtt{c o l o r}\mathtt{=}^{\mathtt{i}}\mathtt{k}^{\mathtt{-}}\mathtt{^{|}}$ )
ax3.axhline(-2, color='k')
plt.ylabel('Z-score')
ax4 $\mathbf{\lambda}=\mathbf{\lambda}$ plt.subplot(414)
perf['1997':].portfoliovalue.plot()
plt.ylabel('Protfolio Value')
for ax in [ax1, ax2, ax3, ax4]: for d in actions.index[actions.action $\underline{{\boldsymbol{\mathbf{\Pi}}}}=\underline{{\boldsymbol{\mathbf{\delta\pi}}}}$ 'PK']: ax.axvline(d, $\mathsf{c o l o r}="\mathfrak{g}".$ ) for d in actions.index[actions.action $\mathbf{=}\mathbf{=}$ 'KP']: ax.axvline(d, $\mathsf{c o l o r}\mathsf{\mathbf{\Psi}="}\mathsf{c}^{\textsf{\textsf{\textsf{\textsf{\textsf{\alpha}}}}}})$ for d in actions.index[actions.action $\mathbf{=}=$ 'DE']: ax.axvline(d, color='r')

plt.gcf().setsizeinches(16, 12)

The first event is on 1997-7-16 when the algorithm saw the spread become less than -2, and, therefore, triggered a sale of KO and a buy of PEP. This quickly turned around and moved to a z-score of 0.19 on 1997-7-22, triggering a divesting of our position. During this time, even though we played the spread, we still lost because a reversion happened very quickly.

On 1997-08-05, the z-score moved above 2.0 to 2.12985 and triggered a purchase of KO and a sale of PEP. The z-score stayed around 2.0 until 1997-10-15 when it dropped to -0.1482 and, therefore, we divested. Between those two dates, since the spread stayed fairly consistent around 2.0, our playing of the spread made us consistent returns as we can see with the portfolio value increasing steadily over that period.

On 1998-03-09, a similar trend was identified, and again, we bought KO and sold PEP.
Unfortunately the spread started to minimize and we lost a little during this period.

Summary

In this chapter, we took an adventure into learning the fundamentals of algorithmic trading using pandas and Zipline. We started with a little theory to set a framework for understanding how the algorithms would be implemented. From there, we implemented three different trading algorithms using Zipline and dived into the decisions made and their impact on the portfolios as the transactions were executed. Finally, we established a fundamental knowledge of how to simulate markets and make automated trading decisions.

8 Working with Options

In this chapter, we will examine working with options data provided by Yahoo! Finance using pandas. Options are a type of financial derivative and can be very complicated to price and use in investment portfolios. Because of their level of complexity, there have been many books written that are focus heavily on the mathematics of options. Our goal will not be to cover the mathematics in detail but to focus on understanding several core concepts in options, retrieving options data from the Internet, manipulating it using pandas, including determining their value, and being able to check the validity of the prices offered in the market.

In this chapter, we will specifically cover:

A brief introduction to options
Retrieving options data from Yahoo! Finance
Examining the attributes of an option
Implied volatility, including smiles and smirks
Calculating the payoff of options
Determining the profit and loss of options
The pricing of options using Black-Scholes
Using Mibian to price and determine the implied volatility of options
with Black-Scholes
An introduction to the Greeks
Examining the behavior of the Greeks

Introducing options

An option is a contract that gives the buyer the right, but not the obligation, to buy or sell an underlying security at a specific price on or before a certain date. Options are considered derivatives as their price is derived from one or more underlying securities. Options involve two parties: the buyer and the seller. The parties buy and sell the option, not the underlying security.

There are two general types of options: the call and the put. Let's look at them in detail:

Call: This gives the holder of the option the right to buy an underlying security at a certain price within a specific period of time. They are similar to having a long position on a stock. The buyer of a call is hoping that the value of the underlying security will increase substantially before the expiration of the option and, therefore, they can buy the security at a discount from the future value.
Put: This gives the option holder the right to sell an underlying security at a certain price within a specific period of time. A put is similar to having a short position on a stock. The buyer of a put is betting that the price of the underlying security will fall before the expiration of the option and they will, thereby, be able to gain a profit by benefitting from receiving the payment in excess of the future market value.

The basic idea is that one side of the party believes that the underlying security will increase in value and the other believes it will decrease. They will agree upon a price known as the strike price, where they place their bet on whether the price of the underlying security finishes above or below this strike price on the expiration date of the option.

Through the contract of the option, the option seller agrees to give the buyer the underlying security on the expiry of the option if the price is above the strike price (for a call).

The price of the option is referred to as the premium. This is the amount the buyer will pay the seller to receive the option. The price of an option depends upon many factors, of which the following are the primary factors:

The current price of the underlying security
How long the option needs to be held before it expires (the expiry date)
The strike price on the expiry date of the option
The interest rate of capital in the market
The volatility of the underlying security
There being an adequate interest between buyer and seller around the
given option

The premium is often established so that the buyer can speculate on the future value of the underlying security and be able to gain rights to the underlying security in the future at a discount in the present.

The holder of the option, known as the buyer, is not obliged to exercise the option on its expiration date, but the writer, also referred to as the seller, however, is obliged to buy or sell the instrument if the option is exercised.

Options can provide a variety of benefits such as the ability to limit risk and the advantage of providing leverage. They are often used to diversify an investment portfolio to lower risk during times of rising or falling markets.

There are four types of participants in an options market:

Buyers of calls Sellers of calls Buyers of puts Sellers of puts

Buyers of calls believe that the underlying security will exceed a certain level and are not only willing to pay a certain amount to see whether that happens, but also lose their entire premium if it does not. Their goal is that the resulting payout of the option exceeds their initial premium and they, therefore, make a profit. However, they are willing to forgo their premium in its entirety if it does not clear the strike price. This then becomes a game of managing the risk of the profit versus the fixed potential loss.

Sellers of calls are on the other side of buyers. They believe the price will drop and that the amount they receive in payment for the premium will exceed any loss in the price. Normally, the seller of a call would already own the stock. They do not believe the price will exceed the strike price and that they will be able to keep the underlying security and profit if the underlying security stays below the strike price by an amount that does not exceed the received premium. Loss is potentially unbounded as the stock increases in price above the strike price, but that is the risk for an upfront receipt of cash and potential gains on the loss of price in the underlying instrument.

A buyer of a put is betting that the price of the stock will drop beyond a certain level. By buying a put they gain the option to force someone to buy the underlying instrument at a fixed price. By doing this, they are betting that they can force the sale of the underlying instrument at a strike price that is higher than the market price and in excess of the premium that they pay to the seller of the put option.

On the other hand, the seller of the put is betting that they can make an offer on an instrument that is perceived to lose value in the future. They will offer the option for a price that gives them cash upfront, and they plan that at maturity of the option, they will not be forced to purchase the underlying instrument. Therefore, it keeps the premium as pure profit. Or, the price of the underlying instruments drops only a small amount so that the price of buying the underlying instrument relative to its market price does not exceed the premium that they received.

Notebook setup

The examples in this chapter will be based on the following configuration in IPython:

In [1]:

import pandas as pd
import numpy as np
import pandas.io.data as web
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
pd.setoption('display.notebookreprhtml', False)
pd.setoption('display.maxcolumns', 7)
pd.setoption('display.maxrows', 15)
pd.setoption('display.width', 82)
pd.setoption('precision', 3)

Options data from Yahoo! Finance

Options data can be obtained from several sources. Publicly listed options are exchanged on the Chicago Board Options Exchange (CBOE) and can be obtained from their website. Through the DataReader class, pandas also provides built-in (although in the documentation, this is referred to as experimental) access to options data.

The following command reads all currently available options data for AAPL:

In [2]:

aaploptions $\mathbf{\lambda}=\mathbf{\lambda}$ web.Options('AAPL', 'yahoo') aaploptions $\mathbf{\lambda}=\mathbf{\lambda}$ aaploptions.getalldata().resetindex()

This operation can take a while as it downloads quite a bit of data. Fortunately, it is cached so that subsequent calls will be quicker, and there are other calls to limit the types of data downloaded (such as just getting puts).

For convenience, the following command will save this data to a file for quick reload at a later time. Also, it helps with the repeatability of the examples. The data retrieved changes very frequently, so the actual examples in the book will use the data in the file provided with the book. It saves the data for later use (it's commented out for now so it does not overwrite the existing file). Here's the command we are talking about:

In [3]:

#aaploptions.tocsv('aaploptions.csv')

This data file can be reloaded with the following command:

In [4]: aaploptions $\mathbf{\lambda}=\mathbf{\lambda}$ pd.readcsv('aaploptions.csv', parsedates $\mathbf{\lambda}=$ ['Expiry'])

I highly recommend that you use the data file for the purposes of going along with the chapter as options data changes very frequently and loading directly from the Web will make the results you get completely different from those in the chapter.

Whether from the Web or the file, the following command restructures and tidies the data into a format best used in the examples that follow:

In [5]:

aos $\mathbf{\lambda}=\mathbf{\lambda}$ aaploptions.sort(['Expiry', 'Strike'])[ ['Expiry', 'Strike', 'Type', 'IV', 'Bid', 'Ask', 'UnderlyingPrice']]
aos['IV'] $\mathbf{\sigma}=\mathbf{\sigma}$ aos['IV'].apply(lambda x: float(x.strip('%')))

Now, we can take a look at the data retrieved:

In [6]: aos

Out[6]:

www.it-ebooks.info

There are 1,103 rows of options data available. The data is sorted by Expiry and then the Strike price to help demonstrate examples.

Expiry is the data at which the particular option will expire and potentially be exercised. We have the following expiry dates that were retrieved. Options typically are offered by an exchange on a monthly basis and within a short overall duration from several days to perhaps two years. In this dataset, we have the following expiry dates:

In [7]: aos['Expiry'].unique()

Out[7]:

array(['2015-02-26T17:00:00.000000000-0700', '2015-03-05T17:00:00.000000000-0700', '2015-03-12T18:00:00.000000000-0600', '2015-03-19T18:00:00.000000000-0600', '2015-03-26T18:00:00.000000000-0600', '2015-04-01T18:00:00.000000000-0600', '2015-04-16T18:00:00.000000000-0600', '2015-05-14T18:00:00.000000000-0600', '2015-07-16T18:00:00.000000000-0600', '2015-10-15T18:00:00.000000000-0600', '2016-01-14T17:00:00.000000000-0700', '2017-01-19T17:00:00.000000000-0700'],
dtype $\mathbf{\sigma}=$ 'datetime64[ns]')

For each option's expiration date, there are multiple options available, split between puts and calls, and with different strike values, prices, and associated risk values.

As an example, the option with the index 158 that expires on 2015-02-27 is for buying a call on AAPL with a strike price of $\$75$ . The price we would pay for each share of AAPL would be the bid price of $\$53.60$ . Options typically sell 100 units of the underlying security, and, therefore, this would mean that this option would cost $100\times\$53.60$ or $\$5,360$ upfront:

In [8]: aos.loc[158]

Out[8]:

This $\$5,360$ does not buy us 100 shares of AAPL. It gives us the right to buy 100 shares of AAPL on $2015-02-27$ at $\$75$ per share. We should only buy if the price of AAPL is above $\$75$ on 2015-02-27. If not, we will have lost our premium of $\$5360$ and purchasing below will only increase our loss.

Also, note that these quotes were retrieved on 2015-02-25. This specific option has only two days until it expires. That has a huge effect on its pricing. We will examine the payout on options in detail in the next section, but in short, we can derive the following points from this purchase:

We have paid $\$5,360$ for the option to buy 100 shares of AAPL on 2015-02-27 if the price of AAPL is above $\$75$ on that date. The price of AAPL when the option was priced was $\$128.79$ per share. If we were to buy 100 shares of AAPL now, we would have paid $\$12,879$ . If AAPL is above $\$75$ on $2015-02-27$ , we can buy 100 shares for $\$7500$ .

There is not a lot of time between the quote and Expiry of this option. With AAPL being at $\$128.79$ , it is very likely that the price will be above $\$75$ in two days' time.

Therefore, in two days' time:

We can walk away if the price is $\$75$ or above. Since we paid $\$5360$ , we probably wouldn't want to do that.
At $\$75$ or above, we can force the execution of the option, where we give the seller $\$7,500$ and receive 100 shares of AAPL. If the price of AAPL is still $\$128.79$ per share, then we will have bought $\$12,879$ of AAPL for $\$7,500+95,360$ , or $\$12,860$ in total. Technically, we will have saved $\$19$ over two days! But only if the price didn't drop.
If, for some reason, AAPL dropped below $\$75$ in two days, we kept our loss to our premium of $\$5,360$ . This is not great, but if we had bought $\$12,879$ of AAPL on 2015-02-5 and it dropped to $\$74.99$ on 2015-02-27, we would have lost $\$12,879-\$499,499,$ or $\$5,380.50,$ we actually would have saved $\$20$ in loss by buying the call option.

It is interesting how this math works out. Excluding transaction fees, options are a zero-loss game. It just comes down to how much risk is involved in the option versus your upfront premium and how the market moves. If you feel you know something, it can be quite profitable. Of course, it can also be devastatingly unprofitable.

We will not examine the put side of this example. It would suffice to say it works out similarly from the side of the seller.

Implied volatility

There is one more field in our dataset that we didn't look at—implied volatility (IV). We won't get into the details of the mathematics of how this is calculated, but this reflects the amount of volatility that the market has factored into the option.

This is different to historical volatility (which is typically the standard deviation of the previous year of returns). We will look at pricing the option in a later section, but this comes out of pricing models as the amount of volatility needed for the strike price/premium value over the duration of the option contract to make those numbers work out nicely, as we have previously shown.

In general, it is informative to examine the IV relative to the strike price on a particular Expiry date. The following command shows this in tabular form for calls on 2015-02-27:

In [9]: calls1 $\mathbf{\lambda}=\mathbf{\lambda}$ aos[(aos.Expiry $\mathbf{\sigma}=\mathbf{\sigma}$ '2015-02-27') & (aos.Type $\mathbf{\sigma}=\mathbf{\sigma}$ 'call')] calls1[:5]

Out[9]:

It appears that as the strike price approaches the underlying price, the implied volatility decreases. Plotting this shows it even more clearly:

In [10]:

ax = aos[(aos.Expiry $\mathrel{\mathop=}\infty$ '2015-02-27') & (aos.Type $\mathbf{=}\mathbf{=}$ 'call')] \ .setindex('Strike')[['IV']].plot(figsize $\mathbf{\lambda}=$ (12,8)) ax.axvline(calls1.UnderlyingPrice.iloc[0], color='g');

The shape of this curve is important as it defines points where options are considered to be either in or out of the money. A call option is referred to as in the money when the options strike price is below the market price of the underlying instrument. A put option is in the money when the strike price is above the market price of the underlying instrument. Being in the money does not mean that you will profit; it simply means that the option is worth exercising.

Where and when an option is in our out of the money can be visualized by examining the shape of its implied volatility curve. Because of this curved shape, it is generally referred to as a volatility smile as both ends tend to turn upwards at both ends, particularly, if the curve has a uniform shape around its lowest point. This is demonstrated in the following graph, which shows the nature of being in or out of the money for both puts and calls:

A skew on the smile demonstrates a relative demand that is greater toward the option being either in or out of the money. When this occurs, the skew is often referred to as a smirk.

Volatility smirks

Smirks can either be reverse or forward. The following graph demonstrates a reverse skew, similar to what we have seen with our AAPL 2015-02-27 call:

In a reverse-skew smirk, the volatility for options at lower strikes is higher than at higher strikes. This is the case with our AAPL options expiring on 2015-02-27. This means that the in-the-money calls and out-of-the-money puts are more expensive than the out-of-the-money calls and in-the-money puts.

A popular explanation for the manifestation of the reverse volatility skew is that investors are generally worried about market crashes and buy puts for protection. One piece of evidence supporting this argument is the fact that the reverse skew did not show up for equity options until after the crash of 1987.

Another possible explanation is that in-the-money calls have become popular alternatives to outright stock purchases as they offer leverage and, hence, increased ROI. This leads to greater demand for in-the-money calls and, therefore, increased IV at the lower strikes.

The other variant of the volatility smirk is the forward skew. In the forward-skew pattern, the IV for options at the lower strikes is lower than the IV at higher strikes. This suggests that the out-of-the-money calls and in-the-money puts are in greater demand compared to the in-the-money calls and out-of-the-money puts:

The forward-skew pattern is common for options in the commodities market. When supply is tight, businesses would rather pay more to secure supply than to risk supply disruption, for example, if weather reports indicate a heightened possibility of an impending frost, fear of supply disruption will cause businesses to drive up demand for out-of-the-money calls for the affected crops.

Calculating payoff on options

The payoff of an option is a relatively straightforward calculation based upon the type of the option and is derived from the price of the underlying security on expiry relative to the strike price. The formula for the call option payoff is as follows:

$$
P a y o f f(c a l l)=M a x(S{T}-X,0)
$$

The formula for the put option payoff is as follows:

$$
P a y o f f(p u t)=M a x(X-S{T},0)
$$

We will model both of these functions and visualize their payouts.

The call option payoff calculation

An option gives the buyer of the option the right to buy (a call option) or sell (a put option) an underlying security at a point in the future and at a predetermined price. A call option is basically a bet on whether or not the price of the underlying instrument will exceed the strike price. Your bet is the price of the option (the premium). On the expiry date of a call, the value of the option is 0 if the strike price has not been exceeded. If it has been exceeded, its value is the market value of the underlying security.

The general value of a call option can be calculated with the following function:

In [11]:

def callpayoff(priceatmaturity, strikeprice): return max(0, priceatmaturity - strikeprice)

When the price of the underlying instrument is below the strike price, the value is 0 (out of the money). This can be seen here:

In [12]: callpayoff(25, 30)

Out[12]:

0

When it is above the strike price (in the money), it will be the difference between the price and the strike price:

In [13]:

callpayoff(35, 30)

Out[13]:

5

The following function returns a DataFrame object that calculates the return for an option over a range of maturity prices. It uses np.vectorize() to efficiently apply the callpayoff() function to each item in the specific column of the DataFrame:

In [14]:

def callpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step $\mathbf{\tau}=\mathbf{1}$ ): maturities $\mathbf{\sigma}=\mathbf{\sigma}$ np.arange(minmaturityprice, maxmaturityprice $^+$ step, step) payoffs $\mathbf{\lambda}=\mathbf{\lambda}$ np.vectorize(callpayoff)(maturities, strikeprice) df $\mathbf{\sigma}=\mathbf{\sigma}$ pd.DataFrame({'Strike': strikeprice, 'Payoff': payoffs}, index $\v=$ maturities) df.index.name $\mathbf{\lambda}=\mathbf{\lambda}$ 'Maturity Price'
return df

The following command demonstrates the use of this function to calculate the payoff of an underlying security at finishing prices ranging from 10 to 25 and with a strike price of 15:

In [15]: callpayoffs(10, 25, 15)

Out[15]:

Using this result, we can visualize the payoffs using the following function:

In [16]:

def plotcallpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step $\mathbf{\tau}=\mathbf{1}$ ): payoffs $\mathbf{\sigma}=\mathbf{\sigma}$ callpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step) plt.ylim(payoffs.Payoff.min() - 10, payoffs.Payoff. $\tt m a x()+\tt10)$ plt.ylabel("Payoff") plt.xlabel("Maturity Price") plt.title('Payoff of call option, Strike={0}' .format(strikeprice)) plt.xlim(minmaturityprice, maxmaturityprice) plt.plot(payoffs.index, payoffs.Payoff.values);

The payoffs are visualized as follows:

In [17]: plotcallpayoffs(10, 25, 15)

The put option payoff calculation

The value of a put option can be calculated with the following function:

In [18]: def putpayoff(priceatmaturity, strikeprice): return max(0, strikeprice - priceatmaturity)

While the price of the underlying is below the strike price, the value is 0:

In [19]: putpayoff(25, 20)

Out[19]:

0

When the price is below the strike price, the value of the option is the difference between the strike price and the price:

In [20]: putpayoff(15, 20)

Out[20]:

5

This payoff for a series of prices can be calculated with the following function:

In [21]:

def putpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step $\mathbf{\tau}=\mathbf{1}$ ): maturities $\mathbf{\sigma}=\mathbf{\sigma}$ np.arange(minmaturityprice, maxmaturityprice $^+$ step, step) payoffs $\mathbf{\lambda}=\mathbf{\lambda}$ np.vectorize(putpayoff)(maturities, strikeprice) df $\mathbf{\sigma}=\mathbf{\sigma}$ pd.DataFrame({'Payoff': payoffs, 'Strike': strikeprice}, index $\c=$ maturities) df.index.name $\mathbf{\lambda}=\mathbf{\lambda}$ 'Maturity Price' return df

The following command demonstrates the values of the put payoffs for prices of 10 through 25 with a strike price of 25:

In [22]: putpayoffs(10, 25, 15)

Out[22]:

Payoff Strike Maturity Price 10 5 15 11 4 15 12 3 15 13 2 15 14 1 15 21 0 15 22 0 15 23 0 15 24 0 15 25 0 15 [16 rows x 2 columns]

The following function will generate a graph of payoffs:

In [23]:

f plotputpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step $\mathbf{\tau}=\mathbf{1}$ ): payoffs $\mathbf{\sigma}=\mathbf{\sigma}$ putpayoffs(minmaturityprice, maxmaturityprice, strikeprice, step) plt.ylim(payoffs.Payoff.min() - 10, payoffs.Payoff.max() + 10) plt.ylabel("Payoff") plt.xlabel("Maturity Price") plt.title('Payoff of put option, Strike={0}' .format(strikeprice)) plt.xlim(minmaturityprice, maxmaturityprice) plt.plot(payoffs.index, payoffs.Payoff.values);

The following command demonstrates the payoffs for prices between 10 and 25 with a strike price of 15:

In [24]: plotputpayoffs(10, 25, 15)

Profit and loss calculation

The general idea with an option is that you want to make a profit on speculation on the movement of the price of a security in the market, over a predetermined time frame.

The amount of profit or loss from the option can be calculated using a combination of the upfront premium and the payoff value of the option upon expiration. It is a zero-sum game as when a buyer profits by a certain amount, the seller loses the same amount, and vice versa.

The following table summarizes all of the profit and loss situations for both the buyer and seller when entering into options contracts:

The call option profit and loss for a buyer

A buyer of a call will pay to the seller the premium to obtain the option being in a loss situation until the payoff exceeds the premium.

This can be demonstrated using the following function, which given the premium and strike price and returns a DataFrame of return values for a range of maturity prices for the buyer of a call:

In [25]:

def callpnlbuyer(premium, strikeprice, minmaturityprice, maxmaturityprice, step $\mathbf{\lambda}=\mathbf{\lambda}{1})$ ):

payoffs $\mathbf{\lambda}=\mathbf{\lambda}$ callpayoffs(minmaturityprice,
maxmaturityprice, strikeprice)
payoffs['Premium'] $\mathbf{\sigma}=\mathbf{\sigma}$ premium
payoffs['PnL'] $\mathbf{\sigma}=\mathbf{\sigma}$ payoffs.Payoff - premium
return payoffs

The following command calculates the values of a call option starting at a price of 12 and with a strike price of 15 through the maturity values of 10 to 30:

In [26]: pnlbuyer $\mathbf{\lambda}=\mathbf{\lambda}$ callpnlbuyer(12, 15, 10, 35) pnlbuyer

Out[26]:

[26 rows x 4 columns]

The following function will visualize information in this DataFrame:

In [27]:

def plotpnl(pnldf, okind, who): plt.ylim(pnldf.Payoff.min() - 10, pnldf.Payoff.max() + 10) plt.ylabel("Profit / Loss") plt.xlabel("Maturity Price")

plt.title('Profit and loss of {0} option, {1}, Premium={2} Strike={3}' .format(okind, who, pnldf.Premium.iloc[0], pnldf.Strike.iloc[0])) plt.ylim(pnldf.PnL.min()-3, pnldf.PnL.max() + 3) plt.xlim(pnldf.index[0], pnldf.index[len(pnldf.index)-1]) plt.plot(pnldf.index, pnldf.PnL) plt.axhline(0, color $\mathbf{\sigma}=$ 'g');

This visualizes the particular DataFrame with the following chart:

In [28]: plotpnl(pnlbuyer, "put", "Buyer")

The profit and loss stays at a loss of the initial premium until the payoff begins to increase from 0 as the maturity price exceeds the strike price. There is a loss until the payoff exceeds the premium, which, in this case, is at $\$27$ (the premium and the strike price).

The call option profit and loss for the seller

A seller of a call will initially profit from the receipt of the premium from the buyer. The profit for a seller will be the premium as long as the price at maturity is below the strike price. As the payoff increases for the buyer, the profit for the seller decreases and will eventually become a loss once the buyer moves into profit.

This can be demonstrated using the following function, which, given the premium and strike price, returns a DataFrame of returns values for a range of maturity prices for the seller of a call:

In [29]:

def callpnlseller(premium, strikeprice, minmaturityprice, maxmaturityprice, step $\mathbf{\lambda}=\mathbf{\lambda}\mathbf{1}\mathbf{\dot{\eta}}$ ): payoffs $\mathbf{\sigma}=\mathbf{\sigma}$ callpayoffs(minmaturityprice, maxmaturityprice, strikeprice) payoffs['Premium'] $\mathbf{\sigma}=\mathbf{\sigma}$ premium payoffs['PnL'] $\mathbf{\sigma}=\mathbf{\sigma}$ premium - payoffs.Payoff return payoffs

The following command calculates the values of a call option starting at a price of 12 and with a strike price of 15 through the maturity values of 10 to 30:

In [30]: pnlseller $\mathbf{\lambda}=\mathbf{\lambda}$ callpnlseller(12, 15, 10, 35) pnlseller

This visualizes a particular DataFrame with the following chart:

In [31]:

plotpnl(pnlseller, "call", "Seller")

The profit and loss stays at a profit matching the premium until the payoff begins to increase from 0 as the maturity price exceeds the strike price. There is a profit obtained until the payoff amount exceeds the premium, which in this case is at $\$27$ (the premium $^+$ the strike price), at which point the seller of the call will increasingly be at a loss as the maturity value increases.

Combined payoff charts

There will be many instances where you will see the payoffs/profit and loss for both the buy and seller represented on a single chart. The following function will do this for us:

Out[32]:

def plotcombinedpnl(pnldf): plt.ylim(pnldf.Payoff.min() - 10, pnldf.Payoff.max() + 10) plt.ylabel("Profit / Loss")

plt.xlabel("Maturity Price")
plt.title('Profit and loss of call option Strike={0}' .format(pnldf.Strike.iloc[0]))
plt.ylim(min(pnldf.PnLBuyer.min(), pnldf.PnLSeller.min())-3, max(pnldf.PnLBuyer.max(), pnldf.PnLSeller.max())+3)
plt.xlim(pnldf.index[0], pnldf.index[len(pnldf.index)-1])
plt.plot(pnldf.index, pnldf.PnLBuyer, color $\mathbf{\bar{\rho}}=\mathbf{\rho}$ 'b')
plt.plot(pnldf.index, pnldf.PnLSeller, color='r')
plt.axhline(0, color='g');

This function expects to be given a DataFrame, which combines data from both the profit and loss functions' calls and puts. This DataFrame can be constructed as follows:

In [33]: pnlcombined $\mathbf{\sigma}=\mathbf{\sigma}$ pd.DataFrame({'PnLBuyer': pnlbuyer.PnL, 'PnLSeller': pnlseller.PnL, 'Premium': pnlbuyer.Premium, 'Strike': pnlbuyer.Strike, 'Payoff': pnlbuyer.Payoff})

pnlcombined

Out[33]:

Now, passing this in to the function, we are presented with the following graph with both series of profit and loss plotted:

In [34]: plotcombinedpnl(pnlcombined)

This shows how the overall effect of buying and selling an option is a zero-sum game. There are fixed losses or gains for the buyer and seller as long as the maturity price is below the strike price. A maturity price above the strike price begins to flow value back to the buyer from the seller. Conceptually, there is unlimited upside for the buyer and unlimited downside for the seller.

The put option profit and loss for a buyer

A buyer of a put pays a premium to the put seller. They are at a loss of the premium if the maturity price exceeds the strike price. As the maturity price falls below the strike price at maturity, the loss will decrease. There will be an overall loss until the payoff exceeds the premium.

This can be demonstrated using the following function which, given the premium and strike price, returns a DataFrame of returns values for a range of maturity prices for the buyer of a put option:

In [35]:

def putpnlbuyer(premium, strikeprice, minmaturityprice, maxmaturityprice, step $\mathbf{\lambda}=\mathbf{\lambda}\mathbf{1}\mathbf{\dot{\lambda}}$ ): payoffs $\mathbf{\lambda}=\mathbf{\lambda}$ putpayoffs(minmaturityprice, maxmaturityprice, strikeprice) payoffs['Premium'] $\mathbf{\sigma}=\mathbf{\sigma}$ premium payoffs['Strike'] $\mathbf{\sigma}=\mathbf{\sigma}$ strikeprice payoffs['PnL'] $\mathbf{\sigma}=\mathbf{\sigma}$ payoffs.Payoff - payoffs.Premium return payoffs

The following command calculates the profit and loss of a put option for the buyer starting at a price of 2 and with a strike price of 15 through the maturity values of 10 to 30:

In [36]: pnlputbuyer $\mathbf{\sigma}=\mathbf{\sigma}$ putpnlbuyer(2, 15, 10, 30) pnlputbuyer

Out[36]:

The following function will visualize information in this DataFrame:

In [37]:

plotpnl(pnlputbuyer, "put", "Buyer")

There is a tendency to read this chart as the put buyer profiting at the purchase of the put option. Remember that the horizontal axis is not time that increases from left to right. Although it looks as though the buyer profits by $\$3$ at the onset of purchasing the option, this chart really shows how profit and loss varies at maturity for different maturity prices. As long as the maturity price is greater than the strike price, there is only a loss of the amount of the premium. The more the maturity price finishes below the strike price, the better the chance to earn profit.

The put option profit and loss for the seller

A seller of a put receives the premium from the buyer of the put. They have a profit of the premium if the maturity price exceeds the strike price. As the maturity price falls below the strike price at maturity, the profit will decrease by the amount of the payoff.

This can be demonstrated using the following function, which, given the premium and strike price, returns a DataFrame of returns values for a range of maturity prices for the seller of a put option:

In [38]:

def putpnlseller(premium, strikeprice, minmaturityprice, maxmaturityprice, step $\mathbf{\lambda}=\mathbf{\lambda}\mathbf{1}\mathbf{\dot{\lambda}}$ ): payoffs $\mathbf{\lambda}=\mathbf{\lambda}$ putpayoffs(minmaturityprice, maxmaturityprice, strikeprice) payoffs['Premium'] $\mathbf{\sigma}=\mathbf{\sigma}$ premium payoffs['Strike'] $\mathbf{\sigma}=\mathbf{\sigma}$ strikeprice payoffs['PnL'] $\mathbf{\sigma}=\mathbf{\sigma}$ payoffs.Premium - payoffs.Payoff return payoffs

The following command calculates the profit and loss of a put option for the seller starting at a price of 2 and with a strike price of 15 through the maturity values of 10 to 30:

In [39]: pnlputseller $\mathbf{\sigma}=\mathbf{\sigma}$ putpnlseller(30, 45, 20, 50) pnlputseller

Out[39]:

The following function will visualize information in this DataFrame:

In [40]:

plotpnl(pnlputseller, "put", "Seller")

The pricing of options

There are two general styles of options: European and American. A European option is an option that cannot be exercised before its expiration date. An American option can be exercised at any point before its expiration date. American options are the most common form of options traded in the market.

The pricing model of the two styles of options is significantly different. Since a European option can only be exercised at its expiration, there exists a closed form calculation for its market price. The common form of modeling for a European option is the Black-Scholes pricing model.

The pricing of American options is complicated by their ability to be exercised at any time, which prevents them having a closed-form pricing model. However, there are several ways to price an American option, one of which we will examine later in the chapter and is known as the binomial tree method.

A general characteristic of an American option compared to a European option is that its price generally will be higher due to the flexibility and increased risk on the counterparty side.

We will examine the pricing of European options using the Black-Scholes formula. Our purpose is not to derive a complete understanding of how the prices are derived but to use a pricing library to verify the price and implied volatility of options retrieved from Yahoo! Finance.

Additionally, we will examine several underlying characteristics of the options referred to as The Greeks, which are various partial derivatives of the Black-Scholes formula relative to the various parameters of the function. These values are often used in decision making with respect to the purchase of options.

The pricing of options with Black-Scholes

The Black-Scholes formula was developed by Fischer Black and Myron Scholes and is a stochastic partial-differential equation that estimates the price of an option, specifically a European option, which is an option that can only be exercised at the end of its life. This is in contrast to an American option, which can be exercised at any point after its purchase.

The basic idea behind Black-Sholes is to determine the value today of an options contract for an underlying security in a year. The contract will have different values depending upon whether the stock goes up or down, so the payoff curve is not symmetrical. The model helps us to derive an underlying measure of the probabilities of the underlying security ending up at various values at the end of the year. If we can determine this, then we can also estimate a value for the contract.

The Black-Scholes model also makes several assumptions to keep the modeling simple:

There is no arbitrage
There is the ability to borrow money at a constant risk-free interest rate
throughout the life of the option
There are no transaction costs
The pricing of the underlying security follows a Brownian motion with
constant drift and volatility
No dividends are paid from the underlying security

This seems to be a list of very important assumptions but it is needed to get a baseline model in place. More complicated scenarios can then be handled with other derivations, but even with these assumptions, the resulting model is quite representative of actual prices (as we will see).

Deriving the model

There are three primary factors that are taken into account for determining the value of an option:

The value of the cash to buy the option The value of the underlying security that is received (if any) The volatility of the underlying price during the life of the option

We have seen these three factors taken into account in our payoff models. We now need to quantify these a bit more to be able to work out their expected values and derive a value for the contract.

The value of the cash to buy

If the option is exercised, then the cash is paid only if the underlying stock price is above the strike at maturity. Therefore, we need to determine the expected value based upon the probability that the stock finishes above the strike price. The strike price will be referred to as $k^{\star},$ , and the probability of the stock finishing above $\mathbf{k}^{\mathrm{r}}$ will be referred to as $N(\vec{a}{\perp})$ . The expected value is then $N(d{2})F{2}^{r}$ with $N()$ representing the cumulative normal function. The $\vec{a}{2}$ variable represents a formulation of the probability of the option exceeding the strike price (a little more on this later).

Given that the expected value is $N(\vec{a}{2})\vec{K},$ , this amount can be discounted using $e^{-r(T-t)}$ to give us the value of the cash to buy the option today as $N(d{2})K E^{-p(T-t)}$ .

The value of the stock received

If the option is exercised, then we take possession of the underlying security at its value in the market at the maturity of the option. It happens that the expected value of this is proportional to the current value of the stock, which is referred to as $5$ . In the Black-Scholes model, this expected value is referred to as $N(d{1})5$ .

$N(\vec{a}{1})$ represents the proportion of the value of the current value of the stock, $5$ at maturity of the option only if the option is exercised and 0 otherwise. Like $d{\geq},d{\mathbf{1}}$ will be stated a little later.

The formulas

Options are either calls or puts, so there are two derivations of the model. The simpler of the two is the model for call options:

$$
C(S,t)=N(d{1})S-N(d{2})K\bar{e}^{-r(T-t)}
$$

This states that the value of the call is the difference between the stock price and the strike price using the probability scaling of each and discounting the strike price.

The formula for a put is slightly more complicated but similar:

$$
P(S,t)=K e^{-r(T-t)}-S+C(S,t)=N(-d{2})K e^{-r(T-t)}-N(-d{1})S
$$

d1 and d2

Finally, we get to $d{1}$ and $d{2}$ . These formulas are at the heart of the Black-Scholes model. The mathematics of $\vec{a}{1}$ and $d{2}$ are fairly complex and represent the probability scale factors for the stock price $(d{1})$ and strike price $(d{2})$ using the cumulative normal function $\pmb{N}[\widetilde{\bigtriangledown}$ . These will be presented as follows without further explanation in this text. The formula for $\vec{a{\mathrm{I}}}$ is as follows:

$$
d{1}={\frac{1}{\sigma{\sqrt{T-t}}}}\left[\ln\left({\frac{S}{K}}\right)+\left[r+{\frac{\sigma^{2}}{2}}\right\right]
$$

The formula for $d{2}$ is as follows:

$$
d{2}={\frac{1}{\sigma{\sqrt{T-t}}}}\left\ln\left({\frac{S}{K}}\right)+\left[r-{\frac{\sigma^{2}}{2}}\right\right]=d{1}-\sigma{\sqrt{T}}
$$

These appear complex (and their derivation is) but are easily implemented in a programming language with the values simply plugged in. Also, the volatility of the underlying price is represented in these equations by the sigma variable.

The parameters that can be plugged in are the following:

N: The cumulative normal function
T: Time to maturity expressed in years
S: The stock price or other underlying assets
K: The strike price
r: The risk-free interest rate

You may have noticed that we have not parameterized the volatility. This is one of the things you need to remember using Black-Scholes. The volatility will be implied via the other parameters.

Now, with this all in hand, we can now implement the Black-Sholes algorithm in Python.

Black-Scholes using Mibian

For the sake of brevity, we will not get into the actual implementation of BlackScholes in Python. Instead, we will use a small but convenient library: MibianLib. MibianLib is available at http://code.mibian.net/ and is open source. It provides several methods for options price calculation, one of which is Black-Scholes. You can examine the implementation to verify the previous formulations.

Now, let's examine the basic use of Mibian to calculate values using Black-Scholes. To do this, we will examine two options that we retrieved from Yahoo! Finance earlier in the chapter—the put and call expiring on 2015-01-15 with IV of 57.23 (the put) and 52.73 (the call):

In [41]: aos[aos.Expiry $\scriptstyle==$ '2016-01-15'][:2]

Out[41]:

At the time of retrieving these, these options are 324 days from expiring:

In [42]:

date(2016, 1, 15) - date(2015, 2, 25)

Out[42]:

datetime.timedelta(324)

We have now collected all of the parameters to use the Black-Scholes pricing (using an assumed 1 percent interest rate):

In [43]: import mibian c $\mathbf{\lambda}=\mathbf{\lambda}$ mibian.BS([128.79, 34.29, 1, 324], 57.23)

The call price can be retrieved via the .callPrice property:

In [43]: c.callPrice

Out[44]: 94.878970089456217

Our result is a few cents off the actual quoted bid but between the bid and ask prices. Given that we assumed a 1 percent interest rate, the result is right in the range we would expect.

The put price is retrieved via the .putPrice property:

In [45]: c.putPrice

Out[45]: 0.075934592996542705

This is very close to the ask value of the put option.

We can also use Mibian to calculate the implied volatility:

In [46]: c $\mathbf{\lambda}=\mathbf{\lambda}$ mibian.BS([128.79, 34.29, 1, 324], callPrice $\mathbf{\sigma}=$ 94.878970089456217 )
Out[46]: 57.22999572753906

Charting option price change over time

It can be useful to plot the price of an option until its expiration. We can do this by varying the time to expiration and plotting the results. This can be done very easily using pandas.

The following command calculates the call price for the AAPL option, varying from 1 to 364 days to expiry, and plots the change in price showing that the price of the call decreases as the number of days to expiry increases:

In [47]: df $\mathbf{\sigma}=\mathbf{\sigma}$ pd.DataFrame({'DaysToExpiry': np.arange(364, 0, -1)}) df

DaysToExpiry 0 364 1 363 2 362 3 361 4 360 .. · 359 5 360 4 361 3 362 2 363 1 [364 rows x 1 columns]

In [48]:

bsv1 $\mathbf{\sigma}=\mathbf{\sigma}$ mibian.BS([128.79, 34.29, 1, 324], volatility $\mathbf{\bar{\rho}}=\mathbf{\rho}$ 57.23)
calccall $\mathbf{\sigma}=\mathbf{\sigma}$ lambda r: mibian.BS([128.79, 34.29, 1, r.DaysToExpiry], volatility $\scriptstyle\mathbf{\varepsilon=}57$ .23).callPrice
df['CallPrice'] $\mathbf{\lambda}=\mathbf{\lambda}$ df.apply(calccall, axis $\mathbf{\tau}=\mathbf{1}$ )
df

DaysToExpiry CallPrice 0 364 94.96 1 363 94.96 2 362 94.96 3 361 94.96 4 360 94.95 .. ... ... 359 5 94.50 360 4 94.50 361 3 94.50 362 2 94.50 363 1 94.50 [364 rows x 2 columns]

The following graph shows the call price decreasing as the days to expiry also decreases:

In [49]: df[['CallPrice']].plot();

The Greeks

The Greeks are quantities representing the sensitivity of the price of options to the change in the underlying parameters of the valuation of the derivative. The first-order Greeks of options represent the change value relative to the change in price, volatility, and time to expiry. Second-order and third-order Greeks do exist, but we will only focus on the first-order Greeks and a single second-order Greek known as Gamma.

The first-order Greeks are named and represented in the following table:

The Greeks are important tools in risk management to manage the exposure of individual investments or combinations, such as in an investment portfolio. We will not get into the detailed use for risk management as that is beyond the scope of this book (and pandas), but they are worth mentioning in a chapter on options pricing.

Calculation and visualization

The Greeks in Black-Scholes are straightforward to calculate and are given with the following formulas:

We will not examine their implementation in this book, especially since they are implemented in Mibian. However, we will demonstrate how the Greeks vary in value by creating a DataFrame to alternate the values of the input in the Black-Scholes pricing algorithm:

In [50]:

greeks $\mathbf{\lambda}=\mathbf{\lambda}$ pd.DataFrame()
delta $\mathbf{\lambda}=\mathbf{\lambda}$ lambda r: mibian.BS([r.Price, 60, 1, 180], volatility $=30$ ).callDelta
gamma $\mathbf{\lambda}=\mathbf{\lambda}$ lambda r: mibian.BS([r.Price, 60, 1, 180], volatility $=30$ ).gamma
theta $\mathbf{\lambda}=\mathbf{\lambda}$ lambda r: mibian.BS([r.Price, 60, 1, 180], volatility $=30$ ).callTheta
vega $\mathbf{\sigma}=\mathbf{\sigma}$ lambda r: mibian.BS([r.Price, 60, 1, 365/12], volatility $=30$ ).vega
greeks['Price'] $\mathbf{\sigma}=\mathbf{\sigma}$ np.arange(10, 70)
greeks['Delta'] $\mathbf{\sigma}=\mathbf{\sigma}$ greeks.apply(delta, axis $\mathbf{\tau}=\mathbf{1}$ )
greeks['Gamma'] $\mathbf{\lambda}=\mathbf{\lambda}$ greeks.apply(gamma, axis $\mathbf{\tau}=\mathbf{1}$ )
greeks['Theta'] $\mathbf{\sigma}=\mathbf{\sigma}$ greeks.apply(theta, axis $\mathbf{\tau}=\mathbf{1}$ )
greeks['Vega'] $\mathbf{\sigma}=\mathbf{\sigma}$ greeks.apply(vega, axis $\mathbf{\tau}=\mathbf{1}$ )
greeks[:5]

Out[50]:

The following plot demonstrates how the different values for Delta, Gamma, Theta, and Vega change for this particular option relative to change in their respective parameters:

In [51]: greeks[['Delta', 'Gamma', 'Theta', 'Vega']].plot();

Summary

In this chapter, we examined several techniques for using pandas to calculate the prices of options, their payoffs, and the profit and loss for the various combinations of calls and puts for both buyers and sellers. We started with a brief introduction to options, covered how to load current market data for options from Yahoo! Finance, and then examined the properties of the data retrieved from the web services.

We then examined the pricing of options using Black-Scholes with a brief explanation of how the algorithm models option prices. We also used the Mibian library to calculate prices using Black-Scholes. We finished with a brief explanation of the Greeks and how to calculate their values for various configurations of options.

In the next chapter, we will look at the modeling of investment portfolios using Python and pandas and how we can calculate optimal portfolios that balance risk and return for different investor types.

9 Portfolios and Risk

A portfolio is a grouping of financial assets, which may include stocks, bonds, and mutual funds. It is generally accepted that a portfolio is designed based upon an investor's risk tolerance, time frames, and investment goals. The allocation of the assets in a portfolio, referred to as asset allocation, influences the risk/reward ratio of the portfolio. The specific assets in a portfolio and the relative weighting of the assets within the portfolio are designed to maximize the expected return, while also minimizing the risk.

The process of determining the proper assets and their proportion relative to each other within a portfolio involves a concept known as modern portfolio theory (MPT). This is a theory in finance that has evolved since the 1950s and describes the mathematics of constructing an optimal portfolio based upon risk and return parameters. This involves selecting assets that are correlated based upon historical returns, in such a manner that they function to diversify the portfolio.

In this chapter, we will examine the concepts of modern portfolio theory. We will first start with an overview of MPT and how it utilizes a concept known as the 'efficient frontier' to determine an optimal portfolio. We will then examine a means of modeling a portfolio with pandas, and then implement the mathematics of MPT to calculate optimum portfolios and determine and visualize the efficient frontier for a particular mix of assets. The chapter then closes of with a brief discussion of Value at Risk, which helps us to understand the level potential loss that can be expected in a portfolio for a specific period of time.

In this chapter, we will cover the following:

An overview of modern portfolio theory
Mathematical models of portfolios
Risk and expected return
The concepts of diversification and the efficient frontier

Modeling a portfolio with pandas Gathering historical stock data within a portfolio Modeling different weights of assets in a portfolio Optimization and minimization using SciPy Calculating the Sharpe ratio of a portfolio Constructing an efficient portfolio Visualizing the efficient frontier for a set of assets Computing Value at Risk (VaR)

Notebook setup

The examples in this chapter will be based upon the following configuration in IPython. One main difference in this setup is that in this chapter, we will be using SciPy, specifically its optimization and statistical features, so this has imports that are required for several of the examples:

In [1]: import pandas as pd import numpy as np import pandas.io.data as web from datetime import datetime import scipy as sp import scipy.optimize as scopt import scipy.stats as spstats import matplotlib.pyplot as plt import matplotlib.mlab as mlab %matplotlib inline

pd.setoption('display.notebookreprhtml', False) pd.setoption('display.maxcolumns', 7) pd.setoption('display.maxrows', 10) pd.setoption('display.width', 82) pd.setoption('precision', 3)

An overview of modern portfolio theory

Modern portfolio theory (MPT) is a theory of finance that attempts to maximize the expected return on a set of investments (known as the portfolio), relative to the overall risk of the combined items in the portfolio. The concept is that given a particular level of risk, the return will be maximized for that risk. This is common in retirement plans. The younger the investor and the smaller the amount in the portfolio, the more there is a willingness to take risks on higher returns. As the investor comes close to retirement and the total value of the portfolio is higher, the more likely they are to take lower risks, to ensure that the base of the portfolio is not lost but that at the tradeoff of potential gains being lower.

MPT provides a mathematical model of diversified investment with the goal of selecting a collection of investments that has a combined risk that is less than any individual asset in the portfolio. This is achievable by selecting individual investments that have opposite correlations such that when one particular investment goes down in value, another gains similarly in value and the overall net of the portfolio remains consistent or at least minimizes the loss during downturns. However, at the same time, this may also lower the overall gains in upturns. And additionally, diversification has a tendency to also lower risk even if various assets in the portfolio are not negatively correlated as the diversity itself tends to give an overall less risky portfolio.

MPT assumes an individual investment's returns as normally distributed and then defines risk as the standard deviation of the returns. It then models a portfolio as a weighted combination of the assets such that the return of the overall portfolio is a weighted sum of the combination of the returns of the assets. Then, by selecting a set of investments that are not perfectly correlated, MPT attempts to reduce the total variance of the overall portfolio return.

MPT was developed in the 1950s and through to the 1970s and represented a significant advance in financial modeling. As a theory, it is interesting and does have practical applications. But like other models of finance (for example, BlackScholes), it is heavily dependent on those assumptions and can lead to suboptimal results when those conditions are not met. Nonetheless, it is an important financial concept—one that can be implemented effectively using pandas and Python—and is important to understand before branching out into more detailed models.

Concept

The basic idea behind MPT is that assets in a portfolio should not be selected individually based upon their individual performance. It is instead important to consider how each asset changes in value relative to other assets in the portfolio. This represents a tradeoff between risk and expected return. The stocks in an efficient portfolio are chosen based on the investor's risk tolerance, with an efficient portfolio having at least two stocks above the minimum variance portfolio. For a given amount of risk, MPT describes how to select a portfolio out of a set of investments that has the highest expected return while being at or below the specified risk level. On the flip side, for a given return, MPT specifies how to select a portfolio with the least possible risk.

Mathematical modeling of a portfolio

In this chapter, we will examine the classical model of MPT. There have been many extensions, but we will focus on the core.

Risk and expected return

A fundamental assumption of MPT is that investors are risk averse. This means that given two portfolios that offer the same expected return, the investor will prefer the less risky portfolio. Therefore, an investor will only take on a riskier portfolio if higher expected returns make it worthwhile. And conversely, an investor wanting higher expected returns must accept greater risk.

MPT makes the assumption that the standard deviation of returns can be used as an accurate representation of risk. This is valid if asset returns are normally jointly distributed, which are otherwise elliptically distributed.

Then, under the model:

Portfolio return is the proportion-weighted combination of the constituents of the returns of the assets
Portfolio volatility is a function of the correlations of the constituent assets, for all pairs of assets $(i,j)$ .

This goes on up to $n$ assets in a portfolio. We will return to these formulas later when we implement them in Python and then with pandas when we optimize portfolios.

Diversification

An investor can then reduce risk by holding combinations of instruments that are not positively correlated. If the asset pairs are perfectly uncorrelated (correlation of 0), then the portfolio's return variance is the sum over all the instruments of the square of the fraction held in the instrument multiplied by the instrument's return variance.

The efficient frontier

Using this model, the risk and expected returns of all possible combinations of risky assets is computed. This can then be plotted in the risk-return space, a two-dimensional space with the risk along the $x$ axis and the expected return along the $y$ axis. The collection of all such portfolios will define a region of the graph, with the left edge of what forms a hyperbola. This following hyperbola is often referred to as the Markowitz Bullet:

The upper portion of the hyperbola, represented with a solid line, represents the efficient frontier. All portfolios along the solid portion on the line can only increase in return with increased risk. However, also note that any portfolio on the efficient frontier also has a matching portfolio on the lower half of the bullet, which represents a portfolio with the same amount of risk but with less expected return. All things considered, an investor will want to take the portfolio with higher return over one with lower return and with the same risk. Hence, only portfolios on the portion of the hyperbola at higher returns than the minimum variance portfolio are considered on the efficient frontier.

Modeling a portfolio with pandas

A basic portfolio model consists of a specification of one or more investments and their quantities. A portfolio can be modeled in pandas using a DataFrame with one column representing the particular instrument (such as a stock symbol) and the other representing the quantity of the item held.

The following command will create a DataFrame representing a portfolio:

In [2]: def createportfolio(tickers, weights $\mathbf{\sigma}=$ None): if (weights is None): shares $\mathbf{\lambda}=\mathbf{\lambda}$ np.ones(len(tickers))/len(tickers) portfolio $\mathbf{\lambda}=\mathbf{\lambda}$ pd.DataFrame({'Tickers': tickers, 'Weights': weights}, index $\c=$ tickers)

return portfolio

Using this, we can create a portfolio of two instruments, Stock A and Stock B. The amount of shares for each is initialized to 1. This would represent an equally weighted portfolio as the number of shares of each stock is the same:

In [3]: portfolio $\mathbf{\lambda}=\mathbf{\lambda}$ createportfolio(['Stock A', 'Stock B'], [1, 1]) portfolio

Out[3]:

Shares Tickers Stock A 1 Stock A Stock B 1 Stock B

We can then model mock returns for the last 5 years. The values used for returns are picked to demonstrate a point about creating an equally-weighted portfolio and to use negatively correlated instruments to create a representation of the diversification effect:

In [4]:

returns $\mathbf{\sigma}=\mathbf{\sigma}$ pd.DataFrame({'Stock A': [0.1, 0.24, 0.05, -0.02, 0.2],'Stock B': [-0.15, -0.2, -0.01, 0.04, -0.15]})

returns

Out[4]:

Stock A Stock B
0 0.10 -0.15
1 0.24 -0.20
2 0.05 -0.01
3 -0.02 0.04
4 0.20 -0.15

Using the portfolio share values and the returns, the following function will compute the equally-weighted return for the underlying instruments:

In [5]:

def calculateweightedportfoliovalue(portfolio, returns, name $\mathbf{\lambda}=$ 'Value'): totalweights $\mathbf{\sigma}=\mathbf{\sigma}$ portfolio.Weights.sum() weightedreturns $\mathbf{\lambda}=\mathbf{\lambda}$ returns $\star$ (portfolio.Weights / totalweights) return pd.DataFrame({name: weightedreturns.sum(axi $\mathsf{s}=\mathsf{1}$ )})

We can now calculate the equally-weighted portfolio and concatenate it with our original DataFrame of returns:

In [6]:

wr $\mathbf{\sigma}=\mathbf{\sigma}$ calculateweightedportfoliovalue(portfolio, returns, "Value")
withvalue $\mathbf{\lambda}=\mathbf{\lambda}$ pd.concat([returns, wr], axis $\mathbf{\tau}=\mathbf{1}$ )
withvalue

Out[6]:

We can examine the volatility of each of the individual instruments combined with the results of the weighted portfolio, as shown here:

In [7]: withvalue.std()

Out[7]:

Stock A 0.106677
Stock B 0.103102
Value 0.020310
dtype: float64

Stock A had a volatility of 11 percent and Stock B of 10 percent. The combined portfolio represented significantly lower volatility of 2 percent. This is because we picked two negatively correlated stocks with similar volatility and combining them has therefore reduced the overall risk.

We can visualize this using the following function:

In [8]: def plotportfolioreturns(returns, title $\mathrel{\mathop:}=$ None): returns.plot(figsize $\mathbf{\lambda}=$ (12,8)) plt.xlabel('Year') plt.ylabel('Returns') if (title is not None): plt.title(title) plt.show()

Also examine the following graph:

In [9]: plotportfolioreturns(withvalue)

It becomes apparent from this graph that the overall portfolio had much less variability, and hence risk, than those of the individual instruments in the portfolio.

Just to check, we can also calculate the correlation of the original returns:

In [10]: returns.corr()

Out[10]:

Stock A Stock B Stock A 1.000000 -0.925572 Stock B -0.925572 1.000000

The returns of our two stocks have a negative correlation of -0.93, which tells us that they can be used to offset each other's volatility.

This scenario used an equally-weighted portfolio of stocks that have a strong negative correlation and returns of similar magnitude. The real trick that we will examine in the upcoming sections will be to select an optimal portfolio from a set of stocks and to also determine the proper weighting for each stock to reach the optimized portfolio- that is, the efficient frontier.

Constructing an efficient portfolio

At the beginning of the chapter, we briefly covered the formulas to calculate the estimated return and variance of a portfolio. We will now dive into implementations of those calculations along with selecting portfolios that are on the efficient frontier.

To do this, we will need to cover the following concepts:

Gathering of historical returns on the assets in the portfolio Formulation of portfolio risk based on historical returns Determining the Sharpe ratio for a portfolio Selecting optimal portfolios based upon Sharpe ratios

Gathering historical returns for a portfolio

In our examples, we will use data retrieved from Yahoo! Finance to create historical returns for the stocks in the portfolio. The calculations we will perform will utilize annualized returns. Yahoo! Finance data represents daily prices for the stocks, so we will need to convert those prices into annualized returns.

We can start this process using the following function, which will retrieve the adjusted closing prices for a list of stocks between the two dates and organize it in a convenient way for the processes we will undertake:

In [11]:

def gethistoricalcloses(ticker, startdate, enddate): $\ p\ =$ web.DataReader(ticker, "yahoo", startdate, enddate) d = p.toframe()['Adj Close'].resetindex() d.rename(columns={'minor': 'Ticker', 'Adj Close': 'Close'}, inplace $\mathbf{\sigma}=$ True) pivoted $\mathbf{\sigma}=\mathbf{\sigma}$ d.pivot(index $\mathbf{\sigma}{\cdot}=\mathbf{\sigma}{\cdot}$ 'Date', columns $\mathbf{\lambda}=$ 'Ticker') pivoted.columns $\mathbf{\sigma}=\mathbf{\sigma}$ pivoted.columns.droplevel(0) return pivoted

Our examples will utilize $\mathtt{A A P L},$ MSFT, and KO stocks, from 2010-01-01 through 2014-12-31. We can retrieve those daily prices as follows:

In [12]:

closes $\mathbf{\lambda}=\mathbf{\lambda}$ efgethistoricalcloses(['MSFT', 'AAPL', 'KO'], '2010-01-01', '2014-12-31')

In [13]: closes[:5]

Out[13]:

Using this data, the following function will calculate annualized returns for each of the stocks. We start with the following function, which converts daily prices into daily returns:

In [14]: def calcdailyreturns(closes): return np.log(closes/closes.shift(1))

Our daily returns are shown here:

In [15]: dailyreturns $\mathbf{\sigma}=\mathbf{\sigma}$ calcdailyreturns(closes) dailyreturns[:5]

Out[15]:

From the daily returns, we can calculate annualized returns using the following function:

In [16]:

def calcannualreturns(dailyreturns): grouped $\mathbf{\lambda}=\mathbf{\lambda}$ np.exp(dailyreturns.groupby( lambda date: date.year).sum())-1 return grouped

This gives us the following as the annual returns:

In [17]: annualreturns $\mathbf{\sigma}=\mathbf{\sigma}$ calcannualreturns(dailyreturns) annualreturns

Out[17]:

Ticker AAPL KO MSFT
2010 0.507219 0.189366 -0.079442
2011 0.255580 0.094586 -0.045156
2012 0.325669 0.065276 0.057989
2013 0.080695 0.172330 0.442979
2014 0.406225 0.052661 0.275646

Formulation of portfolio risks

Since we now have a return matrix, we can estimate its variance-covariance matrix, and by combining it with a vector of weights for each of the assets, we can calculate the overall portfolio variance (this flows into the Sharpe ratio calculation we will do next).

The formulation of the portfolio variance starts with the calculation of the mean of the returns for an individual stock:

$$
\overline{{R}}=\frac{\sum{i=1}^{n}R{i}}{n}
$$

Using this, we can then calculate the variance in the returns of a single stock:

$$
\begin{array}{r}{\upsigma^{2}=\frac{\sum{i=1}^{n}(R{i}-\overline{{R}})^{2}}{n-1}}\end{array}
$$

Here, $R{i}$ is the stock's return for period $i,\overline{{R}}$ is the mean of the returns, and $\mathbb{r l}$ is the number of the observations.

The return volatility is simply the square root of the variance:

$$
\upsigma=\sqrt{\upsigma^{2}}
$$

A portfolio will consist of one or more stocks. The return matrix for those stocks consists of $n$ stocks and $m$ returns:

$$
R=\binom{R{1,1}}{\vdots}\quad\ddots\quad\vdots\atop{R{n,1}}
$$

Using this return matrix, we can derive the formula for the expected return of stock i:

$$
E(R{i})=\sum{i=1}^{n}w{i}R{i,n}
$$

Each stock will make up a certain percentage of the portfolio. We represent this mix of the stock in the portfolio using a vector of weights, $w,$ which necessarily sums up to 1:

$$
w=(w{1},w{1},w{1},\cdots,w{n})
$$

We can apply this vector of weights to the assets in an n-stock portfolio, resulting in the following formula that gives us the weighted expected return of the portfolio:

$$
E(R{p o r t})=\sum{i=1}^{n}w{i}E(R{i})
$$

The variance of an n-stock portfolio is formulated using the following formula:

$$
\upsigma{p o r t}^{2}=\sum{i=1}^{n}\sum{j=1}^{n}w{i}w{j}\upsigma{i}\upsigma{j}\rho{i j}
$$

Here $\rho{i j}$ is the correlation coefficient between returns on assets $i$ and $j,$ and $\rho{i j}=1$ for $i{=}j$ .

Examining this formula more closely, the following equation can be seen:

$$
\boldsymbol{\Sigma}=\mathbb{\sigma}{i}\mathbb{\sigma}{j}\mathbb{\rho}{i j}
$$

Sigma happens to be the covariance matrix calculated from the returns matrix.

Pulling this all together with the summations, we come to the following formula, which describes the variance of a weighted portfolio of n-stocks:

$$
{\upsigma}{p o r t}^{2}=W\sum{W}{'}
$$

Therefore, the variance of a portfolio is determined by multiplying the weights vector by the covariance matrix of the returns, and then multiplying that result by the transpose of the weights vector.

This can be very succinctly implemented in Python using NumPy arrays and matrices and the np.cov() function, which will calculate the covariance of the returns:

In [18]:

def calcportfoliovar(returns, weights $\mathbf{\sigma}=$ None): if (weights is None): weights $\mathbf{\sigma}=\mathbf{\sigma}$ np.ones(returns.columns.size) / \ returns.columns.size sigma $\mathbf{\lambda}=\mathbf{\lambda}$ np.cov(returns.T,ddof $\mathtt{\Omega}=0$ ) var $\mathbf{\sigma}=\mathbf{\sigma}$ (weights $\star$ sigma $\star$ weights.T).sum() return var

Using this function, the variance of our portfolio (using equal weighting for each stock) is determined by the following command:

In [19]: calcportfoliovar(annualreturns)
Out[19]: 0.0028795357274894692

The Sharpe ratio

The Sharpe ratio is a measurement of the risk-adjusted performance of portfolios. It is calculated by subtracting the risk-free rate from the expected return of a portfolio and then by dividing that result by the standard deviation of the portfolio returns. It is described by the following equation:

$$
S h a r p e=\frac{E(R)-R{f}}{\sigma{p}}
$$

The Sharpe ratio tells us whether a portfolio's returns are due to smart investment decisions or a result of excess risk. Although one portfolio or fund can reap higher returns than its peers, it is only a good investment if those higher returns do not come with too much additional risk. The greater a portfolio's Sharpe ratio, the better its risk-adjusted performance has been. A negative Sharpe ratio indicates that a less risky asset would perform better than the security being analyzed.

The following function calculates Sharp Ratio for a portfolio with specified returns, weights, and a risk-free rate:

In [20]:

def sharperatio(returns, weights $\mathbf{\lambda}=\mathbf{\lambda}$ None, riskfreerate $\mathbf{\lambda}=\mathbf{\lambda}$ 0.015): $\textbf{n}=$ returns.columns.size if weights is None: weights $\mathbf{\lambda}=\mathbf{\lambda}$ np.ones(n)/n var $\mathbf{\lambda}=\mathbf{\lambda}$ calcportfoliovar(returns, weights) means $\mathbf{\lambda}=\mathbf{\lambda}$ returns.mean() return (means.dot(weights) - riskfreerate)/np.sqrt(var)

www.it-ebooks.info

We can use this to evaluate the Sharpe ratio of our current portfolio with equal weights using the following statement:

In [21]: Sharperatio(returns)

Out[21]:

3.2010949029381952

Now that we can calculate the Sharpe ratio for a portfolio with a given set of weights, we need to be able to simulate the generation of different combinations of weights and select the weights where the Sharpe ratio is maximized. This will give us the efficient portfolio. This simulation of weights will be performed using SciPy's optimization capabilities.

Optimization and minimization

We now need to perform optimizations to find the efficient portfolio. Optimizations in Python can be performed using scipy.optimize. We will first demonstrate optimization using a basic example and then later, we will optimize portfolios based on Sharpe ratios.

Our basic example will be to minimize the following objective function:

$$
y=2+x^{2}
$$

Intuitively, we know that when $\mathbf{x}$ is 0, y is minimized. We can use this to check the results of the minimization. The first step is to define the function we wish to minimize:

In [22]: def $\mathbf{y}{-}\mathbf{\sigma}\mathbf{\hat{t}}\left(\mathbf{x}\right)$ : return $2+\mathbf{x}\times\mathbf{\times}2$

We can perform the optimization using SciPy's fmin() function. The value 1000 is passed as a seed value for $\mathbf{x},$ and the function will iterate values of $\mathbf{x}$ to find the value of $\mathbf{x}$ where $\boldsymbol{\mathrm{y\f}}$ is minimized:

In [23]:

scopt.fmin(yf, 1000)
Optimization terminated successfully. Current function value: 2.000000 Iterations: 27

Function evaluations: 54

Out[23]:

array([ 0.])

The fmin() function ran 27 iterations, called $\boldsymbol{\Upsilon}{-}\mathbf{f}\left(\mathbf{x}\right)$ with 54 different values of $\mathbf{x},$ , and determined that the minimum result is 2.0. The array that is returned contains the values for $\mathbf{x}$ at which $\begin{array}{r l r}{\mathbf{\nabla}\mathbf{\boldsymbol{Y}}\^{\mathrm{~f~}}(\mathbf{x})}&{{}=}&{2.}\end{array}$ , which is a single value $\scriptstyle\mathbf{x}=0$ .

Constructing an optimal portfolio

We are now able to create a function to use fmin() to determine the set of weights that maximize the Sharpe ratio for a given set of returns representing the stocks in our portfolio.

Since fmin() finds a minimum of the applied function, and the efficient portfolio exists at the maximized Sharpe ratio, we need to provide a function that, in essence, returns the negative of the Sharpe ratio, hence allowing fmin() to find a minimum:

In [24]:

def negativesharperationminus1stock(weights,

"""

Given n-1 weights, return a negative sharpe ratio

"""

return -sharperatio(returns, weights2, riskfreerate)

Our final function is given a DataFrame of returns, and a risk-free rate will run a minimization process on our negative sharpe function. The process is seeded with an array of equal weights, and fmin() will start from those values and try different combinations of weights until we find the minimized negative Sharpe ratio. The function then returns a tuple of the weights satisfying the minimization, along with the optimal Sharpe ratio:

In [25]:

def optimizeportfolio(returns, riskfreerate): w0 = np.ones(returns.columns.size-1, dtype $\mathbf{\lambda}=$ float) \ 1.0 / returns.columns.size w1 $\mathbf{\sigma}=\mathbf{\sigma}$ scopt.fmin(negativesharperationminus1stock,

w0, args $\mathbf{\lambda}=$ (returns, riskfreerate)) finalw = sp.append(w1, 1 - np.sum(w1)) finalsharpe $\mathbf{\lambda}=\mathbf{\lambda}$ sharperatio(returns, finalw, riskfreerate) return (finalw, finalsharpe)

Using this function, we can now determine the most efficient portfolio:

I n [26]:

optimizeportfolio(annualreturns, 0.0003)

Optimization terminated successfully. Current function value: -7.829864 Iterations: 46 Function evaluations: 89

Out[26]:

(array([ 0.76353353, 0.2103234 , 0.02614307]), 7.8298640872716048)

We are told that our best portfolio would have 76.4 percent AAPL, 21.0 percent KO, and 2.6 percent MSFT, and that portfolio would have a Sharpe ratio of 7.8298640872716048.

Visualizing the efficient frontier

Our optimization code generated the portfolio that is optimal for the specific risk-free rate of return. This is one type of? portfolio. To be able to plot all of the portfolios along the Markowitz bullet, we can change the optimization around a little bit.

The following function takes a weights vector, the returns, and a target return and calculates the variance of that portfolio with an extra penalty the further the mean is from the target return, so as to help push portfolios with weights further from the mean considering they are on the frontier:

In [27]:

def objfun(W, R, targetret): stockmean $\mathbf{\sigma}=\mathbf{\sigma}$ np.mean(R,axis=0) portmean $\mathbf{\sigma}=\mathbf{\sigma}$ np.dot(W,stockmean) cov=np.cov(R.T) portvar $\mathbf{\sigma}=\mathbf{\sigma}$ np.dot(np.dot(W,cov),W.T) penalty $\mathbf{\sigma}=\mathbf{\sigma}$ 2000\abs(portmean-targetret) return np.sqrt(portvar) $^+$ penalty

We now create a function that will run through a set of desired return values, ranging from the lowest returning stock to the highest returning stock. These create the bounds for the possible rates of returns.

Each of these desired returns is passed to an optimizer, which will create a weights vector that satisfies the minimization of the Sharpe ratio of a portfolio that matches that specific level of risk.

For each optimal set of weights, the program will return the mean and standard deviation (and weights) that represent the curve of the efficient frontier:

In [28]:

def calcefficientfrontier(returns): resultmeans $\mathbf{\sigma}=\mathbf{\sigma}$ [] resultstds $\mathbf{\lambda}=\mathbf{\lambda}$ [] resultweights $\mathbf{\sigma}=\mathbf{\sigma}$ [] means $\mathbf{\lambda}=\mathbf{\lambda}$ returns.mean() minmean, maxmean $\mathbf{\lambda}=\mathbf{\lambda}$ means.min(), means.max() nstocks $\mathbf{\lambda}=\mathbf{\lambda}$ returns.columns.size for r in np.linspace(minmean, maxmean, 100): weights $\mathbf{\lambda}=\mathbf{\lambda}$ np.ones(nstocks)/nstocks bounds $\mathbf{\sigma}=\mathbf{\sigma}$ [(0,1) for i in np.arange(nstocks)] constraints $\mathbf{\sigma}=\mathbf{\sigma}$ ({'type': 'eq', 'fun': lambda W: np.sum(W) - 1}) results $\mathbf{\sigma}=\mathbf{\sigma}$ scopt.minimize(objfun, weights, (returns, r), method='SLSQP', constraints $\mathbf{\lambda}=\mathbf{\lambda}$ constraints, bounds $\mathbf{\lambda}=\mathbf{\lambda}$ bounds) if not results.success:

handle error raise Exception(result.message) resultmeans.append(np.round(r,4)) # 4 decimal places std $\scriptstyle=\mathtt{n p}$ .round(np.std(np.sum(returns\results.x,axis $\mathbf{\tau}=\mathbf{1}$ )),6) resultstds.append(std)

resultweights.append(np.round(results.x, 5)) return {'Means': resultmeans, 'Stds': resultstds, 'Weights': resultweights}

Given our previous set of stocks (AAPL, MSFT, and KO), the following command will calculate all of the pairs of standard deviation and mean returns that fall on the efficient frontier:

In [29]: frontierdata $\mathbf{\lambda}=\mathbf{\lambda}$ calcefficientfrontier(annualreturns)

The frontierdata function is a dictionary that contains an array for each of the calculated standard deviations, mean returns, and weights that resulted from the optimization.

We can examine the results by inspecting the values of several of the items in the dictionary. The following command examines the first five standard deviations, means, and entries in an array of optimal weights:

In [30]: frontierdata['Stds'][:5]

Out[30]: [0.055842999999999997, 0.053446, 0.052564, 0.051706000000000002, .050871]

In [31]: frontierdata['Stds'][:5]

Out[31]:

[0.1148, 0.1169, 0.11890000000000001, 0.12089999999999999, 0.1229]

In [32]: frontierdata['Weights'][:5]

Out[32]:

[array([-0., 1., 0.]), array([ 0.00512, 0.9308 , 0.06408]), array([ 0.01497, 0.9177 , 0.06733]), array([ 0.02469, 0.90303, 0.07228]), array([ 0.03458, 0.89049, 0.07493])]

We can use the following function to visualize this efficient frontier:

In [33]:

def plotefficientfrontier(efdata): plt.figure(figsize $\mathbf{\lambda}=$ (12,8)) plt.title('Efficient Frontier') plt.xlabel('Standard Deviation of the porfolio (Risk))') plt.ylabel('Return of the portfolio') plt.plot(efdata['Stds'], efdata['Means'], '--');

The following shows how our efficient frontier look:

In [34]: plotefficientfrontier(frontierdata)

Value at Risk

Value at Risk (VaR) is a statistical technique used to measure the level of financial risk within an investment portfolio, over a specific timeframe. It measures in three variables—the amount of potential loss, the probability of the loss, and the timeframe.

As an example, a portfolio may have a 1-month 5 percent $\mathrm{VaR}$ of $\$1$ million. This means that there is a 5 percent probability that the portfolio will fall in value by more than $\$1$ million over a 1-month period. Likewise, it also means that a $\$1$ million loss should be expected once every 20 months.

The most common means of measuring VaR is by calculating the volatility. There are three common means of calculating the volatility: using historical data, variance-covariance, and the Monte Carlo simulation. We will examine the variance-covariance method here, as there is a straightforward formulation for the VaR once you have historical returns.

VaR assumes that returns are normally distributed. The returns for a stock or portfolio over the desired period of time can then be created, and then we can examine the amount of distribution of returns that fits within a z-score for the desired confidence interval.

This concept can be visualized using a normal distribution curve. Common percentages for VaR calculations typically are 1 percent and 5 percent. The following example demonstrates calculating a 99 percent confidence interval, which is where we would find the area in the normal distribution where the z-score less than -2.33:

To apply this to the returns of a stock, the formula for the VaR for a given period is shown here:

$$
V a R{p e r i o d}=p o s i t i o n(\mu{p e r i o d}-z\sigma{p e r i o d})
$$

The position is the current market value of the stock, $\mu{\tilde{\textrm{p e r i o d}}}$ is the mean of the returns for the specific period, and $\sigma{\mathscr{P}\mathscr{E}\mathscr{r}i\mathscr{Q}d}$ is the volatility (standard deviation of the returns); $z$ is the $z\cdot$ -score representing the specific confidence interval— $\scriptstyle\mathtt{z}=2.33$ for a 99 percent confidence interval, and $\scriptstyle{\mathtt{Z}}=1.64$ for a 95 percent confidence interval.

To demonstrate this, we will examine the 1-year VaR for AAPL using returns from the entirety of 2014. To calculate this, we can reuse the functions that we created for calculating an efficient frontier.

We start the analysis by loading the daily prices for 2014 for AAPL and calculating the daily returns:

In [35]: aaplcloses $\mathbf{\sigma}=\mathbf{\sigma}$ gethistoricalcloses(['AAPL'], datetime(2014, 1, 1), datetime(2014, 12, 31)) aaplcloses[:5]

Out[35]:

Ticker AAPL Date 2014-01-02 77.08570 2014-01-03 75.39245 2014-01-06 75.80357 2014-01-07 75.26144 2014-01-08 75.73806

In [36]: returns $\mathbf{\lambda}=\mathbf{\lambda}$ calcdailyreturns(aaplcloses) returns[:5]

Out[36]: Ticker AAPL Date

2014-01-02 NaN
2014-01-03 -0.022211
2014-01-06 0.005438
2014-01-07 -0.007177
2014-01-08 0.006313

We can plot these returns in a histogram to check that they appear to be normally distributed:

In [37]: plt.figure(figsize $\mathbf{\lambda}=$ (12,8)) plt.hist(returns.values[1:], bins $\mathtt{\Omega}=\mathtt{1}00$ );

We can explicitly code $z$ for the confidence interval, but we can also get the value of z for any percentage using norm.ppf() from scipy.stats:

In [38]: z = spstats.norm.ppf(0.95) z

Out[38]:

1.6448536269514722

We will model our position as though we have 1,000 shares of AAPL on 2014-12-31:

In [39]: position $\mathbf{\sigma}=\mathbf{\sigma}$ 1000 \ aaplcloses.ix['2014-12-31'].AAPL position

Out[39]: 109950.0

The VaR is calculated as follows:

In [40]: VaR $\mathbf{\sigma}=\mathbf{\sigma}$ position $\star$ (z \ returns.AAPL.std()) VaR

Out[40]: 2467.5489391697483

This states that our holdings in AAPL at $\$109,950$ have a VaR of $\$2,647$ . Therefore, our maximum loss in the next year is $\$2,647$ with a confidence of 95 percent.

Summary

In this chapter, we examined how to combine combinations of assets into a portfolio and how to model those portfolios using pandas objects. Using a portfolio, we examined how to calculate the overall risk involved in the portfolio, and learned how we can use negatively correlated assets to be able to minimize risk.

We then expanded upon this concept of risk minimization, using concepts from modern portfolio theory to be able to determine whether our portfolio represents the best mix of assets to yield the highest return at a specific level of risk. This included calculating the efficiency of a portfolio using the Sharpe ratio, and then using optimization tools from SciPy to determine the optimum allocation of instruments in the portfolio.

In closing, we went on a significant tour of using pandas to perform various tasks related to finance. We touched on a number of the features built directly into pandas to be able to model and manipulate financial data, particularly using time-series data and the capabilities pandas provides to help solve complicated date- and time-related problems. We also dived into other domain-specific analyses, such as historical stock analysis, analyzing social data to make trading decisions, algorithmic trading, options pricing, and portfolio management, thus offering a practical set of examples for you to learn these concepts.

Index

A

aggregating 63, 70-72
algorithmic trading about 168 mean-reversion strategies 169 momentum strategies 169 process 168 with Zipline 181
American option 233, 234
arithmetic operations, on DataFrame performing 36-38

B

backtesting 167
Black-Scholes deriving 235 formulas 236 implementing, Mibian used 237, 238 used, for pricing of options 234 value of cash, determining 235 value of received stock, determining 235
Boolean selection rows, selecting with 35, 36
box-and-whisker plots 122, 123
buyer 207
buyers of calls 207
buyers of puts 207

C

call option about 206 used, for calculating payoff on options 216-218

used, for profit and loss calculation of buyer 223-225 used, for profit and loss calculation of seller 226, 227 Chicago Board Options Exchange (CBOE) 208 classical model, MPT diversification 249 efficient frontier 249 expected return 248 risk 248 Coca-Cola (KO) 179 crossover about 177 example 178 pairs trading 179, 180 cumulative returns 163-165

D

data reorganizing 48 reshaping 48
data collection about 148, 149 data, from paper 149, 150 DJIA data, gathering from Quandl 151-154 Google Trends data 154-158
DataFrame about 15 arithmetic operations, performing 36-38 basics 15 code samples 26, 27 columns, selecting 27-29 creating 23-26 reindexing 39-42 rows, selecting by .iloc[] 32 rows, selecting by .ix[] property 33 rows, selecting by .loc[] 32 rows, selecting with index 30 scalar lookup, by label with .at[] 34 scalar lookup, by location with .iat[] 34 slicing, [] operator used 31
DataFrame objects merging 56-58
date representation URL 108
Delta 241
distribution of returns, analyzing about 116 box-and-whisker plots 122, 123 histograms 117-119 Q-Q plots 120, 121
Dow Jones Industrial Average (DJIA) 14

E

efficient frontier visualizing 262-264 European option 233, 234 exponentially weighted moving average 173-176

F

financial time-series data visualizatio about 103 candlesticks, plotting 107-111 closing prices, plotting 103-105 combined price and volumes 106 volume-series data, plotting 105
first-order Greeks about 240 Delta 241 Gamma 241 Rho 241 Theta 241 Vega 241
formulas, Black-Scholes for d1 236, 237 for d2 236, 237
frequency conversion, time-series data 91, 92
functions, for rolling windows rollingapply 128 rollingcorr 128 rollingcount 128 rollingcov 128 rollingkurt 128 rollingmax 128 rollingmean 128 rollingmedian 128 rollingmin 128 rollingquantile 128 rollingskew 128 rollingstd 128 rollingsum 128 rollingvar 128
fundamental financial calculations about 111 daily percentage change comparison, between stocks 124-126 distribution of returns, analyzing 116 simple daily cumulative returns, calculating 115 simple daily percentage change, calculating 112-114

G

Gamma 240, 241
Google Trends using 147, 148
Google Trends data 154-158
Greeks about 240, 241 calculation 241, 242 first-order Greeks 240 visualization 241, 242
grouping 63

H

histograms 117-119
historical quotes American Airlines (AA) 101 Apple (AAPL) 101 Coca-Cola (KO) 101 Delta Airlines (DAL) 101 General Electric (GE) 101 IBM (IBM) 101

www.it-ebooks.info

Microsoft (MSFT) 101 Pepsi (PEP) 101 United Airlines (UAL) 101 historical stock data fetching, from Yahoo! 101 loading 46 obtaining 100 organizing, for examples 47

I

implied volatility (IV) about 212-214 smirks 214, 215
index data fetching, from Yahoo! 102
inter-quartile range (IQR) 123

joins, pd.merge() inner 57 left 57 outer 57 right 57

M

matplotlib 1
mean-reversion strategies 169
melting 62
Mibian about 1 URL 237 used, for implementing Black-Scholes 237, 238
MibianLib 237
modern portfolio theory. See MPT
momentum strategies 169
moving averages about 169 exponentially weighted moving average 173-176 simple moving average 169-173
moving windows calculating 128
MPT about 245 classical model 248 concept 248 overview 247
multiple DataFrame objects concatenating 48-55

Notebook

implied volatility (IV) 212-214
options data, obtaining from Yahoo!
Finance 208-211
setting up 14, 46, 146, 208
setting up, SciPy used 246

O

online pandas documentation URL 74
optimal portfolio constructing 261, 262
options about 205, 206 benefits 207 call 206 data obtaining, from Yahoo! Finance 208-211 participants 207 payoff, calculating 216 put 206

P

pairs trading about 179 example 179, 180
pandas portfolio, modeling 250-254
pandas data structures DataFrame 15 Series 14
participants, options buyers of calls 207 buyers of puts 207 sellers of calls 207 sellers of puts 207
payoff, on options calculating 216 calculating, with call option 216-218 calculating, with put option 219-221
Pepsi (PEP) 179
pivoting 59
portfolio about 245 constructing 254 historical returns, gathering 254-256 minimization 260, 261 modeling, with pandas 250-254 optimization 260, 261 risks, formulation 256-259 Sharpe ratio 259, 260
premium 206
price, of options about 233 American 233, 234 charting, until expiration 238-240 European 233, 234 factors 206 Greeks 240, 241 with Black-Scholes 234
profit and loss calculation combined payoff charts 227-229 performing 221-223 with call option, for buyer 223-225 with call option, for seller 226, 227 with put option, for buyer 229-231 with put option, for seller 231, 232
put option about 206 used, for calculating payoff on options 219, 221 used, for profit and loss calculation of buyer 229-231 used, for profit and loss calculation of seller 231, 232

Q

Q-Q plots about 120, 121 URL 121

Quandl about 1, 8 DJIA data, gathering from 151-154 URL 8, 151
Quantifying Trading Behavior, in financial markets 147, 148
Quantopian about 9, 167 URL 9

R

resampling, time-series about 93 downsampling 93-97 upsampling 93-97
returns computing 161, 162
Rho 241
rolling windows calculating 128-132
rows selecting, with Boolean selection 35, 36

S

SciPy about 1 used, for setting up Notebook 246
sellers of calls 207
sellers of puts 207, 208
Series about 14 alignment, via index labels 21, 22 basics 15 creating 16-18 reindexing 39-42 shape, determining 20 size, determining 19 uniqueness, determining 20
Sharpe ratio 259, 260
simple moving average (SMA) about 169, 173 drawbacks 173 example 170-172
smirks 214, 215

S&P 500 stocks comparing 138-143 splitting 63-69 stacking 60-62

technical analysis techniques about 177 crossover 177, 178
Theta 241
time-series about 73 creating, with specific frequencies 82, 83 Notebook setup 74 Period objects, used for representing intervals of time 83-86 resampling 93-97
time-series data and DatetimeIndex 75-81 frequency conversion 91, 92 lagging 87-90 manipulating 74-81 Notebook, setting up 100 shifting 87-90
trade order signals generating 159-161

new packages, installing 7-9
reference 4
samples, installing 10-12
URL 2

Yahoo! Finance options data, obtaining 208-211

Zipline

about 1, 167, 181
buy apple example 181-191
dual moving average crossover
example 192-196
pairs trade example 196-203
URL 167
used, for algorithmic trading 181

U

unstacking 60-62

Value at Risk (VaR) 246, 266-269
volatility calculation about 133-135 least-squares regression of returns 136, 137 rolling correlation of returns 135, 136

W

Wakari

about 1, 2
cloud account, creating 3-6
existing packages, updating 6

Thank you for buying Mastering pandas for Finance

About Packt Publishing

Packt, pronounced 'packed', published its first book, Mastering phpMyAdmin for Effective MySQL Management, in April 2004, and subsequently continued to specialize in publishing highly focused books on specific technologies and solutions.

Our books and publications share the experiences of your fellow IT professionals in adapting and customizing today's systems, applications, and frameworks. Our solution-based books give you the knowledge and power to customize the software and technologies you're using to get the job done. Packt books are more specific and less general than the IT books you have seen in the past. Our unique business model allows us to bring you more focused information, giving you more of what you need to know, and less of what you don't.

Packt is a modern yet unique publishing company that focuses on producing quality, cutting-edge books for communities of developers, administrators, and newbies alike. For more information, please visit our website at www.packtpub.com.

About Packt Open Source

In 2010, Packt launched two new brands, Packt Open Source and Packt Enterprise, in order to continue its focus on specialization. This book is part of the Packt Open Source brand, home to books published on software built around open source licenses, and offering information to anybody from advanced developers to budding web designers. The Open Source brand also runs Packt's Open Source Royalty Scheme, by which Packt gives a royalty to each open source project about whose software a book is sold.

Writing for Packt

We welcome all inquiries from people who are interested in authoring. Book proposals should be sent to author@packtpub.com. If your book idea is still at an early stage and you would like to discuss it first before writing a formal book proposal, then please contact us; one of our commissioning editors will get in touch with you.

We're not just looking for published authors; if you have strong technical skills but no writing experience, our experienced editors can help you develop a writing career, or simply get some additional reward for your expertise.

[PACKT] open source\ communityexperiencedistilled PUBLISHING

IPython Notebook Essentials

ISBN: 978-1-78398-834-1

Paperback: 190 pages

Compute scientific data and execute code interactively with NumPy and SciPy

Perform Computational Analysis interactively.

2. Create quality displays using matplotlib and Python Data Analysis.

Step-by-step guide with a rich set of examples and a thorough presentation of the IPython Notebook.

Python for Finance

ISBN: 978-1-78328-437-5 Paperback: 408 pages

Build real-life Python applications for quantitative finance and financial engineering

1. Estimate market risk, form various portfolios, and estimate their variance-covariance matrixes using real-world data.

Explains many financial concepts and trading strategies with the help of graphs.

3. A step-by-step tutorial with many Python programs that will help you learn how to apply Python to finance.

Please check www.PacktPub.com for information on our titles

Learning IPython for Interactive Computing and Data Visualization

ISBN: 978-1-78216-993-2 Paperback: 138 pages

Learn IPython for interactive Python programming, high-performance numerical computing, and data visualization

1. A practical step-by-step tutorial, which will help you to replace the Python console with the powerful IPython command-line interface.

Use the IPython Notebook to modernize the way you interact with Python.

3. Perform highly efficient computations with NumPy and pandas.

ISBN: 978-1-78328-481-8 Paperback: 512 pages

Over 100 hands-on recipes to sharpen your skills in high-performance numerical computing and data science with Python

IPython Interactive Computing and Visualization Cookbook

1. Leverage the new features of the IPython Notebook for interactive web-based big data analysis and visualization.

Become an expert in high-performance computing and visualization for data analysis and scientific modeling.

3. A comprehensive coverage of scientific computing through many hands-on, example-driven recipes with detailed, step-by-step explanations.

Please check www.PacktPub.com for information on our titles

-->