2016-01-02

EthereumでMacからhello worldする備忘

Ethereumでhello worldするのが意外と大変だったので備忘

Ethereum公式の技術説明

Tutorial:
https://github.com/ethereum/wiki/wiki/%5BJapanese%5D--Ethereum-Development-Tutorial

White Paper:
https://github.com/ethereum/wiki/wiki/%5BJapanese%5D-White-Paper

Ethereumでhello world

一番てっとり早いturorial:
http://consensys.github.io/developers/articles/101-noob-intro/#browser_ides

Ethereum公式tutorial:
https://ethereum.gitbooks.io/frontier-guide/content/
はじめてのDaap:
https://dappsforbeginners.wordpress.com/tutorials/your-first-dapp/

0. node.jsの環境構築

1. testrpcでethereumのテスト環境を立ち上げる(本体につなぐと大変)
https://github.com/ConsenSys/eth-testrpc

# brew install eth-testrpc
# testrpc
→デフォルトでport8545を使用

2. node.js上でweb3.jsをEthereumへのインターフェースとして利用
https://github.com/ethereum/web3.js/tree/master

オブジェクト生成
# Web3=require('web3')
# web3=Web3.new()
# web3.setProvider(new web3.providers.HttpProvider('http://localhost:8545'));

番外編

GUIの開発環境を利用する
http://meteor-dapp-cosmo.meteor.com/

本番環境でethereumを掘る
同期にmacbook air(SSD)で3時間ほどかかりました

2015-01-11

Quandmodで金融データをとってStanで同時相関の変化を見る

Stan

中盤はテクニカルな話題なので、統計や経済に馴染みのない人は読み飛ばして最後だけ読んでください

金融関係のレポートを見ていると、よく相関係数の変化を示すグラフが登場します。

(zero hedge http://www.zerohedge.com/news/2014-12-30/spot-difference)

（日銀「最近の株価と為替の同時相関関係の強まりについて」）

だいたいの場合において、こうした相関係数はその地点の前100日〜500日くらいの期間をとってヒストリカルな相関（sample correlation）をローリングして可視化するだけといった簡単なものになっています。
統計的推定の文脈でいうと、これはfiltering（過去のデータのみを用いてその地点の真の値を推定する）で得られる値に近く、smoothing（後のデータも加味して後だしじゃんけんで推定する）ではありません。
予測の文脈では過去のデータのみしか使えないのでfilteringが大事なのですが、だいたいこういう相関係数は「いつどの指数同士がどう相関していたか」という振り返りに使われているので、smoothingのほうが適切です。もちろん「smoothingっぽくするには前後のx日をとればいい」ということにもなりますが、期間xのとりかた如何で相関係数の変化の具合も変わるので、もっと洗練された方法はないのか、という話になります。

ということで、相関係数ρ自体がランダムウォークで時間変化する非線形状態空間モデルをStanで推定してしまいましょう。
モデルとしてはこんな感じです。

$\binom{y_{t1}}{y_{t2}} \sim Normal \left( \binom{0}{0},\begin{pmatrix} \sigma_1^2 & logistic(\rho_t)\sigma_1\sigma_2 \\ logistic(\rho_t)\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix} \right) \\ \\ \rho_{t+1} \sim Normal \left( \mu_{\rho} (1-\phi_{\rho}) + \phi_{\rho} \rho_t, \sigma_{\rho}^2 \right ) \\ \\ logistic(x)=\frac{2}{1+exp^{-x}}-1$

t期の各系列の対数リターン(y_t1,y_t2)が正規分布に従うという典型的なランダムウォークですが、ノイズの部分が相関を持つというものです。こうした構造は統計用語ではSeemingly Unrelated(SUR model, 見かけ上無相関のモデル)という用語で呼ばれます。

さらにこの相関係数ρをランダムウォークさせてしまおうというのが今回の目論みです。ここまでモデルが複雑になると、一般に太刀打ちが難しい（世の中に存在するパッケージソフトやライブラリでは到底推定不可能）のですが、StanはMCMC(マルコフ連鎖モンテカルロ)という汎用的なソリューションを用いているので、モデルを記述するだけで解決してくれます。

ちなみにこれは、2変量確率的ボラティリティ変動モデル(Bivariate Stochastic Volatility Model)の特殊ケースにあたります。
相関係数は-1〜1という制約があるので、ランダムウォークしてもそこに収まるようにうまくロジスティック関数で変換しています。
（各系列の分散の変化も考える場合には、コレスキー分解済みの三角行列の各要素をランダムウォークさせて、それの積をとって共分散行列をつくったほうが楽かもしれません。）

【もととなるデータの取得】
Rのライブラリquantmodが楽です。データソースの指定に関しては、株はYahooやGoogle,FXはOandaなど、ほかのソースからもとってくることが出来ますが、FREDhttp://research.stlouisfed.org/fred2/(FRBが管理している経済データ集)がおすすめです。

library(vars)

library(quantmod)

library(dplyr)

library(car)

library(ggplot2)

library(reshape2)

library(PerformanceAnalytics)

data(Canada)
master_quote=read.csv("./master_quotes.csv",stringsAsFactors=F)
dat_all=list()

daily_return=list()

com_indx=c()

daily_return_bundle=list()

close_bundle=list()
get_data=function(quote,src){

  if(src=="yahooj"){

    tmp_dat=data.frame(na.omit(rfj(quote,start="2000-1-1")))

    ret=tmp_dat["Close"]

    rownames(ret)=as.Date(gsub(" JST","",tmp_dat$Date)) 

    colnames(ret)=quote

    r=as.xts(ret)

    index(r)=as.Date(gsub(" JST","",tmp_dat$Date)) 

    r

  }else{

    na.omit(getSymbols(quote,src=src,auto.assign=F))

  }

}
for(i in 1:nrow(master_quote)){

  quote=master_quote[i,"quote"]

  src=master_quote[i,"src"]

  adj=master_quote[i,"adj"]

  dat_allquote=get_data(quote,src)

  index(dat_allquote)=index(dat_allquote)+adj

  

  daily_returnquote=dailyReturn(dat_allquote,type="log")  

  

  if(i==1){

    com_indx=index(daily_returnquote)    

  }else{

    com_indx=intersect(com_indx,index(daily_returnquote))

  }

}

for(i in 1:nrow(master_quote)){

  quote=master_quote[i,"quote"]

  src=master_quote[i,"src"]

  daily_return_bundlequote=daily_returnquote[as.Date(com_indx)]
  close_bundlequote=if(src=="oanda"||src=="FRED"||src=="yahooj"){dat_allquote[index(dat_allquote)%in%as.Date(com_indx)]}else{Cl(dat_allquote)[as.Date(com_indx)]}

  colnames(daily_return_bundlequote)=quote

  colnames(close_bundlequote)=quote

}

drb.df=data.frame(daily_return_bundle)

close.df=data.frame(close_bundle)
drb.ts=ts(drb.df)

close.ts=ts(close.df)
roll.cor = function(x) {

  cor(x)[1,2]

}
pdf("test.pdf",family="Japan1")
chart.Correlation(drb.df)
cnames=colnames(drb.df)

for(cnum1 in 1:length(cnames)){

  for(cnum2 in 1:length(cnames)){

    if(cnum1<cnum2){

      cname1=cnames[cnum1]

      cname2=cnames[cnum2]

      drb.df_filter=drb.df[c(cname1,cname2)]

      roll.cor.vals = rollapply(as.zoo(drb.df_filter), width=300,

                                by.column=FALSE,

                                FUN=roll.cor, 

                                align="right")

      roll.cor.vals.df=data.frame(roll.cor.vals)

      correlation=roll.cor.vals.df[rownames(close.df),]

      close.df_selected=close.df[c(cname1,cname2)]

      colnames(close.df_selected)=c(master_quote[master_quote$quote==cname1,"name"],master_quote[master_quote$quote==cname2,"name"])

      

      d=cbind(correlation,close.df_selected)

      

      d$date=rownames(d)

      d_plot=melt(d,id="date")

      d_plot$date=as.Date(d_plot$date)

      

      g=ggplot(d_plot,aes(date, value, col=variable, group=1))+

        geom_line()+

        facet_grid(variable~., scale='free_y')

      plot(g)

    }

  }

}
dev.off()
write.csv(drb.df,file="drb.df.csv")

write.csv(close.df,file="close.df.csv")

参照しているcsvデータは以下になります。

quote,src,adj,name

NIKKEI225,FRED,0,日経平均

NASDAQCOM,FRED,0,ナスダック

SP500,FRED,0,S&P500

DCOILWTICO,FRED,0,WTI原油

DEXJPUS,FRED,0,ドル円(FED)

VIXCLS,FRED,0,VIX

TEDRATE,FRED,0,TEDスプレッド

USD6MTD156N,FRED,0,6ヵ月LIBOR

USEPUINDXD,FRED,0,政策の不確実性

GOLDAMGBD228NLBM,FRED,0,金(LONDON)

DTWEXM,FRED,0,ドルインデックス

DEXUSAL,FRED,0,米ドル豪ドル(FRED)

実行すると、ローリングでの相関係数の変化がまとめてpdfファイルに出力されます。

【Stanによる推定】

ただ、これだと物足りないのでStanで相関係数ρの変化をモデリングします。
ここでは日経平均とドル円の変化率（対数収益率）の同時相関を上記のモデルで捉えます。データの数が多いと収束が怪しいので、10営業日ごとにリターンをまとめたものを用います。

Stanのモデルは以下になります。

data {

  int<lower=1> T;

  vector[2] y[T];

}
parameters {

  real<lower=0.0001> sigma[2];

  real rho_logistic[T];

  real mu_rho;

  real<lower=-0.999,upper=0.999> phi_rho;

  real<lower=0.0001> eta_rho;

 }
transformed parameters {

  cov_matrix[2] Sigma[T];

  real rho[T];

  vector[2] mu;
  for(t in 1:T){

    rho[t] <- 2/(1+exp((-1)*rho_logistic[t]))-1;

    Sigma[t][1,1]<-sigma[1]*sigma[1];

    Sigma[t][1,2]<-sigma[1]*sigma[2]*rho[t];

    Sigma[t][2,1]<-sigma[1]*sigma[2]*rho[t];

    Sigma[t][2,2]<-sigma[2]*sigma[2];

  }

  mu[1]<-0;

  mu[2]<-0;

}
model {

  y[1]~multi_normal(mu',Sigma[1]);

  rho_logistic[1]~normal(mu_rho,eta_rho*eta_rho/(1-phi_rho*phi_rho));

  for (t in 2:T){    

    rho_logistic[t]~normal(mu_rho*(1-phi_rho)+phi_rho*rho_logistic[t-1],eta_rho*eta_rho);

    y[t]~multi_normal(mu',Sigma[t]);

  }

}

Rでキックするコードは以下になります。

library(rstan)

library(reshape2)

library(ggplot2)

library(dplyr)
model.text=scan("sur.rstan",what = character(),sep="@" , blank.lines.skip = F)

model.text=paste(model.text,collapse="\n")




cnames=c("NIKKEI225","DEXJPUS")

by=10
dat=read.csv("drb.df.csv",stringsAsFactor=F)

close.df=read.csv("close.df.csv",stringsAsFactors=F)

dat=cbind(aggregate(.~floor((1:nrow(dat)/by)), dat[cnames], FUN=sum),aggregate(.~floor((1:nrow(dat)/by)), dat[c("X")], FUN=last))
d=list()

d[["T"]]=nrow(dat)

d[["y"]]=dat[cnames]*100
fitchan <- stan(model_code = model.text,

                 data = d, 

                 iter  = 3000, warmup=500,

                 chain = 1)

traceplot (fitchan,par=c("sigma"),inc_warmup=F)
fit=extract(fitchan,permuted=T)

fit.posterior=get_posterior_mean(fitchan)
# From Small Data Scientist Memorandum

# (http://heartruptcy.blog.fc2.com/blog-entry-163.html)

plot_timecourse <- function(la_m, time_scale){

  qua <- apply(la_m, 2, quantile, prob = c(0.025, 0.5, 0.975))

  est <- data.frame(t=time_scale, t(qua))

  colnames(est) <- c('t', 'p2.5', 'p50', 'p97.5')

  est_med <- data.frame(t=time_scale, value=apply(la_m, 2, mean))

  p <- ggplot()

  p <- p + geom_ribbon(data=est, aes(x=t, ymin=p2.5, ymax=p97.5), fill="darkorange3", alpha=I(1/3))

  p <- p + geom_line(data=est_med, aes(x=t, y=value), color="red")

  p <- p + labs(x="time", y="value")

  p

}
pdf("time-series_correlation_SUR_model.pdf",font="Japan1")
plot(plot_timecourse(data.frame(fit$rho),as.Date(dat$X)))


d=data.frame(correlation=apply(fit$rho,2,mean),close.df[close.df$X%in%dat$X,cnames])
d$date=dat$X

d_plot=melt(d,id="date")

d_plot$date=as.Date(d_plot$date)
g=ggplot(d_plot,aes(date, value, col=variable, group=1))+

  geom_line()+

  facet_grid(variable~., scale='free_y')

plot(g)


ggdat=melt(fit["sigma"])
graph  <- ggplot (ggdat, aes(x = value))+geom_histogram(aes(y=..density..),colour="blue", fill="grey",alpha=.1)

graph  <- graph + geom_density(alpha=.5,fill="blue") +facet_wrap(Var2~L1,ncol=1,scale="free")

plot(graph)
dev.off()

結果は以下になります。

相関係数の変化のグラフは次のようになります。
90%信用区間が薄い色がついたところ、事後平均が赤い線です。

原系列とならべると次のようになります。

ポートフォリオ理論だと相関係数は一定と仮定することが多く、普通は期間をきめてヒストリカルをとることが多いですが、時期による相関係数の変化は無視できないほど大きく、しかも急に変動することがあるということがわかりました。

一方で、90%信頼区間ベースではかなり推定値が広がってしまう、つまりきっちり推定するのが難しいということもわかりました。

グラフからは、長期的に見て負相関（円高株高）から正相関（円安株高）へと変化してきていて、例えば2009年では一時的にその傾向が弱まったということが言えそうです。

円安株高の同時相関の高まりに関しては、日本株に投資する海外投資家がヘッジのために立てている円ポジションのリバランスが要因なのではないかという説が有力です（上述日銀ペーパーなど）。つまり、株高になるとそれに応じてヘッジのための追加の円shortのポジションが発生するために、円売りの圧力がかかるということです。これについては海外投資家比率の高まりと、アルゴリズムトレードによって統計的裁定機会の解消速度があがったこと（＝より短期スパンで見たときにも強い同時相関が観測される）が指摘されています。
そして巷に言われるように、円安は（長期的に）輸出企業の業績を上げることを通じて株高の呼び水となり、正のフィードバックが働いているようです。

2014-07-19

Stanで確率的ボラティリティ変動モデルの推定

Stan入門として確率的ボラティリティ変動モデルの推定をやってみた。前々回の投稿SVモデルでのオプションのプライシングでは潜在変数の推定結果を論文から引用していたが、今回はその推定を行う。

なぜボラティリティの精緻なモデリングが大切かというと

金融機関では、たとえばVaRを用いて保有資産がある額以下になる確率（分布の裾確率=tail riskにあたる）を一定以内に抑えるようにリスク管理しているが、その確率は資産価格のモデルの仮定に依存する。もっとも単純なランダムウォークモデルでは将来の価格の分布は正規分布となり、裾確率が過小評価されることで知られている。
デリバティブの価格はボラティリティの状態に大きく依存するので市場より正しく予測できてミスプライシングを発見できれば儲けられる。

正攻法(MCMC)で推定しようとすると死ぬほどめんどくさいが、Stanでやってみるとかなり簡単にできることがわかる。

【Stanとは】

モデルを記述してデータを与えるだけで勝手にベイズ推定してくれるソフト。推定にはHMC(ハイブリッドモンテカルロ)などを使っている。
似たソフトにWinBUGSやJAGSなどがある。時系列特化だとLiBiというソフトも。
ギブスサンプラーなどを自分で書かなくてよいのが利点。
収束が遅かったり、おかしな値にはまったりする可能性があるのが欠点。

【確率的ボラティリティ変動モデルとは】

ファイナンスで用いられるモデル。ボラティリティが常に一定であるランダムウォークとは異なり、ボラティリティ自体もランダムウォークするというモデル。
たとえば株式市場が長期にわたって不安定になる(=ボラティリティが高い状態は一定期間持続する)様子などがうまく記述できる。
確率的ボラティリティ変動モデルは非線形状態空間モデルというクラスに位置し推定が難しく、実務ではより推定が容易なGARCHなどが用いられてきたが、こちらはデータへの当てはまりが悪いことで知られている。
頻度主義的な推定法も存在するが、推定効率が悪いのでベイズ推定が主流。その場合はMCMCを用いて、「潜在パラメーターを既知として状態パラメーターをサンプリング→状態パラメーターを既知として潜在パラメーターをサンプリング」を繰り返す（MH in Gibbs Sampler）か、PMCMCやSMC^2といった逐次モンテカルロベースの手法を用いるなどして求めるが、実装はかなり面倒。
Rなどで手軽に利用できるライブラリはいまのところない。論文用等で作られたコードはいくつか公開されている。
式にすると

$y_t=log(x_t/x_{t-1}) \\y_t =exp(h_t/2)\epsilon_t \\h_{t+1}=\mu+\phi(h_t-\mu)+\sigma_{\eta}\eta_t \\\eta_t,\epsilon_t \sim_{i.i.d}N(0,1)$

...なので時間がない人はStanで求めてしまいましょう。

【インストール】

最初はwindowsでやろうとしたが、Rcppまわりでつまずいてギブアップ。RがProgram Files下にインストールされている（デフォルト設定）と、フォルダ名に空白がはいっているためうまく動かない。
LinuxではチュートリアルどおりにRのコマンドに打ち込めばすぐに使えるようになる。

【実装】

StanはモデルをStanのドメイン特化言語(テキスト形式)で記述する。
data,parameters,modelの三つのブロックで定義する
- data:外部から与えるデータ（データの次元の情報なども整数として与える）
- parameters:推定する潜在変数
- model:変数同士の分布を介した関係
今回はファイルを別にし、sv.stanにモデルを記述し、sv.Rでそれをテキストファイルとして読み込んでいる。
R側からStanを呼ぶには、stan関数を用いる。
- model_code: sv.stanに記述されているテキストデータ
- data: リストでデータを渡す
- iter: イテレーション回数
- chain: チェーンの本数

【結果】
イテレーション10000回（うちバーンイン5000回）でやって30min~1hくらいで終わりました。そんなに多くないですが、パスを見る限り収束は問題なさそうです。

パラメーターの推定値

パス

ボラティリティの推移

2008年のリーマンショックのときにボラティリティが急上昇しているのがわかる。2014年の7月現在はレンジの狭いボックス相場となっており、ボラティリティが低下している。
※中央の青の線が期待値、緑と赤の線がそれぞれ10%,90%の線。

【コード】
下の二つのファイルですべて。事前分布がソフトまかせなのはご愛嬌ということで。

#sv.R

library(rstan)

library(reshape)

library(ggplot2)
model.text=scan("sv.rstan",what = character(),sep="@" , blank.lines.skip = F)

model.text=paste(model.text,collapse="\n")
ind=read.csv("./topix_all.csv")[[1]]

dat=read.csv("./topix_all_return.csv")[[2]]
fitchan <- stan (model_code = model.text,

                 data = list(y=dat,N=length(dat)), 

                 iter  = 10000, 

                 chain = 1)

traceplot (fitchan,par=c("mu","sigeta","phi"))
fit=extract(fitchan,permuted=T)

fit.posterior=get_posterior_mean(fitchan)
h=fit$h

upper=apply(h,2,function(x){quantile(x,.9)})

lower=apply(h,2,function(x){quantile(x,.1)})

expected=apply(h,2,mean)
h_range=melt(data.frame(list(upper=exp(upper/2),lower=exp(lower/2),expected=exp(expected/2),date=ind[2:length(ind)])))

graph=ggplot(h_range,aes(x=as.Date(date),y=value,group=variable,color=variable))+geom_line()
ggdat=melt(fit[c("mu","phi","sigeta")])
graph  <- ggplot (ggdat, aes(x = value))

graph  <- graph + geom_density () + facet_grid(. ~ L1, scales = "free") + theme_bw()

#sv.rstan

data {

        int<lower=1> N;

        real y[N];

    }
parameters {

        real h[N];

        real mu;

        real<lower=-0.999,upper=0.999> phi;

        real<lower=0.0001> sigeta;

}
model {
        h[1] ~ normal(mu,sigeta/sqrt(1-phi*phi));

        y[1] ~ normal(0,exp(h[1]/2));
        for (i in 2:N){

           h[i] ~ normal((1-phi)*mu+phi*h[i-1],sigeta);

           y[i] ~ normal(0,exp(h[i]/2));

        }

}

2014-07-17

システムトレードする方法

個人でシステムトレードする方法が意外とまとまって書いてないので、まとめておく。

対象者

プログラムが書ける人
自動でマーケットデータをとって発注したいが、ほかの人が書いたアルゴリズムで発注したいわけではない人
FX以外で運用するひと(FX関係については今回は調べてない)

【証券会社】

の二択です。

この二社については、

エクセルのアドインとしてリアルタイムスプレッドシート(RSS)に対応しており、リアルタイムにマーケットデータをエクセルに落とせます。
楽天については現在の値、岡三についてはヒストリカルを含めて落とせるようです。
また発注については岡三証券では専用の関数がありますが、楽天では面倒なようです。
基本的にはVBAで書くことになるますが、DDEをかませてC#とかでも書けるみたいです。このへんはwindows上でのプログラミングに慣れてないので詳しくはわかりません。＜参考＞楽天証券 MarketSpeed RSS サンプルプログラム

]

【結論】
マーケットデータをとって、発注もできるAPIでも公開してくれればいいのですが、そういうダイレクトなサービスを提供している証券会社は現状ありません。なのでどうしてもwindowsベースのツールに縛られますし、それゆえに全部自動でやる環境を構築しようとすると辛そうです。

2014-07-06

SVモデルでのオプションのプライシング

ボラティリティを予測して儲けましょうというお話。

本当は資産の上がり下がり（リターン）を見極めて運用したいのだけれども、将来の上がり下がりの予想は価格に織り込まれている（つまり上がりそうなものは既に高い価格になってしまっている）からなかなかそうもいかない。
そこで資産価格の変動のしやすさ（ボラティリティ）に注目する。実はデリバティブ（たとえばオプション）を使えば、ボラティリティを正しく予想することで儲けることができる。たとえばコールオプションとプットオプションを買い建てるロングスストラドル・ストラングル戦略(【参考】http://lounge.monex.co.jp/advance/fop/strategy-long-straddle.html)をとれば、上がり下がりに関わらず大きく価格が変動しさえすれば儲けることができる。

オプションのプライシングにはブラックショールズモデルが一般的に使われているが、ボラティリティに関して非常に単純な仮定を置いている。そこで、確率的ボラティリティ変動モデルという少し込み入ったモデルでボラティリティを予測して、オプションが買いか売りかを判断しましょうというのが本題。

従来のブラックショールズモデルとの違いは

ブラックショールズモデルはボラティリティを定数と仮定
確率的ボラティリティ変動モデルはボラティリティがランダムウォークすると仮定

$y_t=log(x_t/x_{t-1}) \\y_t =exp(h_t/2)\epsilon_t \\h_{t+1}=\mu+\phi(h_t-\mu)+\sigma_{\eta}\eta_t \\\eta_t,\epsilon_t \sim_{i.i.d}N(0,1)$

ここから少し技術的になるが、プライシングは

SVモデルの潜在パラメーター推定自体は時間がかかる（かつ学習期間によって急激に変わることは想定されない）ので、先行研究から借用し([http://www.imf.org/external/pubs/ft/wp/2003/wp03125.pdf:title=http://www.imf.org/external/pubs/ft/wp/2003/wp03125.pdf])
状態変数（すなわち現在のボラティリティ）の推定のみを粒子フィルターで行う

【参考】

お前の逐モン、GETだぜ！ from tera monagi

将来の資産価格の変動(および満期日でのオプションの行使結果)については1000個の粒子でボラティリティの変動を考慮してシミュレーションし、リスク中立で単なる期待割引現在価値で測定した

原資産(日経225)の推移

ボラティリティの推移

現在ボラティリティは小さく、長期的には本来の水準(より高い水準)に戻ると予想される。

※logスケールなので、リターンの標準偏差に変換するには、sd=exp(h/2)とする必要。直近100営業日のヒストリカルからのフィルタリング結果と、次の100営業日の予測値。真ん中のラインが期待値で、上下がそれぞれ95%信用区間

オプションの理論価格

sv=確率的ボラティリティ変動モデル,bs=ブラックショールズ
実際の価格は大阪証券取引所の指数日報(http://www.ose.or.jp/market/daily_report/pdf/20140704/n09.pdf)で見ることができる。
行使日が7月の場合ほとんど差がないが、8月の場合かなり差がある。ボラティリティの上昇予想を反映して、svモデルではより高い理論価格をつけている。

2014-07-11　がSQ(清算日)のコールオプションの理論価格

## | strike| sv| bs|
## |------:|---------:|---------:|
## | 11250| 4195.7330| 4190.9821|
## | 11500| 3945.8186| 3941.0677|
## | 11750| 3695.9042| 3691.1533|
## | 12000| 3445.9898| 3441.2389|
## | 12250| 3196.0754| 3191.3245|
## | 12500| 2946.1610| 2941.4101|
## | 12750| 2696.2466| 2691.4957|
## | 13000| 2446.3322| 2441.5813|
## | 13250| 2196.4178| 2191.6669|
## | 13500| 1946.5034| 1941.7525|
## | 13750| 1696.5890| 1691.8381|
## | 14000| 1446.6746| 1441.9240|
## | 14250| 1196.7602| 1192.0200|
## | 14500| 947.0345| 942.3026|
## | 14750| 701.2854| 694.4499|
## | 15000| 467.3743| 456.9516|
## | 15250| 269.0365| 252.3625|
## | 15500| 129.4894| 109.1894|
## | 15750| 52.0619| 34.8709|
## | 16000| 17.3579| 7.8874|
## | 16250| 4.3944| 1.2344|
## | 16500| 1.2284| 0.1324|
## | 16750| 0.4325| 0.0097|
## | 17000| 0.1636| 0.0005|
## | 17250| 0.0000| 0.0000|
## | 17500| 0.0000| 0.0000|
## | 17750| 0.0000| 0.0000|
## | 18000| 0.0000| 0.0000|
## | 18250| 0.0000| 0.0000|
## | 18500| 0.0000| 0.0000|
## | 18750| 0.0000| 0.0000|
## | 19000| 0.0000| 0.0000|
## | 19250| 0.0000| 0.0000|

2014-08-08 の分のコールオプション

## | strike| sv| bs|
## |------:|---------:|---------:|
## | 11250| 4233.3824| 4206.3772|
## | 11500| 3983.8101| 3956.8049|
## | 11750| 3734.2378| 3707.2326|
## | 12000| 3484.6655| 3457.6604|
## | 12250| 3235.0932| 3208.0882|
## | 12500| 2985.5209| 2958.5169|
## | 12750| 2736.1552| 2708.9507|
## | 13000| 2487.3576| 2459.4089|
## | 13250| 2239.4128| 2209.9656|
## | 13500| 1992.7853| 1960.8567|
## | 13750| 1750.0035| 1712.7134|
## | 14000| 1513.0153| 1466.9587|
## | 14250| 1282.6914| 1226.3058|
## | 14500| 1062.5552| 995.1290|
## | 14750| 859.2617| 779.3620|
## | 15000| 673.7090| 585.6741|
## | 15250| 512.9096| 420.0272|
## | 15500| 377.2256| 286.1252|
## | 15750| 269.1247| 184.4295|
## | 16000| 187.4256| 112.1631|
## | 16250| 127.6027| 64.2334|
## | 16500| 85.9461| 34.5990|
## | 16750| 56.4683| 17.5210|
## | 17000| 36.6471| 8.3425|
## | 17250| 22.4926| 3.7369|
## | 17500| 13.5932| 1.5762|
## | 17750| 7.4098| 0.6267|
## | 18000| 3.3847| 0.2353|
## | 18250| 1.3087| 0.0835|
## | 18500| 0.6687| 0.0281|
## | 18750| 0.3579| 0.0089|
## | 19000| 0.1084| 0.0027|
## | 19250| 0.0000| 0.0008|

ということで、ボラティリティが本来の水準に戻るべく上昇していくにも関わらず、市場は織り込んでいないようなので、SVモデルによると「コールオプションの買い」という結論になる。

以下コード。

### Price of European Option

### Under Stochastic Volatility Model Using Particle Filter
## Model

# y_t=log(x_t/x_{t-1})

# y_t~N(0,exp(h_t/2))*epsilon

# h_{t+1}=mu+phi(h_t-mu)+sig.eta*epsilon.eta

# epsilon,epsilon.eta~i.i.d N(0,1)
library(quantmod)

library(reshape2)

library(scales)

library(ggplot2)
# number of particle

Nx=1000
# sv parameters

# source: http://www.imf.org/external/pubs/ft/wp/2003/wp03125.pdf

sv_params=list(

  mu= -8.55,

  phi=0.944,

  sig.eta=0.178

  )
# simulator
state_sim=function(particles,mu,phi,sig.eta,Nx){

  mu+(particles-mu)*phi+sig.eta*rnorm(Nx)

}
measurement_eq=function(particles,obs,Nx){

  volatility=exp(particles/2)

  dnorm(obs/volatility)/volatility

}
pf_sim=function(dat,sv_params,Nx,pred_t){

  

  #sv params

  mu=sv_params$mu

  phi=sv_params$phi

  sig.eta=sv_params$sig.eta

  particles=replicate(Nx,mu)

  

  volatility.hist=xts(matrix(0,ncol=3,nrow=nrow(dat)),index(dat))
  for(i in 1:nrow(dat)){

    new_particles=state_sim(particles,mu,phi,sig.eta,Nx)

    obs=dat[i]

    particles=sample(new_particles,prob=measurement_eq(new_particles,obs,Nx),replace=T)

    volatility.hist[i,]=c(mean(particles),quantile(particles,c(0.05,0.95)))

  }

  

  result=list()

  result$volatility.hist=volatility.hist

  

  pred_p=index(dat)[nrow(dat)]+(1:(pred_t*3))

  pred_p=pred_p[weekdays(pred_p)%in%c("月曜日","火曜日","水曜日","木曜日","金曜日")][1:pred_t]
  

  result$volatility.pred=xts(matrix(0,ncol=3,nrow=pred_t),pred_p)

  result$return.pred=xts(matrix(0,ncol=Nx,nrow=pred_t),pred_p)

  

  for(i in 1:nrow(result$return.pred)){

    particles=state_sim(particles,mu,phi,sig.eta,Nx)

    mhl=c(mean(particles),quantile(particles,c(0.05,0.95)))

    result$volatility.pred[i,]=mhl

    result$return.pred[i,]=exp(particles/2)*rnorm(Nx)

  }

  result$return_cum.pred=cumsum(result$return.pred)

  

  result

}
# simulate option

sim_option=function(underlying_name,pred_t=100,CP="C",r=0.01,prev_t=100){

  getSymbols(underlying_name)

  underlying_dat=get(gsub("\\^","",underlying_name))

  underlying_return=dailyReturn(underlying_dat,type="log")  

  result=pf_sim(underlying_return,sv_params,Nx,pred_t)

  
  prev_p=index(underlying_return)

  prev_p=prev_p[(length(prev_p)-prev_t+1):length(prev_p)]

  pred_p=index(result$volatility.pred)

  s=as.numeric(tail(Cl(underlying_dat),1))

  

  # plot underlying

  #underlying.plot=data.frame(list(t=index(underlying_dat),ret=as.numeric(Cl(underlying_dat))))

  #result$underlying.plot=ggplot(underlying.plot,aes(x=as.Date(t),y=ret))+geom_line()+ scale_x_date(labels = date_format("%m/%d"))

  

  result$underlying.plot=underlying_dat[prev_p,]
  

  # plot volatility

  volatility.plot=data.frame(rbind(result$volatility.hist[prev_p],result$volatility.pred[pred_p]))

  volatility.plot$t=rownames(volatility.plot)

  volatility.plot <- melt(volatility.plot,id="t",measure=c("X1","X2","X3"))

  result$volatility.plot=ggplot(volatility.plot,aes(x=as.Date(t),y=value,group=variable,color=variable))+geom_line()+ scale_x_date(labels = date_format("%m/%d"))+ geom_vline(xintercept = as.numeric(as.Date(prev_p[length(prev_p)])),linetype=4)

  

  # date of strike

  d=as.data.frame(list(index=pred_p,month=as.factor(months(pred_p)),d=as.numeric(pred_p),dofw=weekdays(pred_p)))

  selected=d[d$dofw=="金曜日" & as.numeric(date_format("%d")(d$index))%in%8:14,"index"]

  

  result$theoretical_price.table=list()

  result$option.plot=list()

  

  for(i in 1:length(selected)){

    result_index=as.character(selected[i])

    num_of_days=length(pred_p[pred_p<=selected[i]])

    

    # define pricing function in standardized form

    pricing=function(rn_value,r,K,term){

      if(CP=="C"){

        sapply(rn_value,function(x){max(x-K,0)/(1+r)^term})

      }else{

        sapply(rn_value,function(x){max(K-x,0)/(1+r)^term})

      }

    }  

    BS_pricing=function(S,K,q,r,sigma,term){

      d1=(log(S/K)+(r-q+.5*sigma^2)*term)/(sigma*term^.5)

      d2=d1-sigma*term^.5

      exp(-1*q*term)*S*pnorm(d1)-exp(-1*r*term)*K*pnorm(d2)

    }

    

    f=function(x){num_of_days;rn_value=exp(result$return_cum.pred[selected[i],]);

                  sapply(x/s,function(y){s*mean(pricing(rn_value,r/365,y,num_of_days))})}

    fbs=function(x){num_of_days;volatility=exp(result$volatility.hist[nrow(result$volatility.hist),1]/2);

                    sapply(x/s,function(y){s*mean(BS_pricing(1,y,0,r/365,volatility,num_of_days))})}

    

    strikes=  floor(s/250)*250+(-16:16)*250

    

    

    result$option.plotresult_index=ggplot(data.frame(list(x=s*c(0.75,1.25))),aes(x=x))+

      stat_function(fun=f,n=30,aes(color="sv"))+

      stat_function(fun=fbs,n=30,aes(color="bs"))+

      geom_vline(xintercept = s)

    

    result$theoretical_price.tableresult_index=data.frame(list(strike=strikes,sv=f(strikes),bs=fbs(strikes)))

  }

  result$result_indices=as.character(selected)

  result

}
test=function(){

  sim_option("^N225")

}

レポート用

Daily Report `r format(Sys.time(), "%Y %b%d%a %H:%M:%S")`

========================================================
```{r ,echo=FALSE,warning=FALSE}

source("./pf.R")

library("xtable")

library(quantmod)

result=sim_option("^N225",pred_t=100,r=0.025,prev_t=100)
#chartSeries(result$underlying.plot)

#chartSeries(tail(result$underlying.plot,10))
kable(tail(underlying.plot,10))


plot(result$volatility.plot)
for(i in result$result_indices){

  print(i)

  plot(result$option.ploti)

  kable(result$theoretical_price.tablei)

}

```{r}

メールで配信

library("EasyHTMLReport")
easyHtmlReport(rmd.file="report.Rmd",

               from="hogehoge@piyo.com",

               to="piyopiyo@hoge.com",subject="DailyFinancialReport")

2013-11-05

cythonのコンパイル

Cython(pythonライクな文法のcのメタ言語)を自分でコンパイルする方法がわかりにくかったのでメモ
http://ja.wikipedia.org/wiki/Cython

pythonとほぼ同じ文法
それをcythonがc/c++言語に変換してくれる
変換されて出てきたc/cppファイル(libpython2.6にのみ依存)は自由にコンパイルして使える
cの自動生成機としてだけではなく、pythonとのインターフェースも充実（本来的にはこっちが大事）
基本的にcライブラリしかcallできないが、numpyは使える←重要
ただし型情報を追加しないと普通のnumpy同様遅い

http://omake.accense.com/static/doc-ja/cython/src/tutorial/numpy.html

自分でコンパイルする方法を調べた理由は、nvcc(cudaのc++コンパイラ)でコンパイルする方法がわからなかったため。
普通にpythonのライブラリを書く分にはsetup.pyでbuildするほうが話が早いです...

使い方(python2.6 ubuntuの場合)

.pyxファイルを作成(型情報を付与したpythonみたいな感じ)
cython *.pyx する
- このときmain関数にしたかったら--embedオプションをつける
任意のcコンパイラでコンパイル(includeファイルは/usr/include/python2.6/)
任意のcコンパイラで-lpython2.6とリンク

まだg++でしか確かめてないので、nvccなどでうまくいくかは不明。

# building by two steps                                                                                                           

## use --embed if you want to include main functin in the output c file                                                           

cython helloworld.pyx --embed

## helloworld.c is created (with main function if --embed option is given)                                                        

## compile                                                                                                                        

g++ -c helloworld.c -I/usr/include/python2.6/

## link                                                                                                                           

g++  helloworld.o -lpython2.6 -o sample1.exe

./sample1.exe 

              

# building by one step                                                                                                          

cython helloworld.pyx --embed

g++ helloworld.c -I/usr/include/python2.6/ -lpython2.6 -o sample2.exe

./sample2.exe

2012-08-09

financeの授業のコード

library("quadprog")
### general

# loading file

csv_to_df<-function(file_path){

  matrix<-read.table(file_path, sep=',',row.names=1,header=T)

  return(matrix)

  }
### Problem 1

## making r vector (mean of rmat)

rmat_to_rvec<-function(rmat){

  rvec <- apply(rmat,2,mean)

  return(rvec)

}

## calculating T score

rmat_to_t<-function(rmat){

  result<-apply(rmat,2,

               function(vec){

                  r<-mean(vec)

                  return( mean(

                    sapply(vec,function(x){

                      return((x-r)**2)

                  })))

               })

  result<-sapply(result*(nrow(rmat))/(nrow(rmat)-1),sqrt)

   return(result)

}

# calculating cov

rmat_to_v<-function(rmat){

  # x-E(x)

  tmp<-apply(rmat,2,

        function(vec){

          r<-mean(vec)

          return(sapply(vec,function(x){

                      return(x-r)

                  }))

    

  })

  return(t(tmp)%*%tmp*(1.0/nrow(tmp))*(nrow(rmat))/(nrow(rmat)-1))

}

# calculating correlation

rmat_to_cor<-function(rmat){

  cov<-rmat_to_v(rmat)

  size<-nrow(cov)

  result<-matrix(0,size,size)

  rownames(result)<-rownames(cov)

  colnames(result)<-colnames(cov)

  for(i in 1:size){

    for(j in 1:size)

      result[i,j]=cov[i,j]/(sqrt(cov[i,i])*sqrt(cov[j,j]))

  }

  return(result)

}
problem1<-function(file_path){

  rmat<-csv_to_df(file_path)

  print("return vector")

  print(rmat_to_rvec(rmat))

  print("T")

  print(rmat_to_t(rmat))

  print("covariance")

  print(rmat_to_v(rmat))

  print("correlation")

  print(rmat_to_cor(rmat))

}
### Problem 2

# stop development

cor_nearest_select <-function(rmat,x){

  cov<-cov(rmat)

  cor<-cor(rmat)

  meigaras=colnames(cor)

  min_pair=c(meigaras[1],meigaras[2])

  for(meigara1 in meigaras){

    for(meigara2 in meigaras){

     

    }

      

  }

}
#

plot_2diagram <-function(rmat,meigara1,meigara2){

  cov=cov(rmat)

  rvec=rmat_to_rvec(rmat)

  points=matrix(NA,1000,2)

  i=1

  for(w1 in seq(0,1.0,by=0.01)){

    w2=1.0-w1

    var=cov[meigara1,meigara2]*2*w1*w2+cov[meigara1,meigara1]*w1*w1+cov[meigara2,meigara2]*w2*w2

    points[i,2]<-rvec[meigara1]*w1+rvec[meigara2]*w2

    points[i,1]<-sqrt(var)

    i=i+1

  }

  na.omit(points)

  return(points)

}
### Problem 3

rate_to_cov<-function(matrix){

  dmat <- var(matrix)

  return(dmat)

  }
solver<-function(rmat,dmat){

  

  # 

  rvec <- rmat_to_rvec(rmat)

  MIN <- min(rvec)

  MAX <- max(rvec)

  size <- nrow(dmat)# dmat size

  meigara = list(colnames(dmat))

  

  ## official description

  #min(-d^Tb+1/2b^TD^b)

  # subject to A^Tb>=b_0 (wrriten in Tex style) 

  #Dmat: matrix appering in the quadratic func to be minimized

  #dvec: vector appering in the quadratic func to be minimized

  #Amat: matrix defining the constraints

  #bvec: vector holding the value of b_0

  

  ###arguments

  ## -d^Tb

  dvec <- rep(0,size)

  ## 1/2b^TD^b

  # D(=Dmat) is given as dmat

  

  ## binding A^Tb>=b_0 where

  # e^T weight_vec >= (0,0,..)^T

  # and (1,1...)^T weight_vec >= 1

  # and weight_vec^T rvec = given r

  # Amat = A , b_0 =bvec

  # Amat is (1,1,1..)^T and e combined,bvec is (1,given r,0,0,...)^T

  

  # making e

  tmp <- matrix(0,nrow=size,ncol=size)

  diag(tmp) <- 1

  e = tmp

  # making Amat

  Amat <- cbind(rep(1,size),rvec,e)

  # making bvec

  bvec <- c(1.0, MIN, rep(0,size))

  #init setting

  sol <- solve.QP(dmat,dvec,Amat,bvec=bvec,meq=2)

  sol["return"]=MIN

  sol["meigara"]=meigara

  sol<-list(sol)

  for(iret in seq(MIN,MAX,by=0.001)){

    bvec[2] <- iret

    # store result in sol

    solution <-solve.QP(dmat,dvec,Amat,bvec=bvec,meq=2)

    solution["return"]=iret

    solution["meigara"]=meigara

    sol <- c(sol,list(solution))

    }

  return(sol)

  }
plot_curve<-function(solutions){

  #c=lapply(solution,function(ls){return(c(ls[["return"]],ls[["value"]]))})

  i=0

  curve=0

  for(sol in solutions){

    if(i==1){curve=c(sqrt(sol[["value"]]),sol[["return"]])}

    else {curve=rbind(curve, c(sqrt(sol[["value"]]),sol[["return"]]))}

    i=i+1

  }

  plot(curve)

  curve=as.matrix(curve)

  return(curve)

}
show_min_v<-function(solutions){

  min=list()

  i=1

  for(sol in solutions){

    if(i==1){min=sol}

    else{

      if(sol[["value"]]<min[["value"]]){

        min=sol

      }

    }

    i=i+1

  }

  print("==when minimum variance==")

  print("=return=")

  print(min[["return"]])

  print("=variance=")

  print(min[["value"]])

  print("=meigara=")

  print(min[["meigara"]])

  print("=weight=")

  print(min[["solution"]]*100)

}
# mapping from return_rate_csvfile  to the minimized distribution curve

min_v<-function(file_path){

  rmat.F<-csv_to_df(file_path)

  rmat.M <- as.matrix(rmat.F)

  dmat.M <- rate_to_cov(rmat.M)

  solutions=solver(rmat.M,dmat.M)

  c<-plot_curve(solutions)

  show_min_v(solutions)

  return(c)

}