Math Problem--principal component analysis

R.BAGGIO

National Team Player
Oct 19, 2002
5,702
0
Toronto
#1
Hi all

I am working on a question and I am kind of stuck. The question asks to estimate sigma_jk and r using Principal Component Analysis when
the given data S_k(t_i) satisfy the following model

dS_k=sum(j=1 to r){sigma_kj*dW_j) when W_j(t) are independent Brownian Motions.

I have previously used PCA to reduce the dimension of a set of deterministic differential equations but I've never used it for estimation
purposes. If anyone could give me a hint as to how to start this problem I would apprecaite it.

Thanks
 

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
#2
Hi all

I am working on a question and I am kind of stuck. The question asks to estimate sigma_jk and r using Principal Component Analysis when
the given data S_k(t_i) satisfy the following model

dS_k=sum(j=1 to r){sigma_kj*dW_j) when W_j(t) are independent Brownian Motions.

I have previously used PCA to reduce the dimension of a set of deterministic differential equations but I've never used it for estimation
purposes. If anyone could give me a hint as to how to start this problem I would apprecaite it.

Thanks
I might be able to make some suggestions if you could write down the question verbatim. There seems to be some information/description missing from you above description. For example, what is t_i in S_k(t_i) as opposed to 't' in W_j(t)?
 

R.BAGGIO

National Team Player
Oct 19, 2002
5,702
0
Toronto
#3
Hi
Thanks for your reply. Here's the entire question verbatim.
The file Data.csv contains time series of daily prices {Sk(ti)} for 10 firms and 501 days. The prices satisfy the model:
dSk =sum(j=1 to r) akjdWj k=1,2,…,10
where Wj(t) are independent brownian motions. The number r is unknown a priori.
Using Principal Component Analysis determine r and estimate the coefficients {akj}.

I think t_i refers to the 501 days of the data.
Thanks
 

spinhead

Elite Member
Oct 24, 2002
2,124
201
United States of Amnesia
#4
Hi
Thanks for your reply. Here's the entire question verbatim.
The file Data.csv contains time series of daily prices {Sk(ti)} for 10 firms and 501 days. The prices satisfy the model:
dSk =sum(j=1 to r) akjdWj k=1,2,…,10
where Wj(t) are independent brownian motions. The number r is unknown a priori.
Using Principal Component Analysis determine r and estimate the coefficients {akj}.

I think t_i refers to the 501 days of the data.
Thanks
Here are some thoughts that may or may not be helpful.

I assume that what you have in file Data.csv is a table with 501 rows and 10 columns. The 10 columns are prices for the 10 companies and the 501 rows are their daily prices which vary from day to day. For example, S_k(t_i) would be the price of the k-th company on the i-th day. So define a (501x10) prince matrix S=[S_ik] where S_ik=S_k(t_i), i is the row index (i=1,2,...,501) and k is the column index (k=1,2,...,10).


I suggest the following procedure for solving the problem:

(1) Create a (500x10) price change matrix D by taking the first difference in rows of S, that is, D_ik = S_(i+1)k – S_ik. Notice that D will only have 500 rows as opposed to S which has 501 for the obvious reason.

(2) Average all rows of D and subtract the computed average from each row. This means that each of the 10 columns of the resulting matrix would have zero mean. For simplicity, I will still call this mean-corrected matrix D.

(3) Compute the 10x10 symmetric matrix D^T*D where D^T means the transpose of D.

(4) Do a PCA in D^T*D to obtain: D^T*D = U L U^T. Note that the 10[SUP]th[/SUP] element of the diagonal matrix L will be zero. This is because you mean-correct the D matrix in step 2 thus reducing the rank of D^T*D from 10 to 9.

(5) So, let’s call the first nine columns of matrix U as: U_j where j=1,2,…,9 and the non-zero diagonal elements of L as L_j (j=1,…,9). I believe that the solution to your problem would be: a_kj = Lj*U_jk , where U_jk is the kth element of U_j, when you think of it as U_j = [U_j1 Uj2 … U_j,10]^T.

(6) Finally, while the maximum r that you can have is 9. In whatever context this problem may have risen, they may have discussed a model order selection method. You can use that method to find an optimum order r<9, or just take r=9. By the way, if such model selection technique has been discussed, I would be interested in knowing it.

Hope this helps, my 2 cents.