Hi
Thanks for your reply. Here's the entire question verbatim.
The file Data.csv contains time series of daily prices {Sk(ti)} for 10 firms and 501 days. The prices satisfy the model:
dSk =sum(j=1 to r) akjdWj k=1,2,…,10
where Wj(t) are independent brownian motions. The number r is unknown a priori.
Using Principal Component Analysis determine r and estimate the coefficients {akj}.
I think t_i refers to the 501 days of the data.
Thanks
Here are some thoughts that may or may not be helpful.
I assume that what you have in file Data.csv is a table with 501 rows and 10 columns. The 10 columns are prices for the 10 companies and the 501 rows are their daily prices which vary from day to day. For example, S_k(t_i) would be the price of the k-th company on the i-th day. So define a (501x10) prince matrix S=[S_ik] where S_ik=S_k(t_i), i is the row index (i=1,2,...,501) and k is the column index (k=1,2,...,10).
I suggest the following procedure for solving the problem:
(1) Create a (500x10) price change matrix D by taking the first difference in rows of S, that is, D_ik = S_(i+1)k – S_ik. Notice that D will only have 500 rows as opposed to S which has 501 for the obvious reason.
(2) Average all rows of D and subtract the computed average from each row. This means that each of the 10 columns of the resulting matrix would have zero mean. For simplicity, I will still call this mean-corrected matrix D.
(3) Compute the 10x10 symmetric matrix D^T*D where D^T means the transpose of D.
(4) Do a PCA in D^T*D to obtain: D^T*D = U L U^T. Note that the 10[SUP]th[/SUP] element of the diagonal matrix L will be zero. This is because you mean-correct the D matrix in step 2 thus reducing the rank of D^T*D from 10 to 9.
(5) So, let’s call the first nine columns of matrix U as: U_j where j=1,2,…,9 and the non-zero diagonal elements of L as L_j (j=1,…,9). I believe that the solution to your problem would be: a_kj = Lj*U_jk , where U_jk is the kth element of U_j, when you think of it as U_j = [U_j1 Uj2 … U_j,10]^T.
(6) Finally, while the maximum r that you can have is 9. In whatever context this problem may have risen, they may have discussed a model order selection method. You can use that method to find an optimum order r<9, or just take r=9. By the way, if such model selection technique has been discussed, I would be interested in knowing it.
Hope this helps, my 2 cents.