You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Extract monthly stock return data out of a table
1 view (last 30 days)
Show older comments
I have this 251x5 table with monthly stock return data over a 10 year time frame for a high number of stocks (ID). (see attachement)
My goal is to create 41 different "Reporting datasets" each depicting an 11 month time frame. I would expect to get 41 tables in the same format as my original table. For more detailed description, look at the notes below.
Note: The actual table of data contains a lot more lines than this short demo table.
%% Form 12-1 month Reporting datasets
% Out of the monthly returns I want to create 41 groups of 11-months-Reporting datasets
% I want to create a new Reporting dataset every 3 months (quaterly)
% R1 09/01 to 09/11 [year / month]
% R2 09/04 to 10/02
% R3 09/07 to 10/05
% ...
% R41 19/1 to 19/11
1 Comment
Accepted Answer
Star Strider
on 21 Mar 2021
The data do not go all the way to 2019, instead stopping at 2017 1.
Also, if I understand corrctly what you want to do, there are 81, not 41, tables in the result.
Try this:
D = load('210320_ShortData_C.mat');
G = D.G;
First15Rows = G(1:15,:);
Last15Rows = G(end-14:end,:);
start = 1:3:size(G,1)-10;
finish = start+10;
for k = 1:numel(start)
DS{k} = G(start(k):finish(k),:);
end
with:
DS_01 = DS{1}
DS_02 = DS{2}
DS_80 = DS{end-1}
DS_81 = DS{end}
producing:
DS_01 =
11×5 table
ID year month date monthlyRET
_____ ____ _____ __________ __________
10001 2009 1 20091 0.043223
10001 2009 2 20092 0.064757
10001 2009 3 20093 -0.069798
10001 2009 4 20094 0.052383
10001 2009 5 20095 0.004916
10001 2009 6 20096 0.02546
10001 2009 7 20097 -0.043847
10001 2009 8 20098 0.057805
10001 2009 9 20099 0.015785
10001 2009 10 2.0091e+05 0.042746
10001 2009 11 2.0091e+05 0.007961
DS_02 =
11×5 table
ID year month date monthlyRET
_____ ____ _____ __________ __________
10001 2009 4 20094 0.052383
10001 2009 5 20095 0.004916
10001 2009 6 20096 0.02546
10001 2009 7 20097 -0.043847
10001 2009 8 20098 0.057805
10001 2009 9 20099 0.015785
10001 2009 10 2.0091e+05 0.042746
10001 2009 11 2.0091e+05 0.007961
10001 2009 12 2.0091e+05 0.16691
10001 2010 1 20101 -0.016774
10001 2010 2 20102 0.000271
DS_80 =
11×5 table
ID year month date monthlyRET
_____ ____ _____ __________ __________
10025 2015 12 2.0151e+05 -0.16054
10025 2016 1 20161 0.1172
10025 2016 2 20162 -0.063814
10025 2016 3 20163 -0.16176
10025 2016 4 20164 -0.059612
10025 2016 5 20165 -0.022708
10025 2016 6 20166 0.30606
10025 2016 7 20167 0.012423
10025 2016 8 20168 0.39732
10025 2016 9 20169 -0.010573
10025 2016 10 2.0161e+05 0.004453
DS_81 =
11×5 table
ID year month date monthlyRET
_____ ____ _____ __________ __________
10025 2016 3 20163 -0.16176
10025 2016 4 20164 -0.059612
10025 2016 5 20165 -0.022708
10025 2016 6 20166 0.30606
10025 2016 7 20167 0.012423
10025 2016 8 20168 0.39732
10025 2016 9 20169 -0.010573
10025 2016 10 2.0161e+05 0.004453
10025 2016 11 2.0161e+05 0.074598
10025 2016 12 2.0161e+05 -0.01457
10025 2017 1 20171 -0.050697
Those appear to me to be what you requested.
.
21 Comments
Fabian Niederreiter
on 21 Mar 2021
Thanks for you support!
"The data do not go all the way to 2019, instead stopping at 2017 1." You are right there because the Demo-dataset I attached did only contain 3 different stocks, that all got liquidated before 2019. In total I have about 13k stocks in my dataset where the fast majority provides data from 2009-2019.
I will attach you another slightly bigger and more accurate demo dataset.
There was indeed a little misunderstanding: If you divide 2009/01 - 2019/11 into 11month timeframes as you correctly did, you get 41 total reporting periods as I stated.
The only difference to your solution is that i want the 11 month return data of all stocks (e.g. for timeframe 09/1-09/11) in one table. So each of the 41 tables should have 11 * "total number of stocks" lines.
Star Strider
on 21 Mar 2021
Edited: Star Strider
on 21 Mar 2021
I did not completely understand what you wanted.
Try this:
D = load('210320_ShortData_C.mat')
G = D.G;
Gu = unstack(G,'monthlyRET','ID'); % Unstack Using ‘ID’
Gu.Properties.VariableNames = strrep(Gu.Properties.VariableNames, 'x','ID_'); % Change ‘ID’ Names
Gu = Gu(Gu.year <= 2019,:); % Limit Rows
start = 1:3:size(Gu,1)-10;
finish = start+10;
for k = 1:numel(start)
DS{k} = Gu(start(k):finish(k),:);
end
with:
DS_01 = DS{1}
DS_02 = DS{2}
DS_40 = DS{end-1}
DS_41 = DS{end}
producing:
DS_01 =
11×9 table
year month date ID_10001 ID_10002 ID_10025 ID_10026 ID_10028 ID_10032
____ _____ __________ _________ _________ _________ _________ _________ _________
2009 1 20091 0.043223 -0.43835 -0.18457 -0.022692 -0.18829 -0.13439
2009 2 20092 0.064757 -0.33206 -0.082562 -0.096532 -0.39222 -0.10564
2009 3 20093 -0.069798 0.21945 0.22589 0.10504 0.023135 0.10171
2009 4 20094 0.052383 0.12377 0.30476 0.12002 0.19034 0.51421
2009 5 20095 0.004916 -0.40998 0.21662 -0.030642 -0.001542 -0.18203
2009 6 20096 0.02546 -0.29761 0.076464 -0.039256 0.14498 0.12626
2009 7 20097 -0.043847 -0.012998 0.19584 0.19526 -0.13307 0.23759
2009 8 20098 0.057805 0.32204 0.1964 0.010337 0.38647 -0.016389
2009 9 20099 0.015785 -0.024919 0.044851 -0.008996 0.14316 0.045934
2009 10 2.0091e+05 0.042746 -0.072414 -0.12666 -0.095975 0.068962 -0.032813
2009 11 2.0091e+05 0.007961 -0.12658 0.085056 -0.078458 -0.051351 0.075131
DS_02 =
11×9 table
year month date ID_10001 ID_10002 ID_10025 ID_10026 ID_10028 ID_10032
____ _____ __________ _________ _________ ________ _________ _________ _________
2009 4 20094 0.052383 0.12377 0.30476 0.12002 0.19034 0.51421
2009 5 20095 0.004916 -0.40998 0.21662 -0.030642 -0.001542 -0.18203
2009 6 20096 0.02546 -0.29761 0.076464 -0.039256 0.14498 0.12626
2009 7 20097 -0.043847 -0.012998 0.19584 0.19526 -0.13307 0.23759
2009 8 20098 0.057805 0.32204 0.1964 0.010337 0.38647 -0.016389
2009 9 20099 0.015785 -0.024919 0.044851 -0.008996 0.14316 0.045934
2009 10 2.0091e+05 0.042746 -0.072414 -0.12666 -0.095975 0.068962 -0.032813
2009 11 2.0091e+05 0.007961 -0.12658 0.085056 -0.078458 -0.051351 0.075131
2009 12 2.0091e+05 0.16691 0.03871 0.018124 0.10658 0.052497 0.050052
2010 1 20101 -0.016774 0.3426 -0.08201 0.050076 -0.020832 0.18898
2010 2 20102 0.000271 0.15655 0.004063 0.022007 0.17511 0.018585
DS_40 =
11×9 table
year month date ID_10001 ID_10002 ID_10025 ID_10026 ID_10028 ID_10032
____ _____ __________ ________ ________ ________ _________ _________ _________
2018 10 2.0181e+05 NaN NaN NaN 0.035841 -0.18726 0.005142
2018 11 2.0181e+05 NaN NaN NaN 0.005752 -0.023593 0.046395
2018 12 2.0181e+05 NaN NaN NaN -0.074561 -0.0235 -0.17362
2019 1 20191 NaN NaN NaN 0.075106 -0.060102 0.098057
2019 2 20192 NaN NaN NaN 0.006812 0.084715 0.096575
2019 3 20193 NaN NaN NaN 0.026897 0.016577 -0.010477
2019 4 20194 NaN NaN NaN -0.009117 0.045077 -0.008486
2019 5 20195 NaN NaN NaN 0.024136 0.53771 -0.19107
2019 6 20196 NaN NaN NaN 0.004705 0.63712 0.16685
2019 7 20197 NaN NaN NaN 0.15242 -0.43362 0.026426
2019 8 20198 NaN NaN NaN 0.040125 0.50335 -0.038035
DS_41 =
11×9 table
year month date ID_10001 ID_10002 ID_10025 ID_10026 ID_10028 ID_10032
____ _____ __________ ________ ________ ________ _________ _________ _________
2019 1 20191 NaN NaN NaN 0.075106 -0.060102 0.098057
2019 2 20192 NaN NaN NaN 0.006812 0.084715 0.096575
2019 3 20193 NaN NaN NaN 0.026897 0.016577 -0.010477
2019 4 20194 NaN NaN NaN -0.009117 0.045077 -0.008486
2019 5 20195 NaN NaN NaN 0.024136 0.53771 -0.19107
2019 6 20196 NaN NaN NaN 0.004705 0.63712 0.16685
2019 7 20197 NaN NaN NaN 0.15242 -0.43362 0.026426
2019 8 20198 NaN NaN NaN 0.040125 0.50335 -0.038035
2019 9 20199 NaN NaN NaN -0.001763 -0.026812 0.092446
2019 10 2.0191e+05 NaN NaN NaN -0.005651 0.078102 0.17332
2019 11 2.0191e+05 NaN NaN NaN -0.028654 0.10819 0.027209
MATLAB does not allow strictlly numeric variable names, so it is necessary to replace the ‘ID’ labels with compatible variable names.
EDIT — (21 Mar 2021 at 15:02)
Corrected typographical errors.
Fabian Niederreiter
on 21 Mar 2021
This is excactly what I was looking for! :)
After seeing your quick and fitting answers, I would love to post a follow up question:
I don't know if this is appropriate or if I should rather open a new Thread? If you want me to open a new thread just tell me in the next comment and I will do that. Otherwise I just formulate my follow up problem here:
I once again attached the output 1x41 cell I got on my demo dataset with the help of your code.
What I am looking for right now is two more 1x41 cells. The first one should include all stocks (in the same structure as before) out of the top 10% decile based on their summed up return for each 11 month reporting period. The second one should inlcude all stocks (in the same structure as before) out of the bottom 10% decile based on their summed up returns for each 11month reporting period.
Example: DS_01 right now includes 10 columns of different stocks.
As an output I would need DS_01_top with the top 10% decile stocks in this reporting period based on stock performance as well as a DS_01_bottom with the bottom 10% decile stocks in this reporting period based on stock performance.
Star Strider
on 21 Mar 2021
Thank you!
I do not understand what you want, specifically with respect to what the tables should contain. I understand about the top 10% and bottom 10%, however I have no idea how you want them calculated, for example the top and bottom 10% of what, exactly?
That aside, if my Answer helped you solve your problem, please Accept it!
.
Fabian Niederreiter
on 21 Mar 2021
This is what everyone of the 41 tables looks like right now.
The numbers in each stock of the 10 stock columns depict the monthly returns of every stock.
The top 10% decile table should now only contain the 10% of stocks with the highest returns over the reporting period. So speaking with the highest sum of the stock column.
With a sample size of 10 stocks the top 10% decile table should look like this (assuming ID_10001 has the highest summed up returns):
I just accepted your answer.
I hope you will still answer this thread. :)
Star Strider
on 21 Mar 2021
Now I understand!
This took a few minutes of experimenting:
D = load('210320_ShortData_C.mat')%, 'VariableNamingRule','preserve');
G = D.G;
Gu = unstack(G,'monthlyRET','ID'); % Unstack Using ‘ID’
Gu.Properties.VariableNames = strrep(Gu.Properties.VariableNames, 'x','ID_'); % Change ‘ID’ Names
Gu = Gu(Gu.year <= 2019,:); % Limit Rows
start = 1:3:size(Gu,1)-10;
finish = start+10;
for k = 1:numel(start)
T11 = Gu(start(k):finish(k),:); % Same Construction As Previously
VarSum = sum(T11{:,4:end}); % Sum Stock Columns
thrshld = prctile(VarSum,[10 90]); % Calculate [10 90] Percentiles
colsl = find(VarSum<=thrshld(1))+3; % Column With Lowest
colsh = find(VarSum>=thrshld(2))+3; % Column With Highest
DS_10H{k,:} = T11(:,[1:3 colsh]); % Create Table For Highest
DS_10L{k,:} = T11(:,[1:3 colsl]); % Create Table For Lowest
end
with:
DS_10_H01 = DS_10H{1}
DS_10_L01 = DS_10L{1}
DS_10_H41 = DS_10H{end}
DS_10_L41 = DS_10L{end}
producing:
DS_10_H01 =
11×4 table
year month date ID_10025
____ _____ __________ _________
2009 1 20091 -0.18457
2009 2 20092 -0.082562
2009 3 20093 0.22589
2009 4 20094 0.30476
2009 5 20095 0.21662
2009 6 20096 0.076464
2009 7 20097 0.19584
2009 8 20098 0.1964
2009 9 20099 0.044851
2009 10 2.0091e+05 -0.12666
2009 11 2.0091e+05 0.085056
DS_10_L01 =
11×4 table
year month date ID_10002
____ _____ __________ _________
2009 1 20091 -0.43835
2009 2 20092 -0.33206
2009 3 20093 0.21945
2009 4 20094 0.12377
2009 5 20095 -0.40998
2009 6 20096 -0.29761
2009 7 20097 -0.012998
2009 8 20098 0.32204
2009 9 20099 -0.024919
2009 10 2.0091e+05 -0.072414
2009 11 2.0091e+05 -0.12658
DS_10_H41 =
11×4 table
year month date ID_10028
____ _____ __________ _________
2019 1 20191 -0.060102
2019 2 20192 0.084715
2019 3 20193 0.016577
2019 4 20194 0.045077
2019 5 20195 0.53771
2019 6 20196 0.63712
2019 7 20197 -0.43362
2019 8 20198 0.50335
2019 9 20199 -0.026812
2019 10 2.0191e+05 0.078102
2019 11 2.0191e+05 0.10819
DS_10_L41 =
11×4 table
year month date ID_10026
____ _____ __________ _________
2019 1 20191 0.075106
2019 2 20192 0.006812
2019 3 20193 0.026897
2019 4 20194 -0.009117
2019 5 20195 0.024136
2019 6 20196 0.004705
2019 7 20197 0.15242
2019 8 20198 0.040125
2019 9 20199 -0.001763
2019 10 2.0191e+05 -0.005651
2019 11 2.0191e+05 -0.028654
Some documentation —
It first calculates the column sums, then uses the Statistics and Machine Learning Toolbox prctile function to calculate the percentiles, finds the columns with the highest and lowest percentiles, and creates new tables for each. The code is essentially the same as the previous code, in that it creates the same initial tables, however then edits them to include only the desired column (variable) in each new table.
I used the earlier data for this, noting that if more than one column meets the criteria for being greater than or less than than the 90th or 10th percentiles, the resulting tables could have more than one variable (column). That is not the situation here, although it could be with a sufficient number of variables (columns) in each table.
I checked a few of the returned tables in detail, and it appears to return the requested information correctly.
.
Fabian Niederreiter
on 22 Mar 2021
Thanks Star Strider!
Clean work as usual.
In order to finish my analysis I would need input on two more tasks. I hope you don't bother if I asked you for help on these two as well.
Step 1:
%% Form 12 month Analysing Portfolios
% These Analysing Portfolios should now be consisting out of the stocks from
% the assiociated TOP and BOTTOM Reporting Portfolios (DS_10H and DS_10L).
% To be more precise A1Top should include the monthly returns for the 12 month
% analysing period 10/1 to 10/12 (see below) from the stocks of DS_10H{1,1}, whereas
% A1Bottom should include the monthly returns for the 12 month analysing
% period 10/1 to 10/12 (see below) from the stocks of DS_10L{1,1}.
% This should once agian result in 41 TOP Analysing Portfolios and 41
% BOTTOM Analysing portfolios for the following 12-month time frames
% A1Top 10/1 to 10/12 [year/month]
% A1Bottom 10/1 to 10/12
% A2Top 10/4 to 11/03
% A2Bottom 10/4 to 11/03
% A3Top 10/7 to 11/06
% A3Bottom 10/7 to 11/06
% ...
% A41Top 20/1 to 20/12
% A41Bottom 20/1 to 20/12
Step 2:
%% Compare avg Return of Top- to Bottom Analysing Portfolios
% Here I'd like to compare the avg monthly returns of all stocks in the Top
% Analysing portfolios over all 41 time periods with the avg monthly returns of all stocks in the
% Bottom Analysing portfolios.
% Resulting in just two simple avg monthly return numbers: The avg Top
% monthly return and the avg Bottom monthly return.
The output should contain these two numbers as well as the difference between the two numbers similar to the attached photo (Sell = Bottom; Buy = Top; Buy - Sell = Difference):
The numbers in parentheses are not needed as of right now.
PS: Please let me know if there is a way to express my gratitute for your effort! Like a place where I can somehow write a positive evalutation or something. :)
Star Strider
on 22 Mar 2021
I am not certain what you want. It would be straightforward to use the bounds function to get the high and low values for each of the ‘10H’ and ‘10L’ tables, and then subtracting them to do the necessary analysis. Those could be added as additional rows (easiest) to the existing tables.
‘PS: Please let me know if there is a way to express my gratitute for your effort!’
Voting for my Answer as well (and any other of my Answers you happen to find) would always be appreciated! I am not certain anything else would be necessary, since my Answers stand on their own merits otherwise. A Comment to your original Question mentioning that this approach works is the only other item I could suggest, simply for the benefit of anyone else who has a similar problem and would like a quick synopsis of an otherwise lengthy thread.
I very much appreciate your compliment!
Fabian Niederreiter
on 22 Mar 2021
Edited: Fabian Niederreiter
on 22 Mar 2021
Thanks for your answer.
I try to rephrase my problem from Step 1 again by giving you a little more backround on it:
In the previous step we created a Top 10% and a Bottom 10% portfolio for each timeframe (R1Top 09/1 to 09/11, R1Bottom 09/1 to 09/11;...;R41Top 19/1 to 19/11, R41Bottom 19/1 to 19/11). These are our "Reporting portfolios".
Now I want to analyise how the stocks of our Reporting portfolios are doing in the following 12 months after skipping one month (A1Top 10/1 to 10/12, A1Bottom 10/1 to 10/12; ... ; A41Top 20/1 to 20/12, A41Bottom 20/1 to 20/12). Meaning R1 09/1 to 09/11 corresponding with A1Top 10/1 to 10/12, A1Bottom 10/1 to 10/12.
Example: "We take the stocks of our R1 09/1-09/11 Top 10% portfolio and now analyse their returns in the corresponding timeframe A1Top 10/1 to 10/12.
Likewise we take the stocks of our R1 09/1-09/11 Bottom 10% portfolio and now analyse their returns in the corresponding timeframe A1Bottom 10/1 to 10/12."
Then we repeat these two steps for all 41 timeframes and we are done.
Note: As we need the stocks data until 20/12 you'd need to take my initial table G with the complete data for the 10 sample stocks. I reattached it to this post again.
Star Strider
on 22 Mar 2021
I am a bit lost with respect ro the ‘R’ and ‘A’ designations. I assume ‘R’ are the original data and ‘A’ are the highest and lowewst 10%. Not all entries end up in the highest 10% or lowest 10% by design, and in these examples only one does.
Do we compare a given highest 10% and lowest 10% entries with how it did a month earlier? This will obviously not work for all of them because there can be only 40 comparisons, not 41.
Fabian Niederreiter
on 22 Mar 2021
Sorry for that. I might have not expressed this clearly enough.
DS_10_H01 = DS_10H{1} this is one table within the 41x1cell we created with your previous code. The table contains the Top 10% of stocks (in our case only one stock per portfolio) for the first timeframe (09/01-09/11). In my description above this equals R1Top 09/1 to 09/11.
DS_10_L01 = DS_10L{1} is also one table within the 41x1cell we created with your previous code. The table contains the Bottom 10% of stocks (in our case only one stock per portfolio) for the first timeframe (09/01-09/11). In my description above this equals R1Bottom 09/1 to 09/11.
Now we want to analyse how these exact portfolios (in our case only one stock per portfolio) is doing in the corresponding analysing timeframe (10/1 to 10/12). In my description above this equals A1Top 10/1 to 10/12 and A1Bottom 10/1 to 10/12.
As a result for every Reporting timeframe R1Top/R1Bottom to R41Top/R41Bottom we should be having a corresponding analysing timeframe A1Top/A1Bottom to A41Top/A41Bottom.
Star Strider
on 22 Mar 2021
I still have absolutely no clue as to what you want with this.
Try this:
filename = '210320_ShortData_C.mat';
D = load(filename);%, 'VariableNamingRule','preserve');
G = D.G;
Gu = unstack(G,'monthlyRET','ID'); % Unstack Using ‘ID’
Gu.Properties.VariableNames = strrep(Gu.Properties.VariableNames, 'x','ID_'); % Change ‘ID’ Names
Gu = Gu(Gu.year <= 2019,:); % Limit Rows
start = 1:3:size(Gu,1)-10;
finish = start+10;
for k = 1:numel(start)
T11 = Gu(start(k):finish(k),:); % Same Construction As Previously
VarSum = sum(T11{:,4:end}); % Sum Stock Columns
thrshld = prctile(VarSum,[10 90]); % Calculate [10 90] Percentiles
colsl = find(VarSum<=thrshld(1))+3; % Column With Lowest
colsh = find(VarSum>=thrshld(2))+3; % Column With Highest
DS_10H{k,:} = T11(:,[1:3 colsh]); % Create Table For Highest
DS_10L{k,:} = T11(:,[1:3 colsl]); % Create Table For Lowest
DS_10H_Perf{k,:} = DS_10H{k,:}([1 end],:); % Create Performance Table For Highest
LL_H = varfun(@(x)diff(x,1),DS_10H_Perf{k}); % Calculate Performance Table For Highest
DS_10H_Perf{k,:}(end+1,4) = LL_H(:,4); % Add Row
DS_10L_Perf{k,:} = DS_10L{k,:}([1 end],:); % Create Performance Table For Lowest
LL_L = varfun(@(x)diff(x,1),DS_10L_Perf{k}); % Calculate Performance Table For Lowest
DS_10L_Perf{k,:}(end+1,4) = LL_L(:,4); % Add Row
end
DS_10_H01 = DS_10H{1}
DS_10_H01P = DS_10H_Perf{1}
DS_10_L01 = DS_10L{1};
DS_10_L01P = DS_10L_Perf{1}
producing:
DS_10_H01P =
3×4 table
year month date ID_10078
____ _____ __________ ________
2009 1 20091 0.14615
2009 11 2.0091e+05 0.04109
0 0 0 -0.10506
DS_10_L01P =
3×4 table
year month date ID_10002
____ _____ __________ ________
2009 1 20091 -0.43835
2009 11 2.0091e+05 -0.12658
0 0 0 0.31177
The ’performance’ metric is simply the difference between the first and alst prices, and is the last row of each of these table.
This is not my area of expertise.
Fabian Niederreiter
on 28 Mar 2021
Hello again :)
Sorry for the late reply, I once again appreciate your effort a lot. Unfortunatelly these are still not the outputs I was looking for.
As I think that I wasn't clear enough in describing the targeted solution again, I will give it one more try before stopping to bother you with my questions in here. :)
I will attach the current status of my code as well as the data in order for us to be on the same page again.
Let us also please ignore what I wrote as Step 2 before and solely focus on what I describe right now:
The .mat file I attached contains the 3 cell arrays we created in previous steps plus the table 'G' with the monthly return data (2009-2020) of our now 20 sample stocks.
DS_10T and DS_10B now both contain 41 tables with the top/bottom 10% percentile of stocks based on their performance in the respective time frame.
I decided to give you a step by step instruction on what I'd like to see the code doing:
- Look into table DS_10T{1, 1} (top 10% of stocks in reporting time frame R1 09/01 - 09/11)
- Take the Stock_IDs out of DS_10T{1, 1} and search for their return data in table 'G'
- Extract the return data of these stocks out of 'G' for the corresponding analysing time frame A1Top 10/1 - 10/12 (12 months, skipping one month after reporting time frame)
- Save the extracted return data as the first table of a new 41x1 cell DS_10T_An
- Look into table DS_10T{2, 1} (top 10% of stocks in reporting time frame R1 09/04 - 10/02)
- Take the Stock_IDs out of DS_10T{2, 1} and search for their return data in table 'G'
- Extract the return data of these stocks out of 'G' for the corresponding analysing time frame A1Top 10/4 - 11/03 (12 months, skipping one month after reporting time frame)
- Save the extracted return data as the second table of the 41x1 cell DS_10T_An
... continue this until the we created A41Top as the last Analysing portfolio of our Top performing stocks.
Then perform excactly the same task for the 41 Bottom Portfolios:
- Look into table DS_10B{1, 1} (bottom 10% of stocks in reporting time frame R1 09/01 - 09/11)
- Take the Stock_IDs out of DS_10B{1, 1} and search for their return data in table 'G'
- Extract the return data of these stocks out of 'G' for the corresponding analysing time frame A1Bottom 10/1 - 10/12 (12 months, skipping one month after reporting time frame)
- Save the extracted return data as the first table of a new 41x1 cell DS_10B_An
- Look into table DS_10B{2, 1} (bottom 10% of stocks in reporting time frame R1 09/04 - 10/02)
- Take the Stock_IDs out of DS_10B{2, 1} and search for their return data in table 'G'
- Extract the return data of these stocks out of 'G' for the corresponding analysing time frame A1Bottom 10/4 - 11/03 (12 months, skipping one month after reporting time frame)
- Save the extracted return data as the second table of the 41x1 cell DS_10B_An
... continue this until the we created B41Top as the last Analysing portfolio of our Bottom performing stocks.
I assume we put all of this into a loop as we did before and therefore create our desired 41x1 cell 'DS_10T_An' (with the top 10% stocks data as input) as well as the other 41x1 cell 'DS_10B_An' (with the bottom 10% stocks data as input).
I hope this time I was precise enough to make you want to tackle it one last time for me.
Thanks a lot in advance!
Star Strider
on 28 Mar 2021
I’m not following everything.
Try this:
D = load('210322_ShortData_C.mat');
DS_10T = D.DS_10T;
DS_10B = D.DS_10B;
G = D.G;
for k = 1:numel(DS_10T)
IDv = sscanf([DS_10T{k}.Properties.VariableNames{4:end}],'ID_%d');
G_ext = G(ismember(G.ID,categorical(IDv)),:);
Hi{k,:} = unstack(G_ext,'monthlyRET','ID');
end
for k = 1:numel(DS_10B)
IDv = sscanf([DS_10B{k}.Properties.VariableNames{4:end}],'ID_%d');
G_ext = G(ismember(G.ID,categorical(IDv)),:);
Lo{k,:} = unstack(G_ext,'monthlyRET','ID');
end
It may be necessary to tweak this to get the desired result. I leave that to you.
Fabian Niederreiter
on 29 Mar 2021
Thanks again! With some working around I got my desired output :)
Fabian Niederreiter
on 1 Apr 2021
Hello again,
Now that I managed to get my analysing portfolios sorted, I asked the community for some advice on the tiny last step of my project... unfortunatelly we did not find a fitting solution until now. That's why I am coming back to you once again and will ask you for one more advise. :)
I would like to compare the average of the monthly stock return data listed in the 41x1 cell array 'DS_10T' with the data listed in the cell array 'DS_10B'.
As there are some NaNs within my data I hope that one can somehow not factor them into the averages.
I once again attached a short version of the relevant data as my original dataset would need significantly more memory, because it includes a lot more stocks.
Even tough every cell array contains 41 tables with data from different timeframes I would only need a comparison of the overall average monthly return out of all tables in 'DS_10T' versus the overall average monthly return of all tables of 'DS_10B' (so literally just 1 number for each of the two cell arrays) in the following format:
Where "Sell" depicts the average monthly returns of the 'DS_10B' array, "Buy" the average monthly returns of the 'DS_10T' array and Buy-Sell the difference between the two values. The numbers in parentheses can be ignored for now.
Fabian Niederreiter
on 5 Apr 2021
Hey StarStrider,
Do you think you will find time to look into it? I'm really struggling here and would be very happy about any kind of advise.
I already tried myself a bit and came up with the idea to first delete all of the three month, year, date colomns in both of my cell arrays.
Then we literally just need to calculate the mean for all of the remaining numbers for the two cell arrays. I think for an expert like you it should not be a problem to come up with a fitting for loop or even an easier way.
Would really appreciate your help on this! :)
Star Strider
on 6 Apr 2021
I still do not understand what you want. Apparently, others are having the same problem.
Fabian Niederreiter
on 6 Apr 2021
You are right that things got a little messy right there.
Thats why I posted a new and easier to understand question in this new thread:
Happy to hear your thoughts right there :)
More Answers (0)
See Also
Categories
Find more on Performance and Memory in Help Center and File Exchange
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)