# Linear regression with two subsamples of different number of observations

12 views (last 30 days)
Jesper Solheimsnes on 19 Sep 2020
Hi,
I am trying to create a linear regression to test the difference in means across two subsamples between stock prices before and after a market crash.
The problem is that the subsamples are of different observations. The observations before is 334, and after is 374.
I am supposed to create a linear regression yi=β0+β1di where y is containing values from both subsamples, d is a dummy variable equal to 0 for values before, and 1 for values after.
I need to extend my code to estimate β(hat)0 and β(hat)1.
How do I do this when the subsamples are of different observations?
Thank you for all help,
J.

Aditya Patil on 22 Sep 2020
As per my understanding, you want to fit a linear regression model, however, the two datasets you are using have different variables.
One way to overcome this would be to see if there is an appropriate value, such as zero, which can be added to missing variables in both datasets. However, this might only make sense if only few variables are different. If all variables are different, then it might not make much sense.
On the other hand, if you're issue is that the observations are not for same stocks, then it might not be useful to compare them, assuming they are independent variables. If you want to test the hypothesis that they are somehow related, then you need to check the relationship between the two sets of variables. You could do this by predicting the after variables based on the before variables.