Hi,
1) Does Matlab have implemented R, R^2, S, K-S, or other tools that can indicate how good is a fit ?
2) This is a general question: What kind of indicator can I use to see how a small portion of a data set fit explains as good as a whole data set fitting ?
For example: I have, 100% of the data set, fitted (for example, linear). And, 10% (first part of the whole) of the same data set fitted (linearly too). Both of the fitted curves would "try" to explain the "almost" the same thing. Is there any kind of tool that I can use to compare this two data's ?
Andre

 Accepted Answer

Star Strider
Star Strider on 20 May 2014

1 vote

  1. Yes. Look in the Statistics Toolbox functions. There are also functions in the Curve Fitting Toolbox, but I don’t have it. (With the Statistics and Optimization Toolboxes, I don’t need it.)
  2. That’s generally not a good idea, because with fewer data, there are fewer degrees-of-freedom and the parameter confidence limits with a small data set might be non-significant but with a larger data set would be significant. It doesn’t take MATLAB much longer to estimate the parameters of a full data set than a subset, especially for a linear problem. Give the regression functions all your data from the outset.
  3. I would not suggest that approach. If you want to use subsets of your data, there are a number of Resampling Techniques in the Statistics Toolbox that will do that reliably.

6 Comments

Andre
Andre on 20 May 2014
Star Strider, always answering my crazy ideas. Thanks.
Let me explain better:
I have a data set with 1000000 points. Looking from a dispersion view, it would be a "pseudo-progressive-line". With a single linear regression approach I can fit it.
If I took the "first" 100000 points of that same data set, looking from a dispersion view, it looks like I can use a linear regression to it too. Can I trust this second fit ?
The problem is that, I have a 36 dimension matrix (with 100000 rows) for fitting with SVR, and it's too slow. But, I know previously, that the first 10000 rows represents a good view of the whole rows data set.
Thanks, Andre
Wow! I've never heard of a matrix with that many dimensions. The biggest array I use has only 7 dimensions, and the longest dimension is probably only about 128 long. Is each dimension 100,000 long? That would be 36^100000 elements, which MATLAB says is infinity. How do you handle that? Using a sparse matrix? Or maybe do you have only a 2 dimension matrix with 100,000 rows and 36 columns?
Andre
Andre on 20 May 2014
Hi Image Analyst,
That's it, 36 x 100000.
For a second, I thought that I could use this set like I said before (use the first 10% of the whole data set.....)
Andre
My pleasure!
Not crazy at all! Experimenting is how we all learn.
In your situation, the resampling idea seems your best option. I’m not guaranteeing that it will be faster (it could be), but it will likely be just as reliable, especially for a linear problem, and the statistics are well-established. (I don’t have any recent experience with it, so I am of limited ability to help you implement it.)
Andre
Andre on 20 May 2014
Thank you very much, I will research the resampling techniques.
My pleasure!
I’ll help if I can. It’s probably beneficial for me to review resampling techniques as well. I may ask you to post some of your data and describe what you want to do.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 20 May 2014

Commented:

on 20 May 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!