Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

[1] Agresti, A. *Categorical Data Analysis*, 2nd Ed.
Hoboken, NJ: John Wiley & Sons, Inc., 2002.

[2] Allwein, E., R. Schapire, and Y. Singer.
“Reducing multiclass to binary: A unifying approach for margin
classiﬁers.” *Journal of Machine Learning
Research*. Vol. 1, 2000, pp. 113–141.

[3] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised
Classification Learning Algorithms.” *Neural Computation*, Vol.
11, No. 8, 1999, pp. 1885–1992.

[4] Blackard, J. A. and D. J. Dean. "Comparative accuracies of artificial
neural networks and discriminant analysis in predicting forest cover types from cartographic
variables". *Computers and Electronics in Agriculture* Vol. 24, Issue
3, 1999, pp. 131–151.

[5] Bottou, L., and Chih-Jen Lin. “Support Vector Machine
Solvers.” *Large Scale Kernel Machines* (L. Bottou, O. Chapelle,
D. DeCoste, and J. Weston, eds.). Cambridge, MA: MIT Press, 2007.

[6] Bouckaert. R. “Choosing Between Two
Learning Algorithms Based on Calibrated Tests.” *International
Conference on Machine Learning*, pp. 51–58, 2003.

[7] Bouckaert, R. and E. Frank. “Evaluating the Replicability of
Significance Tests for Comparing Learning Algorithms.” *In Advances in
Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference*, 2004, pp.
3–12.

[8] Breiman, L. *Bagging Predictors.* Machine
Learning 26, 1996, pp. 123–140.

[9] Breiman, L. *Random Forests.* Machine Learning
45, 2001, pp. 5–32.

[10] Breiman, L. `https://www.stat.berkeley.edu/~breiman/RandomForests/`

[11] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone.
*Classification and Regression Trees.* Boca Raton, FL: Chapman &
Hall, 1984.

[12] Christianini, N., and J. Shawe-Taylor. *An Introduction to
Support Vector Machines and Other Kernel-Based Learning Methods*. Cambridge,
UK: Cambridge University Press, 2000.

[13] Dietterich, T. “Approximate statistical tests for comparing
supervised classification learning algorithms.” *Neural
Computation*, Vol. 10, No. 7, 1998, pp. 1895–1923.

[14] Dietterich, T., and G. Bakiri. “Solving
Multiclass Learning Problems Via Error-Correcting Output Codes.” *Journal
of Artificial Intelligence Research*. Vol. 2, 1995, pp.
263–286.

[15] Escalera, S., O. Pujol, and P. Radeva.
“On the decoding process in ternary error-correcting output
codes.” *IEEE Transactions on Pattern Analysis and
Machine Intelligence*. Vol. 32, Issue 7, 2010, pp. 120–134.

[16] Escalera, S., O. Pujol, and P. Radeva.
“Separability of ternary codes for sparse designs of error-correcting
output codes.” *Pattern Recogn*. Vol.
30, Issue 3, 2009, pp. 285–297.

[17] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working
set selection using second order information for training support
vector machines.” *Journal of Machine Learning Research*,
Vol 6, 2005, pp. 1889–1918.

[18] Fagerlan, M.W., S Lydersen, P. Laake. “The
McNemar Test for Binary Matched-Pairs Data: Mid-p and Asymptotic Are
Better Than Exact Conditional.” *BMC Medical Research
Methodology*. Vol. 13, 2013, pp. 1–8.

[19] Freund, Y. *A more robust boosting
algorithm.* arXiv:0905.2138v1, 2009.

[20] Freund, Y. and R. E. Schapire. *A Decision-Theoretic
Generalization of On-Line Learning and an Application to Boosting.* J. of
Computer and System Sciences, Vol. 55, 1997, pp. 119–139.

[21] Friedman, J. *Greedy function approximation: A gradient
boosting machine.* Annals of Statistics, Vol. 29, No. 5, 2001, pp.
1189–1232.

[22] Friedman, J., T. Hastie, and R. Tibshirani. *Additive
logistic regression: A statistical view of boosting.* Annals of Statistics,
Vol. 28, No. 2, 2000, pp. 337–407.

[23] Hastie, T., and R. Tibshirani. “Classification
by Pairwise Coupling.” *Annals of Statistics*.
Vol. 26, Issue 2, 1998, pp. 451–471.

[24] Hastie, T., R. Tibshirani, and J. Friedman. *The Elements of
Statistical Learning*, second edition. New York: Springer, 2008.

[25] Ho, C. H. and C. J. Lin. “Large-Scale
Linear Support Vector Regression.” *Journal of Machine
Learning Research*, Vol. 13, 2012, pp. 3323–3348.

[26] Ho, T. K. *The random subspace method for constructing
decision forests.* IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 20, No. 8, 1998, pp. 832–844.

[27] Hsieh, C. J., K. W. Chang, C. J. Lin,
S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent
Method for Large-Scale Linear SVM.” *Proceedings
of the 25th International Conference on Machine Learning, ICML ’08*,
2001, pp. 408–415.

[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. *A
Practical Guide to Support Vector Classification*. Available at `https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf`

.

[29] Hu, Q., X. Che, L. Zhang, and D. Yu. “Feature
Evaluation and Selection Based on Neighborhood Soft Margin.” *Neurocomputing*.
Vol. 73, 2010, pp. 2114–2124.

[30] Kecman V., T. -M. Huang, and M. Vogt. “Iterative
Single Data Algorithm for Training Kernel Machines from Huge Data
Sets: Theory and Performance.” In *Support Vector
Machines: Theory and Applications*. Edited by Lipo Wang,
255–274. Berlin: Springer-Verlag, 2005.

[31] Kohavi, R. “Scaling Up the Accuracy of Naive-Bayes
Classifiers: a Decision-Tree Hybrid.” *Proceedings
of the Second International Conference on Knowledge Discovery and
Data Mining*, 1996.

[32] Lancaster, H.O. “Significance Tests
in Discrete Distributions.” *JASA*, Vol.
56, Number 294, 1961, pp. 223–234.

[33] Langford, J., L. Li, and T. Zhang. “Sparse
Online Learning Via Truncated Gradient.” *J. Mach.
Learn. Res.*, Vol. 10, 2009, pp. 777–801.

[34] Loh, W.Y. “Regression Trees with
Unbiased Variable Selection and Interaction Detection.” *Statistica
Sinica*, Vol. 12, 2002, pp. 361–386.

[35] Loh, W.Y. and Y.S. Shih. “Split
Selection Methods for Classification Trees.” *Statistica
Sinica*, Vol. 7, 1997, pp. 815–840.

[36] McNemar, Q. “Note on the Sampling
Error of the Difference Between Correlated Proportions or Percentages.” *Psychometrika*,
Vol. 12, Number 2, 1947, pp. 153–157.

[37] Meinshausen, N. “Quantile Regression
Forests.” *Journal of Machine Learning Research*,
Vol. 7, 2006, pp. 983–999.

[38] Mosteller, F. “Some Statistical Problems
in Measuring the Subjective Response to Drugs.” *Biometrics*,
Vol. 8, Number 3, 1952, pp. 220–226.

[39] Nocedal, J. and S. J. Wright. *Numerical
Optimization*, 2nd ed., New York: Springer, 2006.

[40] Schapire, R. E. et al. *Boosting the margin: A new
explanation for the effectiveness of voting methods.* Annals of Statistics,
Vol. 26, No. 5, 1998, pp. 1651–1686.

[41] Schapire, R., and Y. Singer. *Improved boosting algorithms
using confidence-rated predictions.* Machine Learning, Vol. 37, No. 3, 1999,
pp. 297–336.

[42] Shalev-Shwartz, S., Y. Singer, and N.
Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for
SVM.” *Proceedings of the 24th International Conference
on Machine Learning, ICML ’07*, 2007, pp. 807–814.

[43] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano.
*RUSBoost: Improving classification performance when training data is
skewed.* 19th International Conference on Pattern Recognition, 2008, pp.
1–4.

[44] Warmuth, M., J. Liao, and G. Ratsch. *Totally corrective
boosting algorithms that maximize the margin.* Proc. 23rd Int’l. Conf. on
Machine Learning, ACM, New York, 2006, pp. 1001–1008.

[45] Wu, T. F., C. J. Lin, and R. Weng. “Probability
Estimates for Multi-Class Classification by Pairwise Coupling.” *Journal
of Machine Learning Research*. Vol. 5, 2004, pp. 975–1005.

[46] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo.
“Sparse Reconstruction by Separable Approximation.” *Trans.
Sig. Proc.*, Vol. 57, No 7, 2009, pp. 2479–2493.

[47] Xiao, Lin. “Dual Averaging Methods
for Regularized Stochastic Learning and Online Optimization.” *J.
Mach. Learn. Res.*, Vol. 11, 2010, pp. 2543–2596.

[48] Xu, Wei. “Towards Optimal One Pass
Large Scale Learning with Averaged Stochastic Gradient Descent.” *CoRR*,
abs/1107.2490, 2011.

[49] Zadrozny, B. “Reducing Multiclass
to Binary by Coupling Probability Estimates.” *NIPS
2001: Proceedings of Advances in Neural Information Processing Systems
14*, 2001, pp. 1041–1048.

[50] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning
by Cost-Proportionate Example Weighting.” *Third IEEE International
Conference on Data Mining*, 435–442. 2003.

[51] Zhou, Z.-H. and X.-Y. Liu. “On Multi-Class Cost-Sensitive
Learning.”*Computational Intelligence.* Vol. 26, Issue 3,
2010, pp. 232–257 CiteSeerX.