Free Access
Volume 9, June 2005
Page(s) 323 - 375
Published online 15 November 2005
  1. R. Ahlswede, P. Gács and J. Körner, Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34 (1976) 157–177. (correction in 39 (1977) 353–354). [CrossRef] [Google Scholar]
  2. M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points. Automat. Remote Control 25 (1964) 1546–1556. [Google Scholar]
  3. M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The probability problem of pattern recognition learning and the method of potential functions. Automat. Remote Control 25 (1964) 1307–1323. [Google Scholar]
  4. M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Automat. Remote Control 25 (1964) 917–936. [Google Scholar]
  5. M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Method of potential functions in the theory of learning machines. Nauka, Moscow (1970). [Google Scholar]
  6. H. Akaike, A new look at the statistical model identification. IEEE Trans. Automat. Control 19 (1974) 716–723. [Google Scholar]
  7. S. Alesker, A remark on the Szarek-Talagrand theorem. Combin. Probab. Comput. 6 (1997) 139–144. [CrossRef] [MathSciNet] [Google Scholar]
  8. N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44 (1997) 615–631. [Google Scholar]
  9. M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999). [Google Scholar]
  10. M. Anthony and N. Biggs, Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science (30). Cambridge University Press, Cambridge (1992). [Google Scholar]
  11. M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications. Discrete Appl. Math. 47 (1993) 207–217. [CrossRef] [MathSciNet] [Google Scholar]
  12. A Antos, L. Devroye and L. Györfi, Lower bounds for Bayes error estimation. IEEE Trans. Pattern Anal. Machine Intelligence 21 (1999) 643–645. [CrossRef] [Google Scholar]
  13. A. Antos, B. Kégl, T. Linder and G. Lugosi, Data-dependent margin-based generalization bounds for classification. J. Machine Learning Res. 3 (2002) 73–98. [CrossRef] [Google Scholar]
  14. A. Antos and G. Lugosi, Strong minimax lower bounds for learning. Machine Learning 30 (1998) 31–56. [CrossRef] [Google Scholar]
  15. P. Assouad, Densité et dimension. Annales de l'Institut Fourier 33 (1983) 233–282. [Google Scholar]
  16. J.-Y. Audibert and O. Bousquet, Pac-Bayesian generic chaining, in Advances in Neural Information Processing Systems 16, L. Saul, S. Thrun and B. Schölkopf Eds., Cambridge, Mass., MIT Press (2004). [Google Scholar]
  17. J.-Y. Audibert, PAC-Bayesian Statistical Learning Theory. Ph.D. Thesis, Université Paris 6, Pierre et Marie Curie (2004). [Google Scholar]
  18. K. Azuma, Weighted sums of certain dependent random variables. Tohoku Math. J. 68 (1967) 357–367. [Google Scholar]
  19. Y. Baraud, Model selection for regression on a fixed design. Probability Theory and Related Fields 117 (2000) 467–493. [Google Scholar]
  20. A.R. Barron, L. Birgé and P. Massart, Risks bounds for model selection via penalization. Probab. Theory Related Fields 113 (1999) 301–415. [Google Scholar]
  21. A.R. Barron, Logically smooth density estimation. Technical Report TR 56, Department of Statistics, Stanford University (1985). [Google Scholar]
  22. A.R. Barron, Complexity regularization with application to artificial neural networks, in Nonparametric Functional Estimation and Related Topics, G. Roussas Ed. NATO ASI Series, Kluwer Academic Publishers, Dordrecht (1991) 561–576. [Google Scholar]
  23. A.R. Barron and T.M. Cover, Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 (1991) 1034–1054. [CrossRef] [MathSciNet] [Google Scholar]
  24. P. Bartlett, S. Boucheron and G. Lugosi, Model selection and error estimation. Machine Learning 48 (2001) 85–113. [Google Scholar]
  25. P. Bartlett, O. Bousquet and S. Mendelson, Localized Rademacher complexities. Ann. Statist. 33 (2005) 1497–1537. [CrossRef] [MathSciNet] [Google Scholar]
  26. P.L. Bartlett and S. Ben-David, Hardness results for neural network approximation problems. Theoret. Comput. Sci. 284 (2002) 53–66. [CrossRef] [MathSciNet] [Google Scholar]
  27. P.L. Bartlett, M.I. Jordan and J.D. McAuliffe, Convexity, classification, and risk bounds. J. Amer. Statis. Assoc., to appear (2005). [Google Scholar]
  28. P.L. Bartlett and W. Maass, Vapnik-Chervonenkis dimension of neural nets, in Handbook Brain Theory Neural Networks, M.A. Arbib Ed. MIT Press, second edition. (2003) 1188–1192. [Google Scholar]
  29. P.L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: risk bounds and structural results. J. Machine Learning Res. 3 (2002) 463–482. [CrossRef] [Google Scholar]
  30. P. L. Bartlett, S. Mendelson and P. Philips, Local Complexities for Empirical Risk Minimization, in Proc. of the 17th Annual Conference on Learning Theory (COLT), Springer (2004). [Google Scholar]
  31. O. Bashkirov, E.M. Braverman and I.E. Muchnik, Potential function algorithms for pattern recognition learning machines. Automat. Remote Control 25 (1964) 692–695. [Google Scholar]
  32. S. Ben-David, N. Eiron and H.-U. Simon, Limitations of learning via embeddings in Euclidean half spaces. J. Machine Learning Res. 3 (2002) 441–461. [CrossRef] [Google Scholar]
  33. G. Bennett, Probability inequalities for the sum of independent random variables. J. Amer. Statis. Assoc. 57 (1962) 33–45. [CrossRef] [Google Scholar]
  34. S.N. Bernstein, The Theory of Probabilities. Gostehizdat Publishing House, Moscow (1946). [Google Scholar]
  35. L. Birgé, An alternative point of view on Lepski's method, in State of the art in probability and statistics (Leiden, 1999), Inst. Math. Statist., Beachwood, OH, IMS Lecture Notes Monogr. Ser. 36 (2001) 113–133. [Google Scholar]
  36. L. Birgé and P. Massart, Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 (1993) 113–150. [CrossRef] [MathSciNet] [Google Scholar]
  37. L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, E. Torgersen D. Pollard and G. Yang Eds., Springer, New York (1997) 55–87. [Google Scholar]
  38. L. Birgé and P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 (1998) 329–375. [CrossRef] [MathSciNet] [Google Scholar]
  39. G. Blanchard, O. Bousquet and P. Massart, Statistical performance of support vector machines. Ann. Statist., to appear (2006). [Google Scholar]
  40. G. Blanchard, G. Lugosi and N. Vayatis, On the rates of convergence of regularized boosting classifiers. J. Machine Learning Res. 4 (2003) 861–894. [CrossRef] [Google Scholar]
  41. A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36 (1989) 929–965. [CrossRef] [Google Scholar]
  42. S. Bobkov and M. Ledoux, Poincaré's inequalities and Talagrands's concentration phenomenon for the exponential distribution. Probab. Theory Related Fields 107 (1997) 383–400. [CrossRef] [MathSciNet] [Google Scholar]
  43. B. Boser, I. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proc. of the Fifth Annual ACM Workshop on Computational Learning Theory (COLT). Association for Computing Machinery, New York, NY (1992) 144–152. [Google Scholar]
  44. S. Boucheron, O. Bousquet, G. Lugosi and P. Massart, Moment inequalities for functions of independent random variables. Ann. Probab. 33 (2005) 514–560. [CrossRef] [MathSciNet] [Google Scholar]
  45. S. Boucheron, G. Lugosi and P. Massart, A sharp concentration inequality with applications. Random Structures Algorithms 16 (2000) 277–292. [CrossRef] [MathSciNet] [Google Scholar]
  46. S. Boucheron, G. Lugosi and P. Massart, Concentration inequalities using the entropy method. Ann. Probab. 31 (2003) 1583–1614. [CrossRef] [MathSciNet] [Google Scholar]
  47. O. Bousquet, A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Acad. Sci. Paris 334 (2002) 495–500. [Google Scholar]
  48. O. Bousquet, Concentration inequalities for sub-additive functions using the entropy method, in Stochastic Inequalities and Applications, C. Houdré E. Giné and D. Nualart Eds., Birkhauser (2003). [Google Scholar]
  49. O. Bousquet and A. Elisseeff, Stability and generalization. J. Machine Learning Res. 2 (2002) 499–526. [CrossRef] [Google Scholar]
  50. O. Bousquet, V. Koltchinskii and D. Panchenko, Some local measures of complexity of convex hulls and generalization bounds, in Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT), Springer (2002) 59–73. [Google Scholar]
  51. L. Breiman, Arcing classifiers. Ann. Statist. 26 (1998) 801–849. [CrossRef] [MathSciNet] [Google Scholar]
  52. L. Breiman, Some infinite theory for predictor ensembles. Ann. Statist. 32 (2004) 1–11. [CrossRef] [MathSciNet] [Google Scholar]
  53. L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees. Wadsworth International, Belmont, CA (1984). [Google Scholar]
  54. P. Bühlmann and B. Yu, Boosting with the l2-loss: Regression and classification. J. Amer. Statis. Assoc. 98 (2004) 324–339. [Google Scholar]
  55. A. Cannon, J.M. Ettinger, D. Hush and C. Scovel, Machine learning with data dependent hypothesis classes. J. Machine Learning Res. 2 (2002) 335–358. [CrossRef] [Google Scholar]
  56. G. Castellan, Density estimation via exponential model selection. IEEE Trans. Inform. Theory 49 (2003) 2052–2060. [Google Scholar]
  57. O. Catoni, Randomized estimators and empirical complexity for pattern recognition and least square regression. Preprint PMA-677. [Google Scholar]
  58. O. Catoni, Statistical learning theory and stochastic optimization. École d'été de Probabilités de Saint-Flour XXXI. Springer-Verlag. Lect. Notes Math. 1851 (2004). [Google Scholar]
  59. O. Catoni, Localized empirical complexity bounds and randomized estimators (2003). Preprint. [Google Scholar]
  60. N. Cesa-Bianchi and D. Haussler, A graph-theoretic generalization of the Sauer-Shelah lemma. Discrete Appl. Math. 86 (1998) 27–35. [CrossRef] [MathSciNet] [Google Scholar]
  61. M. Collins, R.E. Schapire and Y. Singer, Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 (2002) 253–285. [CrossRef] [Google Scholar]
  62. C. Cortes and V.N. Vapnik, Support vector networks. Machine Learning 20 (1995) 1–25. [Google Scholar]
  63. T.M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electronic Comput. 14 (1965) 326–334. [CrossRef] [Google Scholar]
  64. P. Craven and G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 (1979) 377–403. [Google Scholar]
  65. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK (2000). [Google Scholar]
  66. I. Csiszár, Large-scale typicality of Markov sample paths and consistency of MDL order estimators. IEEE Trans. Inform. Theory 48 (2002) 1616–1628. [CrossRef] [MathSciNet] [Google Scholar]
  67. I. Csiszár and P. Shields, The consistency of the BIC Markov order estimator. Ann. Statist. 28 (2000) 1601–1619. [CrossRef] [MathSciNet] [Google Scholar]
  68. F. Cucker and S. Smale, On the mathematical foundations of learning. Bull. Amer. Math. Soc. (2002) 1–50. [Google Scholar]
  69. A. Dembo, Information inequalities and concentration of measure. Ann. Probab. 25 (1997) 927–939. [CrossRef] [MathSciNet] [Google Scholar]
  70. P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, NJ (1982). [Google Scholar]
  71. L. Devroye, Automatic pattern recognition: A study of the probability of error. IEEE Trans. Pattern Anal. Machine Intelligence 10 (1988) 530–543. [CrossRef] [Google Scholar]
  72. L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York (1996). [Google Scholar]
  73. L. Devroye and G. Lugosi, Lower bounds in pattern recognition and learning. Pattern Recognition 28 (1995) 1011–1018. [CrossRef] [Google Scholar]
  74. L. Devroye and T. Wagner, Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans. Inform. Theory 25(2) (1979) 202–207. [Google Scholar]
  75. L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25(5) (1979) 601–604. [Google Scholar]
  76. D.L. Donoho and I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3) (1994) 425–455. [Google Scholar]
  77. R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. John Wiley, New York (1973). [Google Scholar]
  78. R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification. John Wiley and Sons (2000). [Google Scholar]
  79. R.M. Dudley, Central limit theorems for empirical measures. Ann. Probab. 6 (1978) 899–929. [CrossRef] [MathSciNet] [Google Scholar]
  80. R.M. Dudley, Balls in Rk do not cut all subsets of k + 2 points. Advances Math. 31 (3) (1979) 306–308. [Google Scholar]
  81. R.M. Dudley, Empirical processes, in École de Probabilité de St. Flour 1982. Lect. Notes Math. 1097 (1984). [Google Scholar]
  82. R.M. Dudley, Universal Donsker classes and metric entropy. Ann. Probab. 15 (1987) 1306–1326. [CrossRef] [MathSciNet] [Google Scholar]
  83. R.M. Dudley, Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999). [Google Scholar]
  84. R.M. Dudley, E. Giné and J. Zinn, Uniform and universal Glivenko-Cantelli classes. J. Theoret. Probab. 4 (1991) 485–510. [CrossRef] [MathSciNet] [Google Scholar]
  85. B. Efron, Bootstrap methods: another look at the jackknife. Ann. Statist. 7 (1979) 1–26. [CrossRef] [MathSciNet] [Google Scholar]
  86. B. Efron, The jackknife, the bootstrap, and other resampling plans. SIAM, Philadelphia (1982). [Google Scholar]
  87. B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York (1994). [Google Scholar]
  88. A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant, A general lower bound on the number of examples needed for learning. Inform. Comput. 82 (1989) 247–261. [CrossRef] [Google Scholar]
  89. T. Evgeniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett B. Schölkopf and D. Schuurmans, Eds., Cambridge, MA, MIT Press. (2000) 171–203. [Google Scholar]
  90. P. Frankl, On the trace of finite sets. J. Combin. Theory, Ser. A 34 (1983) 41–45. [Google Scholar]
  91. Y. Freund, Boosting a weak learning algorithm by majority. Inform. Comput. 121 (1995) 256–285. [Google Scholar]
  92. Y. Freund, Self bounding learning algorithms, in Proceedings of the 11th Annual Conference on Computational Learning Theory (1998) 127–135. [Google Scholar]
  93. Y. Freund, Y. Mansour and R.E. Schapire, Generalization bounds for averaged classifiers (how to be a Bayesian without believing). Ann. Statist. (2004). [Google Scholar]
  94. Y. Freund and R. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55 (1997) 119–139. [CrossRef] [Google Scholar]
  95. J. Friedman, T. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 (2000) 337–374. [Google Scholar]
  96. M. Fromont, Some problems related to model selection: adaptive tests and bootstrap calibration of penalties. Thèse de doctorat, Université Paris-Sud (December 2003). [Google Scholar]
  97. K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, New York (1972). [Google Scholar]
  98. E. Giné, Empirical processes and applications: an overview. Bernoulli 2 (1996) 1–28. [CrossRef] [MathSciNet] [Google Scholar]
  99. E. Giné and J. Zinn, Some limit theorems for empirical processes. Ann. Probab. 12 (1984) 929–989. [CrossRef] [MathSciNet] [Google Scholar]
  100. E. Giné, Lectures on some aspects of the bootstrap, in Lectures on probability theory and statistics (Saint-Flour, 1996). Lect. Notes Math. 1665 (1997) 37–151. [CrossRef] [Google Scholar]
  101. P. Goldberg and M. Jerrum, Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. Machine Learning 18 (1995) 131–148. [Google Scholar]
  102. U. Grenander, Abstract inference. John Wiley & Sons Inc., New York (1981). [Google Scholar]
  103. P. Hall, Large sample optimality of least squares cross-validation in density estimation. Ann. Statist. 11 (1983) 1156–1174. [MathSciNet] [Google Scholar]
  104. T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics. Springer-Verlag, New York (2001). [Google Scholar]
  105. D. Haussler, Decision theoretic generalizations of the pac model for neural nets and other learning applications. Inform. Comput. 100 (1992) 78–150. [CrossRef] [Google Scholar]
  106. D. Haussler, Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. J. Combin. Theory, Ser. A 69 (1995) 217–232. [Google Scholar]
  107. D. Haussler, N. Littlestone and M. Warmuth, Predicting {0,1} functions from randomly drawn points, in Proc. of the 29th IEEE Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA (1988) 100–109. [Google Scholar]
  108. R. Herbrich and R.C. Williamson, Algorithmic luckiness. J. Machine Learning Res. 3 (2003) 175–212. [CrossRef] [Google Scholar]
  109. W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 (1963) 13–30. [Google Scholar]
  110. P. Huber, The behavior of the maximum likelihood estimates under non-standard conditions, in Proc. Fifth Berkeley Symposium on Probability and Mathematical Statistics, Univ. California Press (1967) 221–233. [Google Scholar]
  111. W. Jiang, Process consistency for adaboost. Ann. Statist. 32 (2004) 13–29. [CrossRef] [MathSciNet] [Google Scholar]
  112. D.S. Johnson and F.P. Preparata, The densest hemisphere problem. Theoret. Comput. Sci. 6 (1978) 93–107. [Google Scholar]
  113. I. Johnstone, Function estimation and gaussian sequence models. Technical Report. Department of Statistics, Stanford University (2002). [Google Scholar]
  114. M. Karpinski and A. Macintyre, Polynomial bounds for vc dimension of sigmoidal and general pfaffian neural networks. J. Comput. Syst. Sci. 54 (1997). [Google Scholar]
  115. M. Kearns, Y. Mansour, A.Y. Ng and D. Ron, An experimental and theoretical comparison of model selection methods, in Proc. of the Eighth Annual ACM Workshop on Computational Learning Theory, Association for Computing Machinery, New York (1995) 21–30. [Google Scholar]
  116. M.J. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11(6) (1999) 1427–1453. [Google Scholar]
  117. M.J. Kearns and U.V. Vazirani, An Introduction to Computational Learning Theory. MIT Press, Cambridge, Massachusetts (1994). [Google Scholar]
  118. A.G. Khovanskii, Fewnomials. Translations of Mathematical Monographs 88, American Mathematical Society (1991). [Google Scholar]
  119. J.C. Kieffer, Strongly consistent code-based identification and order estimation for constrained finite-state model classes. IEEE Trans. Inform. Theory 39 (1993) 893–902. [CrossRef] [MathSciNet] [Google Scholar]
  120. G.S. Kimeldorf and G. Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 (1970) 495–502. [CrossRef] [MathSciNet] [Google Scholar]
  121. P. Koiran and E.D. Sontag, Neural networks with quadratic vc dimension. J. Comput. Syst. Sci. 54 (1997). [Google Scholar]
  122. A.N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114 (1957) 953–956. [MathSciNet] [Google Scholar]
  123. A.N. Kolmogorov and V.M. Tikhomirov, ε-entropy and ε-capacity of sets in functional spaces. Amer. Math. Soc. Transl., Ser. 2 17 (1961) 277–364. [Google Scholar]
  124. V. Koltchinskii, Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 (2001) 1902–1914. [Google Scholar]
  125. V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization. Manuscript (September 2003). [Google Scholar]
  126. V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning, in High Dimensional Probability II, E. Giné, D.M. Mason and J.A. Wellner, Eds. (2000) 443–459. [Google Scholar]
  127. V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 (2002). [Google Scholar]
  128. S. Kulkarni, G. Lugosi and S. Venkatesh, Learning pattern classification – a survey. IEEE Trans. Inform. Theory 44 (1998) 2178–2206. Information Theory: 1948–1998. Commemorative special issue. [CrossRef] [MathSciNet] [Google Scholar]
  129. S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, in UAI-2002: Uncertainty in Artificial Intelligence (2002). [Google Scholar]
  130. J. Langford and M. Seeger, Bounds for averaging classifiers. CMU-CS 01-102, Carnegie Mellon University (2001). [Google Scholar]
  131. M. Ledoux, Isoperimetry and gaussian analysis in Lectures on Probability Theory and Statistics, P. Bernard Ed., École d'Été de Probabilités de St-Flour XXIV-1994 (1996) 165–294. [Google Scholar]
  132. M. Ledoux, On Talagrand's deviation inequalities for product measures. ESAIM: PS 1 (1997) 63–87. [Google Scholar]
  133. M. Ledoux and M. Talagrand, Probability in Banach Space. Springer-Verlag, New York (1991). [Google Scholar]
  134. W.S. Lee, P.L. Bartlett and R.C. Williamson, The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 (1998) 1974–1980. [CrossRef] [MathSciNet] [Google Scholar]
  135. O.V. Lepskiĭ, E. Mammen and V.G. Spokoiny, Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 (1997) 929–947. [Google Scholar]
  136. O.V. Lepskiĭ, A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 (1990) 459–470. [Google Scholar]
  137. O.V. Lepskiĭ, Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatnost. i Primenen. 36 (1991) 645–659. [Google Scholar]
  138. Y. Li, P.M. Long and A. Srinivasan, Improved bounds on the sample complexity of learning. J. Comput. Syst. Sci. 62 (2001) 516–527. [CrossRef] [Google Scholar]
  139. Y. Lin, A note on margin-based loss functions in classification. Technical Report 1029r, Department of Statistics, University Wisconsin, Madison (1999). [Google Scholar]
  140. Y. Lin, Some asymptotic properties of the support vector machine. Technical Report 1044r, Department of Statistics, University of Wisconsin, Madison (1999). [Google Scholar]
  141. Y. Lin, Support vector machines and the bayes rule in classification. Data Mining and Knowledge Discovery 6 (2002) 259–275. [CrossRef] [MathSciNet] [Google Scholar]
  142. F. Lozano, Model selection using Rademacher penalization, in Proceedings of the Second ICSC Symposia on Neural Computation (NC2000). ICSC Adademic Press (2000). [Google Scholar]
  143. M.J. Luczak and C. McDiarmid, Concentration for locally acting permutations. Discrete Math. 265 (2003) 159–171. [CrossRef] [MathSciNet] [Google Scholar]
  144. G. Lugosi, Pattern classification and learning theory, in Principles of Nonparametric Learning, L. Györfi Ed., Springer, Wien (2002) 5–62. [Google Scholar]
  145. G. Lugosi and A. Nobel, Adaptive model selection using empirical complexities. Ann. Statist. 27 (1999) 1830–1864. [CrossRef] [MathSciNet] [Google Scholar]
  146. G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 (2004) 30–55. [MathSciNet] [Google Scholar]
  147. G. Lugosi and M. Wegkamp, Complexity regularization via localized random penalties. Ann. Statist. 2 (2004) 1679–1697. [Google Scholar]
  148. G. Lugosi and K. Zeger, Concept learning using complexity regularization. IEEE Trans. Inform. Theory 42 (1996) 48–54. [CrossRef] [MathSciNet] [Google Scholar]
  149. A. Macintyre and E.D. Sontag, Finiteness results for sigmoidal “neural” networks, in Proc. of the 25th Annual ACM Symposium on the Theory of Computing, Association of Computing Machinery, New York (1993) 325–334. [Google Scholar]
  150. C.L. Mallows, Some comments on Cp. Technometrics 15 (1997) 661–675. [Google Scholar]
  151. E. Mammen and A. Tsybakov, Smooth discrimination analysis. Ann. Statist. 27(6) (1999) 1808–1829. [Google Scholar]
  152. S. Mannor and R. Meir, Weak learners and improved convergence rate in boosting, in Advances in Neural Information Processing Systems 13: Proc. NIPS'2000 (2001). [Google Scholar]
  153. S. Mannor, R. Meir and T. Zhang, The consistency of greedy algorithms for classification, in Proceedings of the 15th Annual Conference on Computational Learning Theory (2002). [Google Scholar]
  154. K. Marton, A simple proof of the blowing-up lemma. IEEE Trans. Inform. Theory 32 (1986) 445–446. [CrossRef] [MathSciNet] [Google Scholar]
  155. K. Marton, Bounding Formula -distance by informational divergence: a way to prove measure concentration. Ann. Probab. 24 (1996) 857–866. [CrossRef] [MathSciNet] [Google Scholar]
  156. K. Marton, A measure concentration inequality for contracting Markov chains. Geometric Functional Analysis 6 (1996) 556–571. Erratum: 7 (1997) 609–613. [Google Scholar]
  157. L. Mason, J. Baxter, P.L. Bartlett and M. Frean, Functional gradient techniques for combining hypotheses, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds., MIT Press, Cambridge, MA (1999) 221–247. [Google Scholar]
  158. P. Massart, Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86, 1998. [Google Scholar]
  159. P. Massart, About the constants in Talagrand's concentration inequalities for empirical processes. Ann. Probab. 28 (2000) 863–884. [CrossRef] [MathSciNet] [Google Scholar]
  160. P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse IX (2000) 245–303. [Google Scholar]
  161. P. Massart, École d'Eté de Probabilité de Saint-Flour XXXIII, chapter Concentration inequalities and model selection, LNM. Springer-Verlag (2003). [Google Scholar]
  162. P. Massart and E. Nédélec, Risk bounds for statistical learning, Ann. Statist., to appear. [Google Scholar]
  163. D.A. McAllester, Some pac-Bayesian theorems, in Proc. of the 11th Annual Conference on Computational Learning Theory, ACM Press (1998) 230–234. [Google Scholar]
  164. D.A. McAllester, pac-Bayesian model averaging, in Proc. of the 12th Annual Conference on Computational Learning Theory. ACM Press (1999). [Google Scholar]
  165. D.A. McAllester, PAC-Bayesian stochastic model selection. Machine Learning 51 (2003) 5–21. [CrossRef] [Google Scholar]
  166. C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics 1989, Cambridge University Press, Cambridge (1989) 148–188. [Google Scholar]
  167. C. McDiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin and B. Reed Eds., Springer, New York (1998) 195–248. [Google Scholar]
  168. C. McDiarmid, Concentration for independent permutations. Combin. Probab. Comput. 2 (2002) 163–178. [Google Scholar]
  169. G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York (1992). [Google Scholar]
  170. S. Mendelson, Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 (2002) 1977–1991. [CrossRef] [MathSciNet] [Google Scholar]
  171. S. Mendelson, A few notes on statistical learning theory, in Advanced Lectures in Machine Learning. Lect. Notes Comput. Sci. 2600, S. Mendelson and A. Smola Eds., Springer (2003) 1–40. [Google Scholar]
  172. S. Mendelson and P. Philips, On the importance of “small” coordinate projections. J. Machine Learning Res. 5 (2004) 219–238. [Google Scholar]
  173. S. Mendelson and R. Vershynin, Entropy and the combinatorial dimension. Inventiones Mathematicae 152 (2003) 37–55. [CrossRef] [MathSciNet] [Google Scholar]
  174. V. Milman and G. Schechman, Asymptotic theory of finite-dimensional normed spaces, Springer-Verlag, New York (1986). [Google Scholar]
  175. B.K. Natarajan, Machine Learning: A Theoretical Approach, Morgan Kaufmann, San Mateo, CA (1991). [Google Scholar]
  176. D. Panchenko, A note on Talagrand's concentration inequality. Electron. Comm. Probab. 6 (2001). [Google Scholar]
  177. D. Panchenko, Some extensions of an inequality of Vapnik and Chervonenkis. Electron. Comm. Probab. 7 (2002). [Google Scholar]
  178. D. Panchenko, Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31 (2003) 2068–2081. [CrossRef] [MathSciNet] [Google Scholar]
  179. T. Poggio, S. Rifkin, S. Mukherjee and P. Niyogi, General conditions for predictivity in learning theory. Nature 428 (2004) 419–422. [CrossRef] [PubMed] [Google Scholar]
  180. D. Pollard, Convergence of Stochastic Processes, Springer-Verlag, New York (1984). [Google Scholar]
  181. D. Pollard, Uniform ratio limit theorems for empirical processes. Scand. J. Statist. 22 (1995) 271–278. [MathSciNet] [Google Scholar]
  182. W. Polonik, Measuring mass concentrations and estimating density contour clusters–an excess mass approach. Ann. Statist. 23(3) (1995) 855–881. [Google Scholar]
  183. E. Rio, Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 (2001) 163–175. [CrossRef] [MathSciNet] [Google Scholar]
  184. E. Rio, Une inegalité de Bennett pour les maxima de processus empiriques, in Colloque en l'honneur de J. Bretagnolle, D. Dacunha-Castelle et I. Ibragimov, Annales de l'Institut Henri Poincaré (2001). [Google Scholar]
  185. B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press (1996). [Google Scholar]
  186. W.H. Rogers and T.J. Wagner, A finite sample distribution-free performance bound for local discrimination rules. Ann. Statist. 6 (1978) 506–514. [CrossRef] [MathSciNet] [Google Scholar]
  187. M. Rudelson, R. Vershynin, Combinatorics of random processes and sections of convex bodies. Ann. Math, to appear (2004). [Google Scholar]
  188. N. Sauer, On the density of families of sets. J. Combin. Theory, Ser A 13 (1972) 145–147. [Google Scholar]
  189. R.E. Schapire, The strength of weak learnability. Machine Learning 5 (1990) 197–227. [Google Scholar]
  190. R.E. Schapire, Y. Freund, P. Bartlett and W.S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26 (1998) 1651–1686. [Google Scholar]
  191. B. Schölkopf and A. J. Smola, Learning with Kernels. MIT Press, Cambridge, MA (2002). [Google Scholar]
  192. D. Schuurmans, Characterizing rational versus exponential learning curves, in Computational Learning Theory: Second European Conference. EuroCOLT'95, Springer-Verlag (1995) 272–286. [Google Scholar]
  193. C. Scovel and I. Steinwart, Fast rates for support vector machines. Los Alamos National Laboratory Technical Report LA-UR 03-9117 (2003). [Google Scholar]
  194. M. Seeger, PAC-Bayesian generalisation error bounds for gaussian process classification. J. Machine Learning Res. 3 (2002) 233–269. [CrossRef] [Google Scholar]
  195. J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson and M. Anthony, Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inform. Theory 44 (1998) 1926–1940. [Google Scholar]
  196. S. Shelah, A combinatorial problem: Stability and order for models and theories in infinity languages. Pacific J. Mathematics 41 (1972) 247–261. [Google Scholar]
  197. G.R. Shorack and J. Wellner, Empirical Processes with Applications in Statistics. Wiley, New York (1986). [Google Scholar]
  198. H.U. Simon, General lower bounds on the number of examples needed for learning probabilistic concepts, in Proc. of the Sixth Annual ACM Conference on Computational Learning Theory, Association for Computing Machinery, New York (1993) 402–412. [Google Scholar]
  199. A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA (2000). [Google Scholar]
  200. A.J. Smola, B. Schölkopf and K.-R. Müller, The connection between regularization operators and support vector kernels. Neural Networks 11 (1998) 637–649. [Google Scholar]
  201. D.F. Specht, Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification. IEEE Trans. Neural Networks 1 (1990) 111–121. [CrossRef] [Google Scholar]
  202. J.M. Steele, Existence of submatrices with all possible columns. J. Combin. Theory, Ser. A 28 (1978) 84–88. [Google Scholar]
  203. I. Steinwart, On the influence of the kernel on the consistency of support vector machines. J. Machine Learning Res. (2001) 67–93. [Google Scholar]
  204. I. Steinwart, Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Inform. Theory 51 (2005) 128–142. [CrossRef] [MathSciNet] [Google Scholar]
  205. I. Steinwart, Support vector machines are universally consistent. J. Complexity 18 (2002) 768–791. [CrossRef] [MathSciNet] [Google Scholar]
  206. I. Steinwart, On the optimal parameter choice in v-support vector machines. IEEE Trans. Pattern Anal. Machine Intelligence 25 (2003) 1274–1284. [CrossRef] [Google Scholar]
  207. I. Steinwart, Sparseness of support vector machines. J. Machine Learning Res. 4 (2003) 1071–1105. [CrossRef] [Google Scholar]
  208. S.J. Szarek and M. Talagrand, On the convexified Sauer-Shelah theorem. J. Combin. Theory, Ser. B 69 (1997) 183–192. [Google Scholar]
  209. M. Talagrand, The Glivenko-Cantelli problem. Ann. Probab. 15 (1987) 837–870. [CrossRef] [MathSciNet] [Google Scholar]
  210. M. Talagrand, Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 (1994) 28–76. [Google Scholar]
  211. M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l'I.H.E.S. 81 (1995) 73–205. [Google Scholar]
  212. M. Talagrand, The Glivenko-Cantelli problem, ten years later. J. Theoret. Probab. 9 (1996) 371–384. [CrossRef] [MathSciNet] [Google Scholar]
  213. M. Talagrand, Majorizing measures: the generic chaining. Ann. Probab. 24 (1996) 1049–1103. (Special Invited Paper). [CrossRef] [MathSciNet] [Google Scholar]
  214. M. Talagrand, New concentration inequalities in product spaces. Inventiones Mathematicae 126 (1996) 505–563. [Google Scholar]
  215. M. Talagrand, A new look at independence. Ann. Probab. 24 (1996) 1–34. (Special Invited Paper). [Google Scholar]
  216. M. Talagrand, Vapnik-Chervonenkis type conditions and uniform Donsker classes of functions. Ann. Probab. 31 (2003) 1565–1582. [CrossRef] [MathSciNet] [Google Scholar]
  217. M. Talagrand, The generic chaining: upper and lower bounds for stochastic processes. Springer-Verlag, New York (2005). [Google Scholar]
  218. A. Tsybakov. On nonparametric estimation of density level sets. Ann. Stat. 25 (1997) 948–969. [Google Scholar]
  219. A.B. Tsybakov, Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135–166. [Google Scholar]
  220. A.B. Tsybakov, Introduction à l'estimation non-paramétrique. Springer (2004). [Google Scholar]
  221. A. Tsybakov and S. van de Geer, Square root penalty: adaptation to the margin in classification and in edge estimation. Ann. Statist., to appear (2005). [Google Scholar]
  222. S. Van de Geer, A new approach to least-squares estimation, with applications. Ann. Statist. 15 (1987) 587–602. [CrossRef] [MathSciNet] [Google Scholar]
  223. S. Van de Geer, Estimating a regression function. Ann. Statist. 18 (1990) 907–924. [CrossRef] [MathSciNet] [Google Scholar]
  224. S. van de Geer, Empirical Processes in M-Estimation. Cambridge University Press, Cambridge, UK (2000). [Google Scholar]
  225. A.W. van der Waart and J.A. Wellner, Weak convergence and empirical processes. Springer-Verlag, New York (1996). [Google Scholar]
  226. V. Vapnik and A. Lerner, Pattern recognition using generalized portrait method. Automat. Remote Control 24 (1963) 774–780. [Google Scholar]
  227. V.N. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York (1982). [Google Scholar]
  228. V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995). [Google Scholar]
  229. V.N. Vapnik, Statistical Learning Theory. John Wiley, New York (1998). [Google Scholar]
  230. V.N. Vapnik and A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264–280. [CrossRef] [Google Scholar]
  231. V.N. Vapnik and A.Ya. Chervonenkis, Theory of Pattern Recognition. Nauka, Moscow (1974). (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin (1979). [Google Scholar]
  232. V.N. Vapnik and A.Ya. Chervonenkis, Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory Probab. Appl. 26 (1981) 821–832. [Google Scholar]
  233. M. Vidyasagar, A Theory of Learning and Generalization. Springer, New York (1997). [Google Scholar]
  234. V. Vu, On the infeasibility of training neural networks with small mean squared error. IEEE Trans. Inform. Theory 44 (1998) 2892–2900. [CrossRef] [MathSciNet] [Google Scholar]
  235. M. Wegkamp, Model selection in nonparametric regression. Ann. Statist. 31(1) (2003) 252–273. [Google Scholar]
  236. R.S. Wenocur and R.M. Dudley, Some special Vapnik-Chervonenkis classes. Discrete Math. 33 (1981) 313–318. [CrossRef] [MathSciNet] [Google Scholar]
  237. Y. Yang, Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45(7) (1999) 2271–2284. [Google Scholar]
  238. Y. Yang, Minimax nonparametric classification. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45(7) (1999) 2285–2292. [Google Scholar]
  239. Y. Yang, Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica 10 (2000) 1069–1089. [MathSciNet] [Google Scholar]
  240. V.V. Yurinksii, Exponential bounds for large deviations. Theory Probab. Appl. 19 (1974) 154–155. [Google Scholar]
  241. V.V. Yurinksii, Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6 (1976) 473–499. [CrossRef] [MathSciNet] [Google Scholar]
  242. T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 (2004) 56–85. [CrossRef] [MathSciNet] [Google Scholar]
  243. D.-X. Zhou, Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory 49 (2003) 1743–1752. [CrossRef] [MathSciNet] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.