Compressed Empirical Measures (in finite dimensions)
Annales Henri Lebesgue, Volume 9 (2026), pp. 471-605

Metadata

Keywords Kernel methods ,  coresets ,  compressed empirical measure

Abstract

We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs). In this context, the empirical measure is contained within a natural convex set and can be approximated using convex optimization methods. Such an approximation gives rise to a coreset of data points. A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set. The bulk of our work is concerned with deriving high probability lower bounds on the size of such a ball under various conditions and in various settings: we show how conditions on the density of the data and the kernel function can be used to infer such lower bounds; we further develop an approach that uses a lower bound on the smallest eigenvalue of a covariance operator to provide lower bounds on the size of such a ball; we extend the approach to approximate covariance operators and we show how it can be used in the context of kernel ridge regression. We also derive compression guarantees when standard algorithms like the conditional gradient method are used and we discuss variations of such algorithms to improve the runtime of these standard algorithms. We conclude with a construction of an infinite dimensional RKHS for which the compression is poor, highlighting some of the difficulties one faces when trying to move to infinite dimensional RKHSs.


References

[AHPV05] Agarwal, Pankaj K.; Har-Peled, Sariel; Varadarajan, Kasturi R. Geometric approximation via coresets, Combinatorial and computational geometry (Mathematical Sciences Research Institute Publications), Volume 52, Cambridge University Press, 2005, pp. 1-30 | Zbl

[Aro50] Aronszajn, Nachman Theory of reproducing Kernels, Trans. Am. Math. Soc., Volume 68 (1950), pp. 337-404 | DOI | Zbl | MR

[BBDG19] Briol, François-Xavier; Barp, Alessandro; Duncan, Andrew B.; Girolami, Mark Statistical inference for generative models with maximum mean discrepancy (2019) | arXiv

[BBZ07] Blanchard, Gilles; Bousquet, Olivier; Zwald, Laurent Statistical properties of kernel principal component analysis, Mach. Learn., Volume 66 (2007) no. 2-3, pp. 259-294 | DOI | Zbl

[BCC + 22] Braun, Gábor; Carderera, Alejandro; Combettes, Cyrille W.; Hassani, Hamed; Karbasi, Amin; Mokhtari, Aryan; Pokutta, Sebastian Conditional gradient methods (2022) | arXiv | Zbl

[Bee93] Beer, Gerald Topologies on closed and closed convex sets, Mathematics and its Applications (Dordrecht), 268, Kluwer Academic Publishers, 1993 | Zbl | DOI | MR

[BHPI02] Bādoiu, Mihai; Har-Peled, Sariel; Indyk, Piotr Approximate clustering via coresets, Proceedings of the thirty-fourth annual ACM symposium on theory of computing (STOC 2002), Montreal, ACM Press, 2002, pp. 250-257 | Zbl | MR

[BLJO12] Bach, Francis; Lacoste-Julien, Simon; Obozinski, Guillaume On the equivalence between herding and conditional gradient algorithms, ICML 2012 International Conference on Machine Learning (2012) (https://hal.science/hal-00681128)

[BLM13] Boucheron, Stéphane; Lugosi, Gábor; Massart, Pascal Concentration inequalities: A nonasymptotic theory of independence, Oxford University Press, 2013 | Zbl | DOI | MR

[BM03] Bartlett, Peter L.; Mendelson, Shahar Rademacher and Gaussian complexities: risk bounds and structural results, J. Mach. Learn. Res., Volume 3 (2003) no. 3, pp. 463-482 | DOI | Zbl | MR

[BT04] Beck, Amir; Teboulle, Marc A conditional gradient method with linear rate of convergence for solving convex linear systems, Math. Methods Oper. Res., Volume 59 (2004) no. 2, pp. 235-247 | DOI | Zbl | MR

[CM09] Campbell, Stephen L.; Meyer, Carl D. Generalized inverses of linear transformations, Classics in Applied Mathematics, 56, Society for Industrial and Applied Mathematics, 2009 | DOI | Zbl | MR

[CWS10] Chen, Yutian; Welling, Max; Smola, Alexander Supersamples from kernel-herding, Proceedings of the Conference on Uncertainty in Artificial Intelligence (2010) (https://arxiv.org/abs/1203.3472)

[DF93] Defant, Andreas; Floret, Klaus Tensor norms and operator ideals, North-Holland Mathematics Studies, 176, North-Holland, 1993 | Zbl | MR

[DM21] Dwivedi, Raaz; Mackey, Lester Kernel thinning (2021) | arXiv | Zbl

[DU77] Diestel, Joseph; Uhl, J. Jerry jun. Vector measures, Mathematical Surveys, 15, American Mathematical Society, 1977 | Zbl | DOI | MR

[Dud14] Dudley, Richard M. Uniform central limit theorems, Cambridge Studies in Advanced Mathematics, 142, Cambridge University Press, 2014 | DOI | Zbl | MR

[Eng89] Engelking, Ryszard General topology, Sigma Series in Pure Mathematics, 6, Heldermann Verlag, 1989 | Zbl | MR

[Fre01] Fremlin, David H. Measure theory, Torres Fremlin, 2001

[FW56] Frank, Marguerite; Wolfe, Philip An algorithm for quadratic programming, Nav. Res. Logist. Q., Volume 3 (1956) no. 1-2, pp. 95-110 | DOI | MR

[FW95] Floyd, Sally; Warmuth, Manfred K. Sample compression, learnability, and the Vapnik–Chervonenkis dimension, Mach. Learn., Volume 21 (1995), pp. 269-304 | DOI

[GBR + 12] Gretton, Arthur; Borgwardt, Karsten M.; Rasch, Malte J.; Schölkopf, Bernhard; Smola, Alexander A Kernel two-sample test, J. Mach. Learn. Res., Volume 13 (2012), pp. 723-773 | Zbl | MR

[GN16] Giné, Evarist; Nickl, Richard Mathematical foundations of infinite-dimensional statistical models, Cambridge Series in Statistical and Probabilistic Mathematics, 40, Cambridge University Press, 2016 | DOI | Zbl | MR

[HCB16] Huggins, Jonathan H.; Campbell, Trevor; Broderick, Tamara Coresets for scalable bayesian logistic regression, Proceedings of the 30th International Conference on Neural Information Processing Systems (Advances in Neural Information Processing Systems), Volume 29, Curran Associates, Inc. (2016), pp. 4080-4088

[HS14] Harvey, Nick; Samadi, Samira Near-optimal herding, Proceedings of The 27th Conference on Learning Theory (Proceedings of Machine Learning Research), Volume 35, PMLR (2014), pp. 1165-1182

[Kol18] Koltchinskii, Vladimir Asymptotic efficiency in high-dimensional covariance estimation, Proceedings of the international congress of mathematicians 2018, ICM 2018. Volume IV. Invited lectures, World Scientific; Sociedade Brasileira de Matemática (2018), pp. 2903-2923 | DOI | Zbl | MR

[LLW23] Li, Jian; Liu, Yong; Wang, Weiping Optimal convergence rates for agnostic Nyström kernel learning, Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research), Volume 202, PMLR (2023), pp. 19811-19836

[LT91] Ledoux, Michel; Talagrand, Michel Probability in Banach spaces. Isoperimetry and processes, Classics in Mathematics, Springer, 1991 | DOI | Zbl | MR

[LW86] Littlestone, Nick; Warmuth, Manfred K. Relating data compression and learnability (1986) (Technical report)

[MFSS17] Muandet, Krikamol; Fukumizu, Kenji; Sriperumbudur, Bharath; Schölkopf, Bernhard Kernel mean embedding of distributions: A review and beyond, Found. Trends Mach. Learn., Volume 10 (2017) no. 1-2, pp. 1-144 | DOI | Zbl

[Mur90] Murphy, Gerard J. C * -algebras and operator theory, Academic Press Inc., 1990 | Zbl | MR

[Pin94] Pinelis, Iosif Optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab., Volume 22 (1994) no. 4, pp. 1679-1706 | DOI | Zbl | MR

[Pin99] Pinelis, Iosif Correction : optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab., Volume 27 (1999) no. 4, p. 2119-2119 | DOI | MR

[PR16] Paulsen, Vern I.; Raghupathi, Mrinal An introduction to the theory of reproducing kernel Hilbert spaces, Cambridge Studies in Advanced Mathematics, 152, Cambridge University Press, 2016 | DOI | Zbl | MR

[PT20] Phillips, Jeff M.; Tai, Wai Ming Near-optimal coresets of kernel density estimates, Discrete Comput. Geom., Volume 63 (2020) no. 4, pp. 867-887 | DOI | Zbl | MR

[RCR15] Rudi, Alessandro; Camoriano, Raffaello; Rosasco, Lorenzo Less is more: Nyström computational regularization, Advances in Neural Information Processing Systems, Volume 28, Curran Associates, Inc. (2015), pp. 1657-1665

[Roc97] Rockafellar, R. Tyrrell Convex analysis, Princeton Mathematical Series, 28, Princeton University Press, 1997 | DOI | Zbl | MR

[RS72] Reed, Michael; Simon, Barry Methods of modern mathematical physics. 1: Functional analysis, Academic Press Inc., 1972 | Zbl | MR

[SC08] Steinwart, Ingo; Christmann, Andreas Support vector machines, Information Science and Statistics, Springer, 2008 | DOI | Zbl | MR

[Sim11] Simon, Barry Convexity: an analytic viewpoint, Cambridge Tracts in Mathematics, 187, Cambridge University Press, 2011 | DOI | Zbl | MR

[SSM97] Schölkopf, Bernhard; Smola, Alexander; Müller, Klaus-Robert Kernel principal component analysis, Artificial Neural Networks — ICANN’97 (Lecture Notes in Computer Science), Volume 1327, Springer (1997), pp. 583-588 | DOI

[SSRR20] Sterge, Nicholas; Sriperumbudur, Bharath; Rosasco, Lorenzo; Rudi, Alessandro Gain with no pain: efficiency of kernel-PCA by Nyström sampling, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Volume 108, PMLR (2020), pp. 3642-3652

[Tem11] Temlyakov, Vladimir N. Greedy approximation, Cambridge Monographs on Applied and Computational Mathematics, 20, Cambridge University Press, 2011 | DOI | Zbl | MR

[Wer02] Werner, Dirk Funktionalanalysis, Springer-Lehrbuch, Springer, 2002 | Zbl

[WS00] Williams, Christopher K. I.; Seeger, Matthias Using the Nyström method to speed up kernel machines, Proceedings of the 14th International Conference on Neural Information Processing Systems, MIT Press (2000), p. 661–667