Random polytopes and the wet part for arbitrary probability distributions

We examine how the measure and the number of vertices of the convex hull of a random sample of $n$ points from an arbitrary probability measure in $\mathbf{R}^d$ relates to the wet part of that measure. This extends classical results for the uniform distribution from a convex set [B\'ar\'any and Larman 1988]. The lower bound of B\'ar\'any and Larman continues to hold in the general setting, but the upper bound must be relaxed by a factor of $\log n$. We show by an example that this is tight.


Introduction and Main Results
Let K be a convex body (convex compact set with non-empty interior) in R d , and let X n = {x 1 , . . . , x n } be a random sample of n uniform independent points from K. The set P n = conv X n is a random polytope in K. For t ∈ [0, 1) we define the wet part K t of K: The name "wet part" comes from the mental picture when K is in R 3 and contains water of volume t Vol K. Bárány and Larman [2] proved that the measure of the wet part captures how well P n approximates K in the following sense: Theorem 1]). There are constants c and N 0 depending only on d such that for every convex body K in R d and for every n > N 0 1 4 Vol K 1/n ≤ E[Vol(K \ P n )] ≤ Vol K c/n . By Efron's formula (see (2) below), this directly translates into bounds for the expected number of vertices of P n , see Section 1.2.
1.1. Results for general measures. The notions of random polytope and wet part extend to a general probability measure µ defined on the Borel sets of R d . The definition of a µ-random polytope P µ n is clear: X n is a sample of n random independent points chosen according to µ, and P µ n = conv X n . The wet part W µ t is defined as there is a halfspace h with x ∈ h and µ(h) ≤ t }. The µ-measure of the wet part is denoted by w µ (t) := µ(W µ t ). Here is an extension of Theorem 1 to general measures: Theorem 2. For any probability measure µ in R d and n ≥ 2, where ε d (n) → 0 as n → +∞ and is independent of µ.
A similar upper bound, albeit with worse constants, follows from a result of Vu [13,Lemma 4.2], which states that P µ n contains R d \ W µ c ln n/n with highprobability. Since a containment with high probability is usually stronger than an upper bound in expectation, one may have hoped that the log n/n in the upper bound of Theorem 2 can be reduced. Our main result shows that this is not possible, not even in the plane: Theorem 3. There exists a probability measure ν on R 2 such that for infinitely many n.
The measure that we construct actually has compact support and can be embedded into R d for any d ≥ 2. It will be apparent from the proof that the same construction has the stronger property that for every constant C > 0, the inequality E[1 − µ(P ν n )] > 1 2 · w ν (C log 2 n/n) holds for infinitely many values n.

1.2.
Consequences for f -vectors. Let f 0 (P µ n ) denote the number of vertices of P µ n . For non-atomic measures (measures where no single point has positive probability), Efron's formula [7] relates E [f 0 (P µ n )] and E [µ(P µ n )]: For any measure, this still holds as an inequality: The measure that is constructed in Theorem 3 is non-atomic. As a consequence, Theorems 2 and 3 give the following bounds for the number of vertices: where ε d (n) → 0 as n → +∞ and is independent of µ. (ii) There exists a non-atomic probability measure ν on R 2 such that for infinitely many n.
Theorem 4 follows from Theorems 2 and 3 except that Efron's Formula (2) induces a shift in indices, as it relates f 0 (P µ n ) to µ(P µ n−1 ). This shift affects only the constant in the lower bound of Theorem 4(i), which goes from 1 4 to 1 e , see Section 3.1.
The upper bound of Theorem 4(i) fails for general distributions. For instance, if µ is a discrete distribution on a finite set, then w µ (t) = 0 for any t smaller than the mass of any single point and the upper bound cannot hold uniformly as n → ∞. Of course, in that case Inequality (3) is strict.
For convex bodies, the number f i (P n ) of i-dimensional faces of P n can also be controlled via the measure of the wet part since Bárány [1] proved that E[f i (P n )] = Θ(n Vol K 1/n ) for every 0 ≤ i ≤ d − 1. No similar generalization is possible for Theorem 2. Indeed, consider a measure µ in R 4 supported on two circles, one on the (x 1 , x 2 )-plane, the other in the (x 3 , x 4 )-plane, and uniform on each circle; P µ n has Ω(n 2 ) edges almost surely.
Before we get to the proofs of Theorems 2 (Section 3.2) and 3 (Section 4), we discuss in Section 2 a key difference between the wet parts of convex bodies and of general measures.

Wet part: convex sets versus general measures
A key ingredient in the proof of the upper bound of Theorem 1 in [2] is that for a convex body K in R d , the measure of the wet part K t cannot change too abruptly as a function of t: If c ≥ 1, then where c is a constant that depends only on c and d [2,Theorem 7]. In particular, a multiplicative factor can be taken out of the volume parameter of the wet part and the upper bound in Theorem 1 can be equivalently expressed as (This is in fact how of the upper bound of Theorem 1 is actually formulated in [2,Theorem 1].) This alternative formulation shows immediately that the lower bound of Theorem 1 (and hence also of Theorem 2) cannot be improved by more than a constant.
2.1. Two circles and a sharp drop. The right inequality in (4) does not extend to general measures. An easy example showing this is the following "drop construction". It is a probability measure µ in the plane supported on two concentric circles, uniform on each of them, and with measure p on the outer circle. Let τ denote the measure of a halfplane externally tangent to the inner circle; remark that τ < p/2. The measure w µ (t) of the wet part drops at t = τ : We can make this drop arbitrarily sharp by choosing a small p. In particular, for any given c , setting p < 1 c makes it impossible to fulfill the right inequality in (4) for t < τ < ct.
This example also challenges Inequality (5). As shown in Figure 1 (top), the function w µ (1/n) has a sharp drop, while E [1 − µ(P µ n )] shifts from the higher to the lower branch of the step in a gradual way. For this construction, the straightforward extension of Theorem 1 would imply that E[1 − µ(P µ n )] remains within a constant multiplicative factor of w µ (1/n). Thus, E[1 − µ(P µ n )] would have to follow the steep drop.
n · w(1/n)  Figure 1. The quantities involved in Theorems 1-4 for the drop construction with p = 1/100, when the outer circle has twice the radius of the inner circle. Top: E[1 − µ(P µ n )] and w(1/n), the x-axis being a logarithmic scale. Bottom: E[f 0 (P µ n )] and n · w(1/n) on a doubly-logarithmic scale.

2.2.
A drop for the number of vertices. The fact that E[1 − µ(P µ n )] cannot drop too sharply is more easily seen by examining E [f 0 (P µ n )]. Since the measure defined in Equation (6) is non-atomic, Efron's Formula (2) applies, so let us compare E [f 0 (P µ n )] and n · nw µ (1/n). As illustrated in Figure 1 (bottom), n · w µ (1/n) has a sawtooth shape with a sharp drop from 300 to 3 at n = 300, and E[f 0 (P n )] does actually shift from the higher to the lower branch of the sawtooth, in a gradual way.
The fact that E[f 0 (P µ n )] can decrease is perhaps surprising at first sight, but this phenomenom is easy to explain: We pick random points one by one. As long as all points lie on the inner circle, f 0 (P µ n ) = n. The first point to fall on the outer circle swallows a constant fraction of the points into the interior of P µ n , while adding only a single new point on the convex hull, causing a big drop. This happens around n ≈ 1/p. Again, the straightforward extension of Theorem 1 would imply that E[f 0 (P n )] follows the steep drop. Yet, on average, a single additional point can reduce f 0 (P n ) by a factor of at most 1/2. Hence, the drop of E[f 0 (P n )] cannot be so abrupt as the drop of n · w µ (1/n), for p small enough.

2.3.
A sequence of drops. We prove Theorem 3 in Section 4 by an explicit construction that sets up a sequence of such drops. The function n · w µ (1/n) reaches larger and larger peaks as n increases, while dropping down more and more steeply between those peaks. Our proof of Theorem 3 will not actually refer to any drop or oscillating behavior. We will simply identify a sequence of values n = n 1 , n 2 , . . . for which E[1 − µ(P µ n )] is larger than 1 2 w µ (log 2 n/n).

Open questions.
It is an outstanding open problem whether a drop as exhibited by our two-circle construction can occur for the uniform selection from a convex body: Can the expectation of the number of vertices of a random polytope decrease in such a setting? This is impossible in the plane [6] or for the three dimensional ball [4], but open in general. See [5] and the discussion therein. Perhaps Theorem 1 remains valid for some restricted class of measures µ, for instance, logconcave measures. One approach to circumvent the "impossibility result" of Theorem 3 would be to first extend (4) and establish that for c > 1 there is c such that for all t > 0 The second step would derive from this property the extension of Theorem 1. We don't know if any of these two steps is valid.
We can weaken the claim of Theorem 1 in a different way, while maintaining it for all measures. For example, it is plausible that the upper bound in the theorem holds for a subset of numbers n ∈ N of positive density. On the other hand we do not know if there is a measure for which the bound of Theorem 1 is valid only for a finite number of natural numbers.

Proof of Theorem 2
Let µ be a probability measure in R d . For better readability we drop all superscripts µ.
3.1. Lower bound. The proof of the lower bound is similar to the one in the convex-body case. For every fixed point x ∈ W t , by definition, there exists a half-space h with x ∈ h and µ(h) ≤ t. If h ∩ P n is empty, then x is not in P n , and therefore, for x ∈ W t , Then, for any t, We choose t = 1/n. Since the sequence (1 − 1 n ) n is increasing, for n ≥ 2 we To obtain the analogous lower bound from Theorem 4(i), we write Again, choosing t = 1/n yields the claimed lower bound since the sequence (1 − 1 n ) n−1 is now decreasing to 1 e .

3.2.
Floating bodies and ε-nets. Before we turn our attention to the upper bound, we will point out a connection to ε-nets. Consider a probability space (U, µ) and a family H of measurable subsets of U . An ε-net for In the special case where U = (R d , µ) and H consists of all half-spaces, if a set S is an ε-net, then the convex hull P of S contains R d \ W ε . Indeed, assume that there exists a point x in R d \ W ε and not in P . Consider a closed halfspace h that contains x and is disjoint from P . Since x / ∈ W ε we must have µ(h) > ε and S cannot be an ε-net.
We call the region R d \ W ε the floating body of the measure µ with parameter ε, by analogy to the case of convex bodies. The relation between floating bodies and ε-nets was first observed by Van Vu, who used the ε-net Theorem to prove that P µ n contains R d \ W c log n/n with high probability [13, Lemma 4.2] (a fact previously established by Bárány and Dalla [3] when µ is the normalized Lebesgue measure on a convex body). This implies that, with high probability, 1−µ(P n ) ≤ w(c log n/n). The analysis we give in Section 3.3 refines Vu's analysis to sharpen the constant. Note that Theorem 3 shows that Vu's result is already asymptotically best possible.

Upper bound.
For d = 1, the proof of the upper bound is straightforward and may actually be improved. Indeed, we have w(t) = min{2t, 1}, and Efron's Formula (3) yields We will therefore assume d ≥ 2.
We use a lower bound on the probability of a random sample of U to be an ε-net for (U, µ, H). We define the shatter function (or growth function) of the family H as . Let (U, µ) be a probability space and H a family of measurable subsets of U . Let X s be a sample of s random independent elements chosen according to µ. For any integer N > s, the probability that X s is not a ε-net for (U, µ, H) is at most Lemma 5 is a quantitative refinement of a foundational result in learning theory [14,Theorem 2]. It is commonly used to prove that small ε-nets exist for range spaces of bounded Vapnik-Chervonenkis dimension [9], see also [12,Theorem 3.1] or [11,Theorem 15.5]. For that application, it is sufficient to show that the probability of failure is less than 1; This works for ε ≈ d ln n/n (with appropriate lower-order terms), where d is the Vapnik-Chervonenkis dimension. In our proof, we will need a smaller failure probability of order o(1/n), and we will achieve this by setting ε ≈ (d + 2) ln n/n. We will apply the lemma in the case where U = R d and H is the set of halfspaces in R d . We mention that by increasing ε more agressively, the probability of failure can be made exponentially small.
For the family H of halfspaces in R d , we have the following sharp bound on the shatter function [8]: The proof of the upper bound of Theorem 2 starts by remarking that for any ε ∈ [0, 1] we have: Here, the first inequality between the probabilities holds since the event x / ∈ P n trivially implies that R d \ W ε ⊆ P n when x ∈ R d \ W ε . We thus have We now want to set ε so that Pr[R d \ W ε ⊆ P n ] is ε d (n) n with ε d (n) → 0 as n → ∞. As shown in Section 3.2, the event R d \ W ε ⊆ P n implies that P n fails to be an ε-net. The probability can thus be bounded from above using Lemma 5 with s = n. Taking logarithms, for any N > n, We set N = n ln n , so that: ln Pr[R d \ W ε ⊆ P n ] ≤ d ln n + d ln ln n + ((N − n)ε − 1) ln(1 − n N ) + ln 2. We then set ε = δ ln n n , with δ ≈ d to be fine-tuned later. If n is large enough, the factor ((N − n)ε − 1) ≈ δ ln 2 n is nonnegative, and we can use the inequality ln(1 − x) ≤ −x for x ∈ [0, 1) in order to bound the second term: Altogether, we get n with ε d (n) → 0 as n → ∞. Setting δ = d + 2 yields the claimed bound.

Proof of Theorem 3
In this section, logarithms are base 2. For better readability we drop the superscripts ν.

The construction.
The measure ν is supported on a sequence of concentric circles C 1 , C 2 , . . ., where C i has radius On each C i , ν is uniform, implying that ν is rotationally invariant. We let D i = j≥i C j . For i ≥ 1 we put ν(D i ) = s i := 4 · 2 −2 i and remark that ν(R 2 ) = s 1 = 1, so ν is a probability measure. The sequence {s i } i∈N decreases very rapidly. The probabilities of the individual circles are The infinite sequence of values n for which we claim the inequality of Theorem 3 is In Section 4.2, we examine the wet part and prove that w( log n i n i ) ≤ s i . We then want to establish the complementary bound E [1 − ν(P n i )] > s i /2. Since ν is non-atomic, Efron's formula yields and it suffices to establish that E [f 0 (P n i +1 )] > (n i + 1)s i /2. This is what we do in Section 4.3.

4.2.
The wet part. Let us again drop the superscript ν. Let h i be a closed halfplane that has a single point in common with C i , so its bounding line is tangent to C i . We have So, as t decreases, w(t) drops step by step, each step being from s i to s i+1 . In particular, For j > i, the portion of C j contained in h i is equal to 2 arccos(r i /r j ). Hence, We will bound the term arccos(r i /r j ) by a more explicit expression in terms of i. To get rid of the arccos function, we use the fact that cos x ≥ 1 − x 2 /2 for all x ∈ R. We obtain, for 0 ≤ y ≤ 1, Moreover, the ratio r i /r j can be bounded as follows: Thus we deduce that .
We have established a bound on arccos(r i /r j )/π, which is the fraction of a single circle C j that is contained in h i . Hence, considering all circles C j with j > i together, we get We check that for i ≥ 4, πi for all i ≥ 4. Using (8), this gives our desired bound: for all i ≥ 4. With little effort, one can show that actually w( log n i n i ) = s i . One can also see that, for any C > 0, the condition w(C log n i n i ) ≤ s i holds if i is large enough, because the exponential factor 2 −i dominates any constant factor C in the last chain of inequalities. This justifies the remark that we made after the statement of Theorem 3.

4.3.
The random polytope. Assume now that X n is a set of n points sampled independently from ν. We intend to bound from below the expectation E [f 0 (conv X n i +1 )]. Observe that for any n ∈ N one has Intuitively, as n varies in the range near n i , many points of X n lie on C i and yet no point of X n lies in D i+1 . So P n has, in expectation, at least np i ≈ ns i vertices. At the same time, the term w(log n/n) in the claimed lower bound drops to s i . So the expected number of vertices is about ns i which is larger than 1 2 ns i = n 2 w(log n/n). Formally, we estimate the expected number of vertices when n = n i + 1: The last square bracket tends to 1 as i → ∞. In particular, it is larger than 1 2 for i ≥ 4. This shows that for all i ≥ 4

Higher dimension.
We can embed the plane containing ν in R d for d ≥ 3. The analysis remains true but the random polytope is of course flat with probability 1. To get a full-dimensional example, we can replace each circle by a (d − 1)-dimensional sphere, all other parameters being kept identical: all spheres are centered in the same point, C i has radius 1 − 1 i+1 , the measure is uniform on each C i and the measure of ∪ j≥i C i is 4 · 2 −2 i . The analysis holds mutatis mutandis.
As another example, which does not require new calculations, we can combine ν with the uniform distribution on the edges of a regular (d − 2)dimensional simplex in the (d − 2)-dimensional subspace orthogonal to the plane that contains the circles, mixing the two distributions in the ratio 50 : 50.
In all our constructions, the measure is concentrated on lower-dimensional manifolds of R d , circles, spheres, or line segments. If a continuous distribution is desired, one can replace each circle in the plane by a narrow annulus and each sphere by a thin spherical shell, without changing the characteristic behaviour.

An alternative treatment of atomic measures
Even for measures with atoms, one can give a precise meaning to Efron's formula: The expression in (1) counts the expected number of convex hull vertices of P n that are unique in the sample X n . From this, it is obvious that Efron's formula (2) is a lower bound on E[f 0 (P n )] (3).
For dealing with atomic measures, there is alternative possibility. The resulting statements involve different quantities than our original results, but they have the advantage of holding for every measure. We denote bȳ f 0 (X n ) the number of points of the sample X n that lie on the boundary of their convex hull P n , counted with multiplicity in case of coincident points. We denote byP n the interior of P n . Then a derivation analogous to (1)(2) leads to the following variation of Efron's formula: We emphasize that we mean the boundary and interior with respect to the ambient space R d , not the relative boundary or interior. Even for some non-atomic measures, this gives different results. Consider the uniform distribution on the boundary of an equilateral triangle. Then E[f 0 (X n )] = n, while E[f 0 (P n )] ≤ 6. Accordingly, E[µ(P n )] = 0, while E[µ(P n )] converges to 1.
We denote the closure of the wet part W µ t byW µ t and its measure bȳ w µ (t) := µ(W µ t ). With these concepts, we can prove the following analogs of Theorems 2-4. Observe that for a measure µ for which for every hyperplane H, µ(H) = 0 the content of this theorem is the same as the previous ones.
Proof sketch. Since the derivation is parallel to the proofs in Sections 3-4, we only sketch a few crucial points.
(i) For proving the lower bound in (10), we modify the initial argument leading to (7): For every fixed x ∈ W t , there is a closed half-space h with x ∈ h whose corresponding open halfspaceh has measure µ(h) ≤ t. Therefore, The remainder of the proof can be adapted in a straightforward way. In Section 3.2, we have established that for an ε-net S, its convex hull P contains R d \ W ε . Since the interior operator is monotone, this implies that R d \W ε ⊆P . Therefore, the ε-net argument of Section 3.3 applies to the modified setting and establishes the upper bound in (10).
(ii) The lower-bound construction of Theorem 3 gives zero measure to every hyperplane, and therefore all quantities in part (ii) are equal to the corresponding quantites in Theorem 3 and Theorem 4(ii).