A Generalization of Hierarchical Exchangeability on Trees to Directed Acyclic Graphs

Motivated by problems in Bayesian nonparametrics and probabilistic programming discussed in Staton et al. (2018), we present a new kind of partial exchangeability for random arrays which we call DAG-exchangeability. In our setting, a given random array is indexed by certain subgraphs of a directed acyclic graph (DAG) of finite depth, where each nonterminal vertex has infinitely many outgoing edges. We prove a representation theorem for such arrays which generalizes the Aldous-Hoover representation theorem. In the case that the DAGs are finite collections of certain rooted trees, our arrays are hierarchically exchangeable in the sense of Austin and Panchenko (2014), and we recover the representation theorem proved by them. Additionally, our representation is fine-grained in the sense that representations at lower levels of the hierarchy are also available. This latter feature is important in applications to probabilistic programming, thus offering an improvement over the Austin-Panchenko representation even for hierarchical exchangeability.


Introduction
Motivated by issues arising in the study of spin glasses, in [AP14], Austin and Panchenko consider a random array indexed by the paths of a collection of infinitary rooted trees of finite depth, where each path starts from the root of a tree and ends at a leaf of the tree.A random array is hierarchically exchangeable, defined in [AP14], if its joint distribution remains invariant under rearrangements that preserve the structure of each tree in the forest (collection of trees) underlying the index set (see Example 3.3(c) below for a more precise description).In their work, Austin and Panchenko prove that such tree-indexed arrays have a representation in the spirit of the celebrated Aldous-Hoover representation for exchangeable arrays of random variables.In the special case where all trees in the collection have a depth of one, i.e., are copies of N rooted at ∅ (see Figure 1), then hierarchical exchangeability reduces to separate exchangeability also known as row-column exchangeability.The number of trees in the collection corresponds to the dimension of the random array.We refer to [Kal05,Ch. 7] (see also [Ald85,Aus12]) for the definition of separate exchangeability and for additional background on what are now classic results in the theory of exchangeable random arrays.
In this work, motivated by exchangeable random processes used in Bayesian nonparametrics for modeling data and by problems related to probabilistic programming, we consider a type of partial exchangeability on random arrays indexed by certain subgraphs of a directed acyclic graph (DAG) of finite depth, where each nonterminal vertex has infinitely many outgoing edges.In analogy to [AP14] we say that a random array, indexed by such subgraphs of a DAG, is DAG-exchangeable if its joint distribution remains invariant under rearrangements that preserve the structure of the DAG; a precise description is given later.
Our main result proves a representation theorem for such arrays which generalizes the Aldous-Hoover representation theorem.In the case that the DAG is a collection of certain rooted trees, our arrays are hierarchically exchangeable, and we recover the representation theorem proved in [AP14].Our representation is fine-grained in the sense that we are immediately able to also obtain representations for sub-DAGs which are closed with respect to a partial order that we discuss later.In the special case of hierarchical exchangeability, this allows us to represent arrays corresponding to higher levels of the hierarchy.
There are many reasons for considering such generalized random arrays as we do here.On a purely mathematical level, the study of various forms of partial exchangeability has been vibrant since the 1980's, and two somewhat recent surveys of the early results in this field are given in [Aus08] and [Ald09].Indeed, in the foundational work [Hoo79], the question of when representations arise for partially exchangeable random arrays is posed in Section 7. On the level of applications, we provide a summary of our motivations with respect to Bayesian nonparametric models and probabilistic programming in the next section; a forthcoming companion paper explains these applications in much more detail.
Following the applications presented in the next section, the rest of the paper is organized as follows.In Section 3, we introduce the precise setup by which we work, present some examples from the theory which motivate our main result, and finally define our notion of partial exchangeability.The main result is presented in Section 4, while its proof comprises Section 5.In the appendix we indicate an alternate route of proving our representation, without the fine-graining discussed above.This alternate method is model-theoretic and is based on the work of [CT17].
Since the paper is notationally heavy, we offer here a printable notation guide which may be a useful reference to readers.In terms of applications, our motivation comes from studying generative models of array-like structures through probabilistic programming languages 1 .These are high-level languages for statistical modeling that come equipped with a separate Bayesian inference engine (such as MCMC simulation).
In that context, the first attempt at modeling using infinite exchangeable arrays would be simply to use the Aldous-Hoover representation.For a 2-dimensional array, this kind of programming involves the following basic operations: • find a fresh array (formally, draw a sample from a uniform distribution); • pick a fresh row in a given array (formally, draw a sample from a uniform distribution); • pick a fresh column in a given array (formally, draw a sample from a uniform distribution); • enquire as to the contents of an array in a given row and column (use the representing function together with another uniform sample).
However, an Aldous-Hoover representation might not be efficient or even computable [AAF + , FR12].In many cases it is preferable to use an implementation that appears to have the same interface but which, internally, builds up the array lazily (on-the-fly).This is analogous to using a Polỳa urn to simulate the Beta-Bernoulli relationship without ever directly sampling from a Beta distribution.Sampling from a distribution satisfies a property known in computer science as 'dataflow': operations can be freely reordered as long as the flow of data is respected.Any computer implementation, however lazy, and whether based on urns, stick-breaking or so forth, will still satisfy the dataflow property.For random arrays, the dataflow property coincides with the statistical property of exchangeability [SYA + 17, SSY + 18].
From this programming perspective it is natural and easy to design more elaborate generative models.For a simple extension, consider an array where each cell in the array itself contains an infinite array.This would amount to the following additional functionality: • pick a fresh subrow in a given cell (a given row and column) in a given array; • pick a fresh subcolumn in a given cell in a given array; 1 At this stage, Church [GMR + 08] and Anglican [WvdMM14,TvdMYW16] have some support for advanced Bayesian nonparametric models, through the XRP feature and the produce/absorb constructs for random processes, but we are really thinking of a next generation of probabilistic programming languages, e.g.[SYA + 17], with proper module and library functionality.
• enquire as to the contents of an array in a given subrow and subcolumn of a given cell in a given array.
If we also retain the idea of each cell containing a value, as well as containing another array, then this functionality is what we call 'fine-grained': although we can enquire based on all the indices (the array, row, column, subrow, and subcolumn) we may also enquire as to the value based on a subset of the indices (the array, row and column) alone.
If such a complex generative model still has the dataflow property, i.e. is suitably exchangeable, then our theorem says that, apart from the computability issues, it could just have well been implemented by sampling from uniform distributions.Thus dataflow in programming and exchangeability in statistics are intimately connected more generally.

Setup
In this section, we describe a formal setting for specifying random arrays indexed by certain subgraphs of an acyclic directed graph (DAG), and also for expressing their probabilistic symmetries that arise from the structure of the DAG.

Overview of Infinitary DAGs and Their Finite Presentations
To begin, we recall the index sets of four well-known exchangeable random arrays, and explain how their elements correspond to subgraphs of infinitary DAGs of finite depth; these subgraphs will serve as indices of the random arrays we will consider.Infinitary here means each vertex, other than a terminal vertex, has infinitely many outgoing edges.For (a) an exchangeable sequence of de Finetti type, (b) a row-column exchangeable array and (c) a general ℓ-dimensional exchangeable array of Aldous-Hoover type, the index sets are respectively given by In the hierarchical-exchangeability setting of [AP14], case (c) is generalized so that each separately exchangeable 'dimension' of the array is indexed, not by numbers in N, but rather by sequences of numbers in N r .The index set in this case is All four index sets above are associated with infinitary DAGs, in fact forests or collections of trees, of finite depth.A given index in each of the four index sets then corresponds to a collection of leaves of the associated DAG (one leaf from each tree in the forest).While it is possible to also view indices of general DAGs as a collection of "terminal vertices", for complete generality, it is much more convenient to associate a subgraph to this collection of leaves-this subgraph will just be a collection of paths (one path from each tree in the forest).For instance, the set N in (a) is associated with a tree of depth 1 whose root has infinitely many children, i.e. a single tree in Figure 1, and each index α ∈ N corresponds to a path from the root to the α-th leaf.The index set in (c') is associated with a DAG consisting of ℓ infinitary trees of depths r 1 , . . ., r ℓ , respectively.The element corresponds to a subgraph made out of ℓ paths, where the i-th path starts from the root of the i-th tree and repeatedly moves toward the leaves by taking the v (i) j -th child at step j until the path hits a leaf. ; rF igure 2: An Austin-Panchenko forest with ℓ trees.
The presence of such an associated DAG and the correspondence between indices and subgraphs is not accidental.There is a general method for constructing an index set by first building an infinitary DAG of finite depth.The four index sets that we discussed above can all be naturally constructed by this method.
Assume that a finite DAG G is given.This G depicts the skeleton of an infinitary DAG to be built.In case (a), the finite DAG G is just a single vertex v and no edges; in case (b), G is the DAG with two vertices, r (row) and c (column), and no edges; in case (c'), it is the DAG consisting of ℓ disjoint paths of length r 1 , . . ., r ℓ (see Figure 3).
Next, we generate an infinitary DAG G ′ from G by making infinitely many copies of vertices of G and connecting these copies by edges in an appropriate manner which we now describe.For every vertex v of G, define the downset of v to be for some v.The right way to understand such a vertex α is as one of the infinitely many copies of v of G that is assigned an identifier α.This identifier is then used to connect the various copied vertices with directed edges.In particular, there is a directed edge from α is a directed edge of G and β| Dv = α.For each of (a), (b), (c) and (c') from above, the associated infinitary DAGs are constructed this way, and they turn out to be forests (see Figure 2).We will soon see examples where the infinitary DAGs are not forests (see Figure 6).Finally, we set the index set to be {α : α is a function from the vertex set of G to N}.
One should view this as a collection of subgraphs of G ′ in the following way: For each α : G → N, the subgraph of G ′ associated to α has vertex and edge sets

Formal Setting and Examples
We will often abuse notation by writing G when we are referring to the vertex set of G.
Recall that a finite DAG is the same thing as a finite partially ordered set (that is, a set with a binary relation that is reflexive, transitive, and anti-symmetric).In particular, for a given DAG, we use the partial order on its vertices: v w whenever there is a path from v to w.This is the reflexive-transitive closure of its acyclic directed edge relation.Conversely, we can regard a finite partially ordered set as a DAG with a directed edge v → w if v ≺ w and there is no v ′ such that v ≺ v ′ ≺ w (here, ≺ denotes that reflexivity does not hold).
We say that a subset W of a DAG G is (ancestrally) closed (with respect to the partial ordering) if whenever w ′ ∈ W and w w ′ we have w ∈ W .Let us point out that in the context of hierarchical exchangeability, being "higher" in the hierarchy than v corresponds to ancestorship of v or equivalently (and somewhat confusingly) being in the closure or the downset of v, i.e. satisfying v.
Definition 3.1.Let C be a closed subset of some fixed finite DAG G.A C-type multi-index is a function α from the vertices in C to N. We write N C for the set of all C-type multi-indices.If C ⊆ D for some closed D, then every D-type multi-index α ∈ N D can be restricted to a C-type multi-index α| C ∈ N C .Definition 3.2.Let X be a Borel space, C ⊆ G be closed, and C be a sequence of distinct closed sets.A C-type random array in X is a family of random variables, indexed by C-type multi-indices.Here each X α is a random variable valued in X .A C -type random array collection in X is a sequence of random variable families where each X C is a C-type random array.

Remarks.
1. We have generalized to collections of random arrays in order to obtain later, in our main result, a fine-grained representation which also allows for representations on sub-DAGs.As mentioned in the abstract, in the special case of hierarchical exchangeability, this allows for an improvement over the Austin-Panchenko representation which is important for applications.
2. We often do not mention the Borel space X , and just say C-type random array and/or C -type random array collection.
Example 3.3.Most discretely indexed stochastic processes can be viewed as G-type random arrays for some DAG G.We illustrate this perspective with basic examples from the literature matching those described in Section 3.1.
(a) The most common discrete-time stochastic processes are G-type random arrays where G is the graph with only a single vertex v and no edges.The index set N G in this case is N {v} ≃ N. Thus, these G-type random arrays are N-indexed families of random variables.
(b) The graph with two vertices r, c and no edges corresponds to infinite random matrices or random arrays indexed by N 2 .In other words, the index set N G is N {r,c} ≃ N 2 .Thus, a G-type index is a pair of two numbers, one denoting a row index and the other denoting a column index.These G-type random arrays have the form (X n,m : n, m ∈ N) and are random matrices with countably many rows and columns. v Figure 3: The DAG for multi-path-indexed random arrays in [AP14] (c') Let r 1 , . . ., r ℓ be nonnegative integers.Austin and Panchenko studied a stochastic process indexed by a tuple of paths over ℓ countably-branching trees that have heights r 1 , . . ., r ℓ , respectively [AP14].Formally, this process is a family of random variables of the following form: This stochastic process is a G-type array for the DAG G in Figure 3.So Thus, (X α : α ∈ N r1 × . . .× N r ℓ ) is the same thing as a G-type array.
Of course, our formal setting is not limited to just recasting well-known exchangeable stochastic processes.Its recipe for defining multi-indices via a DAG makes it easy to define a random array with unusual multi-indices.Furthermore, by moving from random arrays to random array collections, we can express multiple random-variable families whose multi-index sets are related.
Example 3.4.In order to illustrate the generality of our setting, we present three instance of DAGexchangeable arrays that we have not seen discussed previously in the exchangeability literature.
(i) Sequences of Random Matrices: The multi-index set for the DAG G in Figure 4 is Figure 4: The DAG for sequences of random matrices N {s,r,c} which can be thought of as an infinite sequence of matrices of the type found in Example 3.3.For α ∈ N G , the number α(s) tells us which matrix to look at, while α(r) and α(c) tell us which row and column of the matrix to look at.This example gives us an additional aid on how to view an infinitary DAG G ′ associated with some finite G.For instance, simply using method described in Section 3.1, the infinitary G ′ is just some tree.However, per the discussion above, it is natural to identify vertex (α(s), α(r)) of G ′ with row α(r) of matrix α(s), and to identify the vertex (α(s), α(c)) of G ′ with column α(c) of matrix α(s); such identifications helps us to arrange/interpret the tree in an intuitive manner.
( ; ; , which can be understood as indices of an infinite matrix each of whose entry is again an infinite matrix.For α ∈ N G , the pair (α(r 0 ), α(c 0 )) specifies the row and column of the outer matrix, and (α(r 1 ), α(c 1 )) those of the nested matrix.Thus, in a random G-type array, each random variable X α stores the value of the (α(r 1 ), α(c 1 ))-th entry of the nested matrix, which is itself stored at the (α(r 0 ), α(c 0 ))th entry of the outer nesting matrix.Let C = {r 0 , c 0 }, a closed subset of G.A small generalization of the random block matrix, alluded to in Section 2, is a random structure that is simultaneously a random matrix (Ex.3.3(b)) and a random block matrix.This can be thought of as a random matrix where each cell contains both a value in X and another random matrix.It comprises both a C-type random array and a G-type random array.In other words, it is a (C, G)-type random array collection.
(iii) Random Walls: Here is another example of a random array collection where C is different from {G}.Consider the graph G consisting of three vertices x, y, z and no edges.Let C be the following closed sets: A C -type random array collection consists of three random variable families, namely, X Cxy , X Cyz and X Czx (see Figure 7).These families use different yet related multi-index sets, N {x,y} , N {y,z} , and N {z,x} , respectively.A good way to understand this array collection is to imagine a 3-dimensional grid at points in N {x,y,z} .The collection associates a random variable for each point in the xy, yz and zx planes with the respective missing coordinate set to 1. Viewing the tuple (X Cxy , X Cyz , X Czx ) in this way, rather than just as three 2-dimensional random arrays, makes it easy to state and study symmetries which involve all three families, as we explain soon.
By definition, a G-automorphism τ induces a bijection on the C-type multi-indices β ∈ N C for any closed C ⊆ G: Slightly abusing notation, we reuse τ to denote this induced map.
Also, a bijection τ : Definition 3.6.Let C be a closed subset of some fixed finite DAG G.We say that a C-type random array X C is DAG-exchangeable if for every G-automorphism τ , DAG-exchangeability generalizes several popular notions of exchangeability from the literature, if we choose G appropriately and set C to be the singleton set consisting of {G}.For instance, when the graph G consists of only one vertex and does not have any edges, DAG-exchangeability becomes the standard notion of exchangeability for random sequences in de Finetti's classic result.When G is the graph for the infinite random matrix of Example 3.

Main Result
In this section we present our main result.Let G be a finite DAG, and recall that for each closed set C, N C is the set of C-type multi-indices.To state our result, we denote the set of (ancestrally) closed subsets of C by Also, set and Dom : where Dom(α) is the domain of α for each multi-index α ∈ I G , i.e. the set of vertices where α is defined.
Let us start with a simpler version of our main representation theorem; this version makes it easy to compare our result with the existing representation theorems in the literature.
Theorem 4.1.If a G-type random array X is DAG-exchangeable, then there exists a measurable function f such that where α| C is the restriction of α to the vertices in C, and the U β are independent [0, 1]-uniform random variables.
In the statement of the theorem, we removed one level of indirection in the indices of random variables, and wrote X α instead of technically accurate X G,α .
We prove this theorem in two different ways, probabilistically and model-theoretically.The first probabilistic proof in fact shows a more refined representation theorem to be presented next, and employs the type of arguments found in probability theory literature, such as [Kal05] and [AP14].The second model-theoretic proof uses a result of Crane and Towsner on the representation of relatively exchangeable random structures [CT17], and appears in Appendix A. Crane and Towsner's result has been formulated and proved in a model-theoretic setting.A large part of our second proof is about translating the graph-theoretic statement of Theorem 4.1 to a model-theoretic one in Crane and Towsner's representation theorem, and showing that after translation, the statement satisfies the conditions of Crane and Towsner, and when translated backwards, their conclusion gives the claimed representation of our theorem.
One limitation of Theorem 4.1 is that it does not apply to C -type random array collections in general; it does so only when C is the singleton set {G}.For instance, it does not provide a representation for the last example (X Cxy , X Cyz , X Czx ) in the previous section.
Our main result lifts this limitation and refines the representation of Theorem 4.1.We say that a G-automorphism τ of N G fixes α ∈ I G if τ (α) = α.Let C be a sequence of distinct closed sets of vertices.Let (X C : C ∈ C ) be a C -type random array collection, for which we often write simply (X C ).We can define F α to be the sub-σ-field of σ (X C ) consisting of (X C )-measurable events that are invariant under every α-fixing G-automorphism τ : One can think of F α as containing the information about the symmetries in the arrays which fix the multi-index α.The above definition takes a little time to digest.We give another representation of this σ-field in (19) below, which may help the reader by giving an alternative perspective.
Since the elements of each X C take values in a Borel space and each F α is countably generated, we can choose some random array S = (S α : α ∈ I G ) (after extending the underlying probability space if needed) so that 1. σ(S α ) = F α for all α, and 2. the random array collection (S C : To actually construct such S, for each C ∈ C , choose one α ∈ N C and find f such that The definition is independent of the choice of τ , and S constructed this way satisfies the desired properties. The array S contains all the information about the DAG symmetries in the arrays (X C ).Note that if β ∈ I G is a restriction of α, the random variable For α ∈ N G , we define The former consists of all restrictions of the multi-index α, while the latter takes only the strict restrictions of α.
Theorem 4.2.Let C be a sequence of distinct closed sets, and let (X C : C ∈ C ) be a C -type random array collection.If (X C : C ∈ C ) is DAG-exchangeable, there exist a family of measurable functions where The representation in the theorem gives rise to a representation of (X C ) in terms of U, which is similar to the one found in standard representation theorems for an exchangeable family of random variables.More concretely, we can use induction and convert g Dom(α) to a function f Dom(α) for each α such that The key part of this inductive conversion is to set f Dom(α) using the following equation: Now note that X C,α is measurable with respect to S α by the choice of S α .Furthermore, X C,α takes values in a Borel space.Thus, there exists a measurable h α such that X C,α = h α (S α ) almost surely.By the DAG-exchangeability of the collection of ordered pairs (X C , S C ) , we can pick h α such that it depends only on Dom(α) and not on the value of α itself.This means that we can write X C,α = h Dom(α) (S α ) almost surely.By combining this observation and the representation for (S α : α ∈ I G ) in (8), we obtain: for some family of measurable functions (h One key benefit of the representation in our theorem is that it tells us how the global information encoded in U gets used by the array S, and it also tells us what information is shared by two random variables S α and S β .As noted before, the array S captures the various restricted versions of partial exchangeability of (X C ).This fine-grained representation eventually allows us to prove Theorem 4.2 inductively.
Remark.Before getting into the proof of the main result, we recall a generic property of exchangeable structures.Whenever X = (X : n ∈ N) is exchangeable, by Kolmogorov's extension theorem, this is equivalent to the seemingly weaker condition that the distribution of X is invariant under the action of finite permutations (permutations fixing all but finitely many elements).In particular, if X is exchangeable, then X = (X n : n ∈ N) for any injection τ .(In fact, Ryll-Nardzewski's theorem tells us the converse is also true.)We can extend this sort of argument to other random variables associated to the symmetries of X.
Let Y be X-measurable and let K be a subgroup of the infinite permutations.By definition, Y is invariant under the action of K if and only if (Y, X) ) for any injection ρ on N such that its arbitrary restriction to finite sets can be extended to an element in K.
In the setup of this paper, N and K correspond to N G and the group of G-automorphisms, respectively.Call a function τ between subsets of N G a G-homomorphism if τ satisfies the condition in (1) for all α, β in its domain, instead of N G .Then, our discussion so far implies that if any G-homomorphism between finite subsets of N G can be extended to a G-automorphism on N G , the distribution of the array is invariant under the action of G-homomorphic injections.This extendability property is called ultrahomogeneity.It appears in model-theoretic results on exchangeability (see for example [CT17]).We discuss this in the appendix, along with the proof that N G and the group of G-automorphisms satisfy the ultrahomogeneity condition.

Proof of the Main Theorem
We will sometimes write (ξ a ) a∈I to mean a family of random variables, and also refer to such a family as an array.Also, we will use the following notation for conditional distribution properties.
(ξ and η are conditionally independent given F .) • ⊥ ⊥ F (ξ a ) a∈I (the family (ξ a ) a∈I is conditionally independent given F .) The first lemma is a standard result from probability theory, of which the proof we omit.
Lemma 5.1.Let (ξ a , η a ) a∈I be a multi-indexed family of random variables, and let F be a σ-field.Assume the following hold: The next lemma, which is a simple application of the previous result, is used to synchronize representations using different functions.
Lemma 5.2.Let ξ 0 , (ξ a ) a∈I be random variables such that ξ 0 d = ξ a , and let ζ = (ζ a ) a∈I be a family of independent random variables, which are also independent from ξ 0 , (ξ a ) a∈I .Let (η a ) a∈I be random variables such that for some Borel measurable functions φ a , the following hold: Our overall plan to prove the main result is to (i) obtain representations for closed proper subsets of G by using induction, and then (ii) glue everything into a joint distributional equality using conditional independence arguments for the S α 's.The above lemmas allow for basic gluing.More complicated gluing will follow from more sophisticated conditional independence arguments for which our next key proposition is one example.We first need more terminology.
Let the set of all terminal vertices (vertices with no descendants) of G be denoted by Also, define and for each Clearly, we have The result may seem obvious at first glance, and it is indeed easy to prove for the case m = 1.However for m > 1, the joint conditional independence seems to be a rather subtle issue.We postpone the proof of this proposition to the next subsection.Using this result, we obtain the following corollary.
We immediately also obtain the following result.
Proposition 5.5.For k = 0, . . ., t, let Then, given G k , the set H k+1 is an independent family of σ-fields for all k < t.
By Corollary 5.4, we have we also have that G k\A , from which the result follows.
The following lemma is the final piece of the puzzle joining all the representations obtained from the inductive hypothesis.For each J ⊆ I G and generic array (ξ α : α ∈ I G ), we will write There exists a Borel measurable function f G such that for any array of independent [0, 1]-uniform random variables where S J def = (S β : β ∈ J) for each J ⊆ I G .
The proof of this lemma is similar to that of Proposition 5.3.We postpone it for now in order to present the proof of the main theorem.
Proof of Theorem 4.2.Without loss of generality we will assume that C is the set A G with some fixed ordering.We use induction on the number of vertices of G.The n = 1 case is simply the de Finetti-Hewitt-Savage theorem.(See Lemma 7.1 and Theorem 1.1, [Kal05] for example.)Now assume that the theorem holds for all DAGs where the number of vertices is less than n, and let |G| = n.Recall the notation T and t in (9) for, respectively, the set of terminal vertices and its cardinality, and the symbol G 0 in (10) for the set of nonterminal vertices.The case where |T | = 1 is obtained directly from Lemma 5.6 and the inductive hypothesis.In the rest of the proof, we assume that |T | = t > 1.
By the inductive hypothesis, there exist Borel functions (g C : C ∈ A G0 ) and (g s C : C ∈ A G0∪{vs} ) for s = 1, . . ., t, as well as an array of independent [0, 1]-uniform random variables, U, which we can assume to be independent from (S α : α ∈ I G ), such that and Now, set By repeatedly using the second equation of (13), we can express each S s α with α ∈ I G0∪{vs} in terms of the S s β 's and U γ 's with β ∈ I G0 and γ ∈ I G0∪{vs} \ I G0 .The resulting equations can be written as θ s = F s (ξ s , ζ s ) for an appropriate measurable F s .By Proposition 5.5, ⊥ ⊥ ξ (η s ) s .By construction, = ξ s are independent from (ζ s ) s , and (ξ, η s ) d = (ξ s , θ s ).Therefore, by Lemma 5.2, we have Thus we can join the representations given by ( 12) and ( 13) to obtain the following joint distributional equality where S 1 α = S ′ α if α ∈ I G0 and the rest of the S 1 α 's are defined by the recursive formulae This is a joint representation of in the case where k = 1.We will next induct on k to achieve an analogous joint representation for k = t − 1.Let k < t − 1 be fixed and assume that we have the following joint representation: Since we have assumed that we have a representation for every proper sub-DAG of G, then for any fixed B ⊆ T with |B| = k + 1 (note that k + 1 ≤ t − 1), we have some Borel measurable functions (g B C : Define the following arrays (to ease notation we do not use boldface for these): where Finally, by repeatedly using the second line of (16), we can express each S B α with α ∈ I G0∪B in terms of the S B β 's in ϕ B and the U γ 's in ψ B .Thus, for an appropriate F B , we have Also, by the same reasoning but with the equation in ( 15) instead of the one in ( 16), we get for some F .Now, by Lemma 5.2, and in particular Moreover, we have ⊥ ⊥ an independent family which is also independent from ϑ.Thus, a slight variation of Lemma 5.1 shows that Therefore, we have where S k+1 α = S k α for α ∈ I G0∪A , A ⊆ T, |A| = k and for other α ∈ N G0∪B , the random variable S k+1 α is defined through the recursive formulae Thus we have built a (k + 1)-version of (15).By induction on k, we obtain the following representation, which involves everything except for the S α 's for α ∈ N G .

Proof of Proposition 5.3 and Lemma 5.6
Let us say that an automorphism τ is separate if for all v ∈ G, there exists The term comes from the fact that an array is separately exchangeable if and only if its distribution is invariant under the action of every separate automorphism.Note that every DAG-exchangeable array X is automatically separately exchangeable.Therefore, for X, we may define F sep α to be the σ-field of all events which are invariant under the action of any separate automorphism that fixes some multi-index α ∈ N H .We emphasize that F sep α is defined for any α ∈ N H for any arbitrary subset H ⊂ G which is not necessarily closed.We will use the letter H below, to denote arbitrary subsets of G.
The missing ingredient, common to the proofs of both Proposition 5.3 and Lemma 5.6, at least in the setting of separate exchangeability, is the following conditional independence result which appears as Corollary 5.6 in the celebrated paper of Hoover [Hoo79]: Proposition 5.7.Define F sep α as above.Let I 1 , I 2 , I 3 ⊆ ∪ H⊆G N H be such that, for all α 1 ∈ I 1 , Since the above result is essentially for separately exchangeable arrays, in order to apply this result, we must first establish a relationship between σ-fields related to separate exchangeability and DAG-exchangeability, respectively.The next two lemmas will serve to establish that relationship.
Fix a sequence C of distinct elements of A G and for α ∈ N G , define We can re-write the definition of F α for α ∈ I G as Let us point out that an α-fixing automorphism τ fixes α| C for C ∈ A Dom(β) , but not for every arbitrary subset H.For a subset H ⊆ G, not necessarily closed, let H o denote the largest subset of H which is closed in G.The closed graph H o is well-defined since a union of closed subsets is again closed.For example, if The closure of a graph H is the smallest closed graph containing H. where and Proof.Define an injection τ k : N → N, where Since ρ 1 fixes α, we have ρ 1 (E) = E for all E ∈ F α and we can easily check that ρ n 1 (F ) ∈ G n α for any event F .Thus, by acting ρ n 1 on X, we obtain F α ⊆ G n α .Now we show the other direction.Consider an automorphism ρ that fixes α and also such that ρ(β)(v) = β(v) whenever there is u v such that β(u) > n.We claim that G n α is invariant under the action of ρ.Note that as we take n → ∞, such ρ's generate all the finite automorphisms fixing α.
. Therefore, combined with the remark at the end of Section 4, this shows that ∩ n≥1 G n α is invariant under any automorphism fixing α.Therefore, we have and β(u) = α(u) for all u v, and ρ 2 (β)(v) = β(v) + 1 otherwise.Then, ρ 2 is an injection fixing α.
Claim.If H ⊆ Dom(α) and H is not a closed subset, then for all β such that β| H = α| H and We have ρ n 2 (E) = E for all E ∈ F α .By the claim, we have By Lemma 5.8, we immediately obtain that Similarly to the proof of the previous lemma, one can show that G n α o is fixed by any automorphism τ that fixes α o and τ (β)(v) = β(v) whenever β(u) > n for some u v.This shows that F sep α ⊆ F α o .Now we are ready to complete the proof of our main results.
Proof of Proposition 5.3.By Lemma 5.8, we have However, Lemmas 5.8 and 5.9 imply that Theorem A.1 (Crane, Towsner).Let M = (I, R 1 , . . ., R n ) be a countably infinite set I with equivalence relations R k on it, and X a Borel space.Assume that • M is an ultrahomogeneous structure, and Then, given a family of X -valued random variables X = (X α : α ∈ I), if the family is relatively exchangeable with respect to M (in short, M -exchangeable), there exists a measurable function f such that where the collection of all equivalence classes [α] R k of α with respect to R k 's, partially-ordered by set inclusion, and Remark.The original theorem [CT18] has an additional condition that M satisfies the so-called ω-DAP condition up to the R k 's.In this paper, we consider only a special case of the theorem, and in that case, this condition always holds.It is thus omitted in our presentation of the theorem.
Most of the boldfaced terms are concepts from model theory.In the rest of this subsection, we explain slightly simplified versions of their definitions.For official definitions and detailed backgrounds of these terminologies, see Crane and Towsner's papers [CT17,CT18].
A structure M of type n for some natural number n is a tuple (I, R 1 , . . ., R n ) of a set I and binary relations {R k } on I.When another structure N = (J, S 1 , . . ., S n ) of the same type satisfies J ⊆ I and S k ⊆ R k for all k, we say that it is a substructure of M .A common way of generating a substructure is to restrict M with a subset J 0 of I: An embedding τ from a structure N = (J, S 1 , . . ., S n ) to a structure M = (I, R 1 , . . ., R n ) is a function τ : J → I such that τ is injective and satisfies Here [n] = {m ∈ N : 1 ≤ m ≤ n}.Note that an embedding from N to M implies that N is essentially the same as M | τ (J) , and provides a sense that N is a substructure of M modulo renaming of elements of N .When the embedding is surjective and M = N , we call τ an automorphism.
Crane and Towsner used a structure M = (I, R 1 , . . ., R n ) with a countably infinite I, to specify an index set for a random-variable family and also a symmetry property of that family.The index set is I itself.They say that a family of random variables X = (X α : α ∈ I) with this index set is relatively exchangeable with respect to M or M -exchangeable if for all finite subsets J of I and embeddings τ : where τ (X) def = X τ (α) : α ∈ I .Embeddings play the role of finite permutations on N in the standard notion of exchangeability for random sequences.
Nearly all of the remaining terminology in Theorem A.1 describe properties on a structure M = (I, R 1 , . . ., R n ).More specifically, they impose requirements on the R k 's, and in doing so, they gauge the M -exchangeability condition.
Definition A.2.The structure M is ultrahomogeneous if for all finite substructures N = (J, S 1 , . . ., S k ) of M and embeddings τ from N to M , there exists an automorphism υ on I that extends τ , i.e., υ| J = τ .
A representative example of an ultrahomogeneous structure is (Q, <), the set of rational numbers with the usual less-than relation, while a representative counterexample is (Z, <), the set of integers with the less-than relation.The latter is not ultrahomogeneous because the function τ mapping 2 to 2 and 3 to 4 is an embedding from ({2, 3}, <) to (Z, <), but cannot be extended to the required global function υ on Z.The lack of any integers strictly between 2 and 3 prevents the construction of such an υ.The structure (Q, <) is dense, and does not suffer from this kind of problem.These examples highlight one intuition behind ultrahomogeneity: that M does not add any further constraint nor information to that which is present already in an embeddable finite structure.
Our next task is to explain when a sequence (R k : k ≤ n) of equivalence relations of the structure M is orderly.Many binary relations on the underlying set I of M will appear in our explanation.We call such binary relations simply relations, without mentioning that they are on the set I. Also, R k refers to the R k of M .Finally, we remind the reader that I is a countable set and so an equivalence relation on I has only a countable number of equivalence classes.
has one of the following three forms: • R = I 0 × I or R = I × I 0 for some subset I 0 of I that can be defined by a first-order logic formula ϕ.The formula ϕ here has one free variable, say x, and may use n symbols r 1 , . . ., r n for binary relations that are interpreted as R 1 , . . ., R n , in addition to the usual quantifiers and logical connectives from first-order logic.This means I 0 = {α : ϕ(x) holds when x = α}.To gain intuition, consider the special case that the structure M is (N2 , R 1 , R 2 ) with the following equivalence relations R 1 and R 2 : Note that R 2 is just the equality relation.The relation R 1 freely contains R 2 .It contains R 2 because it is a coarser equivalence relation than R 2 , the equality relation.This containment is even because each equivalence class of R 1 contains a countable number of equivalence classes of R 2 .Checking the remaining condition of free containment is less immediate, but only slightly.Let D be an equivalence class of R and let {D i : i ∈ N} be a partition of D that consists of equivalence classes of S.Then, D has the form D = {(k 0 , k) : k ∈ N} for some fixed k 0 , and each D i is a singleton set of the form {(k 0 , k i )} for some k i .Given a permutation τ on N, we may fulfill the condition of free containment using the following automorphism υ on N 2 : When k = k 0 and so (k, k ′ ) is in the equivalence class D, this function permutes the second component k ′ according to τ , thus meeting the first bullet point of the condition.Otherwise, (k, k ′ ) is not in D, and the function acts as the identity, as required by the second bullet point.
Definition A.5.Let R 1 , R 2 be equivalence relations that are contained in an equivalence relation R.
Then, R 1 and R 2 are said to be orthogonal within R if for any equivalence classes A good example of an orderly sequence is (R 1 , R 2 ) made out of relations R i in (21).The required relations R ′ 1 and R ′ 2 are the complete relation N 2 × N 2 and the relation R 1 , respectively.We focus on R ′ 2 .We have already shown that R 1 freely contains R 2 .It is also explicit in R 1 , simply because it is R 1 .To check the third condition, consider an equivalence relation S explicit in R 1 and strictly contained in R ′ 2 .Although we do not present a detailed calculation, it is possible to show that being explicit implies that S has to be one of the following three relations: But only the equality relation is strictly contained in R ′ 2 .Thus, S should be the equality relation.That is, S = R 1 .Our argument so far shows that no S meets the assumptions in the third condition and so the condition holds vacuously.
The remaining concept is anti-chain.In a set A with a partial order , an anti-chain is a subset A 0 of A such that no two distinct elements of A 0 can be compared by , that is, for all a, b ∈ A 0 , if a = b, then neither a b nor b a.In Theorem A.1, A 0 is a set of certain subsets of I that are equivalence classes of some equivalence relations, and it is ordered by the subset relation.

A.2 Proof of Theorem 4.1
Let G = (V, E) be the DAG in Theorem 4.1.Set n to the cardinality of V .The first step is to enumerate the vertices of G such that the order in the enumeration respects the directed edges in E. We use this enumeration to build a structure M that has N G as its underlying set and satisfies the conditions of Theorem A.1, especially the orderly condition.
Lemma A.7.There exists an enumeration of V , (v ℓ : 1 Proof.This is a well-known simple result.A process for enumerating V is called topological sort in combinatorics and computer science.For completeness, we explain the construction of the sequence (v ℓ : 1 ≤ ℓ ≤ n) in the lemma.We construct the sequence inductively.Since V is finite and G is acyclic, there exists a minimal vertex v 1 .Our inductive construction starts with the sequence (v 1 ).
Assume that we have enumerated ℓ elements such that V ℓ def = {v 1 , . . ., v ℓ } is closed.Now consider V \ V ℓ .Since V is finite and partially ordered, so is V \ V ℓ and there exists a maximal element v ′ ∈ V \ V ℓ .We set v ℓ+1 to be this v ′ .Then, by the maximality of v ′ in V \ V ℓ , the set {v 1 , . . ., v ℓ+1 } is closed, as required.
From now on, we write M = (I, R v1 , . . ., R vn ), where the v k are enumerated as in Lemma A.7 and R v is defined by α [R v ] β ⇐⇒ for all w v, α(w) = β(w).
Proof.We use induction on n, the cardinality of the vertex set of G.For n = 1, the claim is equivalent to the existence of an extension of a bijection between finite subsets of N to a permutation of N. So, it is obviously true.Now assume that the claim holds if n ≤ m − 1.We will prove the claim for the case that n = m.
Let N = (J, S 1 , . . ., S m ) be a substructure of M , and τ an embedding from N to M .Because of the way that we constructed the enumeration (v ℓ : 1 ≤ ℓ ≤ m), the last vertex v m is maximal according to the partial order induced by G.That is, v m is a terminal vertex.Let G ′ be the subgraph of G with the vertex set W = {v 1 , . . ., v m−1 }.Let Construct a permutation of N, say π β , so that π β (β ′ (v m )) = τ (β ′ )(v m ) for all β ′ ∈ J β .This is possible because J β is finite.Define υ : I → I as follows: Then, υ is the desired extension of τ .
Lemma A.9.The sequence R v1 , . . ., R vn is orderly.2. S is strictly contained in R ′ k ; and 3. it is not the case that S is evenly contained in R k with # R k (S) = ∞.
A more careful analysis of the equivalence relations explicit in R v1 , . . ., R v k−1 for this particular model reveals that they are exactly the equivalence relations that are of the form i∈I R vi for I ⊆ {1 . . .k − 1}.Firstly, the third clause in the notion of basic explicit is redundant on this occasion, for I 0 there must be either empty or I: these are the only two definable sets.Secondly, in this circumstance, if a Boolean combination of relations in R v1 , . . ., R v k−1 is an equivalence relation then it must actually be an intersection of such relations; we showed this by considering the disjunctive normal forms that a transitive relation may have in this particular model.
From this we can conclude that S is an intersection of R ′ k with some R vj 's where v j is not an ancestor of v k .Since {v 1 , . . ., v k−1 } is closed, v k is not an ancestor of v j either.Thus, v j and v k are incomparable.We use this to show that S is orthogonal to R v k in R ′ k .To this end, consider equivalence classes D, D 1 , D 2 of R ′ k , S, R v k , respectively, with D 1 , D 2 ⊆ D, Pick α 1 ∈ D 1 , α 2 ∈ D 2 , so that α 1 [R ′ k ] α 2 , and let β ∈ D be given by Proof of Theorem 4.1.The previous lemmas imply that the conditions of Theorem A.1 hold.Thus, we can apply the theorem, and get the following representation of X: The function ϕ is well-defined.In the first case of the above definition, there is only one α.In the second case, there may be multiple choices of α, but they all give rise to the same element in J.

Figure 1 :
Figure 1: A collection of rooted trees of depth 1

Figure 7 :
Figure 7: The DAG for random walls; C = {C xy , C xz , C yz } 3(b), DAG-exchangeability becomes Aldous-Hoover (separate) exchangeability.When G is the graph in Example 3.3(c), DAG-exchangeability becomes Austin and Panchenko's hierarchical exchangeability.Note that DAG-exchangeability goes further and specifies new kinds of symmetries as displayed by the DAGs in Example 3.4.DAG-exchangeability is quite strong and is only satisfied by some C -type random array collections.It is natural to ask what such DAG-exchangeable random array collections look like.Answering this question by means of a representation theorem forms the rest of this paper.

=
\Ψ B denotes deletion of the array Ψ B .Then, by Corollary 5.4, Θ and Θ B are conditionally independent given ϕ 0 B , and by construction ϕ 0 ϕ B , all of them independent from ψ and ψ B .Also, by the first lines of (15) and (16) we have Θ d = ϑ and Θ B d = ϑ B .

A
relation is explicit in R 1 , . . ., R k if it is a Boolean combination of basic explicit relations in R 1 , . . ., R k .Definition A.4.An equivalence relation S contains an equivalence relation R if x [R] y =⇒ x [S] y, or equivalently every equivalence class of R is contained in one of the equivalence classes of S. If, in addition, every equivalence class of S contains the same number (possibly countably infinite) of equivalence classes of R, we say that S evenly contains R, and write # R (S) for that number.The relation S is said to freely contain R if S not only evenly contains R but also satisfies the following condition: for all equivalence classes D of S, partitions {D 1 , . . ., D m } of D made out of equivalence classes D i of R, and permutations π on [m], there exists an automorphism υ on M such that 2 • υ(D k ) = D π(k) for all k ∈ [m]; and • υ(D ′ ) = D ′ for all the other equivalence classes D ′ of R.

Proof.
For each k ≤ n, defineR ′ k def = R vj : j < n, v j v k and v j = v k .Clearly, R ′ k is explicit in R v1 , . . ., R v k−1 ,and R ′ k freely contains R v k .Now consider S such that 1. S is an equivalence relation explicit in R v1 , . . ., R v k−1 ;