2. Inflationary Flows pfODE

As argued on three views of DBMs, the probability flow ODE offers a means of deterministically transforming an arbitrary data distribution into a simpler form via a score function learnable through DBM training. Here, we introduce a specialized class of pfODEs, Inflationary Flows, that follow from an intuitive picture of local dynamics and asymptotically give rise to stationary Gaussian solutions of the Fokker-Plack equation.

We begin by considering a sequence of marginal transformations in which points in the original data distribution are convolved with Gaussians of increasingly larger covariance \( \mathbf{C}(t) \):

\[p_t(\mathbf{x}) = p_0(\mathbf{x}) * \mathcal{N}(\mathbf{x}; \mathbf{0}, \mathbf{C}(t))\]

It is straightforward to show (Appendix A1 of our paper) that this class of time-varying densities satisfies the Fokker-Planck equation when \(\mathbf{f} = \mathbf{0}\) and \(\mathbf{GG^\top} = \mathbf{\dot{C}}\). This can be viewed as a process of deterministically “inflating” each point in the data set, or equivalently as smoothing the underlying data distribution on ever coarser scales, similar to denoising approaches to DBMs (Raphan & Simoncelli, 2011; Kadkhodaie & Simoncelli, 2021)

Eventually, if the smoothing kernel grows much larger than \( \boldsymbol \Sigma_0 \), the covariance in the original data, total covariance \( \boldsymbol \Sigma(t) \equiv \boldsymbol\Sigma_0 + \mathbf{C}(t) \rightarrow \mathbf{C}(t)\), \(p_t(\mathbf{x}) \approx \mathcal{N}(\mathbf{0}, \mathbf{C}(t))\), and all information has been removed from the original distribution. However, because it is numerically inconvenient for the variance of the asymptotic distribution \(p_\infty(\mathbf{x})\) to grow much larger than that of the data, we follow previous work (Song et al., 2021; Karras et al., 2022) in adding a time-dependent coordinate rescaling \(\mathbf{\tilde{x}}(t) = \mathbf{A}(t) \cdot \mathbf{x}(t)\), resulting in an asymptotic solution \(p_\infty(\mathbf{x}) = \mathcal{N}(\mathbf{0}, \mathbf{A} \boldsymbol\Sigma \mathbf{A^\top})\) of the corresponding Fokker-Planck equation when \( \boldsymbol{\dot \Sigma } = \mathbf{\dot{C}} \) and \(\mathbf{\dot{A}}\boldsymbol\Sigma\mathbf{A}^\top + \mathbf{A}\boldsymbol\Sigma\mathbf{\dot{A}}^\top = \mathbf{0}\) (Appendix A.2 of our paper).

Together, these assumptions give rise to the general pfODE (Appendix A.3 of our paper):

\[\frac{\mathrm{d}\mathbf{\tilde{x}}}{\mathrm{d}t} = \mathbf{A}(t) \cdot \left( -\frac{1}{2} \mathbf{\dot{C}}(t) \cdot \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \right)+ \left( \mathbf{\dot{A}}(t) \cdot \mathbf{A^{-1}}(t) \right) \cdot \mathbf{\tilde{x}}\]

where the score function is evaluated at \(\mathbf{x} = \mathbf{A^{-1}\cdot \mathbf{\tilde{x}}} \). Notably, the above general pfODE is equivalent to the general pfODE form given in Karras et al., 2022 in the case both \(\mathbf{C}(t)\) and \(\mathbf{A}(t)\) are isotropic (Appendix A.4 of our paper), with \(\mathbf{C}(t)\) playing the role of injected noise and \(\mathbf{A}(t)\) the role of the scale schedule.

In the following sections, we will show how to choose both of these in ways that either preserve or reduce intrinsic data dimensionality.

Go to next section Participation Ratio (PR)