6. Score Function Approximation from DBMs
Having chosen inflation and rescaling schedules, the last component needed for our general pfODE is the score function s(x,t)≡∇xlogpt(x). Our strategy will be to exploit the correspondence described previously between diffusion models’ forward SDEs and their correspoinding pfODEs that give rise to the same marginals (i.e., same the Fokker-Planck equation - three views of DBMs). That is, we will learn an approximation to s(x,t) by fitting the DBM corresponding to our desired pfODE, since both make use of the same score function.
Briefly, in line with previous work on DBMs (Karras et al., 2022), we train neural networks to estimate a de-noised version D(x,C(t)) of a noise-corrupted data sample x given noise level C(t) (cf. Appendix A.4 of our paper for the correspondence between C(t) and noise). That is, we model Dθ(x,C(t)) using a neural network and train it by minimizing a standard L2 de-noising error:
Ey∼dataEn∼N(0,C(t))‖D(y+n;C(t))−y‖22De-noised outputs can then be used to compute the desired score term using ∇xlogp(x,C(t))=C−1(t)⋅(D(x;C(t))−x) (Song et al., 2021; Karras et al., 2022). Moreover, as in Karras et al., 2022 , we also adopt a series of preconditioning factors aimed at making training with the above L2 loss and our noising scheme more amenable to gradient descent techniques (Appendix B.1 of our paper).