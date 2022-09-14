But where does the text prompt come in?



I lied! SD does NOT learn a function f(x,t) to denoise x a "little bit" back in time.



It actually learns a function f(x, t, y), with y the "context" to guide the denoising of x.



Below, y is the image label "arctic fox".



8/15 pic.twitter.com/z4WVWJ8NVu