https://www.youtube.com/watch?v=vJo7hiMxbQ8 autoencoders
https://www.youtube.com/watch?v=x6T1zMSE4Ts NVAE: A Deep Hierarchical Variational Autoencoder
https://www.youtube.com/watch?v=eyxmSmjmNS0 GAN paper
and then of course you need to check the Stable Diffusion architecture.
oh, also lurking on Reddit to simply see the enormous breadth of ML theory: https://old.reddit.com/r/MachineLearning/search?q=VAE&restri...
and then of course, maybe if someone's nickname has fourier in it, they probably have a sizeable headstart when it comes to math/theory heavy stuff :)
and some hands-on tinkering never hurts! https://towardsdatascience.com/variational-autoencoder-demys...
In regular terms he's saying the outputs aren't coming out in the same dimensions that the next stages cn work with properly. It wants values between -1 and +1 and it isn't guaranteeing it. Then he's saying you can make it quicker to process by putting the data into a more compact structure for the next stage.
The discriminator could be improved. i.e we could capture better input
KL Diversion is not an accurate tool for manipulating the data, and we have better.
ML is a huge pot of turning regular computer science and maths into intelligible papers. If you'd like assurance, look up something like MinMax functions and Sigmoids. You've likely worked with these since you progressed from HelloWorld.cpp but wouldn't care to shout about them in public