14.7 Vanishing and Exploding Gradients
Right, so you’ve built your first few neural networks. They’re training, the loss is (mostly) going down, and you’re feeling pretty good about yourself. Then you try to build something a bit deeper—maybe ten, twenty, or a hundred layers. Suddenly, your model’s performance flatlines. The loss stops improving, or worse, it starts outputting complete gibberish from the very first epoch. Welcome to the two gremlins that have haunted deep learning since its inception: the problems of vanishing and exploding gradients.