>>elgfar+6t
You likely understand minimizing a continuously differentiable function. Now, you are minimizing a continuously differentiable error function (which measures the difference between the output of your hypothesis function and actual data), with respect to adjustable weights and biases that determine the value for the neurons going from one layer to the next. The complexity is in that the hypothesis function is a composition of many functions due to the layering, and there usually are a large number of neurons. However, you are basically doing the same thing many times.