00:00:00 | Neural Networks II |
00:01:09 | Mini-batch stochastic gradient descent |
00:03:55 | Finding an effective learning rate |
00:06:15 | Using a learning schedule |
00:07:35 | Complex loss surfaces and local minima |
00:09:12: | Adding momentum to gradient descent |
00:12:50 | Adaptive optimizers (RMSProp and Adam) |
00:15:08 | Local minima are rarely a problem |
00:15:21 | Activation functions (sigmoid, tanh, and relu) |
00:19:35 | Weight initialization techniques (Xavier/Glorot and He) |
00:21:15 | Feature scaling (normalization and standardization) |
00:23:28 | Batch normalization for training stability |
00:28:26 | Regularization (early stopping, L1, L2, and dropout) |
00:33:11 | DEMO: building a basic deep learning model for NLP |
00:56:19 | Deep learning is about learning representations |
00:58:18 | Sensible defaults when building deep learning models |