Thanks to the David Sprott’s Distinguished Lecture by Professor Trevor Hastie and the question raised by Professor Ali Ghodsi in the Q&A session today, I realized several things about Deep Learning. I think there are three main reasons why they are so good:
- Deep Learning is like a stack of simpler models. Simple models, such as linear models, are ensembled together into a certain, probably hierarchical, structure. Thus, Deep Learning models usually contain many more parameters than other classical models. This may may explain why it has better performance with bigger data: Deep Learning summarize data using more parameters.
- A lot of Deep Learning models take into account data structure. For example, the scan in CNN is a local kernel that captures locally-correlated features in an image. Deep Learning models also make use of data augmentation, as if we are adding a “Gaussian cloud” to the data. These augmentation gives more power to the model because we have more data for free: jittering the picture or rotating it does not change the label.
- Deep Learning is online learning: we can re-train a pre-trained model with new data. This is very handy in a lot of applications.