Lecture 07

  • recap
  • Today's topic: Convolutional Neural Network
  • CNN Visualization
  • Image representation:
    • 1920x1080 RGB image has 6,220,800 values
    • Feed such an image to mlp with 1 hidden layer of 100 units, we need 100 x 6,220,800 weight variables!
    • The model should be able to handle pixel translation, noise etc.
  • Convolutional network architecture
    • CNN = Feature learning + classification
    • Feature learning = convolution + pooling
      • convolution makes data deeper (increase number of channels)
      • pooling shrink width and height
      • number of channels depends on number of kernels
    • Analogy to human brain
    • Components summary: kernel size, stride step, pooling method, ......
  • Filter types:
    • median filter: 用于降噪,噪声一般是极端值,取中间值可以避开噪声
    • mean filter: 用于生成平滑图像
    • gaussian filter: 用于生成更平滑的图像
    • edge filter: 用于提取边缘的像素
    • learnable filter: 让weights自动匹配特征形态
  • Convolutional layer
  • Relationship among layer sizes
  • Feature locality: convolution extract features on a small area. As the network goes deep, neurons will capture more and more local features, locality becomes more and more "global"/"abstract"