DL Related Image Tasks

Image Classification

Challenges
- Semantic Gap 图像含义和Pixel value之间机器不能直接理解
- Viewpoint variation 视角随camera移动时，对应的pixels也会变(猫猫360°全景图)
- Background Clutter 背景杂乱(保护色下的猫猫)
- Illuminaion 亮度(阴影中的猫猫)
- Occlusion 闭塞(躲藏起来的猫猫)
- Deformation 变形(奇怪姿势的猫猫)
- Scale 大小
- Motion blur 动作模糊(快速移动的猫猫残影)
- Intraclass variation 种内多样性(橘猫，狸花猫，奶牛猫)
图像特征提取 : Bags of features
- 1. Extract local features
  - Sample patches and extract descriptors 即得到关键词
- 1. Learn “visual vocabulary” 对关键词进行聚类算法(K-means)得到具有代表性的聚类中心，最终将聚类中心组合即可形成字典(Codebook)
  - K个向量的通用称呼叫做visual word
  - 如何选择vocabulary的size
    - 太小，不具有代表性
    - 太多，过拟合
  - 计算如何更加高效
    - vocabulary trees
- 1. Quantize local features using visual vocabulary
  - 对于图像的每个特征，我们在字典中找到最相近的聚类中心(visual word)
- 1. Represent images by frequencies of “visual words”
  - 统计上述聚类中心出现的次数最终绘制成直方图向量
Classifiers
- KNN
  - For a new point, find the k closest points from training data. Then vote for class label with labels of the k points
  - 超参数: K 和 Distance_metric
    - cross validation to find the best hyper-parameter
  - Cons
    - 需要存储所有的training data
    - 主要时间在于预测
    - 参数选择
  - Pros
    - 无参数，无训练学习过程
    - 实现简单
    - 多类别
    - 决策边界非线性
- SVM
  - $min_{w,b} \frac{1}{2}\|w\|^2 + C\sum^n_{i = 1} max(0,1-y_i(wx_i+b))$
  - Kernel trick: Separable in High dimension(But nonlinear in input space)
  - Cons
    - can just二分类不能多分类
    - 计算复杂度和memory开销
  - Pros
    - 框架灵活强大
    - 凸优化问题能够找到全局最优解
    - 无需太多的训练样本
- Perceptron(Linear)
  - Formula: $f(x,W) = Wx + b$
  - Pros
    - 快捷
  - Cons
    - 最好用于二分类
    - training
    - 如果数据非线性可分，要考虑激活函数变为多层感知机

Object Localization

Overview: classify + regress, bounding box(x,y,w,h)

Semantic Segmentation

Overview: pixel-wise classify

Object Detection

Overview: per region: classify + regress, bounding box(x,y,w,h).
1. Exhaustion Method
- Template matching/sliding window 滑动窗口法
  - 利用不同大小的box对图片进行遍历去match，计算similarity score来作为衡量指标
  - 复杂度高，低效；去检测同一个instance，similarity score作为指标是够用的。但是如果是对一个给定类别的object，那么我们还是需要特征和分类器。

Before training the classifier, we need to split the original image into small local regions so that it can improve our classifier performance -> Region Proposal Algorithms 候选区域算法！！！

1. Region Proposal Algorithms
- Histogram of oriented gradients (HOG)
- Deformable part model (DPM)
- Selective search (SS)
- Non-max suppression (NMS)
  - Basic rule: we introduce a metric to measure the extent of regional overlap, that is intersection over union (IoU) or Jaccard index $J(A,B)= \frac{|A \cap B|}{|A \cup B|}$
1. Advanced Algorithm
- RCNN
- Spatial pyramid pooling (SPP)
- Yolo (You only look once)

Instance Segmentation

Overview: per region: pixel-wise classify

Image Segmentation

Overview: Classic methods mainly depend on the core of Kmeans. Advanced methods mainly depend on Network!!! And it is the pixel level problem !!!
Algorithms
- Super-pixel
- Kmeans
- Mean shift
- FCN

Image Retrieval

Overview: Mainly based on feature extraction methods, search the most similar image in the database.
- Database: Most collections of images and videos are not image databases, because it should satisfy two properties, including having DBMS to manage the data, and having facility for complex queries.
- Feature for retrieval
  - Color
    - Mean
    - Overall distribution -> Color histogram (has no information about pixel locations)
    - Relative locations -> Layout templates
  - Texture
    - Linear filters
    - Textures of textures
  - Shape (Mainly based on segmentation (DL types), and for global shape measures, it contains Boundary length 边界长度, area enclosed 区域闭合, boundary curvature 边界曲率, moments, projections onto axes, and Tangent angle histogram 切角直方图)
    - Sketches 素描
    - Segmented objects
  - Others
- Feature similarity metrics
  - 欧氏距离: Distances
  - 余弦相似度: 值越接近1，越相似；越接近-1，越不相似
  - 海明距离: 二进制编码之间不同bits的数量，可用xor实现