DIP-review-3
DIP_review_3 Image Segmentation
What’s image segmentation
将图像分割成其组成的区域Region或是Object, 主要依靠Image Intensity Value的性质
Edge-based segmentation: Discontinuity
Region-based segmentation: Similarity
1 Edge detection
Mainly use differential
一阶 x[n]=x[n+1]−x[n]=−(x[n]−x[n+1])x[n] = x[n+1] - x[n] = -(x[n] - x[n+1])x[n]=x[n+1]−x[n]=−(x[n]−x[n+1])
一阶一般会产生较为厚的edge
二阶 x[n]=x[n]−x[n−1]x[n] = x[n] - x[n-1]x[n]=x[n]−x[n−1]
二阶在边界会产生光暗/暗光线(zero crossing 过零点),光暗的先后决定边缘是如何过渡的,并且二阶更善于寻找细节
1.1 Poin ...
DIP_review_2_4
DIP_review_2_4 - Wavelet and Other Image Transforms
1 Unitary transform 酉变换(找到合适的基函数(完备正交基)去压缩数据,且满足能量守恒定律)
前向变换 t=Aft[k]=∑n=0N−1A[k,n]f[n]t = Af \\ t[k] = \sum_{n = 0}^{N-1}A[k,n]f[n]t=Aft[k]=∑n=0N−1A[k,n]f[n]
逆向变换 f=AHtf[n]=∑k=0N−1AH[k,n]t[k]f = A^Ht \\ f[n] = \sum_{k=0}^{N-1}A^H[k,n]t[k]f=AHtf[n]=∑k=0N−1AH[k,n]t[k]
酉变换需要满足的条件为A的逆矩阵等于其共轭对称矩阵AH=(AT)∗,AAH=IA^H = (A^T)^{*}, AA^H = IAH=(AT)∗,AAH=I
eg1. 图像旋转矩阵
eg2. PCA参数矩阵
eg3. DFT傅里叶变换
2D unitary transform
2. Frequency Domain Extension
2.1 DC ...
DIP_review_2_3
DIP_review_2_3 - Color image processing & Morphological image processing
1. Color image processing
1.1 Standard color spaces
1.1.1 RGB
Blue
Red
Green
主要用于自身发光
1.1.2 CMYK
[CMY]=[111]−[RGB]\left[\begin{matrix}C \\M \\Y\end{matrix}\right] = \left[\begin{matrix}1 \\1 \\1\end{matrix}\right] - \left[\begin{matrix}R \\G \\B\end{matrix}\right]
⎣⎢⎡CMY⎦⎥⎤=⎣⎢⎡111⎦⎥⎤−⎣⎢⎡RGB⎦⎥⎤
Cyan = White – Red
Magenta = White – Green
Yellow = White – Blue
Black = White – Red - Green - Blue
主要用于自身不发光,在照明条 ...
DIP_review_2_2
DIP_review_2_2 - Image reconstruction & Restoration
1. Noise
噪声一般可以用PDF表示,因此我们可以进行参数估计
μ\muμ, σ2\sigma^2σ2
1.1 Periodic Noise
sinusoidal noise(在频率域中以共轭冲激串的形式出现)
f(x,y)=Asin(u0x+v0y)f(x,y) = Asin(u_0x+v_0y)f(x,y)=Asin(u0x+v0y)
F(u,v)=Aπj[δ(u−u0,v−v0)−δ(u+u0,v+v0)]F(u,v) = \frac{A\pi}{j}[\delta(u-u_0,v-v_0) - \delta(u+u_0,v+v_0)]F(u,v)=jAπ[δ(u−u0,v−v0)−δ(u+u0,v+v0)]
cosine noise
⋯\cdots⋯
适合用频率域滤波器解决,如Bandpass filter & Notch filter
1.2 Random Noise
Gaussian noise
...
Deconvolution
Deconvolution
Torch Function
torch.nn.ConvTranspose2d()
经常出现在需要进行上采样的地方,通俗来讲,就是需要进行特征恢复或者图像扩大的地方。而之所以称其为“逆卷积”,是因为该方法进行上采样的方式与卷积操作互为“逆过程”。但也仅限于区域恢复,而无法确定原来的特征值在卷积域内的分布情况。
参数列表
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
stride (int ...
CV_learning_notes(4)
DL Related Image Tasks
Image Classification
Challenges
Semantic Gap 图像含义和Pixel value之间机器不能直接理解
Viewpoint variation 视角随camera移动时,对应的pixels也会变(猫猫360°全景图)
Background Clutter 背景杂乱(保护色下的猫猫)
Illuminaion 亮度(阴影中的猫猫)
Occlusion 闭塞(躲藏起来的猫猫)
Deformation 变形(奇怪姿势的猫猫)
Scale 大小
Motion blur 动作模糊(快速移动的猫猫残影)
Intraclass variation 种内多样性(橘猫,狸花猫,奶牛猫)
图像特征提取 : Bags of features
Extract local features
Sample patches and extract descriptors 即得到关键词
Learn “visual vocabulary” 对关键词进行聚类算法(K-means)得到具有代表性的聚类中心,最终 ...
CV_learning_notes(3)
DL Related Network Part
CNN
Neural Network(MLP多层感知机/全连接网络)
Goal: 引入激活函数(activation function),即可得到非线性分类器
Activation fucntions
Process: forward passing + backward propagation(Gradient descent)
Convolutional Neural Networks
Overview
Convolution layer
N表示image的batch数量
Cin表示图片3维维度
H W表示图片的长宽
Cout表示输出Activation map的层数
Kw Kh表示filter的长宽
H' W'表示卷积后输出Activation map的长宽
无padding时输出大小N−Fstride+1\frac{N-F}{stride} + 1strideN−F+1
有padding时输出大小N+2P−Fstride+1\frac{N+2P-F}{stride} + 1strideN ...
CV_learning_notes(2)
DIP Related Part
Filter
Types of image
Binary: 图像中每个像素的灰度值仅可取0或1,即不是取黑,就是取白,二值图像可理解为黑白图像
Gray scale: 图像中每个像素的灰度值仅可取0或1,即不是取黑,就是取白,二值图像可理解为黑白图像
Gray=R∗0.299+G∗0.587+B∗0.114Gray = R*0.299 + G*0.587 + B*0.114Gray=R∗0.299+G∗0.587+B∗0.114
Color: 每幅图像是由三幅灰度图像组合而成,依次表示红绿蓝三通道的灰度值,即我们熟知的RGB,此时彩色图像要视为三维的[height,width, 3]
Filters
Linear
Box filter
Gaussian filter
How to choose kernel width
rule of thumb: 设置滤波器的半宽度为3σ\sigmaσ
Non-linear
Median filter
Kernel
The kernel size should be odd, ...
CV_learning_notes(1)
Basic Concepts in Computer Vision
Camera
Pinhole camera
在投影过程中丢失了较多的信息
length(相似三角形)
angles(直观来看)
尽管直线得以保留但也引入了 vanishing points
定义
All parallel lines converge to a vanishing point
Each direction in space is associated with its own vanishing point
Exception: directions parallel to the image plane
特性
平行的3D线投影相交在Vanishing points
平行的3D面投影相交在Vanishing line
但相交并不一定平行
Perspective distortion
由于image plane切的问题,而并非视觉的幻想或是镜头的缺陷
齐次坐标系
使在投影空间中进行图形和几何计算成为可能
通过利用齐次坐标就可以表示无穷远处的点。
把齐次 ...
DIP_review_2_1(2)
DIP_review_2_1 - Frequency Domain Filtering(2)
Discrete Convolution Theorem
Convolution theorem
𝑓(𝑥,𝑦)★h(𝑥,𝑦)⟺𝐹(𝑢,𝑣)𝐻(𝑢,𝑣)𝑓(𝑥, 𝑦) ★ ℎ(𝑥, 𝑦) ⟺ 𝐹(𝑢, 𝑣) 𝐻(𝑢, 𝑣)f(x,y)★h(x,y)⟺F(u,v)H(u,v)
𝑓(𝑥,𝑦)h(𝑥,𝑦)⟺1MN𝐹(𝑢,𝑣)★𝐻(𝑢,𝑣)𝑓(𝑥, 𝑦) ℎ(𝑥, 𝑦) ⟺ \frac{1}{MN}𝐹(𝑢, 𝑣) ★ 𝐻(𝑢, 𝑣)f(x,y)h(x,y)⟺MN1F(u,v)★H(u,v)
Wraparound problem
数字信号在时域为周期信号,会产生周期性延拓,因此卷积会存在循环卷积的问题,并非得到线性卷积
线性卷积(Linear convolution)
𝑓(𝑛)⨂h(𝑛)𝑓(𝑛) ⨂ ℎ(𝑛)f(n)⨂h(n)
循环卷积
当L<M+N−1L<M+N-1L ...