The theory
In 1979, professor Nobuyuki Otsu, proposed a statistical method for dividing an image histogram into N groups of similar gray values, you can read the paper here. Otsu's method is also based in Fisher's linear discriminant.
Let's say we have a set of multidimensional elements, for example 2D points, where a point is defined as it's position in X and Y; or we have 3D point where the point is defined by it's (X, Y, Z) coordinates, or it can be a 1D element with just one component, the number of components doesn't matters whereas it is measurable. Now trace a division between the elements, wherever you want, calculate the median and deviation for the elements in each side, with this data calculate the deviation between the median for each group and the total median, and finally calculate the median of the of deviations of all groups.
Before continue, in discriminant analysis, the groups of data are called classes, the median distance between the classes and the absolute median is called betweenclass variance ($\sigma_B$), and the median distance between the elements in each class is called withinclass variance ($\sigma_W$).
Fisher linear discriminant establishes a relation between the betweenclass variance ($\sigma_B$) and the withinclass variance ($\sigma_W$) in which bigger the betweenclass variance ($\sigma_B$), and smaller the withinclass variance ($\sigma_W$), means that we are more probably dealing with two (or more) different groups of data.
Summarizing, and explaining it with simple words, we want to find the division that make those elements in both sides, enough far away and compact to treat them as two completely different group of elements.
If you want a more detailed explanation on how Linear Discriminant Analysis works I recommend watching this video.
Creating the histogram
First at all we need to create a histogram for the luminance channel. A histogram gives us the number of repetitions for a given value in a data set. In this case, we are creating an histogram for the luminance channel.
// Load the image QImage image(":/otsu/lena.png'); // Convert it to gray scale img = image.convertToFormat(QImage::Format_Grayscale8); QVector<int> histogram(256, 0); for (int y = 0; y < image.height(); y++) { const quint8 *line = reinterpret_cast<const quint8 *>(image.constScanLine(y)); for (int x = 0; x < image.width(); x++) histogram[line[x]]++; }
The histogram for the following example image:
will be this:
In case the graph is not rendered properly, refresh the page.
The maths
Let's start explaining the simplest case, that is 2 classes (groups), 1 threshold. So, we will divide the histogram in two parts, and calculate the median an the quadratic deviation in each side for the color axis.
$$ \begin{matrix} \mu_0=\sum_{i=0}^{k}\frac{p_i}{\omega_0}i & \sigma_0^2=\sum_{i=0}^{k}\frac{p_i}{\omega_0}(i\mu_0)^2 & (left) \\ \mu_1=\sum_{i=k+1}^{L}\frac{p_i}{\omega_1}i & \sigma_1^2=\sum_{i=k+1}^{L}\frac{p_i}{\omega_1}(i\mu_1)^2 & (right) \\ \end{matrix} $$
Where $p_i$ are the normalized weights for each color, and $k$ is the selected threshold, and $L$ is the maximum number for the gray levels, 255 in this case. $\omega_0$ and $\omega_1$ are the summation of the weights in each side, that is:
$$ \begin{matrix} \omega_0=\sum_{i=0}^{k}p_i & (left) \\ \omega_1=\sum_{i=k+1}^{L}p_i & (right) \\ \end{matrix} $$
where:
$$\omega_0+\omega_1=1$$
and the total media is:
$$\omega_0\mu_0+\omega_1\mu_1=\mu_T$$
Confused? you can visualize the effects of changing the threshold in the following graph:
Linear Discriminant Analysis
In the graph above, you will notice that moving the $k$ index will change $\mu_0$ and $\mu_1$, if you move $k$ to the left will make $\mu_0$ close to $k$, but will move $\mu_1$ away of $k$, if you move $k$ to the right will make $\mu_1$ close to $k$, but will move $\mu_0$ away of $k$, we need to find a threshold in which $\mu_0$ and $\mu_1$ ar as far as possible from $k$, but the difference between the colors in the group is minimal. For that reason I will make use of Fisher lineal discriminant, that is: $$ \lambda=\frac{\sigma_B^2}{\sigma_W^2} $$ $\sigma_B$ is the betweenclass variance and $\sigma_W$ is the withinclass variance, we need to find a value for $k$ that maximizes $\lambda$, because bigger the value of $\lambda$ means that more probably we are dealing with two clearly different groups (classes). So we have:
$$ \begin{matrix} \sigma_B^2=\omega_0(\mu_0\mu_T)^2+\omega_1(\mu_1\mu_T)^2 \\ \sigma_W^2=\omega_0\sigma_0^2+\omega_1\sigma_1^2 \end{matrix} $$ $\sigma_B^2$ is also: $$ \begin{matrix} \sigma_B^2= & \omega_0(\mu_0\mu_T)^2+\omega_1(\mu_1\mu_T)^2 \\ & \omega_0(\mu_0\omega_0\mu_0\omega_1\mu_1)^2+\omega_1(\mu_1\omega_0\mu_0\omega_1\mu_1)^2 \\ & \omega_0[\mu_0(1 \omega_0)\omega_1\mu_1]^2+\omega_1[\mu_1(1 \omega_1)\omega_0\mu_0]^2 \\ & \omega_0[\mu_0\omega_1\omega_1\mu_1]^2+\omega_1[\mu_1\omega_0\omega_0\mu_0]^2 \\ & \omega_0\omega_1^2(\mu_0\mu_1)^2+\omega_0^2\omega_1(\mu_0\mu_1)^2 \\ & \omega_0\omega_1(\mu_0\mu_1)^2(\omega_0+\omega_1) \\ & \omega_0\omega_1(\mu_0\mu_1)^2 \end{matrix} $$ and $\sigma_W^2$ is: $$ \begin{matrix} \sigma_W^2= & \omega_0\sigma_0^2+\omega_1\sigma_1^2 \\ & \omega_0\left(\sum_{i=0}^{k}\frac{p_i}{\omega_0}(i\mu_0)^2\right)+\omega_1\left(\sum_{i=k+1}^{L}\frac{p_i}{\omega_1}(i\mu_1)^2\right) \\ & \left(\sum_{i=0}^{k}p_i(i\mu_0)^2\right)+\left(\sum_{i=k+1}^{L}p_i(i\mu_1)^2\right) \\ & \left(\sum_{i=0}^{k}p_i(i^2+\mu_0^22i\mu_0)\right)+\left(\sum_{i=k+1}^{L}p_i(i^2+\mu_1^22i\mu_1)\right) \\ & \left(\sum_{i=0}^{k}p_ii^2+\sum_{i=0}^{k}p_i\mu_0^22\sum_{i=0}^{k}p_ii\mu_0\right)+\left(\sum_{i=k+1}^{L}p_ii^2+\sum_{i=k+1}^{L}p_i\mu_1^22\sum_{i=k+1}^{L}p_ii\mu_1\right) \\ & \left(\sum_{i=0}^{k}p_ii^2+\omega_0\mu_0^22\omega_0\mu_0^2\right)+\left(\sum_{i=k+1}^{L}p_ii^2+\omega_1\mu_1^22\omega_1\mu_1^2\right) \\ & \left(\sum_{i=0}^{k}p_ii^2\omega_0\mu_0^2\right)+\left(\sum_{i=k+1}^{L}p_ii^2\omega_1\mu_1^2\right) \\ & \sum_{i=0}^{L}p_ii^2\omega_0\mu_0^2\omega_1\mu_1^2 \end{matrix} $$ The total quadratic deviation is: $$ \begin{matrix} \sigma_T^2= & \sum_{i=0}^{L}p_i(i\mu_T)^2 \\ & \sum_{i=0}^{L}p_i(i^2+\mu_T^22i\mu_T) \\ & \sum_{i=0}^{L}p_ii^2+\mu_T^2\sum_{i=0}^{L}p_i2\mu_T\sum_{i=0}^{L}p_ii \\ & \sum_{i=0}^{L}p_ii^2+\mu_T^22\mu_T^2 \\ & \sum_{i=0}^{L}p_ii^2\mu_T^2 \end{matrix} $$ So $\sigma_W^2$ can be written as: $$ \begin{matrix} \sigma_W^2= & \sigma_T^2+\mu_T^2\omega_0\mu_0^2\omega_1\mu_1^2 \\ & \sigma_T^2+(\omega_0\mu_0+\omega_1\mu_1)^2\omega_0\mu_0^2\omega_1\mu_1^2 \\ & \sigma_T^2+\omega_0^2\mu_0^2+\omega_1^2\mu_1^2+2\omega_0\omega_1\mu_0\mu_1\omega_0\mu_0^2\omega_1\mu_1^2 \\ & \sigma_T^2+\omega_0\mu_0^2(\omega_01)+\omega_1\mu_1^2(\omega_11)+2\omega_0\omega_1\mu_0\mu_1 \\ & \sigma_T^2\omega_0\omega_1\mu_0^2\omega_0\omega_1\mu_1^2+2\omega_0\omega_1\mu_0\mu_1 \\ & \sigma_T^2\omega_0\omega_1(\mu_0\mu_1)^2 \\ & \sigma_T^2\sigma_B^2 \end{matrix} $$ At the end we found that: $$ \sigma_B^2+\sigma_W^2=\sigma_T^2 $$ and because $\sigma_T^2$ is a constant, that means that $\sigma_B^2$ and $\sigma_W^2$ depends in each other, so we can rewrite $\lambda$ as: $$ \begin{matrix} \lambda=\frac{\sigma_B^2}{\sigma_T^2\sigma_B^2} & or & \lambda=\frac{\sigma_T^2\sigma_W^2}{\sigma_W^2} \end{matrix} $$ In both cases, its possible to increase the value of $\lambda$ by increasing the value of $\sigma_B^2$, or by decreasing the value of $\sigma_W^2$, but $\sigma_B^2$ is preferred because $\sigma_W^2$ will require to calculate $\mu_T$ first.
In resume, we must iterate $k$ from 1 to $L1$, and the $k$ that gives the maximum value for $\sigma_B^2$ will be the otsu threshold.
Generalization for multi level thresholding
In general, for multiple classes, we can calculate $\sigma_B^2$:
$$ \begin{matrix} \sigma_B^2= & \sum_{i=0}^{C}\omega_i(\mu_i\mu_T)^2 \\ & \sum_{i=0}^{C}\omega_i(\mu_i^2+\mu_T^22\mu_i\mu_T) \\ & \sum_{i=0}^{C}\omega_i\mu_i^2+\mu_T^2\sum_{i=0}^{C}\omega_i2\mu_T\sum_{i=0}^{C}\omega_i\mu_i \\ & \sum_{i=0}^{C}\omega_i\mu_i^2+\mu_T^22\mu_T^2 \\ & \sum_{i=0}^{C}\omega_i\mu_i^2\mu_T^2 \end{matrix} $$ where $C$ is the number of classes minus one, and because $\mu_T^2$ is a constant, we just need to find the maximum value of: $$ h=\sum_{i=0}^{C}\omega_i\mu_i^2 $$
Algorithm speedup
In 2001, prof. PingSung Liao, TseSheng Chen and PauChoo Chung, proposed a method for speeding up the computation of Otsu's thresholds using lookup tables. You can read the paper here.
Supposed we split the histogram in N groups (classes), so we will have N  1 thresholds, the intervals for each group will be $[k_i, k_{i+1}]$, where $k_0=0$ and $k_N=255$, so $[k_1, k_{N1}]$ will be the thresholds we are finding. The scan ranges for each $k_i$ will be then: $$ \begin{matrix} k_1 \in [1, LC+1] \\ k_2 \in [2, LC+2] \\ k_3 \in [3, LC+3] \\ \vdots \\ k_{C1} \in [C1, L1] \end{matrix} $$ where $L$ is the maximum value for gray level. We can iterate over thresholds using the following code:
void for_loop(int u, int vmax, int level, QVector<int> *index) { int classes = index>size()  1; for (int i = u; i < vmax; i++) { (*index)[level] = i; if (level + 1 >= classes) { // Reached the end of the for loop. // Calculate maximum betweenclass variance // and return current values of 'index' as solution. } else // Start a new for loop level, one position after current one. for_loop(i + 1, vmax + 1, level + 1, index); } }We call the code above as:
int classes = 5; int levels = 256; QVector<int> index(classes + 1); index[index.size()  1] = levels  1; index[0] = 0; for_loop(1, levels  classes+1, 1, &index);Let's return to the formula: $$ h=\sum_{i=0}^{C}\omega_i\mu_i^2 $$ $\mu_i$ can be written as: $$ \mu_i=\frac{1}{\omega_i}\sum_{j=u_i}^{v_i}p_jj $$ In both cases, $\omega_i$ and $\mu_i$, can be calculated using cumulative sum tables, that is a trick we used before. So we can rewrite $h$ as: $$ h=\sum_{i=0}^{C}\frac{(S_{v_i}S_{u_i})^2}{P_{v_i}P_{u_i}} $$ where: $$ \begin{matrix} S=\sum_{i=0}^{L}p_ii \\ P=\sum_{i=0}^{L}i \end{matrix} $$ The $H$ table is then as follows:
u\v  0  1  2  $\cdots$  $L$ 

0  0  $\frac{(S_1S_0)^2}{P_1P_0}$  $\frac{(S_2S_0)^2}{P_2P_0}$  $\cdots$  $\frac{(S_LS_0)^2}{P_LP_0}$ 
1  0  0  $\frac{(S_2S_1)^2}{P_2P_1}$  $\cdots$  $\frac{(S_LS_1)^2}{P_LP_1}$ 
2  0  0  0  $\cdots$  $\frac{(S_LS_2)^2}{P_LP_2}$ 
$\vdots$  $\vdots$  $\vdots$  $\vdots$  $\ddots$  $\vdots$ 
$L$  0  0  0  0  0 
And this is the code for generating the table:
// Load the image inline QVector<qreal> buildTables(const QVector<int> &histogram) { // Create cumulative sum tables. QVector<quint64> P(histogram.size() + 1); QVector<quint64> S(histogram.size() + 1); P[0] = 0; S[0] = 0; quint64 sumP = 0; quint64 sumS = 0; for (int i = 0; i < histogram.size(); i++) { sumP += quint64(histogram[i]); sumS += quint64(i * histogram[i]); P[i + 1] = sumP; S[i + 1] = sumS; } // Calculate the betweenclass variance for the interval uv QVector<qreal> H(histogram.size() * histogram.size(), 0.); for (int u = 0; u < histogram.size(); u++) { qreal *hLine = H.data() + u * histogram.size(); for (int v = u + 1; v < histogram.size(); v++) hLine[v] = qPow(S[v]  S[u], 2) / (P[v]  P[u]); } return H; }At this point, I think I've explained almost all key parts of the algorithm, so from here, I'm pretty sure you will be able able to understand the rest by looking at code bellow.
Online Otsu threshold test
But before finishing this article, here I've written the following code so you can test the Otsu algorithm, you can test with the images in your computer and see how it reacts to the different histograms.


N° of classes:  
Thresholds: 
And here is the code for C++/Qt and JavaScript, enjoy :).
No comments:
Post a Comment