Image Segmentaion is to divide the image into disjoint homogeneous regions or classes, where all the pixels in the same class must have some common characterestics. With recent growth in image data in various fields like Medical Imaging- MRI images, fMRI images, X-Ray images, PET Scans, CT Scans, Ultrasound; Facial Recognition; Handwritten Notes Scans, etc it becomes important to devise a method to segment parts of image to study specific ROI(Region of Interests) in MRI, etc.
A Grayscale(Color) image, afterall, is nothing but a(a set of) matrix(matrices) i.e. single channel(3 channels -RGB) of intensity values(0 to 1, both inclusive). So, there are parts of images with higher intensity and parts with lower intensity, parts of images which share common intensity pattern. So our goal is to exploit this feature of image to segment or classify the pixels.
Image is thought of as a random process, particularly, a mixture of random processes.
Where K is the number of processes or classes that need to be extracted from the image, \(\theta_k\ \forall\ k=1,2,\dots, K\) is a parameter vector of class and its form \([\mu_k,\ \sigma_k]\) such that \(\mu_k,\ \sigma_k\) are mean and standard deviation of the distribution \(k\) with density \(G_k(x|\theta_k)\), respectively, \(p_k\) is mixing proportion of class \(k\) (\(0<p_k<1,\ \forall\ k=1,\dots,K\) and \(\sum_kp_k=1\)), \(x_i\) is the intensity of pixel \(i\), and \(\Phi = \{p_1,\dots, p_k, \mu_1,\dots,\mu_k \}\) is called the parameters vector of the mixture.
The parameter vector \(\Phi\) is the missing data that defines the mixture. The EM Algorithm utilize the maximum log-likelihood estimator to estimate the value of \(\Phi\). The likelihood is given by, \[L(\Phi) = f(x_1,\dots,x_n|\Phi) = \prod_{i=1}^n f(x_i|\Phi)\] We use the EM-Algorithm to solve for \(\Phi_{ML}\), which maximizes the likelihood (otherwise computationally expensive(sometimes intractable)) i.e., \[L(\Phi_{ML}) = arg\ max\ (L(\Phi))\] Hence, \(\Phi_{ML}\) is called the maximum log-likelihood estimato of \(\Phi\).
Once the true value of the parameters are found then the class of each pixel in the image can be easily determined by computing the probability of this pixel to be belong to each class in the image. The EM algorithm is used to find the unknown parameters of a mixture model, that maximizes the log-likelihood of the sample.
The Expectation(E) Step: Compute the expected value of \(z_{ik}\) use the current estimate of the parameter vector \(\Phi\), \[z_{ik}^{(t)} = \frac{p_k^{(t)}.G(x_i|\theta_k^{(t)})}{f(x_i|\Phi^{(t)})}\] Where \(z_{ik}\) is the posteriori probability that given \(x_i,\ x_i\) comes from class k. The \(N\times K\) matrix \(Z=[z_{ik}]\) of posteriori probability satisfies the constraints, (\(0\leq z_{ik}\leq 1, \sum_k z_{ik} = 1, \sum_i z_{ik}>0, 1\leq i\leq N, 1\leq k\leq K\)). \(x_i\) is the value of the pixel \(i\). \(G(x_i|\theta_k^{(t)})\) is the probability of pixel i given it is a member of class k.
The Maximization(M) Step: Using the Expectation Step Data as if it were actually measured data, \[\mu_k^{(t+1)} = \frac{\sum_{i=1}^N z_{ik}^{(t)}.x_i}{\sum_{i=1}^N z_{ik}^{(t)}}\] \[{\sigma^2_{k}}^{(t+1)} = \frac{\sum_{i=1}^N z_{ik}^{(t)}.(x_i-\mu^{(t+1)})^2}{\sum_{i=1}^{N} z_{ik}^{(t)}}\] \[p_k^{(t+1)} = \frac{\sum_{i=1}^N z_{ik}^{(t)}}{N}\] We keep on iterating the above two steps until the parameter estimates change in an iteration falls below some tolerance level or when NaN comes in the optimization after optimizing implying no further numerical optimization can be done.
The Classification Step: After we have applied EM-Algorithm and we have the parameter estimate, we use it to find the probability(density) at each point belonging to each of the \(K\) class. Then, we classify the point to be belonging to the class which attains the maximum probability(density) for that point out of the \(K\) classes.
Here we will apply the above described algorithm on some Brain MRI Images, etc. - We have the following Brain MRI Image and taking a look at it we can visually see 3 different classes is the most probable i.e. the background, the CSF and the gray matter.
Now, coming to the most interesting part that is seeing the result of the segmentation. The Segmentation works almost perfectly and separates the required 3 parts without any manual intervention apart from deciding on the number of clusters, tolerance and iteration limit.
Let’s Apply our Segmentation function on some other images, to see how well it works. One possible application of this technique can be to scan handwritten notes, sketches, etc. and from below it looks like a perfectly scanned note(We obv. used 2 clusters).
We used EM-Algorithm to Segment Image into sections which have common characteristics. This has the following advantages:
Drawbacks: