 # Introduction to Image Processing

### Applications of Image Processing

In this lesson we are going to give a brief overview of some applications of image processing.

Image processing is a very popular tool across different fields.

Image it is diagnostics machines the MRI ,PET scan and XRAY imaging.

In the military it is used doing this like object detection for controlling UAV.

In consumer electronics we use image processing for thing like rendering and editing.

In security applications image processing is again used for thing like object detection as well as in biometric techniques like fingerprint processing, hand recognition, facial recognition etcetera.

## What is an image

We can thing of an image as a discrete representation of a type of data which possess both spatial and intensity information

Based on the simple definition we gave for an image we can a digital image is a representation of a two-dimensional image using a finite number of points usually called pixels.

The two-dimensional  discrete, digital image I(m,n) represents the response of some sensor (or simply a value of some interest) at a series of fixed positions (m = 1 ,2…M; n = 1,2…N) in 2-D Cartesian coordinates .

The indices m and n respectively designate the rows and columns of the image. The individual picture elements or pixels of the image are therefore referred to by their 2-D (m,n) index. I(m,n)  denotes the response of the pixel located at row m and column n starting from a top-left image origin as we can see here .

Although the images we consider in the course will be discrete, it is often theoretically convenient to treat an image as a continuous spatial signal.

In the next lesson we shall talk about the scope of image processing .I will see you there.

## Image Color

The simplest type of image data is black and white. It is a binary image since each pixel is either 0 or 1.

The next, more complex type of image data is gray scale, where each pixel takes on a value between zero and the number of gray scales. These images appear like common black-and- white photographs — they are black, white, and shades of gray. Gray scale images often have 256 shades of gray. Humans can distinguish about 40

The most complex type of image is the color image. Color images are similar to gray scale except that there are three bands, or channels, corresponding to the colors red, green, and blue.

Hence each pixel has three values associated with it.

Image color can also me be represented in the HSV format, which describes the hue, the saturation and the intensity of the image.

The size of the 2-D pixel grid together with the data size stored for each individual image pixel determines the spatial resolution and colour quantization of the image.

The representational power (or size) of an image is defined by its resolution. The

resolution of an image source (e.g. a camera) can be specified in terms of three quantities, Spatial resolution, temporal resolution and bit resolution.

In Spatial resolution, The column (C) by row (R) dimensions of the image define the number of pixels used to cover the visual space captured by the image. This relates to the sampling of the image signal and is sometimes referred to as the pixel or digital resolution of the image. It is commonly quoted as C X R  so we end up having resolutions like 640 x 480, 800 X 600, 1024 X 768, etc.

For a continuous capture system such as video, the temporal resolution is the number of images captured in a given time period. It is commonly quoted in frames per second (fps), where each individual image is referred to as a video frame (for example commonly broadcast TV operates at 25 fps; 25–30 fps is suitable for most visual surveillance; higher frame-rate cameras are available for specialist science/engineering capture).

Bit resolution defines the number of possible intensity or colour values that a pixel may have and relates to the quantization of the image information. For instance a binary image has just two colours (black or white), a grey-scale image commonly has 256 different grey levels ranging from black to white while for a colour image it depends on the colour range in use. The bit resolution is commonly quoted as the number of binary bits required for storage at a given quantization level, e.g. binary is 2 bit, grey-scale is 8 bit and colour (most commonly) is 24 bit. The range of values a pixel may take is often referred to as the dynamic range of an image.

## Image Formats

GIF stands for Graphics Interchange Format; it is limited to 256 colors meaning 8-bit.

BMP stands for Bit map picture, it is one of the basic formats in use.

PNG stands for Portable Network Graphics: It is was designed to gif images.

TIFF stands for Tagged Image file format. TIFFs are generally highly flexible and adaptable.

Now that we have seen the various image formats, in the next lesson we shall the image data types these formats come in. I will see you in the next lesson.

## Image Data Types

So, the choice of image format used can be largely determined by not just the image contents, but also the actual image data type that is required for storage. We can classify images into 4 data types. We have the binary image, the intensity or greyscale image, the RGB or true-color image and the floating point image.

Binary images are 2D arrays that assign one numerical value from, 0 or 1 to each pixel in the image. These are sometimes referred to as logical images: black corresponds to zero (or an ‘off’ or ‘background’ pixel) and white corresponds to one (or an ‘on’ or ‘foreground’ pixel). As no other values are permissible, these images can be represented as a simple bit-stream, but in practice they are represented as 8-bit integer images in the common image formats. A fax image is an example of a binary image.

Intensity or grey-scale images are 2-D arrays that assign one numerical value to each pixel which is representative of the intensity at this point. As discussed previously, the pixel value range is bounded by the bit resolution of the image and such images are stored as N-bit integer images with a given format.

RGB or true-colour images are 3-D arrays that assign three numerical values to each pixel, each value corresponding to the red, green and blue (RGB) image channel component respectively.

Floating-point images differ from the other image types we have discussed. By definition, they do not store integer colour values. Instead, they store a floating-point number which, within a given range defined by the floating-point precision of the image bit- resolution, represents the intensity. They may (commonly) represent a measurement value other than simple intensity or colour as part of a scientific or medical image. Floating point images are commonly stored in the TIFF image format.

## Scope of image processing

So there are 3 levels of image processing operations. We have the low level, the mid-level and the high level.

Low level operations involve primitive operations  where both the input and the output are images these include operations like noise reduction, contrast enhancement etcetera.

Mid-level operations deal with extraction of attributes from images, these operations include things like edge extraction, contour extractions, regions extractions etcetera.

High- level operations are mainly about analysis and interpretation of content of an image scene.

In the next lesson we shall take quick look at some of the commonly used image processing operations. I will see you there.

## Example of some image processing operations

Image processing as a wide range of techniques and algorithms which we shall try out one by one in this course. For now let’s take a look at some of the results some of these techniques produce.

Over here we have two images. On the left we have the original image and on the right this is what the image looks when we sharpen it.

Sharpening is a technique by which the edges and fine details of an image are enhanced for human viewing. We shall see how to do this in both the spatial domain and the frequency domain.

Image processing filters can be used to reduce the amount of noise in an image before processing it any further. Depending on the type of noise, different noise removal techniques are used. We shall see how to do this in the noise removal section.

An image may appear blurred for many reasons, ranging from improper focusing of the lens to an insufficient shutter speed for a fast-moving object. We shall see how to design de-blurring algorithms to solve this issue.

It is sometimes necessary to blur an image in order to minimize the importance of texture and fine detail in a scene, for instance, in cases where objects can be better recognized by their shape. We shall see how to do this as well.

Extracting edges from an image is a fundamental preprocessing step used to separate objects from one another before identifying their contents. On the right here we can see what an image looks like after edge extraction.

In many image analysis applications, it is often necessary to reduce the number of gray levels in a monochrome image to simplify and speed up its interpretation. Reducing a grayscale image to only two levels of gray which are black and white is usually referred to as binarization. On the right hand side here we see what the image on the left will look like after binarization.

In order to improve an image for human viewing as well as make other image processing tasks such as edge extraction easier, it is often necessary to enhance the contrast of an image.

Over here we have to images. On the left we have the original image on the right we see what the original image looks like after contrast enhancement.

The task of segmenting and labeling objects within a scene is a prerequisite for things object recognition and classification systems. Once the relevant objects have been segmented and labeled, their relevant features can be extracted and used to classify, compare, cluster, or recognize the objects in question. We shall talk about this later, in its own section.

## Overview of a machine vision system

Nowadays it almost impossible to talk about image processing without mention mentioning machine vision. In this lesson we shall give an overview of a machine vision system.

This arrangement over here shows the various processes contained in a machine vision system.

Lets say this a system for facial recognition. The system recognizes the face and then say a door a high security office opens.

The problem domain, in this case, ficial recognition. The goal is to be able to extract the goal is to be able to extract various features of the face size eye size and color, nose and mouth parameters, etcetera.

The acquisition block is in charge of acquiring one or more images containing a human face.

This can be implemented using camera and controlling the lighting conditions so as to ensure that the image will be suitable for further processing. The output of this block is a digital image that contains a view of the face.

The goal of the preprocessing stage is to improve the quality of the acquired image. Possible algorithms to be employed during this stage include contrast improvement, brightness correction, and noise removal.

The segmentation block is responsible for partitioning an image into its main components, into relevant foreground objects and background. It produces at its output a number of labeled regions or “sub-images.” It is possible that in this particular case segmentation will be performed at two levels: the first state will involve extracting the face from the rest of the original image; and the second stage will deal with segmenting facial features within the face area.

The feature extraction block is also known as representation and description block and consists of algorithms responsible for encoding the image contents in a concise and descriptive way. Typical features include measures of color (or intensity) distribution, texture, and shape of the most relevant (previously segmented) objects within the image. These features are usually grouped into a feature vector that can then be used as a numerical indicator of the image contents for the subsequent stage, where such contents will be recognized or classified.

Once the most relevant features of the image (or its relevant objects, in this case facial features) have been extracted and encoded into a feature vector, the next step is to use this K-dimensional numerical representation as an input to the pattern classification (also known as recognition and interpretation) stage.

At this point, image processing meets classical pattern recognition and benefits from many of its tried-and-true techniques, such as minimum distance classifiers, probabilistic classifiers, neural networks, and many more. The ultimate goal of this block is to classify (meaning, assign a label to) each individual facial feature. I hope from this example you see relevance of image processing in computer vision applications.

## Image Formation

Let’s begin by understanding how images are formed.

The image formation process can be summarized as a small number of key elements. In general, a digital image s can be formalized as a mathematical model comprising a functional representation of the scene (known as the object function, denoted by o) and that of the capture process (known as the point- spread function or (PSF) denoted by p). Additionally, the image will contain additive noise denoted by n.

In summary, in the process of image formation we have several key elements:

The PSF which describes the way information on the object function is spread as a result of recording the data. It is a characteristic of the imaging instrument (like the camera) and is a deterministic function that operates in the presence of noise

. The Object function describes the object (or scene) that is being imaged (its surface or internal structure, for example) and the way light is reflected from that structure to the imaging instrument.

. Noise This is a nondeterministic function which can, at best, only be described in terms of some statistical noise distribution (like Gaussian). Noise is a consequence of all the unwanted external disturbances that occur during the recording of the image data.

. Convolution is mathematical operation which ‘smears’ (i.e. convolves) one function with another. We shall see how this done later

Now let’s take a look at the image formation system and the effect of the Point-Spread-Function.

Here, the function of the light reflected from the object or scene is what is known as the object function and is transformed into the image data representation by convolution with the point-spread-function. This function characterizes the image formation or capture process. The process is affected by noise.

The PSF is a characteristic of the imaging instrument (like the camera). It represents the response of the system to a point source in the object plane, as we can see in this arrangement over here, where we can also consider an imaging system as an input distribution to an output distribution mapping function consisting both of the point-spread-function itself and additive noise

The input distribution is the scene light and the output distribution is the image pixels.

In the next lesson we shall see the formation of images from a mathematical perspective.

## Understanding the mathematics of image formation

In a general mathematical sense, we may view image formation as a process which transforms an input distribution into an output distribution

Therefore, a simple lens may be viewed as a ‘system’ that transforms a spatial distribution of light in one domain (which is the object  plane) to a distribution in another (which is the image plane).

Similarly, a medical ultrasound imaging system transforms a set of spatially distributed acoustic reflection values into a corresponding set of intensity signals which are visually displayed as a grey-scale intensity image.

Whatever the specific physical nature of the input and output distributions, the concept of a mapping between an input distribution and an output distribution is valid.

And by input distribution I simply mean the thing you want to investigate, see or visualize and by output distribution I mean what is produced with the system.

The systems theory approach to imaging is a simple and convenient way of conceptualizing the imaging process. Any imaging device is a system, or a ‘black box’, whose properties are defined by the way in which an input distribution is mapped to an output distribution.

The process by which an imaging system transforms the input into an output can be viewed from an alternative perspective, namely that of the Fourier or frequency domain. From this perspective, images consist of a superposition of harmonic functions of different frequencies. Imaging systems then act upon the spatial frequency content of the input to produce an output with a modified spatial frequency content. Frequency-domain methods are powerful and important in image processing, and we shall talk such methods in the frequency domain section of the course. First however, we are going to spend some time to understanding the basic mathematics of image formation.

Linear systems and operations are extremely important in image processing because the majority of real-world imaging systems may be well approximated as linear systems. Nonlinear theory is still much less developed and understood, and deviations from strict linearity are often best handled in practice by approximation techniques which exploit the better understood linear methods.

An imaging system described by operator S is linear if for any two input distributions X and Y and any two scalars a and b we have:

S of a times X plus b times y = a times S of X plus b times S of Y.

In other words, applying the linear operator to a weighted sum of two inputs yields the same result as first applying the operator to the inputs independently and then combining the weighted outputs. To make this concept concrete, lets consider the two simple input I this image: A consisting of circle and B consisting of a cross.

These radiate some arbitrary flux (say optical photons, like X-rays, ultrasonic waves or whatever) which are imperfectly imaged by our linear system.

The first row shows the two input distributions, their sum and then the result of applying the linear operator to the sum, the linear operator is simply a blur. The second row shows the result of first applying the operator to the individual distributions and then the result of summing them. In each case the final result is the same. The operator applied in this example is a convolution with Gaussian blur, a topic we will discuss later.

Lets consider this image over where we have some general 2-D input function f(x’,y’) in an input domain (x,y) and the 2-D response g(x,y)of our imaging system to this input in the output domain (x,y). In the most general case, we should allow for the possibility that each and every point in the input domain may contribute in some way to the output. If the system is linear, however, the contributions to the final output must combine linearly. For this reason, basic linear image formation is described by an integral operator which is called the linear superposition integral which is denoted by this equation:

This integral expresses something very simple. To understand this, consider some arbitrary but specific point in the output domain with coordinates (x,y). We wish to know g(x,y), the output response at that point. The linear superposition integral can be understood by breaking it down into three steps:

First we take the value of the input function f at some point in the input domain (x’,y’) and multiply it by some weight h, with h determining the amount by which the input flux at this particular point contributes to the output point.

We repeat this for each and every valid point in the input domain multiplying by the appropriate weight each time.

Finally we integrate meaning we sum all such contributions to give the response g(x,y).

It is really the weighting function h which determines the basic behaviour of the imaging system. This function tells us the specific contribution made by each infinitesimal point in the input domain to each infinitesimal point in the resulting output domain.

In the most general case, it is a function of four variables, since we must allow for the contribution of a given point (x’, y’) in the input space to a particular point (x, y) in the output space to depend on the exact location of both points.

In the context of imaging, the weighting function h is referred to as the point-spread function (PSF). The reason for this name will be demonstrated shortly. First, however, lets introduce an important concept in imaging: the (Dirac) delta or impulse function.

In image processing, the delta or impulse function is used to represent mathematically a bright intensity source which occupies a very small in fact infinitesimal region in space. It can be modelled in a number of ways, but arguably the simplest is to consider it as the limiting form of a scaled rectangle function as the width of the rectangle tends to zero. The 1-D rectangle function is defined by this equation over here.

Accordingly the 1-Dimensional delta function can be expressed like this:

Delta of x = limit as a approaches zero 1 over a times rect of x over a.

The 2-Dimensional delta function can be expressed like this: Delta of x y = limit as z approaches 0, 1 over A squared times rect x over a times rect y over

In this image over here shows the behavior of the scaled rectangular function as a approaches zero.

We can see that :

As a -> 0 the support (meaning the nonzero region) of the function tends to a vanishingly small region either side of x = 0.

As a -> 0 the height of the function tends to infinity but the total area under the function remains equal to one.

Delta(x)is therefore a function which is zero everywhere except at x = 0 precisely. At this point, the function tends to a value of infinity but retains a finite unit area under the function.

Therefore delta of x = infinity when x =0 and zero when x is not = 0. And it follows that a shifted delta function corresponding to an infinitesimal point located a x = x0 is defined in exactly the same. We will have :

Delta of x – x0 = intinty when x = x0 and equal to zero when x is not equal to zero.

This works the same way in two dimensions and more.

Delta of xy equal infinity when x equals zero and y equals zero and delta of xy equals zero when x and y are something else.

The most import and property of the delta function is defined by its action under an integral sign this is known as the sifting property. The sifting theorem is expressed in 1-Dimension by this equal and in 2 -Dimensions by this other equation. This means that, whenever the delta function appears inside an integral, the result is equal to the remaining part of the integrand evaluated at those precise coordinates for which the delta function is nonzero.

In summary, the delta function is formally defined by its three properties of singularity, unit area and the sifting property.

Delta functions are widely used in optics and image processing as idealized representations of point and line sources (or apertures): So, this is all there is for this lesson, if you have any questions, send me a message of leave them in the Q and A section.

The point-spread function of a system is defined as the response of the system to an input distribution consisting of a very small intensity point. In the limit that the point becomes infinitesimal in size

We can represent the input function as a Dirac delta function like this

From a linear system, we can substitute this into the linear superposition integral

To end up with go of xy.

When we apply this sifting theorem equations we saw in the previous lesson we end up with g of x y like this.

In other words, the response to the point-like input (which is sometimes called the impulse response) is precisely equal to h. However, in the context of imaging, the weighting function in the superposition integral h(x;y;x’,y’) is ccalled the point spread function or PSF because it measures the spread in output intensity response due to an infinitesimal (i.e. infinitely sharp) input point.

You may wonder why the point-spread function is an important measure?

Consider for a moment that any input distribution may be considered to consist of a very large ( or in fact infinitely many) collection of very small ( or  infinitesimally small) points of varying intensity. The point spread function tells us what each of these points will look like in the output; so, through the linearity of the system, the output is given by the sum of the point spread function responses. It is therefore clear that the point spread function of such a linear imaging system (in the absence of noise) completely describes its imaging properties and, therefore, is of primary importance in specifying the behaviour of the system.

It also follows from what we have said so far that an ideal imaging system would possess a point spread function that exactly equalled the Dirac delta function for all combinations of the input and output coordinates .Such a system would map each infinitesimal point in the input distribution to a corresponding infinitesimal point in the output distribution. There would be no blurring or loss of detail and the image would be ‘perfect’. Actually, the fundamental physics of electromagnetic diffraction dictates that such an ideal system can never be achieved in practice, but it nonetheless remains a useful theoretical concept. By extension then, a good or ‘sharp’ imaging system will generally possess a narrow point spread function, whereas a poor imaging system will have a broad point spread function, which has the effect of considerably overlapping the output responses to neighbouring points in the input.

Lets say we have this image consisting of 4 white dots. As the point spread function becomes increasingly broad, the dots in the original input become broader and overlap. So in effect the image looks poorer as the point spread function increase