July to September 2005 (10 weeks)

### What is UROP

The Undergraduate Research Opportunities Programme (UROP) is a worldwide programme that offers students the chance to work along side professors and graduates in all fields, gaining insight and experience in research.

I worked under the supervision of Professor Wayne Luk in the Custom Computing group. I was asked to research immunologic algorithms with a view to applying them to sensor processing.

I've included below some of my findings, alongside some machine learning tools based on immuno-computing.

### Email reports to My Supervisor, Prof. Wayne Luk

- Week starting 11th July (webpage) (pdf)
- Week starting 18th July (webpage) (pdf)
- Week starting 25th July (webpage) (pdf)
- Week starting 1st August (webpage) (pdf)
- Week starting 8st August (webpage) (pdf)
- Week starting 22st August (webpage) (pdf)
- Week starting 29th August (webpage) (pdf)
- Week starting 5th September (webpage) (pdf)
- Week starting 12th September (webpage) (pdf)
- Week starting 19th September (webpage) (pdf)
- Week starting 26th September (webpage) (pdf)

### Gait Analysis

Available below are the tools I developed for classifying gait signatures. The Euclidean distance classifier is an expensive but accurate algorithm and the DC classifier is an implementation of an immuno machine learning algorithm from a research paper entitled "Introducing Dentritic Cells as a Novel Immune-Inspired Algorithm for Anomaly Detection" by Julie Greensmith, Uwe Aickelin, and Steve Cayzer. The first will help us to clearly determine the possible differentiation of different gait classes from the simplistic gait signatures available, where as the second is an investigation into possible more efficient alternatives.

### Euclidean Distance classification

The source code contains a class for applying a straightforward Euclidean distance classification in an N dimensional shape-space of attributes of any range. Upon instantiation, you should provide the object with a training set that represents a wide sample of valid input, and you can then switch to classification mode, where you can ask it to predict class based on prior training. After training, based on the samples variance, the class is able to automatically normalise input to produce fairer weighting for dimensions.

The main application uses the classifier class along with some file processing routines to input csv files of 'signatures' (targetted largely at the gait analysis csv format) based on wildcard file specifications for batch processing.

The Current Working Directoy must be that of the data you are using. Given for example signatures in the current directory as follows;

P0_S1_C2_M0_G1_T1.sig P0_S1_C2_M0_G1_T2.sig P0_S1_C2_M0_G2_T1.sig P0_S1_C2_M0_G2_T2.sig

You could issue the command `euclidean P0_S1_C2_M0_G*_T*.sig _G1_ 0.50`

to learn from 50% of the files, and classify the rest, where files containing
the substring _G1_ are 'normal' and those that don't are abnormal, where normal
and abnormal are the two classification classes we are interested in. You should
read the programs usage for exact details.

There are three main command line parameters that can be specified to affect classification. The first is what proportion of files should be used for training compared to classification. You should note that the program is non-deterministic, and will process files in a random order. This will help you to guage the strength of the signature representation as you can repeat your experiments. The second attribute controls how many previous signatures are considered in a lastN rolling average. This enables you to take into account the continuity of signatues, acting as a resistance to change control parameter. The final parameter allows you to specify what fraction of a training class is used for comparison against a new signature. If for example the parameter has a value of 0.2, the closest 20% of a classes training signatures will be used in an average for determining the distance of a signature to a class. This setting helps you configure the system to deal with multi-modal clusters of data, where considering the whole population would be mis-leading.

Although the featured program supports only two class types (i.e. normal and abnormal), the Classifier C++ class supports unlimited classes and is very easy to use. Since the Gait signature data we have is quite simplistic in terms of attributes, it does not show enough differentiation to be used for classifying multiple gait classes (concluded from preliminary tests on the large gait signature dataset).

### Dentritic Cell Classifier

The Dentritic Cell Classifier is comparable to the tool above, but is targetted at more efficient and robust classification, based on continuity and re-inforcement. The algorithm used is introduced in "Introducing Dentritic Cells as a Novel Immune-Inspired Algorithm for Anomaly Detection" by Julie Greensmith, Uwe Aickelin, and Steve Cayzer [ICARIS-05].

The classifier has limitations, and you should read the usage for instructions on how to use it. The euclidean distance classifier will most likely outperform this algorithm in accuracy, though this classifier is less expensive.

### Sample Gait signatures

The signatures available above can be analysed using the provided tools.