Classification of vehicles for urban traffic scenes
An investigation into detection and classification of vehicles and pedestrians from video in urban traffic scenes is presented. The final aim is to produce systems to guide surveillance operators and reduce human resources for observing hundreds of cameras in urban traffic surveillance. Cameras are a well established means for traffic managers to observe traffic states and improve journey experiences. Firstly, per frame vehicle detection and classification is performed using 3D models on calibrated cameras. Motion silhouettes (from background estimation) are extracted and compared to a projected model silhouette to identify the ground plane position and class of vehicles and pedestrians. The system has been evaluated with the reference i-LIDS data sets from the UK Home Office. Performance has been compared for varying numbers of classes, for three different weather conditions and for different video input filters. The full system including detection and classification achieves a recall of 87% at a precision of 85.5% outperforming similar systems in the literature. To improve robustness, the use of local image patches to incorporate object appearance is investigated for surveillance applications. As an example, a novel texture saliency classifier has been proposed to detect people in a video frame by identifying salient texture regions. The image is classified into foreground and background in real- time.No temporal image information is used during the classification. The system, used for the task of detecting people entering a sterile zone, a common scenario for visual surveillance. Testing has been performed on the i-LIDS sterile zone benchmark data set of the UK Home Qffice. The basic detector is extended by fusing its output with simple motion infonriation, which significantly outperforms standard motion tracking. Lower detection time can be achieved by combining texture classification with Kalman filtering. The fusion approach running on 10 frames per second gives the highest result of Fl=O.92 for the 24 hour test data set. Based on the good results for local features, a novel classifier has been introduced by combining the concept of 3D models with local features to overcome limitations of conventional silhouette-based methods and local features in 2D. The appearance of vehicles varies substantially with the viewing angle and local features may often be occluded. In this thesis, full 3D models are used for the object categories to be detected and the feature patches are defined over these models. A calibrated camera allows an affine transformation of the observation into a normalised representation from which '3DHOG' features (3D extended histogram of oriented gradients) are defined. A variable set of interest points is used in the detection and classification processes, depending on which points in the 3D model are visible. The 3DHOG feature is compared with features based on FFf and simple histograms and also to the motion silhouette baseline on the same data. The results demonstrate that the proposed method achieves comparable performance. In particular, an advantage of .the proposed, method is that it is robust against miss-Shaped motion silhouettes which can be caused by variable lighting, camera quality and occlusions from other objects. The proposed algorithms are evaluated further on a new data set from a different camera with higher resolution, which demonstrates the portability of the training data to novel camera views. Kalman filter tracking is introduced to gain trajectory information, is used for behaviour analysis. Correctly detected tracks of 94% outperforming a baseline motion tracker (OpenCV) tested under the same conditions. A demonstrator for bus lane monitoring is introduced using the output of the detection and classification system. The thesis concludes with a critical analysis of the work and the outlook for future research opportunities.