Towards Automated Sign Language Recognition from Video

 

Goal: 

 

To advance the design of robust computer representations  and algorithms for recognizing American Sign Language from video. (An overview ppt file is here).

Broader Impact:

 

*  To facilitate the communication between the Deaf and the hearing population.

*  To bridge the gap in access to next generation Human Computer Interface.

Scientific Contributions:

 

We have developed representations and approaches that can

 

*  Capture the global (Gestalt) configuration of hand and face relationship using relational distributions. It is somewhat robust to segmentation errors and does not require part tracking.

*  Learn, without supervision,  sign models from examples using automated common motif extraction using Markov Chain Monte Carlo methods

*  Recognize in the presence of movement epenthesis, i.e. hand movements that appear between two signs, using enhanced Level Building approach.

*  Automatically segment an ASL sentence into signs using Conditional Random Fields.

*  Match signs and gestures in the presence of segmentation noise using fragment-Hidden Markov Models (frag-HMM)

 

IMGP1938.jpg

Publications

 

*  S. Nayak, S. Sarkar, B. Loeding, “Automated Extraction of Signs from Continuous Sign Language Sentences using Iterated Conditional Modes,” IEEE Conference on Computer Vision and Pattern Recognition, June 2009. Poster, Results on complete data

 

*  R. Yang and S. Sarkar, “Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition using Nested Dynamic Programming,” IEEE Transactions on Pattern Analysis and Machine Intelligence, accepted Jan 2009

 

*  R. Yang and S. Sarkar, “Coupled Grouping and Matching for Sign and Gesture Recognition,” Computer Vision and Image Understanding, vol. 113, no. 6, pp. 663-681, June 2009.

 

*  S. Nayak, S. Sarkar, B. Loeding, “Distribution-based dimensionality reduction applied to articulated motion recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, May 2009.

 

*  R. Yang; S. Sarkar, B. Loeding, “Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition  IEEE Conference. on Computer Vision and Pattern Recognition, 2007.

 

*  R. Yang; Sarkar, S., “Gesture Recognition using Hidden Markov Models from Fragmented ObservationsIEEE Conference on Computer Vision and Pattern Recognition pp. 766- 773, 17-22 June 2006.

 

*  R. Yang and S. Sarkar, “Detecting Coarticulation in Sign Language using Conditional Random Fields  International Conference on Pattern Recognition vol.2, pp. 108- 112, 20-24 Aug. 2006.

 

*  S. Nayak, S. Sarkar, and B. Loeding, “Unsupervised Modeling of Signs Embedded in Continuous Sentences,” IEEE Workshop on Vision for Human-Computer Interaction, vol. 3, pp. 81, June 2005.

 

*  R. Yang, S. Sarkar, B. L. Loeding, A. I. Karshmer, “Efficient Generation of Large Amounts of Training Data for Sign Language Recognition: A Semi-automatic Tool,” International Conference on Computers Helping People with Special Needs,  2006: 635-642.

 

*  S. Nayak, S. Sarkar, and K. Sengupta, “Modeling Signs using Functional Data Analysis,” IAPR Conference on Computer Vision, Graphics, and Image Processing, 2004.

 

*  B. L. Loeding, S. Sarkar, A. Parashar, A. Karshmer, “Progress in Automated Computer Recognition of Sign  LanguageInternational Conference on Computers Helping People with Special Needs 2004: 1079-1087.

 

*  Sunita Nayak, “A vision-based approach for unsupervised modeling of signs embedded in continuous sentences,” Master’s thesis, University of South Florida, Tampa, 2005. 

 

*  Ayush S Parashar, “Representation and interpretation of manual and non-manual information for automated American Sign Language recognition,” Master’s thesis, University of South Florida, Tampa, 2003.

Data Sets

 

A data set with 25 sentences with 5 instances of each sentence taken against plain background is available for distribution. The vocabulary is the context of an airport security check scenario. For the signs in 4 instances of each sentence, we also have ground truth (manually generated) for the dominant hand locations (each pixel from the hands are marked) and the head location is marked with a rectangle. A release document needs to be signed by a permanent faculty member or researcher.  We need the original signed document.

 

Code

*  Frag-HMM source code is HERE. Frag HMM is a type of HMM taking the observation as the grouping result of a low level over segmentation output.

*  Matching of signs to sentences using enhanced-Level Building and other associated tools

*  A tool to annotate sign language sentences

Funding Acknowledgement

This work was supported in part by the National Science Foundation under ITR grant IIS 0312993 and is currently being supported by funds from the USF Center for Pattern Recognition. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last revised: 19 October 2009