Hand Centered Studies of Human Movement Project

Technical Report 96-1

Hand Gestures for HCI

Research on human movement behaviour reviewed in the context of hand centred input.

Prepared by: Axel Mulder, School of Kinesiology, Simon Fraser University, February 1996

Acknowledgement: This work was supported in part by a strategic grant from the Natural Sciences and Engineering Research Council of Canada.

© Copyright 1996 Simon Fraser University. All rights reserved.


Contents

Summary

Classifying Hand Movement

Gestural Communication Hand Gestures for HCI Appendix A: Verbal Synonyms for Hand Movements

References (you can also search for more references)


Summary

This paper focusses on the design issues involved in implementing human computer communication by means of full hand movements, i.e. based on hand position and shape.

A variety of terms for describing and defining hand movements are examined, to gain a better insight into possible ways for classification. Hand movements can be grouped according to function as semiotic, ergotic or epistemic (Cadoz, 1994). Semiotic hand movements can be classified as iconic, metaphoric, deictic, beat-like (McNeill, 1992) and, according to their linguisticity, as gesticulation, language-like, pantomime, emblematic or as sign language (Kendon, 1988). Human communication comes in many modalities. They include speech, gestures, facial and bodily expressions which appear to implement in close cooperation parts or all of the aspects of the expression, such as temporo-spatial, visual, structural and emotional aspects. Thus, human communication is not only symbolic. Emotional aspects of an expression modulate other aspects of the expression.

Research efforts in the design of gestural interfaces and other types of input devices which capture hand shape and position are reviewed. The incorporation of research results from human movement behaviour as listed above is gradually taking place, although there are still tendencies to ignore the importance of these findings. Difficulties in the design and development of gestural interfaces are discussed taking some of these findings into account. Issues discussed include hand movement tracking needs, context detection, gesture segmentation, feature description and gesture identification. The identification of a method to define a set of standard gestures is addressed.


Classifying Hand Movement

This paper is a reflection of an ongoing effort to examine results of research into human communication through movement to benefit the design and development of computer interfaces that more adequately capture such forms of human communication. Human communication comes in many modalities, including speech, gestures, facial and bodily expressions. A variety of forms of expression, such as poetry, sign language, mimicry, music and dance, exploit specific capacities of one or more of these modalities. This paper focusses on the design issues involved in implementing human computer communication by means of hand movements into human computer interaction.

Examples

To refresh the mind let us look at a random list of examples of hand movements:

Definitions

A brief discussion of the word gesture and its possible meanings is appropriate. Gesture has been used in place of posture and vice versa. The tendency however, is to see gesture as dynamic and posture as static. In prosaic and poetic literature, gesture is often used to mean an initiation or conclusion of some human interaction, where no human movement may be involved. The notion of a musical gesture without actual human movement is quite common. Obviously, musical expression is intimately connected with human movement, hence the existence of such idiom.

In this paper, a hand gesture and hand movement are both defined as the motions of fingers, hands and arms. Hand posture is defined as the position of the hand and fingers at one instant in time. However, hand posture and gesture describe situations where hands are used as a means to communicate to either machine or human. Empty-handed gestures and free-hand gestures are generally used to indicate use of the hands for communication purposes without physical manipulation of any object.

The motivation for these definitions will become apparent in the course of this paper, while slight nuances will also be added.

Verbal Synonyms

Spoken english language has over the ages incorporated a number of expressions and words that signify hand actions and gestures. The mere fact that these words and expressions exist indicates that the goals, not necessarily the corresponding hand actions and gestures, they identify are common in daily life. McNeill (1992) pointed out that gestures are not equivalent to speech, but that gestures and speech complement each other (this will be further discussed below). In other words, the speech modality may have developed such that certain communication can only be expressed using gestures. Consequently, the available verbal language may not represent a number of common and/or important gestures. In Appendix A a list of words describing hand movements is given. The list can be divided in a number of groups:

Classifications

It is almost immediately clear from the above discussions that hand movements can be divided into two major groups, one involving communication (such as empty handed gestures), the other involving manipulation and prehension. Somewhat inbetween lie the hand movements identified as haptic exploration actions. Similar considerations must have led Cadoz (1994) to classify hand movements according to their function:

All three functions may be augmented using an instrument, e.g. handkerchief for a semiotic good-bye movement, a pair of scissors for the ergotic cutting movement, a stick for an epistemic poking movement. For the purpose of this paper we can by definition substitute gestures for semiotic hand movements. Hand actions commonly identified as prehension are a subset of ergotic hand movements.

Similarly, Thieffry, in Malek et al (1981) classifies hand movements as:

Transitive hand movements are part of an uninterrupted sequence of interconnected structured hand movements that are adapted in time and space, with the aim of completing a program, such as prehension. These hand movements could be equally classified as Cadoz's ergotic hand movements.

Intransitive hand movements or gestures have a universal language value especially for the expression of affective and aesthetic ideas. Such gestures can be indicative, exhortative, imperative, rejective a.o.. The gesture alone expresses fully the intention and motivation of its author. These gestures could be equally classified as Cadoz's semiotic hand movements.

We can further classify Cadoz's semiotic hand movements or gestures and ergotic hand movements.

Semiotic Hand Movements

Many researchers consider gestures, or semiotic hand movements, as intimately connected with speech and some conclude that speech is complementary to gesture.

McNeill (1992) compares his classification scheme with a number of other researchers (Efron, Freedman, Hoffman, Ekman and Friesen) and concludes that all are using very similar categories. He classifies gestures as follows:

Any of these gestures can be cohesive gestures or gestures that tie together thematically related but temporally separated parts of the discourse.

Kendon (1988) classifies gestures along a continuum, discussed more in depth below:

Nespoulos & Lecours (1986) take a more detailed approach and suggest a three level scheme for classification: This approach seems of limited use due to its lack of clear distinctions. While it summarizes a number of gestural behaviours it does not suggest some form of underlying structure or model for the processes involving gestural expression. Nevertheless, the concept of arbitrariness of gestures is duly noted. It refers to the fact that gestures may be somewhat formalized and generally recognizable by others, but such formalization exists only within a culture. There is no such thing as an universal gesture. Kendon's continuum brings forward an apparent principle, that of different levels of linguisticity of gestures. McNeill's classification is clearly recognizable and just as Kendon, emphasizes the strong connection between speech and gesture.

Ergotic Hand Movements

It seems likely that the oldest purpose of our hands is to manipulate the physical world, such that it better suited our needs. In terms of objects of a size of the order of our hands we can change the object's position, orientation and shape. Objects can be solid, fluid or gaseous. Therefore ergotic hand movements can be classified according to physical characteristics:

The value of such a classification is limited due to the fact that no reference is made to the task at hand, although the indirection level bears some relation to the notion of a task. Basically, the model of ergotic handmovements suggested by this classification omits the importance of cognitive processes in such hand movements.

It is more common to classify ergotic hand movements according to their function, ie. as either prehensile or non-prehensile. Non-prehensile movements include pushing, lifting, tapping and punching. Mackenzie (1994) defines prehension as the application of functionally effective forces by the hand to an object for a task, given numerous constraints. While various taxonomies exist, one readily recognizable classification scheme (Napier, 1993) identifies a prehensile movement as either a:

The type of grip used in any given activity is a function of the activity itself and does not depend on the shape or size of the object to be gripped. Although in extreme cases this does not always hold. While this classification relates to the musculo-skeletal properties of the hand, notably opposition, it incorporates the notion of a task, such actions requiring precision or power. However, neither the scissor grip nor the hook grip can be related in a similar way to the notion of a task. These grips merely refer to a frequently used hand movement. The classification is therefore somewhat ambiguous.

Pressing (1991) lists some more ways to classify ergotic hand movements:

Classification 1 is based on a control task taxonomy and is, for specific purposes, useful. Classification 2 puts the emphasis on the observer's point of view and extracts the semiotic function of the hand movement, although the movement may be purely ergotic from the executer's point of view. It demonstrates that the semiotic function of hand movements is always present. Classification 3 omits hand shape as a parameter and has limitations similar to the first classification discussed above.


Gestural Communication

Evolution of Human Communication

Kimura (1993) and others have pointed out that there is evidence that (hand)gestures preceded speech in the evolution of communication systems amongst hominids.This finding supports the modeling of gesture and speech as forms of expression generated by a system where formalized linguistic representation is not the main form from which gestures are derived. Instead, it is conjectured by McNeill (1992) that gestures and speech are an integrated form of expression of utterances where speech and gestures are complementary.

Linguistic Aspects of Gesture

Many have investigated the relation between human gestures and speech. Kendon (1988) ordered gestures of varying nature along a continuum of "linguisticity":

Gesticulation - Language-like gestures - Pantomimes - Emblems - Sign languages

Observe that while going from gesticulation to sign languages:

In other words, the formalized, linguistic component of the expression present in speech is replaced by signs going from gesticulation to sign languages. This supports the idea that gesture and speech are generated by one integral system as suggested above.

In an effort to further define the underlying structure of gestures, McNeill, Levy and Pedelty (1990) propose a diagram (based upon Kendon's work) that clarifies the relations between the units at each level of the speaker's gestural discourse. Each unit consists of one or more of the units of a (higher numbered) level:

From this structure it can be seen that quite often the word gesture is used to identify the stroke. Perhaps this simplification is due to the fact that most of the linguistic aspects of the expression are communicated in the stroke. In a similar vein, it should be noted that a hand posture often involves a preparation and retraction phase (cf. the OK sign). Therefore the definition of a hand posture as solely the position of hand and fingers at one particular point in time is somewhat misleading. Sequences of postures occur with other hand movements in between.

Gesticulation

McNeill (1992) concluded that there is no body "language", but that instead gestures complement spoken language. In Kendon's (1980) words: the phrases of gesticulation that co-occur with speech are not to be thought of either as mere embellishments of expression or as by-products of the speech process. They are rather, an alternate manifestation of the process by which ideas are encoded into patterns of behaviour which can be apprehended by others as reportive of those ideas. Such hand movements voluntarily but also involuntarily convey extra information, besides speech, about the internal mental processes of the speaker. Obviously, McNeill is concerned with gestures similar to gesticulation as defined in Kendon's continuum. McNeill supports his conclusion above by finding that gesticulation-type gestures have the following non-linguistic properties:

Due to the lack of linguistic features of gesticulation such gestures can not be analysed with the tools developed for studying spoken language, and caution must be taken when summarizing the pragmatics, semantics and syntax of such gestures. McNeill's arguments for supporting the hypothesis that gesture and spoken language are a single system based on the following findings:

Sign Language

Examples of sign languages are the American Sign Language (ASL) and the Deaf and Dumb Language. ASL is an amalgam with French Sign Language. Other systems of formally coded hand and arm signals are pidgin, or creole language, and a gesture language used by the women of the Warlpiri, and aborigine people living in the north central Australian desert.

In ASL, the prevalent form of signing consists of unilateral or bilateral series of movements usually involving the whole arm. Typically, a particular hand shape is moved through a pattern in a location specified with respect to the body. Each sign roughly corresponds to a concept such as a thing or an event, but there is not necessarily an exact equivalence with English words or with words of any spoken language. A native sign language like ASL is therefore quite different from a manual depiction of spoken language, such as signed english, or co-verbal gesticulation.

A manual sign is claimed to be distinguished from other signs by 4 features (Stokoe, 1980):

Sign languages exhibit language-like properties since they are used by people who have to communicate symbolically encoded messages without the use of the speech channel. They exhibit the following language-like properties (McNeill 1992): In signlanguages meaning can be modulated, e.g. the addition of emotional expression, by varying the following parameters (personal communication with sign language instructors and Klima & Bellugi, 1979):

Ergotic versus Semiotic and Emotions

The relation between the semiotic and ergotic function of hand movements, how they differ, their possible (mutual) dependencies in terms of neuro-motor systems have barely been researched. Kimura (1993) suggests that manual praxis is essential for signing, however manual praxis and signing are not identical. This suggest a model where gestural communication is a higher level in the hierarchy of systems involved in the creation of hand movements. As discussed in the section on ergotic hand movements, it appears that the semiotic function is always present, whether consciously intended or not. This can be explained by remembering that for communication a sender and receiver are needed. The receiver can always decide to interpret signal that the sender unintentionally has sent. Emotion functions in human communication as a means for modulating the semiotic content, such as emphasis. The expression of emotions during ergotic movements adds some semiotic content to the actions so that the hand movements are differently interpreted by an observer. This procedure may be used, either by the sender (by amplifying the emotional content, such as the dropping of items to draw attention to an issue or a problem) or the receiver (by amplifying the focus on emotional content, such as the initial remarks in conversation as in you seem rather tense today ... anything wrong ?) to initiate a communication with more semiotic content, usually of verbal nature. Perhaps emotions could therefore be seen as a bridge between ergotic and semiotic movements.

Handedness

As far as cortico-spinal systems are concerned, arm, hand and fingermovements are controlled contralaterally, while arm and shoulder movements may also be controlled ipsilaterally, ie. proximal movements can be controlled contra- as well as ipsilaterally, while distal movements are only controlled contralaterally. The left hemisphere is specialized for complex movement programming, ie. manual praxis. Consequently, movements involving identical commands to the two limbs, ie. motor commands resulting in mirror image movements, whether they are temporally coinciding or not, are more frequent in natural gestural communication. In contrast, different, simultaneously expressed hand postures occur frequently due to the more disparate control of the distal musculature. The fact that the manual praxis system is left hemisphere based is also thought to be the origin of the prevalent right-handedness and is supported by findings susch as the preference of signers to use the right hand if only hand can be used and the correlation between sign language aphasia, manual apraxia and left hemisphere damage. Although the left hemisphere is essential for selecting movements, the left hand is thought to be better in executing independent finger movements than the right hand (Kimura, 1993).

Since the human brain exhibits different capabilities for the right and a left hemispere, differences can be expected between the capabilities of the left and right hand. This is not as noticeable at the physical level, i.e. in ergotic hand movements, but more apparent in communicating with the hands. Our left hand, it can be speculated, may perform better in expressing holistic concepts and dealing with spatial features of the environment, while our right hand may perform better in communicating symbolic information.

Napier (1993) discusses handedness and suggests that the dominance of right handedness (in the order of 90% are right handed) gradually evolved from a slight left handedness in nonhuman primates, perhaps under the influence of social and cultural pressures not so much due to capabilities of the right hand which would supersede capabilities of the left hand. While an exception, humans have been known to be almost perfectly ambidextrous.

Other Modalities

From the above, a clear relation can be seen between gestures and speech as well as body posture. Napier (1993) explicitly includes facial expressions and bodily movements when examining the use of gestures as:

It is well known that sign language can be almost entirely replaced by facial expressions. In musical conducting the integral body posture and dynamics are by many considered an integral part of the expression of the conductor. A more formalized conducting method (Saito, 199?) prescribes that only gestures of the upper body are allowed.

Human communication consists of a number of perceptually distinguishable channels which operate through different modalities. It appears that a single system underlies this communication which can direct aspects of the expression through one modality while using other modalities for other aspects of the expression. Each modality has its intrinsic limitations due to constraints of musculo-skeletal and neuro-motor nature for example. In co-verbal gesticulation for instance, these aspects are the structured content of the expression as present in linguistic forms. Both hand gestures and speech can be used to express such linguistic forms. Further research is needed to establish more detail in these aspects of human communication and how they are distributed amongst the various modalities and under which conditions. It can be easily observed that, in humans with speaking ability, speech is the most efficient channel for structural aspects (relating concepts and identifying how) of expression, while temporo-spatial and visual representation aspects (identification of where, when, which and what) are more easily conveyed through hand gestures. Perhaps posture, facial expression as well as gestures are best applied for communication of emotional aspects (modulation of concepts).


Hand Gestures for HCI

Definitions

In the HCI literature the word gesture has been used to identify many types of hand movements for control of computer processes. Perhaps to avoid confusion Sturman (1993) defines whole hand input as the full and direct use of the hand's capabilities for the control of computer-mediated tasks, thereby making a more precise indication of which type of human movement is involved (i.e. not just positioning but also hand shape) as well as for which purpose they are applied. By using the word input the association is made with information theory, conceiving of hands as devices which output information to be received and interpreted by another device. Although the link is made with the semiotic function of hand movements, the hand is really a communication device, i.e. it can both receive and send information. The limitation to capabilities of the hand only however, excludes the suggestion that semiotic hand movements should be considered part of an integrated system for human expression. In fact capabilities of the hand strongly suggests a focus on musculo-skeletal capabilities since those are literally the capabilities of the hand. As such the definition oversees the neuro-motor systems and cognitive abilities involved in hand movements. Somewhat more appropriate, hand centred input, as a shorthand for human computer interaction involving hand movements, emphasizes the context as created by other modalities. However, it is not specific as to the type of hand movements; gesturing with a mouse, empty-handed gestures or movements with a joystick are all included.

None of these terms address the distinction between semiotic and ergotic hand movements, let alone refer to the existence of epistemic hand movements. In the following the focus will be on the use of hand movements to their full extent, ie. as few limitations as permitted by the current state of the art movement tracking technology will be imposed on the hand movements, in computer-mediated tasks. Such applications of hand movements will be called whole hand centred input. To include a reference to epistemic hand movements, input could be replaced by communication. A hand gesture interface will mean an HCI system employing whole hand centred input which specifically exploits the semiotic function of hand movements. Certain applications may however not exploit the full capabilities of the hand. Mouse gesturing and joystick controlling amongst others will not be examined.

Prior Art

The following applications that mainly exploit the ergotic function of hand movements have been found in the literature:

Other possible applications not found in the literature include sound design, stage sound mixing and game playing.

The following applications that mainly exploit the semiotic function of hand movements have been found in the literature:

Other possible applications not found in the literature include airplane and other traffic control, game scoring and playing as welll as stock trading and legal transactions (Hibbits, 1995).

Often applications only implement the use of hand signs or postures. In the case of Pook (1995), the use of hand signs which indicate to the robot to execute a movement pattern to fulfill a task, is questionable since such a mapping could be more easily implemented using simple function keys on a keyboard. The value of the approach is that it allows the user to act more naturally since no cognitive effort is required in mapping function keys to robotic hand actions. Applications involving navigation essentially implement deictic gestures. The applications involving signlanguage are technically impressive, but, since they basically implement a nicely formalized system of human communication of use for a relatively small (but important) group of people only, they do not provide us with much clues as to the interpretation, definition or modeling of gestures in computer-mediated tasks.

The research into the use of hand gestures with speech (or speech with gestures) has gained special attention. Cavazza (1995), Wexelblatt (1995), Hauptmann & McAvinney (1993), Sparrel (1993), Cassell et al (1994) all implement gestural communication theory as developed by McNeill, Kendon, Birdwhistell and others. Sparrel (1993) implemented a system for interpretation co-verbal iconic gestures as defined by McNeill (see above).

Multimodal interaction, involving not only hand gestures and speech, but also facial expressions and body posture is another distinguishable subject researched by Gao (1995), Maggioni (1995), Hataoka et al (1995) and Bohm (1995) amongst others. These research efforts are generally rather poor from a human behaviour researcher's point of view, since they simply make a system which is able to detect each modality, which information is then checked against each other, so that each modality is basically interpreted as communicating the same information. Little or no effort is made to integrate the system's abilities using a somewhat sophisticated model of human expression, e.g. where each modality is deemed to express specific aspects, which cannot simply be used to confirm the correct interpretation of another modality, of the expression such as discussed above.

Hand Gesture Interface Design

Using the above classification of and investigation into hand movements, we can now proceed with evaluating and analysing how hand movements have been used in processes mediated through computers. Such an evaluation is of interest since the following problems in the design and implementation of whole hand centred input applications have remained unsolved:

In general, many applications have been technology driven, and not based on knowledge of human behaviour and/or a proper task analysis. In many cases there is no consideration given to the different functions of hand movements, particular the semiotic and ergotic functions.

From the many aspects discussed in the aforegoing some basic comments can be made for better design of hand gesture interfaces:

Standard Hand Gestures

Many applications can be critisized for their idiosyncratic choice of hand gestures or postures to control or direct the computer-mediated task (Baudel, Harrison). However, the choice was probably perfectly natural for the developer of the application ! This shows the dependence of gestures on their cultural and social environment. For specialized, frequent tasks, where the learning of a particular set of gestures and postures is worth the investment, such applications may have a value. In everyday life, however, it is quite unlikely that users will be interested in a device for which they have to learn some specific set of gestures and postures, unless there is an obvious increase in efficiency or ease of use over existing methods of hand centred input in the adoption of such a gestural protocol. On the other hand, the economics of the marketplace may dictate such a set independent of its compatibility with existing cultural and/or social standards, just like the keyboard and mouse have set a standard. Especially when users are allowed to expand or create their own sets such a protocol may gain some acceptance.

The problem in defining a possible standard for a gesture set is that it is very easy to let the graphical user interface dictate which type of hand movements are required to complete certain computer-mediated tasks. However, the computer can be programmed to present tasks in a variety of graphical and auditory ways. The aim is to make the computer and its peripherals transparent, meaning that the tasks are presented in a way that execution of these tasks is most natural. Unfortunately the concept of naturalness may apply for ergotic hand movements but semiotic hand movements prohibit such naturalness for all of humankind, unless the system knows which culture the user belongs to.

Zimmerman, in a personal communication, suggested only seven parameters are needed to describe hand movements for the bulk of the applications: hand position and orientation and hand grip aperture, where the last parameter could be just binary, ie. open/close. Such a standard seems to put a lot of emphasis on the ergotic function only of hand movements and will severly restrict access to the semiotic function. Augustine Su (1994) suggests touching, pointing and gripping as a minimal set of gestures that need to be distinguished. At least they include a reference to the tasks involved, but they do not make any reference to the semiotic function of hand movements. An analysis of sign language is needed to extract the very basic gestures and postures that minimize the amount of learning required of the user. An initial parametrization is proposed by Stokoe (1980), as discussed above. Interestingly, his 4 features are mostly body-centred, which would suggest that for the semiotic function a body-centred coordinate system is more appropriate when analysing gestures. On the contrary, for ergotic functions, a world-based coordinate system seems more obvious.

These suggestions for standards may be combined if some way is found to recognize the relevance of either the ergotic or semiotic function or both during a specific hand movement. Perhaps the analysis of emotional behaviour can provide clues. For a gesture set to gain major acceptance in the market place, it is advisable to examine the tasks and semiotic functions most frequently executed and then choose a hand gesture set that seems to appear natural, at least to a number of different people within a social group or even a culture, when executing those tasks and functions. Simple market economics will then do the rest.


References

Augustine Su , S. and Richard Furuta (1993). A Logical Hand Device in Virtual Environments Virtual Reality Software & Technology: Proceedings of the ACM VRST'94 Conference edited by Gurminder Singh, Steven K. Feiner, & Daniel Thalmann pages 33-42, World Scientific Publishing Co., Singapore, 1994 (Singapore, August 23-26, 1994)

Augustine Su , S. and Richard Furuta (1994). A Specification of 3D Manipulations in Virtual Environments ISMCR'94: Topical Workshop on Virtual Reality Proceedings of the Fourth International Symposium on Measurement and Control in Robotics pages 64-68, NASA Conference Publication 10163, November 1994 (Houston, Texas, November 30 - December 3, 1994)

Augustine Su, S. (1993). Hand Modeling in Virtual Environment. Master Scholarly Paper (No presentation) Department of Computer Science, University of Maryland, College Park, Maryland, 1993

Augustine Su, S. and Richard Furuta (1993). The Virtual Panel Architecture: A 3D Gesture Framework Proceedings of the 1993 IEEE Virtual Reality Annual International Symposium (VRAIS'93) pages 387-393, IEEE, 1993 (Seattle, Washington, September 18-22, 1993)

Augustine Su, S., Furuta, R. (1994). A logical hand device in virtual environments. Conference proceedings VRST 94. Available through anonymous ftp.

Baudel, T., Beaudoin-Lafon, M. (1993). Charade: Remote control of objects using free-hand gestures. Communications of the ACM 36(7) p29-35.

Baudel, Thomas (1991). Spécificités de l'interaction gestuelle dans un environnement multimodal IHM'91, p. 11-16, 1991

Baudel, Thomas (1994). A Mark-Based Interaction Paradigm for Free-Hand Drawing ACM-SIGGRAPH & SIGCHI, Proc. ACM Symposium on User Interface Software and Technology (UIST), 1994

Baudel, Thomas and Annelies Braffort (1993). Reconnaissance de gestes de la main en environnement reèel EC2, L'interface des mondes rèels et virtuels, Montpellier, France.

Baudel, Thomas and Beaudouin-Lafon, Michel and Annelies Braffort and Daniel Teil (1992). An Interaction Model Designed for Hand gesture Input. Technical report no. 772, LRI, Université de Paris-Sud. Available through anonymous ftp.

Baudel, Thomas and Yacine Bellik and Jean Caelen and Chatty, Stéphane and Joelle Coutaz and Francis Jambon and Solange Karsenty and Daniel Teil (1993). SystÈmes d'analyse des interactions Homme-Ordinateur IHM'93.

Bergamasco, M. (1994). Manipulation and exploration of virtual objects. Magnenat-Thalmann, N., Thalmann, D., Artificial life and virtual reality. Wiley.

Bohm, K.; V. Kuehn, J. Zedler (1995). Multimodal interaction in virtual environments. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Bohm, K.; Vaananen, K. (1993). Gesture driven interaction as a human factor in virtual environments - an approach with neural networks. In: Earnshaw, R. (ed) Virtual reality systems, p 93-107. London, UK: Academic press.

Brooks, F. (1988). Grasping reality through illusion: Interactive graphics serving science. Proceedings CHI '88 Conference - Human Factors in Computing Systems, p 1-11. New York, USA: ACM.

Cadoz, C. (1994). Les realites virtuelles. Dominos, Flammarion.

Cassell, J.; M. Steedman, N.I. Badler, C. Pelachaud, M. Stone, B. Douville, S. Prevost, B. Achorn (1994). "Modeling the Interaction between Speech and Gesture", Proceedings of the 16th Annual Conference of the Cognitive Science Society, Georgia Institute of Technology, Atlanta, USA.

Cavazza, M. (1995). Integrated semantic processing of speech and gestures. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Coutaz, J., Crowley, J. (1995). Interpreting human gesture with computer vision. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Feiner, S.; Beshers, C. (1990). Worlds within worlds: metaphors for exploring n-dimensional virtual worlds. Proceedings User Interface Software and technology '90, p 76-83. New York, USA: ACM.

Fels, S. Sidney (1990). Building adaptive interfaces with neural networks: the glove-talk pilot study. Technical report CRG-TRG-90-1. University of Toronto, Toronto, Canada.

Fels, S. Sidney; Hinton, Geoffrey E. (1990). Building Adaptive Interfaces with Neural Networks: the Glove-Talk Pilot Study Human Computer Interaction - INTERACT '90, D. Diaper et al (editors), IFIP, pp 683-688. Elsevier Science Publishers Amsterdam, NL

Fels, S.S., (1994). Glove-Talk II: Mapping hand gestures to speech using neural networks - An approach to building adaptive interfaces. PhD thesis, University of Toronto, Toronto, Canada ftp cs.toronto.edu in pub/ssfels/phdthesis.short.ps.Z

Fels, S.S., Hinton, G.E., (1993). Glove-Talk: a neural network interface between a data-glove and a speech synthesizer. IEEE Transactions on Neural Networks, 4 (1): 2-8.

Fels, S.S., Hinton, G.E., (1994). Glove-Talk II: Mapping hand gestures to speech using neural networks. Proceedings of the Conference on Neural Information Processing Systems (NIPS).

Gao, W. (1995). Enhancement of human-computer interaction by hand gesture recognition. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Gao, W. (??) On human body language understanding. ???

Gao, W., Brooks, R. (??). Hand gesture recognition for enhanced human computer interaction. ???

Harrison, D., Jaques, M., Strickland, P. (1993). Design by manufacture simulation using a glove input. In: Warwick, K., Gray, J. and Roberts, D., Virtual reality in engineering. UK: The institution of electrical engineers.

Hataoka, N., Ando, H. (1995). Prototype development of multimodal interfaces using speech and pointing gestures. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Hauptmann, A.G.; McAvinney, P. (1993). Gestures with speech for graphic manipulation. International Journal of Man-Machine Studies Vol: 38 Iss: 2 p. 231-49.

Hibbits, B.J. (1995). (no title) Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Katkere, Arun, Hunter, Edward, Kuramura, Don, Schlenzig, Jennifer, Moezzi, Saied, Jain, Ramesh . ROBOGEST: Telepresence using Hand Gestures. Technical Report VCL-94-104, Visual Computing Laboratory, University of California, San Diego, December 1994. PostScript version.

Kendon, A. (1988). How gestures can become like words. In Potyatos, F. (ed), Crosscultural perspectives in nonverbal communication, p 131-141. Toronto, Canada: Hogrefe.

Kendon, A. (1980). Gesticulation and speech: two aspects of the process of utterance. In: Key, M.R., The relation of verbal and nonverbal communication. The Hague, The Netherlands: Mouton.

Kimura, D. (1993). Neuromotor mechanisms in human communication. Oxford, UK: Oxford University Press.

Klima, E. & Bellugi, U. (1979). The signs of language. Cambridge, MA, USA: Harvard university press.

Kramer, J., Leifer, L. (1987). The "Talking Glove": An expressive and receptive "verbal" communication aid for the deaf, deaf-blind, and nonvocal. In: Murphy, H.J., Proceedings of the third annual conference "computer technology / special education / rehabilitation", California state university, Northridge, October 15-17, 1987, p335-340.

Kramer, J., Leifer, L. (1989). The "Talking Glove": A speaking aid for nonvocal deaf and deaf-blind individuals. RESNA 12th Annual conference, New Orleans, Louisiana, USA, p471-472.

Kramer, J., Leifer, L. (1990). A "talking glove" for nonverbal deaf individuals. Technical report CDR-19900312, Center for design research, Stanford university, CA, USA.

Kramer, J.F. (1995). The CyberGlove and it's many uses as a gestural input device. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Lederman, S.J. & Klatzky, R.L. (1987). Hand movements: A window into haptic object recognition. Cognitive Psychology, 19, 342-368.

Machover, T. & Chung, J. (1989). Hyperinstruments: Musically intelligent and interactive performance and creativity systems. Proceedings International Computer Music Conference, Columbus, Ohio, USA. San Fransisco CA, USA: International Computer Music Association.

Maggioni, C. (1995). Gesture computer. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Malek, R., Harrison, S. and Thieffry, S. (1981). Prehension and gestures. In: Tubiana, R., The hand. Philadelphia, USA: Saunders.

McNeil, David; E.T. Levy, L.L. Pedelty (1990). Speech and gesture In: Cerebral control of speech and limb movements Hammond G.E. (editor) pp 203-256 Elsevier Science Publishers Amsterdam, Nederland.

McNeill, D. (1992). Hand and mind: what gestures reveal about thought. Chicago, USA: University of chicago press.

Morita, H., Hashimoto, S. & Ohteru, S. (1991). A computer music system that follows a human conductor. IEEE Computer, July, p 44-53.

Mulder, A.G.E. (1994). Virtual Musical Instruments: Accessing the sound synthesis universe as a performer.

Mulder, A.G.E. (1994). Human Movement Tracking Technology.

Napier, J.R. (1993). Hands. Princeton, N.J.: Princeton University Press.

Nespoulos, J.L., Roch Lecours, A., (1986). Gestures: nature and function. In: Nespoulos, J.L., Perron, P., Roch Lecours, A., The biological foundations of gestures: motor and semiotic aspects, p 49-62. Hillsdale, New Jersey, USA: Lawrence Erlbaum Associates.

Papper, M.J., Gigante, M.A. (1993). Using gestures to control a virtual arm. In: Earnshaw, R., Virtual reality systems. UK: Academic press.

Pook, P.K.; D.H. Ballard (1995). Teleassistance: A gestural sign language for teleoperation. Position paper for the workshop gesture at the user interface, CHI 95, Denver, CO, USA, May 1995.

Pressing, J. (1991). Synthesizer performance and real-time techniques. Madison, WI, USA: A-R editions.

Saito (). The Saito conducting method.

Sparrell, C.J. (1993). Coverbal iconic gesture in human-computer interaction. Msc. Thesis, Media arts and sciences, MIT. Available through anonymous ftp.

Starner, T.E. (1995). Visual recognition of american sign language using hidden markov models. MSc. thesis, MIT Medialab. Available through anonymus ftp.

Stokoe, W.C. (1980). Sign language structure. Annual review of anthropology, 9, 365-390.

Sturman, D.J. (1992). Whole Hand Input. Ph.D. Thesis. [Available via anonymous ftp at media-lab.mit.edu, ./pub/sturman/WholeHandInput]. Cambridge, MA: Massachusetts Institute of Technology.

Sturman, D.J., and Zeltzer, D. (1994). A Survey of Glove-Based Input. IEEE Computer Graphics and Applications, 14 (1) (january), 30-39.

Sturman, D.J., Zeltzer, D. (1993). A design method for "whole-hand human computer interaction". ACM transactions on information systems, 11(3), p219-238.

Sturman, D.J., Zeltzer, D., Pieper, S. (1989). Hands-on interaction with virtual environments. Proceedings ACM SIGGRAPH symposium on user interface software and technology, Williamsburg, VA, USA, 13-15 november 1989, p19-24.

Watson, Richard (1993). A Survey of Gesture Recognition Techniques. Technical Report TCD-CS-93-11, Department of Computer Science, University of Dublin, Trinity College, July 1993.

Weimer, D., Ganapathy, S.K. (1992). Interaction techniques using hand tracking and speech recognition. In: Blattner, M.M., Dannenberg, R.B. (eds), Multimedia interface design. New York, NY, USA.


Appendix A: Descriptions of Hand Movements

Goal directed manipulation

Empty-handed gestures

twiddle, wave, snap, point, hand over, give, take, urge, show, size, count, wring, draw, tickle, fondle, nod, wriggle, shrug

Haptic exploration

touch, stroke, strum, thrum, twang, knock, throb, tickle, strike, beat, hit, slam, tap, nudge, jog, clink, bump, brush, kick, prick, poke, pat, flick, rap, whip, hit, slap, struck, caress, pluck, drub, wallop, whop, thwack, rub, swathe