TOMI: A Childrens Tangible Online Musical Instrument


Concept and Introduction

Communication through the use of touch, the creation of sounds, and the grasping of objects is a fundamental activity between a young child and parent. TOMI allows parents to take this activity to a whole new level. TOMI represents an online musical toy for children and parents that allows for the creation and sharing of sound events and accompanying visualizations. The goal of this product is to allow for a new form of interaction between parent, child, and family. This form of interaction would not only mimic current activities but would use the unique strengths of the medium to create a whole new experience. The expected result of this product is a collaborative environment where children, parents, and family can create “music” and animations as an aid to child development.

TOMI consists of a physical device, the toy, connected to a computer and an accompanying web site. The web site serves as a means for sharing and further creation. The device consists of three brightly coloured soft toys of various sizes connected to computer which allow a child and her parent to compose through play a series of sounds and than share them with family and friends via the internet. The toys control a sophisticated interaction that is hidden from the both parent and child. By grasping, shaking, and throwing the toys the child causes the manipulation of pre-created sound events and their corresponding symbols. The screen interface forms a semantic relationship – a metaphorical meaning between sound and representative animated symbol.

The website serves as a means to record these play activities. It also shares the same concept of interface and the focus is more on sharing than on high level music creation. During the use of TOMI parents have the option of saving the play sessions sound and visual composition. This saved session can than be automatically uploaded to the website to allow for sharing and repeat performance away from the TOMI device. Parents can download this saved play session to allow for their child to view at another time. As well these saved states can be shared among family and friends. The web site also allows for the creation of unique sessions by family and friends (or potentially parents at work) for the child to view. This could allow for a form of remote play for those unable to be there for the “live” play session.

The Interface and the Design of Sounds
The are a number of ways in which to represent and design sounds in TOMI.
Barrass (1997) describes 7 types of approaches in his Phd. thesis, two of which are useful start for this project- semantic and connotative – while he describes pragmatic, perceptual, task-oriented, and device-oriented as well.

“The semantic approach focuses on the metaphorical meaning of the sound. The connotative method is concerned with the cultural and aesthetic implications of the sound. ” (Barrass 1997).

“Composers and sound designers are concerned with aesthetic, affective connotations…[of sounds]. These connotations reflect the value that society places on both the signifier and the signified in a sign.” The connotations implied by sounds are likely to influence how well it is received by the users of TOMI. Positive connotations may encourage the use of the toy and encourage more play.

The semantic approach is on what is signified by the sound. This is where the relationship between sound, toy, and visualizations can be seen. The following is a general discussion on this approach.

Gaver’s (1997) examination of 2 different strategies for the usage of sound provides a simplified view. One possibility he discusses is to the creation of sounds, the origins of which are analogous to what they represent. These would be characterised as auditory icons and are based on everyday sounds. The other strategy, contrary to the first, is to use sounds that are arbitrarily linked to what they represent. “Earcons” is a commonly held term for these symbolic sounds. An earcon is a musical sound that can be created from any sound source. A compromise between the iconic and symbolic strategies produces metaphoric or “representational” (Brewster 1998) sounds, which means that they share an abstract feature with what they are to represent. An example of a sound metaphor would be to use a high-pitched sound to represent a high spatial position on the screen. Using a low sound pressure to represent a far distance would be an iconic relationship as this what would happen in the real world. The boundries between these three types can be difficult to distinguish.

 There are a number of advantages to using auditory icons in interfaces. Gaver (1997) states that iconic sounds are easily learned and remembered, though they may not be entirely intuitive. When you combine auditory and visual interfaces, it is ideal to have both icons rely on the same analogy, thereby by producing the most comprehensive hybrids. In Gavers (1989) Sonic Finder for Macintosh computers, he grouped auditory icons into “feedback families”. This operating system used “wooden” sounds to indicate interaction with text files. Selecting a text files makes a tapping sound, erasing it would create a splintering effect, and etc. Another possibility introduced in the Sonic Finder was the parameterising of auditory icons where one sound quality or parameter corresponds to the feature of the selected object. In his Sonic Finder, the pitch of the text file sound is parameterised to indicate size, so that selecting a large text file would result in a tapping sound with a lower pitch and selecting a small text file would result in a higher pitch.

A possible problem with auditory icons is that it may prove difficult to find suitable iconic sounds for all events in an interface as they may not correspond to a sound producing event in the real world. Compromises would produce inconsistancy. Norman (1990) stresses the importance of developing a conceptual model that is understandable to the user. A well designed interface should have as few interpretation strategies as possible. Another problem of course is that sounds from the most realistic mappings might not be suitable as it may confuse the user. The user might get confused between the interface and the everyday sounds around her. In large interface sets it might prove challenging to find enough varied sounds that do not interfere with one another and the environment in which they will be used.

Symbolic sounds are arbitrarily mapped to what they represent and do not have to be limited by any similarities. Every earcon can be designed with emphasis on its aesthetic qualities, which is difficult when developing auditory icons. Since these symbolic sounds can be freely designed it opens up greater possibilities for musical interfaces that may be more pleasant and less tiring. As well, music provides for a sophisticated system for the manipulation of groups of sounds (gaver 1997) When designing a musical interface, complex information can be conveyed by sounds that are parameterised in many dimensions.

In the process of building the body of work which has led to this proposal I have come to articulate a set of desiderata for the creation of TOMI.

Easily understood, instantly responsive interfaces. Online and off-line interfaces should be able to manipulate specific creative parameters in the pre-created sound objects. The online interface must be behaviour driven – creating new metaphors than the traditional knobs, buttons, and keys. Traditional musical instruments generally suggest traditions and a seriousness that quickly precludes non-musicians from using them. The offline devices interface must allow for a kind of “purposeful purposelessness” [John Cage] – a type of playing with visuals and sound that is in turn explorative and engaging, intuitive and enjoyable.

The system must allow for a recording and indexing of past performances and allow for a “snapshot” of the current event.

Future extensions should allow for the sharing of video of the play session.

While computer screen may be the easiest display device, television or projector would be preferred. A television allows for the child to lay on the floor and play for longer periods. A projector would allow for the focusing of the animation on a ceiling or large space.

The device must have high quality audio capabilities.


Barrass, Stephen 1997. Auditory Information Design. Ph.D thesis, Australian National University

Brewster, Stephen A. 1998, “Using Non Speech Sounds to Provide Navigation Cues.” ACM Transactions on Computer-Human Interaction 5:3: 224 –259.

Gaver, William W. 1989. “The Sonic Finder” Human-Computer Interaction 4:1. Elsevier

Gaver, William W. 1997. “Auditory interfaces.” In Handbook of Human-Computer Interaction 2nd ed. 1003 –1041.

Kivy, Peter. 1984. “Representation and Expression in Music.” In Sound and Semblance. Princeton University Press.

Norman, Donald A. 1990. The Design of Everyday Things. New York: Doubleday.

Tufte, E.R. 1983. The Visual Display of Quantitative Information. Cheshire, Graphics Press.