From the SciAm article:
David Ostry [link], a neuroscientist with co-appointments at McGill University and the New Haven, Conn.-based speech center Haskins Laboratories, has been studying for years the relation between speech and the somatosensory system, the network of receptors in skin and muscle that report information on tactile stimuli to the brain. In his most recent study, published in the Proceedings of the National Academy of Sciences USA, he and two Haskins colleagues found that subjects heard words differently when their mouths were stretched into different positions. The results have implications for neuroscientists studying speech and hearing as well as for therapists looking for new ways to treat speech disorders.The study involved seventy-five young, hearing, American-English-speaking volunteers listening to computer-generated speech (a single word, derived from recordings of a human speaker saying either "head" or "had," and then subjected to various frequency modifications) and pressing a button to indicate which word they thought they heard.
Also, they were to do this while hooked up to this thing:
That thing is a robot that's been programmed to pull on those little plastic tabs to stretch the wearer's mouth in a certain way (Fig. 1, taken from Ito et al.).
(Here's the PNAS article's description):
(That last sentence is a good example of why the passive voice, while seemingly a good choice for scientists looking for a nice, impersonal way to write about methods without going "We did this, and then we did this and this and this" over and over again, can also be a grammatical and semantic minefield. I try never to write in passive voice without consciously asking myself, at least twice, what the subject of my sentence is!)
We programmed a small robotic device (Phantom 1.0, SensAble Technologies) to apply skin-stretch loads (Fig. 1). The skin stretch was produced by using small plastic tabs (2 x 3 cm), which were attached bilaterally to the skin at the sides of the mouth and were connected to the robotic device through thin wires. The wires were supported by wire supports with pulleys to avoid contact between the wires and facial skin. By changing the configuration of the robotic device and the wire supports, the facial skin was stretched in different directions.
Anyway. I was writing about methods before my inner grammar Nazi interrupted, so let's get back to that.
The seventy-five volunteers were split into five groups of fifteen, each of which got a different sort of stimulation from the robot facehugger. Two of the groups were designated as control groups, which in this context means that the ways in which their faces were stretched bore no resemblance to any part of human speech:
We used a robotic device (Fig. 1) to create patterns of facial skin deformation that would be similar to those involved in the production of head and had. We tested 3 directions of skin stretch (up, down, backward) with 3 different groups of subjects. We also carried out 2 control tests in which we assessed the effects on speech perception of patterns of skin stretch that would not be experienced during normal facial motion in speech. One involved a twitch-like pattern of skin stretch; the other involved static skin deformation.
They found that, when participants' lips were pulled upward (like the motion used to shape the short "e" sound in "head"), they were more likely to hear the intermediate sounds as "head," while when their lips were pulled downward (like the short, nasal "a" in "had"), they were more likely to hear those same sounds as "had." Pulling the corners of the mouth straight back had no effect on which word they were likelier to hear.
One thing they didn't measure, but that I kind of wish they had, was whether the presence of extraneous sensory input (i.e., the skin stretching) had any effect on the participants' ability to register what they heard. The way the study was designed, all they could tell was how the "perceptual boundary" could be shifted one way or another --- not whether perception itself was disrupted, as I might expect it to be with such an intrusive competing stimulus!
Although, I can also see how a consistent effect that varies with the type of somatosensory stimulus would argue against any interference between the two sensory processes (i.e., hearing and touch), and would even suggest that they aren't separate processes at all.
That seems to be what the study's authors suggest:
That's the part that really sounds like synesthesia to me, albeit a somewhat weaker version where sensations of one type (say, a feeling of tension or pressure on facial skin) influence your interpretation of sensations of another type (say, sounds that may or may not be words), rather than creating those partnered sensations de novo, as happens in full-blown synesthesia.
The modulation of speech perception observed in the present study may arise as a consequence of facial somatosensory input to facial motor and premotor areas, a pattern that would be consistent with the motor theory of speech perception. However, somatosensory inputs may also affect auditory processing more directly. A number of studies have reported bidirectional effects linking somatosensory and auditory cortices. Indeed, activity due to somatosensory inputs has been observed within a region of the human auditory cortex that is traditionally considered unisensory.
Ito, T., Tiede, M., & Ostry, D. (2009). Somatosensory function in speech perception Proceedings of the National Academy of Sciences, 106 (4), 1245-1248 DOI: 10.1073/pnas.0810063106