Proposal for a PhD position beginning in September
2014: “Bayesian model of the joint development of perception, action and
phonology”
Context
The Speech Unit(e)s project is focused on the speech unification
process associating the auditory, visual and motor streams in the human brain,
in an interdisciplinary approach combining cognitive psychology, neurosciences,
phonetics (both descriptive and developmental) and computational models. The
framework is provided by the “Perception-for-Action-Control Theory (PACT)”
developed by the PI (Schwartz et al., 2012).
PACT is a perceptuo-motor theory of speech communication, which connects
in a principled way perceptual shaping and motor procedural knowledge in speech
multisensory processing. The communication unit in PACT is neither a sound nor
a gesture but a perceptually shaped gesture, that is a perceptuo-motor unit. It
is characterised by both articulatory coherence – provided by its gestural
nature – and perceptual value – necessary for being functional. PACT considers
two roles for the perceptuo-motor link in speech perception: online unification
of the sensory and motor streams through audio-visuo-motor binding, and offline
joint emergence of the perceptual and motor repertoires in speech development.
Objectives
of the PhD position
In the debates between auditory and motor theories of speech perception,
and in their modern revival concerning the role of the dorsal route (Hickok
& Poeppel, 2004, 2007), there is no real reflexion about what could be the functional role of a perceptuo-motor
coupling for speech perception. The “dorsal route” is supposed to be useful in
“adverse conditions”, e.g. in noise or with a foreign language (Callan et al.,
2004; Zekveld et al., 2006). But no theoretical explanation is actually
proposed for this potential efficiency of motor processes in adverse
conditions.
We have recently developed a computational framework enabling to compare
the predictions of auditory, motor and perceptuo-motor theories in various
kinds of situations (Moulin-Frier et al., 2012). Casting these theories into a
single, unified mathematical framework is an efficient way to compare the
theories and their properties in a systematic manner. Bayesian modeling is a
mathematical framework that precisely allows such comparisons. The trick is
that the same tool, namely probabilities, can be used both for defining the
models and for comparing them (see e.g. Myung & Pitt, 2009).
The generic model we developed is called COSMO, which stands for
"Communicating about Objects using Sensory-Motor Operations". The
COSMO acronym also represents the five variables around which the basic
structure of the model is built. In COSMO, communication (C) is a success when
an object OS in the speaker’s mind is transferred, via sensory and
motor means S and M, to the listener’s mind where it is correctly recovered as
OL. COSMO assumes that a communicating agent, which is both a
speaker and a listener, internalizes the communication situation inside an
internal model.
The PhD project aims at developing COSMO in two major directions.
(1) Joint acquisition of
perceptual and motor repertoires in a syllabic framework. Experiments in COSMO have mainly concerned simple stimuli, e.g. in
abstract one-dimensional sensory-motor spaces, or with restricted vowel
samples. We will explore strategies for automatically learning to produce and
perceive complex sequences such as plosive-vowel CV sequences, which display
systematic coarticulation phenomena. Various kinds of exploration and learning mechanisms
are available from cognitive and developmental robotics (Moulin-Frier &
Oudeyer, 2012). Validation tests will be inspired from real data, on e.g. locus
equations for plosive acoustics (Sussman et al., 1998), robustness to
perturbations in production (Lindblom et al., 1979; Savariaux et al., 1995), or
coupling of perceptual and motor idiosyncrasies.
(2) Comparison of
auditory, motor and perceptuo-motor theories for speech processing in various
conditions. Once these perception and production
components will be settled in COSMO, we will compare auditory, motor and
perceptuo-motor speech perception theories in challenging conditions, such as
noise, speaker normalization, or foreign accent. We will test the ability to
develop a perceptuo-motor phonology from auditory and motor experience, e.g. to
acquire a category such as “plosive place of articulation” through the
discovery of perceptuo-motor links in learning. We will also test COSMO on
natural CV stimuli, exploiting natural multi-speaker corpora of CV sequences
for learning and perceptual tests.
The work will be realized within a multidisciplinary team gathering
knowledge in speech communication, cognitive theories and Bayesian modeling
(Jean-Luc Schwartz in GIPSA-Lab Grenoble, Julien Diard in LPNC Grenoble, Pierre
Bessière in ISIR Paris), in collaboration with Pierre-Yves Oudeyer in INRIA
Bordeaux.
Practical
information
The PhD position is open from September 2014, or slightly later if
necessary.
Candidates should have a master, some knowledge about
speech and cognitive modeling, and ability to program and to develop
computational models.
They must send a CV, together with a letter explaining why
they are interested in the project. They should also provide two names
(with email addresses) for recommendations about their applications.
This should be send before June 15th to Jean-Luc Schwartz (Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.fr). Interviews will be
done with preselected candidates. Decision will occur in the following weeks.