The SMILES (Sensorimotor Interaction, Language and Embodiment of Symbols) Workshop will take place virtually at the ICDL 2020 (International Conference on Developmental Learning).

* Call for abstracts and papers:
- Deadline: 25th of September 2020
 - papers abstracts: 2 pages
 - long papers: 4-6 pages
- Submissions: smiles.conf@gmail.com
- Papers format: same as ICDL conference https://cis.ieee.org/publications/t-cognitive-and-developmental-systems/tcds-information-for-authors
- Workshop dates (coming soon): the days before or after the ICDL conference (26-30 October 2020)

* Workshop Short Description
On the one hand, models of sensorimotor interaction are embodied in the environment and in the interaction with other agents. On the other hand, recent Deep Learning development of Natural Language Processing (NLP) models allow to capture increasing language complexity (e.g. compositional representations, word embedding, long term dependencies). However, those NLP models are disembodied in the sense that they are learned from static datasets of text or speech. How can we bridge the gap from low-level sensorimotor interaction to high-level compositional symbolic communication? The SMILES workshop will address this issue through -an interdisciplinary approach involving researchers from (but not limited to):
- Sensori-motor learning,
- Emergent communication in multi-agent systems,
- Chunking of perceptuo-motor gestures (gestures in a general sense: motor, vocal, ...),
- Sensori-motor learning,
- Symbol grounding and symbol emergence,
- Compositional representations for communication and action sequence,
- Hierarchical representations of temporal information,
- Language processing and acquisition in brains and machines,
- Models of animal communication,
- Understanding composition and temporal processing in neural network models, and
- Enaction, active perception, perception-action loop.

* More info
- contact: smiles.conf@gmail.com
- organizers: Xavier Hinaut, Clement Moulin-Frier, Silvia Pagliarini, Joni Zhong, Loo CHU KIONG, Michael Spranger, Tadahiro Taniguchi
- invited speakers (coming soon)
- workshop website (comming soon): https://sites.google.com/view/smiles-workshop/
- ICDL conference website: https://cdstc.gitlab.io/icdl-2020/

* Workshop Long Description
Recently Deep Learning networks have broken many benchmarks in Natural Language Processing (NLP), e.g. [1]. Such breakthroughs are realized by a few mechanisms (e.g. continuous representations like word embedding, attention mechanism, ...). However, no equivalent finding happened towards the understanding of which mechanisms enable the brain to perform similar functions. Deep Learning does not reproduce learning mechanisms nor the dynamics of the brain. The brain needs to parse incoming stimuli and learn from them incrementally, it cannot unfold time like deep learning algorithms such as Back-propagation through time (BPTT). Thus, we still lack the key neuronal mechanisms needed to properly model the (hierarchies of) functions in language perception and production. Other models of language processing reproducing the behavior of brain dynamics (Event-Related-Potentials (ERPs) [2] or functional Magnetic Resonance Imaging (fMRI) [3]) have been developed. However, such model softens lack explanatory power demonstrating the causes of such observed dynamics: i.e. what is computed and why is it computed – for which purpose? We need more biologically plausible learning mechanisms while producing causal explanations of the experimental data modelled. There is converging evidence that language production and comprehension are not separated processes in a modular mind, they are rather interwoven, and this interweaving is what enables people to predict themselves and each other [4].Interweaving of action and perception is important because it allows a learning agent (or a baby) to learn from its own actions: for instance, by learning the perceptual consequences(e.g. the heard sounds) of its own actions (e.g. vocal productions) during babbling [5]. Thus, the agent learns in a self-supervised way instead of relying only on supervised learning, which in contrast, imply non-biological teacher signals cleverly designed by the modeler. Explicit neuronal models explaining which are the mechanisms shaping these perceptuo-motor units through development are missing. The existence of sensorimotor (i.e. mirror) neurons at abstract representation levels (often called action-perception circuits [6]), jointly with the perceptuo-motor shaping of sensorimotor gestures [5], suggest the existence of similar action-perception mechanisms implemented at different levels of hierarchy. How could we go towards such hierarchical architectures based on action-perception mechanisms? Importantly, a language processing model needs a way to acquire the semantics of the (symbolic) perceptuo-motor gestures and of the more abstract representations, otherwise it would consider only morphosyntactic and prosodic featuresof language. These symbolic gestures, i.e. signs, need to be grounded to the mental concept, i.e. signified, they are representing. Several theories and robotic experiments give examples of how symbols could be grounded or how symbols could emerge [7]. However, current neurocomputational models aiming to explain brain processes are not grounded. Robotics have an important role here for the grounding of semantics by experiencing the world through interactions with the physical world and with humans. We need mechanisms that start from raw sensory perception and raw motor commands in order to let emerge plausible representations through development, instead of arbitrary representations. Finally, computational models of emergent communication in agent populations are currently gaining interest in the machine learning community [8], [9]. These contributions show how a communication system can emerge to solve cooperative tasks in sequential environments. However, they are still relatively disconnected from the earlier theoretical and computational literature aiming at understanding how language might have emerged from a prelinguistic substance [10], [11]. We need to conceive communication as the emergent result of a collective behavior optimization process and to ground the resulting computational models into the theoretical literature from language evolution research. The SMILES workshop will aim at discussing each of the questions mentioned above, together with original approaches on how to integrate them.


[1] J. Devlin et al., 2018.
[2] H. Brouwer and J. C. J. Hoeks, 2013.
[3] M. Garagnani et al. 2008.
[4] M. J. Pickering and S. Garrod, 2013.
[5] J.-L. Schwartz et al., 2012.
[6] F. Pulvermüller and L. Fadiga, 2010.
[7] T. Taniguchi et al., 2016.
[8] S. Sukhbaatar et al.,2016.
[9] I. Mordatch and P. Abbeel, 2017.
[10] M. Tomasello et al., 2005.
[11] P.-Y. Oudeyer and L. Smith, 2015.

Xavier Hinaut
Inria Researcher (CR)
Mnemosyne team, Inria
LaBRI, Université de Bordeaux
Institut des Maladies Neurodégénératives
+33 5 33 51 48 01
www.xavierhinaut.com