dott. MARCO BARONI


dott. marco baroni    Ricercatore
settore: L-LIN/01
membro: consiglio


contatti
tel.: 0543-374744
fax: 0543-374717
mail: baroni@sslmit.unibo.it
studio: D14

Note

General:
E-mail Address: baroni AT sslmit unibo it
Homepage: http://sslmit.unibo.it/~baroni
Address: SSLMIT / Corso della Repubblica 136 / 47100 Forlì (FC) / Italy

Languages:
Italian (native)
English (extremely fluent)
German (I manage to communicate)
I am currently studying Japanese
I've studied and nearly forgotten bits and pieces of several other languages (Latin, Ancient Greek, Romanian, Korean, Hausa)

Programming and Scripting:
perl
C/C++
(g)awk and other UNIX text processing tools
some UNIX shell scripting
XML

Areas of Interest:
Corpus-based computational morphology
Automated, unsupervised acquisition of various aspects of morphology;
Computational/quantitative approaches to the study of morphological productivity, morphological compositionality and morphology in general;
Corpus-based lexicography and terminology
Automated extraction of semantic and lexical information from raw corpora;
Automated construction of semantic/syntactic/morphological word classes;
Automated extraction of terms and collocations;
Web-mining for linguistic data extraction;
Corpus annotation;
How to combine different forms of evidence to improve the performance of unsupervised/knowledge-poor natural language learning algorithms;
Fast development of corpora, lexica and other language resources using knowledge-poor techniques.

Education:
Ph.D. in Linguistics, University of California, Los Angeles, June 2000
Dissertation Title: Distributional cues in morpheme discovery: A computational model and empirical evidence
Dissertation Committee: Bruce Hayes (chair), Carson Schütze, Edward Stabler, Donca Steriade, Jody Kreiman
M.A. in Linguistics, University of California, Los Angeles. December 1997
Thesis Title: The representation of prefixed forms in the Italian lexicon: Evidence from the distribution of intervocalic [s] and [z]
Thesis Committee: Bruce Hayes (chair), Sun-Ah Jun, Carson Schütze, Donca Steriade
Laurea in Linguistica ("110 e lode"), University of Padua, Italy, April 1995
Thesis Title: La relazione tra struttura segmentale e costituenza moraica [The relation between segmental structure and moraic constituency]
Thesis Co-Chairs: Alberto Mioni and Laura Vanelli

Work Experience:
October 2002 - present
Researcher (tenured position)
Dipartimento di Studi Interdisciplinari su Traduzione, Lingue e Cultura (SITLEC)
Scuola Superiore di Lingue Moderne per Interpreti e Traduttori (SSLMIT)
Università di Bologna (Sede di Forlì), Italy
SITLEC website: http://www.disitlec.unibo.it
SSLMIT website: http://www.ssit.unibo.it

September 2001 - August 2002
Researcher (position funded by EU R&D project FASTY)
Natural Language Processing Group
Austrian Research Institute for Artificial Intelligence (ÖFAI)
Vienna, Austria
ÖFAI NLP group website: http://www.ai.univie.ac.at/oefai/nlu/
FASTY project website: http://www.fortec.tuwien.ac.at/reha.e/projects/fasty/fasty.html

July 2000 - August 2001
Computational Linguist
Language Development Team / Core Technologies Team
Conversay
Redmond WA, USA
Conversay website: http://www.conversay.com/

January - December 1999
Research Assistant to Prof. Pat Keating (position funded by NSF project KDI)
Phonetics Laboratory
Department of Linguistics
University of California, Los Angeles
Los Angeles CA, USA
KDI project website: http://www.hei.org/research/projects/comneur/kdipage.htm

Summer 1998
Summer Research Intern
Spoken Language Processes Laboratory
House Ear Institute
Los Angeles CA, USA

Teaching:
Winter 2004
Computational Linguistics
SSLMIT, Università di Bologna
Fall 2002 – Present
Phonetics/Phonology/Morphology modules of General Linguistics class
SSLMIT, Università di Bologna
Fall 1996 - Fall 1998
Teaching Assistant for the classes Introduction to Linguistics, Experimental Phonetics and Introduction to General Phonetics
Department of Linguistics, University of California, Los Angeles

Other Activities:
Co-organized and co-taught intensive mini-course A Practical Introduction to Corpus Work, Bertinoro University Center, October 2003
Co-coordinated the CORAL (CORpora e Apprendimento Linguistico) e-learning project http://www.e-learning.sslmit.unibo.it/COR/
Helped organizing the Interdepartmental Workshop on Science and Common Sense, University of Padua, May 1995
Reviewer for: Journal of the International Phonetic Association (2002, 2003), Phonetica (2001), Journal of the Acoustical Society of America (2000), Journal of Phonetics (1998)

Honors:
Chancellor Fellowship, University of California, Los Angeles, 1995-2000
Summer School Fellowship, San Marino Center for Semiotic and Cognitive Studies, 1995
Education Abroad Program Fellowship, University of California, Los Angeles, 1993-1994
Summer School Fellowship, University of Bucharest, Romania, 1993

Pubblicazioni
 
VOLUMI E CURE
·  2003: Metodi non-supervisionati per la scoperta di morfemi e relazioni morfologiche [Unsupervised methods for the discovery of morphemes and morphological relations.], ().
·  2004: Introducing the La Repubblica corpus: A large, annotated, TEI(XML)-compliant corpus of newspaper Italian, (), coautore con S. Bernardini, F. Comastri, L. Piccioni, A. Volpi, G. Aston, M. Mazzoleni.
·  2004: Using cooccurrence statistics and the web to discover synonyms in a technical language, (), coautore con S. Bisi.
 
ARTICOLI E SAGGI
·  1993: Teorie della sottospecificazione e restrizioni sulle code consonantiche in italiano [Underspecification theories and consonantal coda constraints in Italian], (Rivista di Grammatica Generativa 18, pp. 3-59).
·  1994: Moraic structure and vowel length in Galeatese, (Romance Linguistics and Literature Review 7, pp. 24-52).
·  1995: Iambic senarii, (Quaderni Patavini di Linguistica 14, pp. 13-38).
·  1996: The natural classes of Lughese vowels and why they are natural, (UCLA Working Papers in Phonology 1, pp. 1-17).
·  1998: The phonetic nature of the Northern Italian allophones [s] and [z] in words with variable realization: Electroglottographic and acoustic evidence, (UCLA Working Papers in Phonetics 96, pp. 166-174).
·  1999: Il contrasto di lunghezza vocalica in friulano [The vowel length contrast in Friulian], coautore con L. Vanelli, Roma, Bulzoni, (Fonologia e morfologia dell'italiano e dei dialetti d'Italia: atti del 31o Congresso della Societa' di Linguistica Italiana (Padova, 25-27 settembre 1997) a cura di Paola Beninca, Alberto Mioni, Laura Vanelli, pp. 291-317).
·  2000: The relationship between vowel length and consonantal voicing in Friulian, coautore con L. Vanelli, Amsterdam, John Benjamins, (Lori Repetti (ed.), Phonological theory and the dialects of Italy, pp. 13-44).
·  2001: How do languages get crazy constraints? Phonetically-based phonology and the evolution of the Galeata Romagnolo vowel system, (UCLA Working Papers in Phonology 5, pp. 152-178).
·  2001: The representation of prefixed forms in the Italian lexicon: Evidence from the distribution of intervocalic [s] and [z] in northern Italian, (Yearbook of Morphology 1999, pp. 121-152).
·  2002: FASTY: A multilingual approach to text prediction, (Elsnews 11.2, pp. 11-12).
·  2002: FASTY: A multi-lingual approach to text prediction, coautore con J. Matiasek and H. Trost, Berlin, Springer-Verlag, (Proceedings of the 8th International Conference on Computers Helping People with Special Needs (Linz, Luglio 2002) a cura di K. Miesenberger et al., pp. X-X).
·  2003: A preliminary analysis of collocational differences in monolingual comparable corpora, coautore con S. Bernardini, Lancaster, UCREL, (Corpus Linguistics 2003 (Lancaster University, 28-31 marzo 2003) a cura di D. Archer, P. Rayson, A. Wilson e T. McEnery, pp. 82-91).
·  2003: Distribution-driven morpheme discovery: A computational/experimental study, (Yearbook of Morphology, pp. 213-248).
 
INTERVENTI A CONVEGNI
·  1993: Teorie della sottospecificazione e restrizioni sulle code consonantiche in italiano [Underspecification theories and consonantal coda constraints in Italian], ().
·  1996: An acoustic study of Italian unstressed mid-vowels, ().
·  1997: Il contrasto di lunghezza vocalica in friulano [The vowel length contrast in Friulian], (), coautore con L. Vanelli.
·  2000: Articulation of word and sentence stress, (), coautore con P. Keating, T. Cho, S. Mattys, L. Bernstein, B. Chaney, A. Alwan.
·  2000: Using distributional information to discover morphemes: A distribution-driven prefix learner, ().
·  2000: Using distributional information to discover morphemes: An automated distribution-driven prefix learner, ().
·  2002: Using textual association measures and minimum edit distance to discover morphological relations, (), coautore con J. Matiasek and H. Trost, (International Workshop on Computational Approaches to Collocations (Vienna, Luglio 2002) a cura di Brigitte Krenn).
·  2003: Assessing morphological productivity via automated measures of semantic transparency, (), coautore con S. Vegnaduzzo, (Workshop "Explaining Productivity" at the 25th Annual Meeting of the German Society for Linguistics (DGfS) (Muenchen, Feb 26 - 28) a cura di Peter Bosch).
·  2003: Estrazione non supervisionata di informazioni morfologiche da corpora non annotati [Unsupervised extraction of morphological information from unannotated corpora], ().
·  2004: I toponimi stranieri nella stampa quotidiana italiana: Una ricerca sul corpus de La Repubblica [Foreign toponyms in Italian daily press: A research on the La Repubblica corpus], (), coautore con M. Mazzoleni.
 
ATTI DI CONVEGNI
·  2002: Predicting the components of German nominal compounds, coautore con J. Matiasek and H. Trost, Amsterdam, IOS Press, (Proceedings of the 15th European Conference on Artificial Intelligence (ECAI 2002) (Lione, Giugno 2002) a cura di F. van Harmelen, pp. 470-474).
·  2002: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity, coautore con J. Matiasek and H. Trost, Philadelphia (PA), USA, ACL / University of Pennsylvania, (Proceedings of the Workshop on Morpological and Phonological Learning of ACL/SIGPHON 2002 (Philadelphia (PA), USA, Luglio 2002) a cura di M. Maxwell, pp. 48-57).
·  2002: Wordform- and class-based prediction of the components of German nominal compounds in an AAC system, coautore con J. Matiasek and H. Trost, Taiwan, ACL, Taipei, (COLING 2002, Proceedings of the 19th International Conference on Computational Linguistics (Taipei (Taiwan), Agosto 2002) a cura di S.-C. Tseng, pp. 57-63).
·  2003: Exploiting long distance collocational relations in predictive typing, coautore con J. Matiasek, Budapest, ACL, (Proceedings of the EACL Workshop on Language Modeling for Text Entry Methods (Budapest, April 14 2003) a cura di NA, pp. 1-8).
·  2003: Optical phonetics and visual perception of lexical and phrasal stress in English, coautore con P. Keating, S. Mattys, R. Scarborough, A. Alwan, E. Auer, L. Bernstein, Barcelona, ICPhS, (Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS) (Barcelona, August 2003) a cura di NA, pp. 2071-2074).
·  2004: Identifying subjective adjectives through web-based mutual information, (), coautore con S. Vegnaduzzo.
·  2004: Retrieving Japanese specialized terms and corpora from the World Wide Web, (), coautore con M. Ueyama.