Jeroen Geertzen

I am a postdoctoral scientist, working at the intersection of natural language processing (NLP), computational linguistics (CL), and machine learning (ML).

My research involves the computational modelling of linguistic phenomena, to gain better understanding of how we use language and to design computer systems that interpret written and spoken language.

I work at Elsevier, where I am leading an NLP group focussed on text mining, and I am affiliated to the Language Technology Lab. I developped NLP and ML driven product features for second-language learning platforms at EF Education First, and was research asssociate at the University of Cambridge (DTAL). Previously, I worked at the Centre for Speech, Language and the Brain, and at the Center for Language Technology (Macquarie University). My Ph.D. was carried out at the Tilburg center for Cognition and Communication.

Topics & projects

  • Concept mining in academic text, content-based recommendation systems
  • Data-driven approaches to second language learning (e.g. [2] and [4], and the EFCAMDAT corpus)
  • Computational modelling of meaning activation in speech processing (e.g. [3])
  • Unsupervised machine-learning of structure from symbolic sequential data (e.g. [5] and [7])
  • Spoken dialogue modelling (e.g. [5] and [6])

Selected publications

  1. Geertzen, J., Blevins, J.P. & Milin, P. (2016). The Informativeness of Linguistic Unit Boundaries. Italian Journal of Linguistics, 28(2):1-24
  2. Alexopoulou, T., Geertzen, J., Korhonen, A. & Meurers, D. (2015). Exploring Big Educational Learner Corpora for SLA Research: Perspectives on Relative Clauses. International Journal of Learner Corpus Research, 1(1):96-129
  3. Devereux, B.J., Taylor, K.I., Randall, B., Geertzen, J. & Tyler, L.K. (2015). Feature Statistics Modulate the Activation of Meaning During Spoken Word Processing. Cognitive Science
  4. Geertzen, J., Alexopoulou, T. & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In: Selected Proc. of the 2012 Second Language Research Forum, pages 240-254
  5. Geertzen, J. (2009). Dialogue Act Prediction Using Stochastic Context-Free Grammar Induction. In: Proc. of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference, pages 7-15, Association for Computational Linguistics
  6. Geertzen, J., Petukhova, V. & Bunt, H. (2007). A Multidimensional Approach to Utterance Segmentation and Dialogue Act Classification. In: Proc. of the 8th SIGdial Workshop on Discourse and Dialogue, pages 140-149
  7. Geertzen, J. & van Zaanen, M. (2004). Grammatical Inference using Suffix Trees. In: Proc. of the 7th International Colloquium on Grammatical Inference (ICGI), 3264:163-174, Springer-Verlag

Selected software & resources