ABL : Alignment-Based Learning
Provide at least two sentences that have words in common, one sentence each line, and see how the alignment of substitutable parts aids the learning of structure.
Alignment-Based Learning (ABL), introduced in , is a symbolic grammar inference framework that has successfully been applied for several unsupervised machine learning tasks in Natural Language Processing (NLP). Given sequences of symbols only, a system that implements ABL induces structure by aligning and comparing the input sequences. As a result, the input sequences are augmented with the induced structure.
The demo on this page allows you to apply ABL on unstructured text.
A C++ implementation was developed by Menno van Zaanen and me, partially supported by a grant from Macquarie University. The package is maintained by Menno (see his page for stable release downloads), and I keep a development version on Github.
The first phase in ABL, alignment learning, builds a search space of possible structures, called hypotheses, by comparing the sequences and clustering sub-sequences that share similar context. E.g., when we have the following input sentences:
she is smart
she is tired
(0 she is (1 smart )1 )0
she runs from CBD to Bondi
he drives from CBD to Bondi
he walks terribly slowly
In the second phase, selection learning, the most probable combination of hypothesized structures are selected in an Expectation-Maximalization search and overlapping constituents are eliminated according to a heuristic. The final induced structure thus becomes:
If desired, the induced structure could be transformed to a context-free grammar (CFG):
0 -> 1 from CBD to Bondi
0 -> he 2
1 -> she runs
1 -> he drives
2 -> walks terribly slowly
- 2001). Bootstrapping Structure into Language: Alignment-Based Learning. , PhD Thesis, School of Computing, University of Leeds, UK. (
- 2004). Grammatical Inference using Suffix Trees. In: Proc. of the 7th International Colloquium on Grammatical Inference (ICGI), 3264:163-174, Springer-Verlag (