An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
|Titolo:||Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology|
|Data di pubblicazione:||2014|
|Appare nelle tipologie:||04.01 Contributo in atti di convegno|