The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian
Castagnoli, Sara;
2020-01-01
Abstract
The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Masini_Multiword-expressions_2020.pdf
accesso aperto
Descrizione: articolo completo
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
360.5 kB
Formato
Adobe PDF
|
360.5 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.