Comparing contextual embeddings for semantic textual similarity in Portuguese

Andrade Junior, José E.; Cardoso-Silva, Jonathan

; and Bezerra, Leonardo C.T. (2021) Comparing contextual embeddings for semantic textual similarity in Portuguese. In: Intelligent Systems - 10th Brazilian Conference, BRACIS 2021, Proceedings, Part 2. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Springer Science and Business Media Deutschland GmbH, pp. 389-404. ISBN 9783030916985

Copy

Semantic textual similarity (STS) measures how semantically similar two sentences are. In the context of the Portuguese language, STS literature is still incipient but includes important initiatives like the ASSIN and ASSIN 2 shared tasks. The state-of-the-art for those datasets is a contextual embedding produced by a Portuguese pre-trained and fine-tuned BERT model. In this work, we investigate the application of Sentence-BERT (SBERT) contextual embeddings to these datasets. Compared to BERT, SBERT is a more computationally efficient approach, enabling its application to scalable unsupervised learning problems. Given the absence of SBERT models pre-trained in Portuguese and the computational cost for such training, we adopt multilingual models and also fine-tune them for Portuguese. Results showed that SBERT embeddings were competitive especially after fine-tuning, numerically surpassing the results of BERT on ASSIN 2 and the results observed during the shared tasks for all datasets considered.

Item Type	Chapter
Keywords	deep learning,natural language processing,semantic textual similarity,word embeddings
Departments	LSE Methodology
DOI	10.1007/978-3-030-91699-2_27
Date Deposited	22 Feb 2022 10:30
URI	https://researchonline.lse.ac.uk/id/eprint/113795

Explore Further

https://www.lse.ac.uk/DSI/People/Jonathan-Cardoso-Silva (Author)
10.1007/978-3-030-91699-2_27 (DOI)

picture_as_pdf

picture_as_pdf
subject: Accepted Version

Download

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads