dc.date.accessioned | 2016-06-10T12:17:42Z | |
dc.date.available | 2016-06-10T12:17:42Z | |
dc.date.created | 2016-06-02T14:25:35Z | |
dc.date.issued | 2016 | |
dc.identifier.citation | Lison, Pierre Tiedemann, Jörg . OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). 2016, 923-929 European Language Resources Association | |
dc.identifier.uri | http://hdl.handle.net/10852/50459 | |
dc.description.abstract | We present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR errors and the use of meta-data to estimate the quality of each subtitle and score subtitle pairs. | en_US |
dc.language | EN | |
dc.language.iso | en | en_US |
dc.publisher | European Language Resources Association | |
dc.rights | Attribution-NonCommercial 4.0 International | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
dc.title | OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles | en_US |
dc.type | Chapter | en_US |
dc.creator.author | Lison, Pierre | |
dc.creator.author | Tiedemann, Jörg | |
cristin.unitcode | 185,15,5,56 | |
cristin.unitname | Forskningsgruppen for språkteknologi | |
cristin.ispublished | true | |
cristin.fulltext | preprint | |
dc.identifier.cristin | 1359276 | |
dc.identifier.bibliographiccitation | info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.btitle=Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)&rft.spage=923&rft.date=2016 | |
dc.identifier.startpage | 923 | |
dc.identifier.endpage | 929 | |
dc.identifier.pagecount | 6000 | |
dc.identifier.urn | URN:NBN:no-54046 | |
dc.type.document | Bokkapittel | en_US |
dc.type.peerreviewed | Peer reviewed | |
dc.source.isbn | 978-2-9517408-9-1 | |
dc.identifier.fulltext | Fulltext https://www.duo.uio.no/bitstream/handle/10852/50459/4/947_Paper.pdf | |
dc.type.version | PublishedVersion | |
cristin.btitle | Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) | |