Hide metadata

dc.date.accessioned2016-06-10T12:17:42Z
dc.date.available2016-06-10T12:17:42Z
dc.date.created2016-06-02T14:25:35Z
dc.date.issued2016
dc.identifier.citationLison, Pierre Tiedemann, Jörg . OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). 2016, 923-929 European Language Resources Association
dc.identifier.urihttp://hdl.handle.net/10852/50459
dc.description.abstractWe present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR errors and the use of meta-data to estimate the quality of each subtitle and score subtitle pairs.en_US
dc.languageEN
dc.language.isoenen_US
dc.publisherEuropean Language Resources Association
dc.rightsAttribution-NonCommercial 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.titleOpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitlesen_US
dc.typeChapteren_US
dc.creator.authorLison, Pierre
dc.creator.authorTiedemann, Jörg
cristin.unitcode185,15,5,56
cristin.unitnameForskningsgruppen for språkteknologi
cristin.ispublishedtrue
cristin.fulltextpreprint
dc.identifier.cristin1359276
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.btitle=Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)&rft.spage=923&rft.date=2016
dc.identifier.startpage923
dc.identifier.endpage929
dc.identifier.pagecount6000
dc.identifier.urnURN:NBN:no-54046
dc.type.documentBokkapittelen_US
dc.type.peerreviewedPeer reviewed
dc.source.isbn978-2-9517408-9-1
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/50459/4/947_Paper.pdf
dc.type.versionPublishedVersion
cristin.btitleProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)


Files in this item

Appears in the following Collection

Hide metadata

Attribution-NonCommercial 4.0 International
This item's license is: Attribution-NonCommercial 4.0 International