Hide metadata

dc.date.accessioned2022-01-28T18:48:10Z
dc.date.available2022-01-28T18:48:10Z
dc.date.created2021-10-06T14:37:28Z
dc.date.issued2021
dc.identifier.citationPõldvere, Nele Frid, Johan Johansson, Victoria Paradis, Carita . Challenges of releasing audio material for spoken data: The case of the London–Lund Corpus 2. Research in Corpus Linguistics. 2021, 9(1), 35-62
dc.identifier.urihttp://hdl.handle.net/10852/90265
dc.description.abstractThis article aims to describe key challenges of preparing and releasing audio material for spoken data and to propose solutions to these challenges. We draw on our experience of compiling the new London-Lund Corpus 2 (LLC-2), where transcripts are released together with the audio files. However, making the audio material publicly available required careful consideration of how to, most effectively, 1) align the transcripts with the audio and 2) anonymise personal information in the recordings. First, audio-to-text alignment was solved through the insertion of timestamps in front of speaker turns in the transcription stage, which, as we show in the article, may later be used as a valuable complement to more robust automatic segmentation. Second, anonymisation was done by means of a Praat script, which replaced all personal information with a sound that made the lexical information incomprehensible but retained the prosodic characteristics. The public release of the LLC-2 audio material is a valuable feature of the corpus that allows users to extend the corpus data relative to their own research interests and, thus, broaden the scope of corpus linguistics. To illustrate this, we present three studies that have successfully used the LLC-2 audio material.
dc.languageEN
dc.publisherAsociación Española de Lingüística de Corpus (AELINCO)
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleChallenges of releasing audio material for spoken data: The case of the London–Lund Corpus 2
dc.typeJournal article
dc.creator.authorPõldvere, Nele
dc.creator.authorFrid, Johan
dc.creator.authorJohansson, Victoria
dc.creator.authorParadis, Carita
cristin.unitcode185,14,34,70
cristin.unitnameRussland, Sentral-Europa og Balkan
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin1943839
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Research in Corpus Linguistics&rft.volume=9&rft.spage=35&rft.date=2021
dc.identifier.jtitleResearch in Corpus Linguistics
dc.identifier.volume9
dc.identifier.issue1
dc.identifier.startpage35
dc.identifier.endpage62
dc.identifier.doihttps://doi.org/10.32714/ricl.09.01.04
dc.identifier.urnURN:NBN:no-92860
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn2243-4712
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/90265/1/Challenges%2Bof%2Breleasing%2Baudio%2Bmaterial%2Bfor%2Bspoken%2Bdata.pdf
dc.type.versionPublishedVersion


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International