Session Handle:
http://hdl.handle.net/2196/48069396-15c6-4b31-ad54-95496ba80df3
Title:
Lista 001 tonos completos 2010-12-12-b
Description:
This file (and derivative files, e.g. mono conversions of stereo elicitations and edited mono files[see below]) was given on a hard disk to the Endangered Language Archive (ELAR) via David Nathan on 9 Jan. 2011 at the Pittsburgh LSA Conference. Along with other material on the disk, it was not accessioned and on 4 April 2012 a second hard disk was given to Tom Castle with these and other files. Word list 001 is archived in three related files: Archived-elicitation-list-001_261-word-tokens-for-all-tonal-patterns_261-words with a .doc (Word document), .xls (Excel spreadsheet), and .pdf/A (portable document format) extension. The lists presented on these files were of the words pronounced in the 20 sound files for segmentation. Ten speakers were asked to repeat 261 words in two sessions. Thus there should be 20 wav files. There are actually 21 as the first recording was redone and never segmented for PRAAT analysis: Yolox_Elict_CTB501_Lista-001-tonos-completos_2010-12-08-a.wav. The other 20 files (2 sessions x 10 speakers) were all segmented (see below). The two speakers were: Constantino Constantino Teodoro Bautista Teodoro Celso, Esteban Castillo García, Esteban Guadalupe Sierra, Estela Santiago Castillo, Guillermina Nazario Sotero, Rey Castillo García, Soledad García Bautista, Victorino Ramos Rómulo, Zoila Guadalupe Sierra. Each speaker was asked to repeat the 261 words 3 times in each session (x 2 sessions = 6 tokens). The targeted speaker was miked for one channel (usually left) and Rey Castillo García was miked on the other channel (usually right). Rey would try to elicit without pronouncing the target word, but this wasn't always possible. Rey would listen and, if the speaker uttered a tonal sequence that was not the targeted pattern, Rey would re-elicit. Thus there were sometimes 4 or 5 tokens. ANALYSIS AND SEGMENTATION: The first process was to isolate the channel of the targeted speaker. Thus from the file Yolox_Elict_CTB501_Lista-001-tonos-completos_2010-12-08-c.wav, the left channel was isolated as a mono file and so named: Yolox_Elict_CTB501_Lista-001-tonos-completos_2010-12-08-c_mono.wav. Then Rey Castillo removed, cut out from Yolox_Elict_CTB501_Lista-001-tonos-completos_2010-12-08-c_mono all the sounds that were not the targeted words. This left a clean sound file of pure tokens, an average of 3 per word per session (3 x 261 = 783 tokens). At this point William Poser segmented each token in an automated process. Rey Castillo had previously given a list of the number of repetitions for each token (e.g., 001,3; 002,3; 003, 4; 004,2 ...). Poser then segmented into tokens for all 20 sessions and then recombined into a single file (e.g., 001x6_CBT501.wav). Leandro DiDomenico, a graduate student in France, was then hired to segment the phonemes in PRAAT of the first and second utterances in each session. Generally these were the first, second, fourth, and fifth tokens of the six-token sequence. Much later, while on a postdoc at Haskins laboratories, Christian DiCanio went over and corrected each TextGrid (e.g. Yolox_Elict_List-01_0001x6_CTB501.TextGrid associated with Yolox_Elict_List-01_0001x6_CTB501.wav). These four-tokens-revised TextGrids will soon be superceded by a complete six-token TextGrid, which is the TextGrid that will be archived at ELAR and AILLA. The total number of hand-segmented tokens, therefore, is 10 speakers x 6 repetitions x 261 words = 15660 individual tokens). Finally, as part of the NSF project led by Doug Whalen, two automated segmenters were evaluated for accuracy against the hand-segmented tier. A short article whose principal author is Christian DiCanio was written about the results of this test: " Assessing agreement level between forced alignment models with data from endangered language documentation corpora.; duration: 070:13, recording device: Marantz PMD 670, microphone:
Date created:
2010-12-12