10.17910/B7.1062
Databrary
Volume
Databrary Access Agreement
An Isolated-Signing RGBD Dataset of 100 American Sign Language Signs Produced by Fluent ASL Signers
Huenerfauth, Matt
Rochester Institute of Technology
2020
The ASL-100-RBGD dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology and the City University of New York.
Access: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume.
We have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences of 100 ASL signs. This directed dataset had originally been collected as part of an ongoing collaborative project, to aid in the development of a sign-recognition system for identifying occurrences of these 100 signs in video. The set of words consist of vocabulary items that would commonly be learned in a first-year ASL course offered at a university, although the specific set of signs selected for inclusion in the dataset had been motivated by project-related factors. Given increasing interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data.
Organization of the Dataset: The dataset is organized into sub-folders, with codenames such as "F19" or "F21" etc. These codenames refer to specific human signers who were recorded in this dataset. In some cases, a human signer may have been recorded up to three times, and in that case, there will be multiple copies of the resulting recording files within each subfolder.
Task: During the recording session, the participant was met by a member of our research team who was a native ASL signer. No other individuals were present during the data collection session. The participant was presented with a sequence of videos of a native ASL signer performing each of the desired 100 signs. Participants were asked to perform a sequence of the 100 individual ASL signs, without lowering their hands between signs. Signers were encouraged to hold their hands in a comfortable neutral position in the signing space in-between each of the signs. Time permitting, we collected two to three videos per signer, with each video containing up to one production of each of the 100 ASL signs. This process yielded a total collection of 42 video files, each containing about 100 signs and approximately 4150 tokens in total.
Demographics: All 22 of our participants were fluent ASL signers. As screening, we asked our participants: Did you use ASL at home growing up, or did you attend a school as a very young child where you used ASL? All the participants responded affirmatively to this question. A total of 22 DHH participants were recruited on the Rochester Institute of Technology campus. Participants included 15 men and 7 women, aged 20 to 51 (median = 23). Fifteen of our participants reported that they began using ASL when they were seven years old or younger. The remaining of the participants reported that they had been using ASL for at least 6 years and that they regularly used ASL at work or school.
Filetypes:
*.eaf: The videos were annotated using ELAN, using the gloss labels listed below, to indicate the start-time and stop-time of each token. At times, participants in our video recordings accidentally omitted a sign that had been requested, and at other times participants intentionally did not produce one of the requested signs. Participants in our video collection session were encouraged to produce a sign only if it is a sign that they would produce themselves; if they did not use a particular sign, e.g. due to some regional/dialectical variation, they were instructed not to skip that sign. At other times in our videos, the participant accidentally performed a different sign than the specific form requested (as shown in the stimulus video). For this reason, our team needed to watch the resulting videos carefully to ensure that the signs included in the video were the specific 100 signs that had been requested. In the case of sign productions that differed from the designed token, e.g. with the signer using a different handshape or other variation, the sign was not annotated.
*.avi, *_dep.bin: The ASL-100-RGBD dataset has been captured by using a Kinect 2.0 RGBD camera. The output of this camera system includes multiple channels which include RGB, depth, skeleton joints (25 joints for every video frame), and HD face (1,347 points). The video resolution produced in 1920 x 1080 pixels for the RGB channel and 512 x 424 pixels for the depth channels respectively. Due to limitations in the acceptable filetypes for sharing on Databrary, it was not permitted to share binary *_dep.bin files directly produced by the Kinect v2 camera system on the Databrary platform. If your research requires the original binary *_dep.bin files, then please contact Matt Huenerfauth.
*_face.txt, *_HDface.txt, *_skl.txt: To make it easier for future researchers to make use of this dataset, we have also performed some post-processing of the Kinect data. To extract the skeleton coordinates of the RGB videos, we used the Openpose system, which is capable of detecting body, hand, facial, and foot keypoints of multiple people on single images in real time. The output of Openpose includes estimation of 70 keypoints for the face including eyes, eyebrows, nose, mouth and face contour. The software also estimates 21 keypoints for each of the hands (Simon et al, 2017), including 3 keypoints for each finger, as shown in Figure 2. Additionally, there are 25 keypoints estimated for the body pose (and feet) (Cao et al, 2017; Wei et al, 2016).
Reporting Bugs or Errors:
Please contact Matt Huenerfauth to report any bugs or errors that you identify in the corpus. We appreciate your help in improving the quality of the corpus over time by identifying any errors.
List of Glosses:
ALWAYS
CAN'T_CANNOT
DODO1
DODO2
DON'T_CARE
DON'T_KNOW
DON'T_LIKE
DON'T_MIND
DON'T_WANT
EIGHT_O_CLOCK1
EIGHT_O_CLOCK2
ELEVEN_O_CLOCK
EVERY_AFTERNOON
EVERY_DAY
EVERY_FRIDAY
EVERY_MONDAY
EVERY_MORNING
EVERY_NIGHT
EVERY_SATURDAY
EVERY_SUNDAY
EVERY_THURSDAY
EVERY_TUESDAY
EVERY_WEDNESDAY
FIVE_O_CLOCK1
FIVE_O_CLOCK2
FOR_FOR
FOUR_O_CLOCK1
FOUR_O_CLOCK2
FRIDAY
HOW1
HOW2
I_ME
IF_SUPPOSE
IX_HE_SHE_IT
IX_THEY_THEM
LAST_WEEK
LAST_YEAR
MIDNIGHT1
MONDAY
MONTH
MORNING
NEVER
NEXT_WEEK1
NEXT_WEEK2
NEXT_YEAR
NIGHT
NINE_O_CLOCK1
NINE_O_CLOCK2
NO
NO_ONE
NONE
NOON1
NOT
NOW
ONE_O_CLOCK1
ONE_O_CLOCK2
PAST_PREVIOUS
QMWG
QUESTION
RECENT
SATURDAY
SEVEN_O_CLOCK1
SEVEN_O_CLOCK2
SINCE_UP_TO_NOW
SIX_O_CLOCK1
SIX_O_CLOCK2
SOMETIMES
SOON1
SOON2
SUNDAY
TEN_O_CLOCK
THREE_O_CLOCK1
THREE_O_CLOCK2
THURSDAY
THURSDAY2
TIME
TODAY
TOMORROW
TONIGHT
TUESDAY
TWELVE_O_CLOCK
TWO_O_CLOCK1
TWO_O_CLOCK2
WAVE_NO
WEDNESDAY
WEEK
WHAT1
WHAT2
WHEN1
WHEN2
WHERE
WHICH
WHO1
WHO2
WHO3
WHY1
WHY2
WILL_FUTURE
YESTERDAY
YOU
National Science Foundation (NSF)
10.13039/100000001
american sign language
dataset
rgbd video