Massive Conversational Data Ecosystem

A massive data set of English conversation data and actual online English conversations is being built to create scenarios designed to allow for balanced observation of the student’s ability to speak English.

In general, one of the most difficult things about AI and machine learning is the grand design of the data cycle ecosystem. Recently, the essential issue in the field of conversational AI research has been the absolute dearth of adequate speech corpora. The Tutorial English program at Waseda University conducted more than 60,000 hours of group English conversation within the 2018 fiscal year alone. There is no other research case in the world that has recorded and annotated this type of structure and purpose-driven multimodal conversation data (generally, the total recording time is tens to hundreds of hours long. If such large-scale conversation data could be properly recorded, it would be an epoch-making achievement in the field.

Target Dataset to Collect

The underlying data set for this theme is divided into two categories.

Data Sets for English Speaking Proficiency Assessment: Three English Conversation Task Scenarios will be set up to observe communication capabilities that cannot be measured by existing tests. Based on the distribution of the English proficiency of the actual Tutorial English student population, sample data from approximately 200 students will be prepared first (with additional recordings if necessary).
Online English Conversation Data Set: Recorded English conversation data from Tutorial English Online, which began in May 2020, is recorded using the video recording function of the online teaching platform.

Data Collection Methods

Data set for English speaking ability assessment

In the current study, the requirements for the English speaking proficiency data set are summarized as follows:

A natural conversation between the two parties.
A scenario that presents the necessary opportunities which allow for assessment at speaking proficiency level 6.
The complexity of the conversation is kept to a level that can be repeated, with the presumption that one speaker will be replaced by an AI agent in the future.

Definition of the English speaking task

Based on the above requirements, the current study has set up several English conversation tasks to observe six different types of speaking abilities which cannot be measured by existing communication tests. Each task is categorized according to the nature of the conversation as defined in the CEFR.

Tutorial English Online Course Structure

The Tutorial English Online is constructed from two to four students and one tutor, classified by prior reading, and listening tests, and is conducted via the Moodle online learning platform described above. Each course consists of a total of 10, 90-minute lessons delivered twice weekly. Each class usually incorporates two learning objectives based on the CEFR. The class begins with a simple ice breaker activity. The tutor explains the English expression, and the learning task, and students pair up with each other to reflect on the entire lesson. The tutor observes the students’ comments and actions during pairwork and provides feedback accordingly. Tutors are trained based on guiding principles such as “follow the 80:20 principal of speaking opportunities,” “allow students to take the lead,” “manage time to ensure proper performance assessment,” “provide explanations tailored to students’ level,” and “prioritize and point out errors in an appropriate way.” This policy also serves as a design guide for Theme 3, “Smart Online English Conversation Learning Experience Environment.”

Online Tutorial English Course

Recording of online English speaking data

This online data collection is handled by the Language Education Department of Waseda Academic Solutions Co., Ltd. (the subcontractor), which conducts the Tutorial English program. After obtaining recording permission from each English conversation class tutor participating with students, the tutors perform the recording themselves, aggregating the data on a server managed by the Institute of Perceptual Information Systems, Waseda University. Permission to gather data was obtained from the Waseda University Ethics Review for Research on people.