Trial version of Taiwan Hakka Corpus available online- Hakka Culture Development Center

:::

News

Trial version of Taiwan Hakka Corpus available online

Source:客家文化發展中心
Publication Date:2022/03/11
Last updated:2022/04/06
Count Views:2150

The Taiwan Hakka Corpus Website

The Taiwan Hakka Corpus, which was launched by the HAC in 2017, has collected more than 6-million-word Hakka language material, in written or spoken form. The trial version of the Taiwan Hakka Corpus is now available online.

HAC commissioned National Chengchi University (NCCU) to construct the Taiwan Hakka Corpus, which uses written texts and spoken contents in the country as language material sources, including publications, TV programs, recordings from field research, interviews, speeches, daily talks, and oral histories told by the elders. These resources of the corpus have to be transliterated and revised by the native Hakka speakers and professional linguists through several complicated procedures.

The Hakka corpus is the product of the interdisciplinary cooperation. Its construction process is time-consuming, during which the work must rely on experts in the field of linguistics, computer science, and communication. These professionals led the team to collect the language material, process data, and establish the whole system. Therefore, the target language can be transformed into information that connects other languages and can be utilized by the general public.

The salient feature of the Hakka language database is that it has speech recognition of different Hakka accents and speech synthesis, which can be combined with the artificial intelligence technology to develop Hakka digitalization.

The Hakka corpus has multiple functions. First, it visualizes data and displays multimedia, enabling the users to quickly browse commonly used Hakka vocabulary. Second, it collects and preserves Hakka language heritage, completing the preservation of Taiwan Hakka in six accents—Sixian (四縣), Hailu (海陸), Dapu (大埔), Raoping (饒平), Zhao'an (詔安), and Southern Sixian (南四縣). Third, it helps promote digitalization of Hakka research and language education, providing a channel constructed through information technology for utilizing language data.

Combined with artificial intelligence technologies, the Hakka corpus will successfully preserve the cultural heritage and give the general public quick and easy access to written and spoken material of the Hakka language.