Datasets
We create high quality datasets.
Introduction to High-Quality Voice Dataset (Chinese Mandarin) KKQ202406
Compliance: In order to achieve full compliance, we cooperated with the legal department in the early stages of construction of this voice dataset. First, in terms of the selection of recording scripts, we purchased more than 1.5 million word scripts from the copyright company and signed a copyright contract with them. Secondly, in the recording process, we signed a recording authorization contract with the talents, stipulating that this recording can be used for AI. In addition, we also required each talent to record a separate paragraph, the content is agreed that we can authorize their recordings to third parties.
Recording Quality: All voice talents are professionals with professional equipments and professional recording environment. In order to ensure high quality, all recordings are in the format of wav, 48000hz, 24bit, mono format.
Splitting and Annotation: All recordings were manually splitted and annotated. The splitted audios are between 1-20 seconds. All recordings wer splitted at natural pause positions. The splitted audios were manually annotated. Each splitted audio file includes two annotation files, one is Pinyin annotation and the other is Chinese character annotation. The annotation file format is txt format and utf+8 encoding.
Proportion of Different Audio Length: 【1-5s, 20%】 【6-10s, 20%】 【11-15s, 30%】【16-20, 30%】
Dataset Samples
KKQ 081
KKQ 096
KKQ 013
KKQ015
(1356 files, 452 audio files, 452 Pinyin scripts, 452 Chinese char scripts)
Pinyin:
jin3 guan3 ou3 er3 ta1 ye3 hui4 shun4 bian4 zai4 qi2 ta1 tong2 shi4 hui2 jia1 ,
Chinese Chars:
尽管偶尔他也会顺便载其他同事回家,
Pinyin:
hu1 ran2 kan4 jian4 fu4 yan2 fei1 dan1 shou3 fu2 zhe5 zi4 xing2 che1 zai4 bu4 yuan3 chu4 zhan4 zhe5 ,
Chinese Chars:
忽然看见傅延飞单手扶着自行车在不远处站着,
Pinyin:
chi1 bao3 le5 dai4 hui4 er2 jiu4 chi1 bu4 xia4 fan4 le5 。
Chinese Chars:
吃饱了待会儿就吃不下饭了。
Pinyin:
xu3 song4 ya3 yan3 zheng1 zheng1 di4 kan4 zhe5 ta1 ba3 di4 gua1 sai1 jin4 le ta1 de5 shu1 bao1 li3 , xiao3 sheng1 sui4 sui4 nian4 ,
Chinese Chars:
许颂雅眼睁睁地看着他把地瓜塞进了他的书包里,小声碎碎念,
Pinyin:
xiong1 ba1 ba1 de5 。
Chinese Chars:
凶巴巴的。
(1293 files, 431 audio files, 431 Pinyin scripts, 431 Chinese char scripts)
Pinyin:
dan4 shi4 wo3 zhi1 dao4 ta1 jiu3 hou4 shi4 cong2 lai2 bu4 xi1 yan1 de1 ,
Chinese Chars:
但是我知道她酒后是从来不吸烟的,
Pinyin:
er2 qie3 wo3 shuo1 de1 zhe4 xie1 ye3 suan4 bu4 shang4 shen2 me1 zheng4 ju4 ,
Chinese Chars:
而且我说的这些也算不上什么证据,
Pinyin:
zhe4 shi4 yun2 jin3 lai2 te4 qin2 yi1 zhan4 hou4 di4 yi1 ci4 yu4 dao4 zhe4 me1 zhong4 da4 de1 huo3 qing2 ,
Chinese Chars:
这是云锦来特勤一站后第一次遇到这么重大的火情,
Pinyin:
zhuan3 yan3 jian1 jiu4 bu4 jian4 le1 zong1 ying3 。
Chinese Chars:
转眼间就不见了踪影。
Pinyin:
yun2 jin3 yi1 zhuan3 tou2 , zheng4 hao3 kan4 jian4 yang2 yue4 fei1 shen1 shang4 le1 yi1 liang4 xiao1 fang2 che1 ,
Chinese Chars:
云锦一转头,正好看见杨钺飞身上了一辆消防车,
(1413 files, 471 audio files, 471 Pinyin scripts, 471 Chinese char scripts)
Pinyin:
wo3 deng3 wei4 he2 yao4 gao4 su4 ni3 yuan2 yin1 a1
Chinese Chars:
我等为何要告诉你原因啊
Pinyin:
gai1 zen3 me5 sheng1 huo2 shi4 wo3 men2 shu4 ren2 zi4 ji3 de5 shi4 qing2
Chinese Chars:
该怎么生活是我们树人自己的事情
Pinyin:
lao3 shu4 ren2 hou4 fang1 de5 yi1 ke1 shu4 mu4 tu1 ran2 cha1 hua4 dao4
Chinese Chars:
老树人后方的一棵树木突然插话道
Pinyin:
heng1 , xiao3 bei4 , ni3 wen4 gou4 le5 mei2 you3 ?
Chinese Chars:
哼,小辈,你问够了没有?
Pinyin:
yu3 ni3 zhe4 ge4 wai4 ren2 mei2 you3 guan1 xi4 ba1 ?
Chinese Chars:
与你这个外人没有关系吧?
(1281 files, 427 audio files, 427 Pinyin scripts, 427 Chinese char scripts)
Pinyin:
ta1 dou1 bu4 ke3 neng2 dui4 ta1 yi3 yuan4 bao4 yuan4
Chinese Chars:
他都不可能对他以怨报怨
Pinyin:
wu2 lun4 ta1 dui4 ta1 zuo4 le5 shen2 me5 guo4 fen1 de5 shi4
Chinese Chars:
无论他对他做了什么过分的事
Pinyin:
er2 ling4 yi1 ge4 ren2 , bian4 shi4 ta1 de5 mei4 mei4 fu4 shi1 yun2
Chinese Chars:
而另一个人,便是他的妹妹傅诗云
Pinyin:
yi1 shi4 ta1 xin1 yi2 de5 su1 luo4 yu3
Chinese Chars:
一是他心仪的苏洛雨
Pinyin:
ke3 shi4 zhe4 ge4 shi4 jie4 shang4 , you3 liang3 ge4 ren2 yu3 ta1 er2 yan2 shi4 yi4 wai4
Chinese Chars:
可是这个世界上,有两个人于他而言是意外