[ home / bans / all ] [ qa / jp / sum ] [ maho ] [ xmas ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]

/qa/ - Questions and Answers

Questions and Answers about QA

New Reply

Whitelist Token
Password (For file deletion.)
Markup tags exist for bold, itallics, header, spoiler etc. as listed in " [options] > View Formatting "

[Return] [Bottom] [Catalog]

File:__zundamon_voiceroid_and_1….png (632 KB,876x1125)


Thanks to a link from a guy that lead to discussion and a link from another guy, we have links to software that makes cute voices!

I'm using the first link personally because it has Zundamon, which is the absolute cutest artificial voice. I'm making a thread here on /qa/ because the original thread is on /jp/ and /jp/ threads can fall off after a couple weeks and this is too cool to lose that quickly.


File:VOICEVOX_t1v0dq7dUx.png (133.82 KB,1300x822)

I don't speak or understand any Japanese symbols, so I'm using a katakana converter to turn text into something the program can read. This is the one I'm using, but I don't know if there's anything better: https://www.sljfaq.org/cgi/e2k.cgi

I can't figure out how to get back to the opening screen so I can't give a tutorial for what to do from there, but once you're at this screen you can use the right to adjust various sound tweaks that effect everything, or the lower bar and its lower left tabs to affect the enunciating and emphasis and stuff. I don't really know how to do these effectively, but maybe trial and error will help.

You can click the little icon next to the 'typed text here' to select a voice, and two of the characters have multiple options. After you type stuff, hit the play button in lower left to build up the audio. I've found that deleting the "・" between words makes it flow a lot better, but leave it if you want it to be read very slowly or to insert a pause after a sentence.

As for exporting... well, you can hit E on the keyboard to extract a WAV, but you'll need to convert it to something else to upload it on kissu. (WAVs are so large and inefficient)
Personally, I just use OBS to capture the window and audio to share here.


File:2022-05-29 20-00-16.mp4 (3.63 MB,1300x822)

The plus sign in the lower right lets you add more lines, which lets you do this!
*fixed with the lines I forgot*


File:zundamon_tsukuyomi.mp4 (2.07 MB,1920x1080)

Thanks for this thread!

For those who want some more, here is another free TTS program with 5 built5-in voices:
(You might need to bypass the GDrive download limit)

It uses the same UI as Voicevox and also uses machine learning method, but the algorithm is different so some features are missing (intonation and length).

However, the developer provides a tool that lets you make your own voice for the software, and by now 28 additional voices can be downloaded from external sites.


File:メールだよ.mp3 (42.23 KB)

What can we do with this? Making cute voice clips is cute and all but what's some practical use for this? Like creating notification sounds, etc.


It will mean that people can easily voice hentai games, even single devs making games in RPG maker(Well I guess they don't have to be hentai games) or they could use it in mods as well. People could use it to make proper 3d anime type things, like Burning Melling but with voices. One could even use it to make vt*bers, if they were that way inclined....


Hmm. This is nice and all, but how was neropaso using this for TTS or were they using something else? Were they actually using this >>90447 ?

Anything you would want voices for! Just as an example with something like this, given the number of voicebanks, you could hypothetically create a voiced game or something like that.


File:8c8e8f4b82cf5f908949a0e22b….jpg (322.5 KB,1752x2557)

Mostly for making videos, like adding a narration. Lots of TTS software of this type actually have features specifically designed for this, like outputting a text file at the same time so an automatic subtitle is generated.
Also useful for adding vocal to a song.
Or adding a voice for a cute AI girl you make etc.


Voicevox has an API, there are definitely special software designed to work with that and the Youtube chat API so voices for the chat messages are automatically generated


Someone had to do it.


File:shut up.webm (144.59 KB,358x166)


File:VOICEVOX.jpg (66.3 KB,1101x797)

Translated the "How to" for VOICEVOX.
"Reader" refers to the character that reads the text, with "character" referring to letters.
Striked text are my inputs.

How to use

Before we begin
This document is for learning how to use the text-to-speech voice synthesizer software VOICEVOX.
Make sure to read the Terms and Conditions first: https://voicevox.hiroshiba.jp/term/
we also have a video that teaches you the basics of the program: https://www.youtube.com/watch?v=4yVpklclxwU

How to run
When trying to run this program on Windows, you might get a warning dialog saying "Windows protected your PC". If you do, click on "More info" and choose "Run anyway".

During the first run of the program, you might get a dialog saying that this program is not registered with Apple:
If so, on the Finder, click on VOICEVOX's icon while holding down the Control key. From the shortcut menu, select "Open" and then click on "Open".
You can also select "System Environment Settings" from the Apple Menu and from the General tab select something along the lines of "Open as is".
If you are on a system that runs on Apple Silicon:
When trying to run this program for the first time, if prompted to install Rosetta, please follow the install wizard and install it:

Running the voice synthesis engine
The first thing to start up is the voice synthesis engine, if you have an NVIDIA GPU with at least 3GB of memory, you can use the GPU mode with its faster speeds.
* GPU mode is not available for Mac.

Voice synthesis
Click on the empty space to the right of the character icon to input text. From now on referred to as "text row" with "input area" referring to the input are within.
Press the Enter key to confirm the text; doing so, you will see the readings and the accents for the text at the bottom of the screen. From now on referred to as "customization area".
Once you click on the "play" button the voice will first begin generating, then be played back.

Adding or removing text
Clicking on the "+" on the bottom right will add an empty text row. Hovering over a text row, you will see a trash can icon appear, click that to delete the corresponding text row. You can also select multiple text rows at once.

Changing the reader
Click on the reader icon(s) towards the left of the text row(s) and select a reader from the dropdown menu:
You can change the order of the readers through "キャラクター並べ替え". It's the third option in the Settings menu; drag and drop the character names on the right of the screen.

Changing the order of the text rows
You can change the order of the text rows by clicking in the vicinity of the input area and dragging.

Changing the spacing between characters
When characters are unintentionally connected or separated, you can adjust this by clicking on the empty space between the characters in the "Accent" tab.
Clicking on the gap between the "two words":
you can turn it into a "single word".
Similarly, when you want to add a separation, click on the empty space between the characters:

Changing the accent
If the desired accent is not achieved, you can change it in two ways. The recommended way is to change the accent slider.
For example, if you want "Deeplearning" to be read as "↑deepuraa↓ningu", drag the slider up to right above where the「ラ」character is:

Changing the intonation
If the desired outcome is not achieved even after adjusting the accents, or if you want to make more delicate changes, you can change the intonation of each individual character as well.
You can change the intonation for each character from the "Intonations" tab:
You can also increase the size of the customization area to make further detailed changes to the intonations:
You can also move the slider with the mouse wheel. Hold "Ctrl" while using the mouse wheel to lessen the amount of change with every scroll.
Further more, characters like 「キ」,「ツ」, and「ス」are muted, their sliders are grayed out in the Intonation tab. You can unmute it by clicking on the character:
You can only mute/unmute characters that end with either an「イ」"e"or an「ウ」"oo".


Correcting the readings
When the reading is not as expected, you can make corrections to it by clicking on the character(s) in the "Accents" tab:

Changing the style
Depending on the reader, there are a couple of styles (the way they talk) to choose from. Same as with changing the reader, you can change the style by clicking on the reader icon towards the left of the text row(s):
You can change the default style for each reader from「デフォルトスタイル」in the Settings menu. The fourth item.

Changing the length of audio
You can also change the length of the audio for each character such as lengthening the end of a character, or adjusting the length of "silence" muted characters.
To adjust the length, click on the Length tab:

Exporting audio
Clicking on the「音声書き出し」item second item from the File menu, audio of all the text lines will be exported as a .WAV file. The file will be saved as [SNo.]_[reader name]_[text head].wav. You can change the settings to save a text file along with the audio file via the「オプション」item last item in the Settings menu.

Importing a text file
You can import a text file by clicking on「テキスト読み込み」item the fifth item in the File menu. The text can be delimited with either a newline, or a half-width comma (,) it's the normal comma, I think. If a block of delimited text only contains a reader's name, the following text will be read by that reader.
For example:
will be imported as:

Saving/importing a project file
Your text(s), reader choice(s), and customization(s) can be saved as a project file and imported in later sessions. From the File menu, second last item for saving and last item for importing. The project file will be saved as a `.vvproj` file.

You can customize the shortcuts from the「キー割り当て」in the settings menu first item.
- Up and down arrow keys:
Change text row selection.
- Space:
Play audio.
- Shift + Enter:
Add text row.
- Shift + Delete:
Delete text row.
- Ctrl + S:
Save project.
- Ctrl + E:
Save audio.
- Ctrl + Z:
Undo previous change.
- Ctrl + Y:
Redo previous undo.
- Esc:
De-select input area.
- 1:
Activate "Accent" tab.
- 2:
Activate "Intonation" tab.
- 3:
Activate "Length" tab.
- Scrolling the mouse wheel when atop a slider:
Change slider value.
Hold Ctrl to lessen the amount of change.
Hold Alt when adjusting Intonation or Length sliders to simultaneously adjust the Accent values. Not sure what they meant by this.

Customize the toolbar
You can customize the buttons type and location for the tool bar. Via the second item in the Settings menu.

Change reader order or listen to reader sample
You can change the order the readers appear in from the「キャラクターの並び替え・試聴」item third item in the Settings menu. You can also listen to sample audio for each character.

Default style
From the「デフォールトスタイル」item fourth item in the Settings menu, you can set the default style for each reader.

Readings and accent dictionary
It is possible for hard or new words to be hard to pronounce for the readers, you can use the dictionary feature to register them. You can find the dictionary feature in「読み方&アクセント辞書」second last item in the Settings menu.
When you open the 「読み方&アクセント辞書」screen, you will see a list of registered words on the left. Use the「追加」button to register new words:
Enter the word you want to register in「単語」and add the hiragana or katakana reading for the word in「読み」. In the「アクセント調整」area, you can set the default accent for the word:


You can make changes to multiple settings in the「オプション」item last item in the Settings menu.

エンジン <Engine>
Change the operation mode for the engine. You need an NVIDIA GPU with at least 3 GB of memory to utilize the GPU mode.

操作 <Processing>
パラメータの引き継ぎ <Parameter inheritance>
Whether or not the customizations to existing text rows(s) are inherited by newly added text rows.
再生位置を追従 <Track playback position>
Set how the program deals with text going out of screen during playback.

保存 <Save>
文字コード <Character code>
Set character code for saving/importing.
書き出し先を固定 <Fix export destination>
Fix an export destination for audio files, exports audio files to that destination without having to select it during export.
上書き防止 <Prevent overwrite>
When saving/exporting, add a serial number to the filename when a file with the same filename is discovered.
txtファイルを書き出し <Export a txt file>
Whether or not to export a text file when exporting.
labファイルを書き出し <Export a lab file>
Whether or not to export a lab file when exporting. It stores information needed when lipsyncing dunno what thats for such as phoneme, and timing information.

高度な設定 <Advanced settings>
音声をステレオ化 <Change audio to stereo>
Change playback and export audio from mono to stereo.
再生デバイス <Playback device>
Change device for audio playback.
音声のサンプリングレート <Sampling rate for audio>
Change the sampling rate for playback and export audio. Settings a high sampling rate does not increase the quality of audio.

実験的機能 <Experimental features>
You can choose to utilize unfinished/under development features.
プリセット機能 <Preset feature>
Feature to register customizations. The presets persist for following run(s) of the program.
疑問文自動調整 <Automatic adjusting questions>
Will raise the intonation for end(s) of word(s) for question sentences, making it sound more like an actual question sentence.

データ収集 <Data collection>
ソフトウェア利用状況のデータ収集を許可する <Allow data collection when the program is running>
The data on the utilization factor for each UI component will be used to further improve VOICEVOX. Rest assured knowing that we do not collect data pretaining to user input text or audio outputs.

You can fix the window as the top most window with the pin button at the top right of the window.

You can read the Terms and conditions here.

If the program was installed with the installer, run the `Uninstall VOICEVOX.exe` in the install directory.
If the program was installed from the zip file, delete the zip file along with the extracted directory.

If the program was installed with the installer, drag and drop VOICEVOX from "Applications" into the trash bin.
If the program was installed from the zip file, delete the zip file along with the extracted directory.

See: https://voicevox.hiroshiba.jp/qa/

For feedback and requests, tweet with the hashtag `#VOICEVOX` to reach us on Twitter.
For bug reports, tweet with the hashtag `#VOICEVOX` or directly tweet at us at @`voicevox_pj`.
Further more, reach us at `@voicevox_pj` for questions not already in the Q/A page.


File:Apo~n_dance_Yume.gif (728.4 KB,640x360)

Probably some continuity errors or some other errors, but this should convey the gist of everything. Hopefully.


File:802b92a88710079828b5c93f9b….jpg (370.72 KB,1995x2706)

>It stores information needed when lipsyncing dunno what thats for such as phoneme, and timing information.
It's for making narration videos where characters specking have sprites on screen. Lipsyncing info lets them swap between correct mouth shapes while speaking.


File:2022-06-01 15-52-08.mp4 (431.5 KB,960x676)

She remembered!


File:2022-06-25 15-01-15.mp4 (2.06 MB,800x600)

It feels like so much effort to do musical things, but I know it's in here somewhere.
I which order I should do things in. Also thanks a lot for >>90556
I'm going to look at this program some more soon...


File:99270887_p0.jpg (340.33 KB,1270x1754)

Zundamon's popularity in pixiv seems to have grown in recent times. She has 20 pages (1188 images) on pixiv with the oldest entry being of the animal form in 2014. The first image of page 15 is from the middle of January 2022, so something happened that created quite a craze and it seems to have maintained it.


File:portrait-sayo.webp (59.33 KB,1280x1280)

Since the OP was posted, VOICEVOX now has even more voices (22 in total) available!


File:104757134_p0.jpg (333.81 KB,1738x1736)

Hmm, are they any popular? Zundamon was by far the cutest at the time.
I wonder if you can use this stuff for AI. I think it'd be a lot easier to make some sentences for training data and then just use the AI stuff. At least for English.
I'm sure you can't beat the voice control of this stuff for singing, though.


Jashin-chan is getting her own voice synthesis software. A 3D MMD model and a Live2D model of her have also been released. There's some kind of contest in April for submitting videos of her using these assets.
A trial version is available.


That's impressive and strange and cool. I wonder what the motivation for it is, was it maybe a Kickstarter incentive or something? It's just so out there that I have trouble thinking of a reason why it exists. It's really cool, though, but... huh.


File:megatest.ogg (222.57 KB)

Cutest Voiee for test files!


A few days later I'm reminded that Miku was in her show so this must be some weird osmosis thing going on. I just got done listening to a bunch of songs and now I want to hear Jashin singing!


File:uegh... kimo.mp3 (18.61 KB)

I was having a bit of fun with this the other day.


Can I ERP with it


File:konbanwa jashin-chan.ogg (22.16 KB)


File:harukanana.mp4 (988.29 KB,1280x1080)

Well, Haruka Nana (voiced by Nanahira) was just added! Now there are 25 voices available.
(she also got an updated version of the UTAU voicebank this year...)


She's got a HUGE head!


My own personal Nanahira...


when do yo think theyre going to give zundamon her own anime


File:107486113_p0.jpg (14.19 MB,3349x4302)

If you go by art, which is probably a terrible way to gauge popularity, she's not too popular when compared to others like Yuzuki.
Yuzuki Yukari: 43,804
Akane Kotonoha: 14,090
Aoi Kotonoha: 12,174
Zunndamonn: 3,341
(also is that really how it's pixiv Latin-ized or whatever the term was? Seems kind of awkward)

Zundamon came out in 2014 so she's definitely had time to get numbers. I think her designn and dual existence as a humanoid and little furry creature definitely lends itself more to a show than other characters, though.


>Zunndamonn: 3,341
>(also is that really how it's pixiv Latin-ized or whatever the term was? Seems kind of awkward)
That's an extremely bizarre way of romanizing the name. It's like they used the keystrokes for alphabetic kana input (you type "n" twice to get the ん character, because it needs to leave your options open after the first "n" to continue into na, ni, nu, ne, no, or n). It'd be like romanizing "レティ" (Letty) as Retexi, because you use "xi" to get the small ィ character.


Romanization and the reverse (Japanification?) often has things like this. Often I'll think of ways of writing some english word in kana, but then the canonically accepted way is very different.


Ehh, "katakanization"/loanword adaptation is a phonological process while romanization is purely orthographic. Zundamon and Zunndamonn stand for exactly the same sounds referring to the exact same thing, while enerugii and enajii are two separate words that ultimately have the same Greek root but went down different paths.


File:tzUrvEz6xAPRmAG_.mp4 (202.34 KB,1280x720)

It's clear to see why. She's adorable.


File:18a74fa082654f0697b51a1254….jpg (131.62 KB,944x1569)

Who could dislike this funny little green squirrel girl with green bean ears?


Zundamon is indeed extremely cute and it's a bit of a mystery why there isn't loads and loads of art made each day. She does still seem to be increasing in polarity, though, so good times ahead for Zundamon fans.


I think a lot of dedicated cute artists are devoted to certain franchises




on topic sager


how can it be that i'm the one who gets like 90% of these replies it's not fair



I feel like I will die in 7 days now




File:[MTBB] Oshi no Ko - 08 [CC….jpg (278.34 KB,1920x1080)

Where do I know this song from? It's somewhere in the recesses of my memory


I felt like I watched a cursed video


the original is wildly popular got like a bajillion covers


File:[anon] The Idolmaster Cind….jpg (275.53 KB,1920x1080)

Huh, is it really not older than that? I guess my brain is playing tricks on me then


cute brat song





now it's zunda that's armed and ready to go




File:110699327_p0.png (8.38 KB,617x680)

Nice. GYARI is pretty popular so this could usher in a whole bunch of stuff. I can't wait to be a Zundamon hipster


zundamon is so cute


File:110737332_p0.png (12.46 KB,1064x952)

Good artist


Wow this is the first time I've seen a Japanese program natively support Linux


probably happened by accident from him choosing electron


insanely cute


File:00083-2641630778.png (962.78 KB,864x1152)

Zundamon sound board! https://aidn.jp/zundaroe/
(didn't feel like looking at pixiv so take a nonsensical AI image)


File:zundaroe_1703190060.mp4 (3.69 MB,1280x720)

nice site


Anyone tried using this for english? Just copy/pasting the text doesn't work too well, but converting it to katakana first works a lot better. Haven't found a way to do it locally, though.


File:116422198_p0.jpg (1.13 MB,2048x2048)

I tried, but it's an exercise in frustration. I think I did try the katakana thing, but it was still hard to do the pitches and such. I think I also tried general TTS tricks, which I can't quite remember. (I think it was like typing "hoo" instead of "who". I didn't fully understand the UI either, but I guess part of the whole "unable to understand Japanese" thing.
After seeing all the AI cover stuff I don't know if this has any chance to gain traction outside Japanese; it's just too much work.


File:eng.flac (577.2 KB)

i have better lucks with voice cloning tools and 10 seconds of reference audio
it's not perfect but really sounds like how japanese speak english


Ohh, I forgot about this thread.

By voice cloning you mean the AI stuff, right? Yeah, it's made vocaloid stuff seem like backbreaking labor by comparison, but vocaloid still has superior charm and special sounds I think.


File:9902e43858ea0512760cd03fd5….jpg (221.61 KB,1667x2048)

yes, i generated the audio with https://github.com/fishaudio/fish-speech
ai is progressing so fast, now the large language models are being integrated for text to speech so i don't have to care about language-specific models and preprocessing anymore, i can enter text in any language and it just works

[Return] [Top] [Catalog] [Post a Reply]
Delete Post [ ]

[ home / bans / all ] [ qa / jp / sum ] [ maho ] [ xmas ] [ f / ec ] [ b / poll ] [ tv / bann ] [ toggle-new / tab ]