Vocaloid, UTAU, and the culture of synthesized vocals in doujin music – Part 1

When most people think of Vocaloid, the twin-tailed internet diva Hatsune Miku comes to mind. But Vocaloid is a lot more than just a single character, and it has a lot more history and influence than just Miku’s voice. Infact, Miku didn’t exist until Vocaloid2, let alone the production of any Japanese Vocaloid doujin work. The voice synthesizer software itself was what gave many artists a chance to add vocals to their music, and it eventually became its own sub-culture. Not only that, but it also spun off other efforts and sub-cultures like MikuMikuDance, Voiceroids, and UTAU (all of which I’ll talk more about in Part 2). Surprisingly enough though, Vocaloid is still contained to a niche community in the west when compared to Japan, even if the Vocaloid software was originally released for English speaking producers.



The Vocaloid software is a singing voice synthesizer which owes its roots to the research done at Pompeu Fabra University in Spain around the early 2000s. It was originally a joint research project to explore human voice and synthesis in the field of singing. Jordi Bonada was the researcher who led the project with an extensive background in music perception, vocal characteristics, and overall sound theory. But the one who pushed the commercial application and research into synthesis for Yamaha’s R&D division was Hideki Kenmochi (if anybody is interested in the theory behind the software, I found the original research paper on EpR vocal synthesis which was the basis for Vocaloid!). The same team then also founded the voice synthesis company, Voctro Labs, which still assists Yamaha R&D with Vocaloid and were the creators of Bruno & Clara (the first Spanish language virtual singers). In 2003, the first version of Vocaloid was announced in Germany and it started to be used later that year.


Vocaloid software GUI

At first, the Vocaloid software might seem a bit daunting, and I wouldn’t blame anyone as the number of options and settings make using it far from a simple task. But the basics behind creating VSTs (the file format used to store score information) are not too difficult to understand. The image above is the main view when using the software – a piano roll with phrases and added effects. In its most basic form, you can type out what the voice should be saying at a particular pitch, control the duration, and add various effects. In terms of what’s required, that is really the bare minimum to get a singing voice out of the software, but the resulting voice might sound less human-like than one would hope without some extra effort. That’s where the more detailed work such as pitch bending, vibrato, portamento, and other effects come into play. For example, the turquoise wave with white dots is an example of a pitch bend resulting in a finely controlled vibrato-like effect. Vocaloid was most definitely state-of-the-art in commercial voice manipulation software in its day, and producers understood that.

Box art of Vocaloid Leon


In Yamaha’s vision, Vocaloid was intended to be “a singer in a box” designed to fully replace a traditional singer. When phrases are typed into the software to be synthesized, a database of phonetic vocal samples is used and modified to produce the final sound. This means that you could take the same VST file, change the singer database, and it would sound like a completely different person. But what’s really unique about Vocaloid is the way it is licensed – what Yamaha made available to others was a voice synthesis engine which would take a voice database and singing directives to produce realistic and melodic vocals, but they did NOT sell the voice itself. They made the choice to let other companies develop their own voice databases and sell those under a different license, a move which paved the way for the first Vocaloids commercially sold to music producers – Leon and Lola, developed by the English company Zero-G in 2004.


But these two Vocaloids were not as popular as Zero-G and Yamaha had hoped. According to Kenmochi, the biggest reason their sales were sluggish was because of the packaging and presentation of the Vocaloid concept. As opposed to selling the characters behind the voices, Zero-G sold them as “virtual soul vocalists” which resulted in a subdued response. Personally, I think that the main failure was the lack of exposure to doujin creators as Nico Nico Douga (Japanese video sharing site) had not picked up yet. Although producers were interested in the product, it didn’t catch on because it was a faceless voice as opposed to a character with a voice. In addition, there was also a backlash because of what is called the “uncanny valley” in robotics – as synthetic vocals get closer to real human singing, there is a point where the realism dips and becomes offputting. Whether people were afraid of it or just saw it as a toy, the reception was so bad in the west that the spread was forever stunted. Luckily, the Sapporo-based company Crypton Future Media stepped in with great vision and good timing to develop the most popular and well known Vocaloid, internet diva Hatsune Miku.


Box art of Vocaloid MeikoBut before we get into Miku, there’s some history behind Crypton’s success with her release for Vocaloid2. Even before Miku, Crypton was fairly successful with their previous two Vocaloid releases of Meiko and Kaito. Although Crypton marketed them as their own products, these two were actually produced jointly with Yamaha – Crypton’s role was the marketing and distribution of the product as that had been their forte at the time. But what’s really interesting about marketing is how seemingly small choices can completely change the course of a product, and that’s what happened when Crypton decided to stick character art of their Vocaloids on the boxes to describe their visual characteristics. What made these two more popular than Leon and Lola was that Crypton was not selling only voicebanks, but actual characters that people could get behind (it’s interesting to note that Kaito was not as popular because male voices were not in demand back then). They figured out that the doujin community really cared about characterization, so they used what they learned to develop the turquoise-haired girl we all know for their landmark release in 2007.


Box art of Vocaloid Hatsune Miku


It wouldn’t be a stretch to call Crypton Future Media producer, Wataru Sasaki, the father of Hatsune Miku. In designing her, the choice was made to leave her as more of a blank canvas – they listed her design, age (16), height (5’2″), weight (92.5 lbs) and nothing more. Her name comes from the Japanese words for first (初 hatsu), sound (音 ne), and the nanori reading of 未来 future (ミク miku). The voice actress, Saki Fujita, was picked to provide the donor voice and after many long recording sessions the voicebank was completed. Realizing that the appeal of their previous Vocaloid Meiko was due to her existence as a character, they wanted to build upon that and let the doujin community decide how to define Miku’s personality. Soon after her release, Crypton opened an online community called Piapro (which stands for “peer production”) where Vocaloid fans could upload their doujin works. This, coupled with Nico Nico Douga’s rise in popularity at the time, drove Hatsune Miku sales through the roof – so much that Crypton couldn’t keep up with the initial demand.




Crypton Future Media's Vocaloid2 family (Kaito, Meiko, Miku, Len, Rin, Luka)

Crypton’s Vocaloid family – Render by Sango2000

Over the next few years, Crypton released Rin, Len, and Luka for the Vocaloid2 software who all had similar success following in their sister’s footsteps. Other companies moved in to have a piece of the pie as well and Vocaloids continued to boom in Japan. Some people say that the popularity of Vocaloid in Japan was due to the inherently Shinto culture – anthropomorphism was part of its core. There are many other examples of this type of object-personality in Japan such as OS-tan and anime like Akikan (lol). But sadly, the culture in the US is very different, and cute moe designs are regarded as weird (although Crypton blames it on a fear of robots in the US…). Nevertheless it’s pretty interesting to note that Crypton’s label, KarenT, makes a majority of their sales from American consumers at the iTunes store. Funny how these things turn out!


The Vocaloid craze spawned many different sub-cultures around voice synthesis as Miku gained popularity through 2008. One of the most interesting ones is something called UTAU – think Vocaloid, but with user-created voicebanks. More on this, Miku, and other Vocaloid inspired software in Part 2!



====> Make sure to catch Part 2 here! <====


• • •


Also, for some HIGHLY suggested reading check out this feature article on Vocaloids which chronicles Bonada, Kenmochi, and the Vocaloid craze (and some pretty cool quotes/interviews).





You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

8 − = six

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>