174 lines
8.3 KiB
Markdown
174 lines
8.3 KiB
Markdown
# libsonic Home Page
|
|
|
|
[Download the latest tar-ball from here](download).
|
|
|
|
The source code repository can be cloned using git:
|
|
|
|
$ git clone git://github.com/waywardgeek/sonic.git
|
|
|
|
The source code for the Android version, sonic-ndk, can be cloned with:
|
|
|
|
$ git clone git://github.com/waywardgeek/sonic-ndk.git
|
|
|
|
There is a simple test app for android that demos capabilities. You can
|
|
[install the Android application from here](Sonic-NDK.apk)
|
|
|
|
There is a new native Java port, which is very fast! Checkout Sonic.java and
|
|
Main.java in the latest tar-ball, or get the code from git.
|
|
|
|
## Overview
|
|
|
|
Sonic is free software for speeding up or slowing down speech. While similar to
|
|
other algorithms that came before, Sonic is optimized for speed ups of over 2X.
|
|
There is a simple sonic library in ANSI C, and one in pure Java. Both are
|
|
designed to easily be integrated into streaming voice applications, like TTS
|
|
back ends. While a very new project, it is already integrated into:
|
|
|
|
- espeak
|
|
- Debian Sid as package libsonic
|
|
- Android Astro Player Nova
|
|
- Android Osplayer
|
|
- Multiple closed source TTS engines
|
|
|
|
The primary motivation behind sonic is to enable the blind and visually impaired
|
|
to improve their productivity with free software speech engines, like espeak.
|
|
Sonic can also be used by the sighted. For example, sonic can improve the
|
|
experience of listening to an audio book on an Android phone.
|
|
|
|
Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved. It is released
|
|
as under the Apache 2.0 license. Feel free to contact me at
|
|
<waywardgeek@gmail.com>. One user was concerned about patents. I believe the
|
|
sonic algorithms do not violate any patents, as most of it is very old, based
|
|
on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
|
|
the new part, for greater than 2X speed up, is clearly a capability most
|
|
developers ignore, and would not bother to patent.
|
|
|
|
## Comparison to Other Solutions
|
|
|
|
In short, Sonic is better for speech, while WSOLA is better for music.
|
|
|
|
A popular alternative is SoundTouch. SoundTouch uses WSOLA, an algorithm
|
|
optimized for changing the tempo of music. No WSOLA based program performs well
|
|
for speech (contrary to the inventor's estimate of WSOLA). Listen to [this
|
|
soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
|
|
it to [this sonic sample](sonic.wav). Both are sped up by 2X. WSOLA
|
|
introduces unacceptable levels of distortion, making speech impossible to
|
|
understand at high speed (over 2.5X) by blind speed listeners.
|
|
|
|
However, there are decent free software algorithms for speeding up speech. They
|
|
are all in the TD-PSOLA family. For speech rates below 2X, sonic uses PICOLA,
|
|
which I find to be the best algorithm available. A slightly buggy
|
|
implementation of PICOLA is available in the spandsp library. I find the one in
|
|
RockBox quite good, though it's limited to 2X speed up. So far as I know, only
|
|
sonic is optimized for speed factors needed by the blind, up to 6X.
|
|
|
|
Sonic does all of it's CPU intensive work with integer math, and works well on
|
|
ARM CPUs without FPUs. It supports multiple channels (stereo), and is also able
|
|
to change the pitch of a voice. It works well in streaming audio applications,
|
|
and can deal with sound streams in 16-bit signed integer, 32-bit floating point,
|
|
or 8-bit unsigned formats. The source code is in plain ANSI C. In short, it's
|
|
production ready.
|
|
|
|
## Using libsonic in your program
|
|
|
|
Sonic is still a new library, but is in Debian Sid. It will take a while
|
|
for it to filter out into all the other distros. For now, feel free to simply
|
|
add sonic.c and sonic.h to your application (or Sonic.java), but consider
|
|
switching to -lsonic once the library is available on your distro.
|
|
|
|
The file [main.c](main.c) is the source code for the sonic command-line application. It
|
|
is meant to be useful as example code. Feel free to copy directly from main.c
|
|
into your application, as main.c is in the public domain. Dependencies listed
|
|
in debian/control like libsndfile are there to compile the sonic command-line
|
|
application. Libsonic has no external dependencies.
|
|
|
|
There are basically two ways to use sonic: batch or stream mode. The simplest
|
|
is batch mode where you pass an entire sound sample to sonic. All you do is
|
|
call one function, like this:
|
|
|
|
sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);
|
|
|
|
This will change the speed and pitch of the sound samples pointed to by samples,
|
|
which should be 16-bit signed integers. Stereo mode is supported, as
|
|
is any arbitrary number of channels. Samples for each channel should be
|
|
adjacent in the input array. Because the samples are modified in-place, be sure
|
|
that there is room in the samples array for the speed-changed samples. In
|
|
general, if you are speeding up, rather than slowing down, it will be safe to
|
|
have no extra padding. If your sound samples are mono, and you don't want to
|
|
scale volume or playback rate, and if you want normal pitch scaling, then call
|
|
it like this:
|
|
|
|
sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);
|
|
|
|
The other way to use libsonic is in stream mode. This is more complex, but
|
|
allows sonic to be inserted into a sound stream with fairly low latency. The
|
|
current maximum latency in sonic is 31 milliseconds, which is enough to process
|
|
two pitch periods of voice as low as 65 Hz. In general, the latency is equal to
|
|
two pitch periods, which is typically closer to 20 milliseconds.
|
|
|
|
To process a sound stream, you must create a sonicStream object, which contains
|
|
all of the state used by sonic. Sonic should be thread safe, and multiple
|
|
sonicStream objects can be used at the same time. You create a sonicStream
|
|
object like this:
|
|
|
|
sonicStream stream = sonicCreateStream(sampleRate, numChannels);
|
|
|
|
When you're done with a sonic stream, you can free it's memory with:
|
|
|
|
sonicDestroyStream(stream);
|
|
|
|
By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
|
|
no change at all to the sound stream. Sonic detects this case, and simply
|
|
copies the input to the output to reduce CPU load. To change the speed, pitch,
|
|
rate, or volume, set the parameters using:
|
|
|
|
sonicSetSpeed(stream, speed);
|
|
sonicSetPitch(stream, pitch);
|
|
sonicSetRate(stream, rate);
|
|
sonicSetVolume(stream, volume);
|
|
|
|
These four parameters are floating point numbers. A speed of 2.0 means to
|
|
double speed of speech. A pitch of 0.95 means to lower the pitch by about 5%,
|
|
and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
|
|
exceed the maximum range of a 16-bit integer. Speech rate scales how fast
|
|
speech is played. A 2.0 value will make you sound like a chipmunk talking very
|
|
fast. A 0.7 value will make you sound like a giant talking slowly.
|
|
|
|
By default, pitch is modified by changing the rate, and then using speed
|
|
modification to bring the speed back to normal. This allows for a wide range of
|
|
pitch changes, but changing the pitch makes the speaker sound larger or smaller,
|
|
too. If you want to make the person sound like the same person, but talking at
|
|
a higher or lower pitch, then enable the vocal chord emulation mode for pitch
|
|
scaling, using:
|
|
|
|
sonicSetChordPitch(stream, 1);
|
|
|
|
However, only small changes to pitch should be used in this mode, as it
|
|
introduces significant distortion otherwise.
|
|
|
|
After setting the sound parameters, you write to the stream like this:
|
|
|
|
sonicWriteShortToStream(stream, samples, numSamples);
|
|
|
|
You read the sped up speech samples from sonic like this:
|
|
|
|
samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
|
|
if(samplesRead > 0) {
|
|
/* Do something with the output samples in outBuffer, like send them to
|
|
* the sound device. */
|
|
}
|
|
|
|
You may change the speed, pitch, rate, and volume parameters at any time, without
|
|
having to flush or create a new sonic stream.
|
|
|
|
When your sound stream ends, there may be several milliseconds of sound data in
|
|
the sonic stream's buffers. To force sonic to process those samples use:
|
|
|
|
sonicFlushStream(stream);
|
|
|
|
Then, read those samples as above. That's about all there is to using libsonic.
|
|
There are some more functions as a convenience for the user, like
|
|
sonicGetSpeed. Other sound data formats are supported: signed char and float.
|
|
If float, the sound data should be between -1.0 and 1.0. Internally, all sound
|
|
data is converted to 16-bit integers for processing.
|