[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnuspeech-contact] Re: Parameter names and meanings
From: |
David Hill |
Subject: |
Re: [gnuspeech-contact] Re: Parameter names and meanings |
Date: |
Fri, 3 Feb 2006 14:24:09 -0800 |
Hi Eric,
This is the first time I've seen this message, so maybe there was a
list problem. Very odd!
On Feb 3, 2006, at 4:40 AM, Eric Zoerner wrote:
I haven't seen a reply to this message which was sent to the list
on 3 Jan. Is this information not available?
Thanks,
Eric
Ar 3 Ean 2006, ag 18:47, scríobh Eric Zoerner:
Maybe I'm not looking in the right place, but there seems to be
some information missing in the GnuSpeech documentation which is
causing some difficulty for me.
I am having some difficulty with the names used for the
parameters, both at the posture (Synthesizer) and suprasegmental
(Monet) levels, in that I cannot find documentation on how the
parameters relate to the tools. There is general discussion of the
parameters in some cases, but there is no direct mapping of the
concepts to the parameter names.
I'll have to look into this.
For example, the Synthesizer app allows you to adjust the
"breathiness" of a posture, but there is no reference to how
adjusting this affects the output parameters of the Synthesizer
app. My best guess is that it may affect the "fricBW" parameter.
The "breathiness" parameter actually injects noise as part of the
glottal excitation. The parameter simulates the fact that with some
voices, there is a part of the glottis that does not fully close, and
air passing through that unclosed portion, as the main glottal
closure increases, causes "breathy" noise at the glottis itself.
This is one of the features that distinguishes most female voices
("she had a really husky voice" indicates a more extreme case) from
most male voices.
You can find the source for the software Tube Resonance Model engine
as "tube.c" on the gnu site under
"trliium/src/softwareTRM/tube.c
The most relevant part of the code is:
/* CREATE GLOTTAL PULSE (OR SINE TONE) */
pulse = oscillator(f0);
/* CREATE PULSED NOISE */
pulsed_noise = lp_noise * pulse;
/* CREATE NOISY GLOTTAL PULSE */
pulse = ax * ((pulse * (1.0 - breathinessFactor)) +
(pulsed_noise * breathinessFactor));
/* CROSS-MIX PURE NOISE WITH PULSED NOISE */
if (modulation) {
crossmix = ax * crossmixFactor;
crossmix = (crossmix < 1.0) ? crossmix : 1.0;
signal = (pulsed_noise * crossmix) +
(lp_noise * (1.0 - crossmix));
Note that when voiced fricatives (especially "z") are synthesised,
the frication noise is pulsed at glottal frequency. This is a
different effect and is is what the noise cross-mix is all about.
It would be helpful to get a description of each parameter, what
the abbreviation stands for, what it means, and how it is adjusted
by the tools.
Agreed. It shall be done.
r1 through r8 are fairly obvious, and I've figured out the
meanings of fricVol, fricPos, etc., but the ones I'm not sure of
include fricCF (is this the throat transmission Cutoff
Frequency?), and fricBW (bandwidth??).
This is "fricative center frequency". Fricatives are well imitated
with a particular bandwidth and centre frequency (though real
fricatives are more complex if you get down to analyse them, but also
pretty variable). Thus a "sh" sound has a wide bandwidth and low CF
(2600 Hz and 2500 Hz respectively) whilst a "s" sound has a narrow
bandwidth and higher CF (500 Hz and 5500 Hz respectively). This kind
of information is in the posture data entries.
In the Monet documentation appendix, the BW and AX are listed, but
no explanation of what these mean that I can find. (I did find
explanation of qss, and duration parameters in transitions).
BW is "bandwidth" -- I am not sure of the context you had in mind.
AX is an old term that should have been updated. It dates back to
the days of Lawrence's "Parametric Artificial Talker" (or "PAT"),
probably the first fully functional formant-based synthesiser (like
the later MITalk and DECTalk) that toured the US in 1953 with its
British inventor. AX simply stands for "Larynx Amplitude" -- the A
being "amplitude" and the X standing for "larynx" -- more properly
referred to as glottis or vocal folds these days. Don't ask me to
justify it ;-) Computer memory was at a premium back in those days,
and PAT did not, at that time, even use a computer, but the
techniques of character saving spilled over, I suppose. AX stands
for the amplitude of the glottal waveform (but even there some
qualification is needed. Waveform "amplitude" can be specified as
peak amplitude, RMS amplitude, energy flow (power) ... We are using
peak amplitude. An energy measure might be better. This is like the
debate between VU readings and other measures of audio output for
audio equipment.
Please keep asking questions. I am very happy to supply answers, and
it will help me to see what is missing and steer me to producing a
document to fill in the gaps.
I'll check out what might be needed and make a start asap, but I am
currently trying to get a working version of the "Synthesizer"
application up (it is going quite well), and I also want to get a
"real-time Monet" working too.
Thanks for writing.
All good wishes.
david
----
David Hill
Imagination is more important than knowledge. (Albert Einstein)
Kill your television!