On 01/04/2016 00:03, David Hill wrote:
Dear Alex,
I have interpolated my reply in your email, in what follows.
On Mar 30, 2016, at 16:30 09PM, Farlie A wrote:
On 30/03/2016 21:14, David
Hill wrote:
The platform independent gnuspeechsa
does not yet incorporate the Monet facility though
I believe Marcelo is working on that aspect,
judging by some of the image material he has
previewed to me.
Thanks.
In order to get different accents, intonation
and rhythm, as required for your examples, you may
have to get involved in significant manual work,
modifying the databases. For intonation, you'd
have to create the required intonation contour
manually.
Hmm, and as I am not a speech professional, this may be
beyond my level of expertise, other than marking notes in
the script. as to intonation intent. Your note about
adding tonic feet below is something I was missing.
Something else that will need to be worked out is how to
translate between Gnuspeech's phoneme names and E-Speak's
one (based on the Kirshenbaum encoding. (see my other
recent e-mail).
I think that would be a bad idea. The gnuspeech phonetic
representation is well descibed in the Monet manual. You
shouldn't need to arbitrarily change the set of phonetic
symbols. That is likely to cause problems and seems
pointless. The input is punctuated, plain English text. If
you want to modify the phonetic script produced, learn the
symbols. They are very intuitive and are documented in the
manual.
My reasons for the comment, were to do with the fact that E-Speak
(and E-SpeakNG) have some interface code which allows their use with
Microsofts's SAPI Speech API, in it's Windows port, making the voice
potentially callable from any application which supports SAPI. A
better ideas might be to encourage the Espeak NG developers to also
contribute to gnuspeech's development, effectively adding a similar
SAPI->Gnuspeech bridge. :)
In order to make the process easier and less
trouble, the user and application dictionaries
should be added and made usable. Then particular
dictionaries (a lot smaller than the main
dictionary) could be set up for particular
dialogue and accent requirements.
Hmm... Would consideration some kind of Unintophonic(
Universal intonation phonetic encoding) to represent both
sounds and intonation intent? An older speech synth
program I found called Superior Speech! ( running under
RISC OS 3 years ago) , allowed for at least 8 different
(albeit fixed) intonation pitches on individual phonemes
as well as some more advanced features for "singing"
phonemes at specific notes ( something which I understand
is an area of current research by others. ). There are
some possible encodings like XSAMPA which incorporate
intonation advice. MBROLA (which is non-free) stores
intonation data in a format which deals at a much lower
level so it is possible to do much more finely tuned
intonation contouring, if I understand what that means
correctly. ( Thought: If there was a way to add MBROLA's
PHO style data to GNUspeech/Espeak input files....
hmmm...)
You really need to read the Monet manual. I have just
updated my university web site, specifically the page
accessed through the left-hand menu selection "Gnuspeech
material", to include both the TRAcT and Monet manuals,
together with precompiled versions of both TRAcT and Monet.
Monet needs Mac OS X 10.10.x or better to run; TRAcT will
run on OS X 10.6 or higher. On that same page, in the list
of papers relevant to Gnuspeech, there's also a new
historical view of the work on intonation and rhythm that
may be on interest (the first paper in the list) and there's
access to the early data on which the rhythm model was based
(the last item, which is a report that was present to an
Acoustical Society of America conference in 1977).
There are a whole bunch more papers, less specific to
Gnuspeech, but undoubtedly some of interest, under the lefty
menu selection "Published papers" which takes you to a new
main page.
Duly bookmarked your page.
The cut-in and phrase echoing would have to be
done by synthesising the cut-in phrase and then
mixing, or possibly in the future by having two
copies on Monet running.
That's what I thought the current situation was likely to
need. However for audio-drama this is less of an issue
given that ihe generated speech audio will probably be
edited together in a non-linear way anyway. Marking the
cut-in's then become a partioning(?) issue during the
lexical parsing(?) and timecoding in any automated
scripts that would generate to ressamble the audio output.
Muse ( http://www.muse-sequencer.org/)
is certainly scriptable, and depending on programmer
interest, it looks possible that a future gnuspeech might
be able to pipe output directly into the tool via various
Audio interfaces like LV2, JACk etc... Granted that
'scripted' semi-automated editing for cues is outside your
area of focus on the speech generation portion.
Having access to the source code and databases,
you could in principle create any facilities you
needed to facilitate the kinds of dramatic
dialogue for which you are looking. Do you have a
programmer with whom you could work? It would
amount to creating a "dramatic dialogue"
application, based on gnuspeech.
I don't yet, but was considering asking around on
projects like Wikipedia/Wikisource/Wikiversity, given that
certain aspects of it are quite broad.
You could put out a request on the gnuspeech list
(address in the "Copy" field of this email). People reading
the list are quite likely to be interested.
I will consider that.
On a different but related topic... from some of your
papers you built an approximate Tract model. This is
presumably flexible enough to cope with most human
charcteristics (including voices that "Sound like that guy
from the Trailer, that's been smoking since he was old
enough to buy them."
(another 'staged' voice type I will add to my earlier
examples of vocal types.).
If you read the Monet manual, you'll find that there are
various controls to change various aspect of the voice --
and yes, they include changing the settings for the tube
resonance model (TRM). You can investigate the quality
directly by using the TRAcT application to play with the
TRM, but it isn't dynamic. Monet is dynamic and can speak
and has an equivalent bunch of controls.
I'll have a read over the manual.
Humanoid like aliens, are also a possibility, The
so-termed Nordic types would probably have a voice closest
to human (from a tract model perspective, based on
internet accounts of alleged encounters), "Stage" aliens
such as in old radio/TV are from what I recall mostly
accented human langauge albiet with much modified grammar
or intonational rythm. On the other hand you may have
aliens that have "clicks" in their language (not sure of
what these are called in speech/IPA terms) in additional
to tonal and noise based phonemes.
Clicks are not yet in the repertoire! You'd have to
generate them somewhere else, for now, and edit them in. :-(
Noted. I'm not aware of many human languages that have them, so
editing them in manually seems fair. (In editing together an audio
drama, I'd evenntually be adding in other non-human SFx anyway. )
As you say I need to make some experiments when I am able to get
hold the relevant platform's hardware.
Alex Farlie.
This email has been sent from a virus-free computer protected by Avast. www.avast.com
|
|