[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Accessibility] Can you help write a free version of HTK?
From: |
Eric S. Johansson |
Subject: |
Re: [Accessibility] Can you help write a free version of HTK? |
Date: |
Fri, 09 Jul 2010 17:06:57 -0400 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1 |
I've been reading list for a while and bills posting finally prompts me to
introduce myself.
I am a former developer, injured in 1994 ago after 18 year software development
career. I have been a successful user of speech recognition in writing but
little else because of fundamental mismatches between speech recognition and
computer interfaces. I was involved in programming by voice efforts through
roughly the early to thousands when I had organized a couple of workshops for
like-minded people to discuss programming by voice issues as well as a training
session at Dragon Systems for use of Joe Gould's natPython system. I've been a
student of speech user interfaces and an observer of how the general dictation
vocabulary market collapsed from various Factors ranging from price erosion
through false competition to monopolistic acquisition of competitors.
In the mid-to thousands, I was one of the founding members of the open-source
speech recognition initiative (nonprofit organization) and again observed its
subsequent failure because of a lack of resources (i.e. developers). In OSSRI,
we had some seriously high level people involved ranging from computational
linguists to speech recognition engine designers (Sphinx 4) and cutting-edge
users ( ISS and Mars mission applications).
The consensus of this August body was that all of the speech recognition
toolkits out there (Julius HEK, Sphinx) were all designed to keep graduate
students busy but not designed for use in the real world. I did take a look at
Simon, it looks like it's the closest of the bunch but I estimate it somewhere
between 5 to 8 years away from being useful (i.e. on parity with
NaturallySpeaking). Based on my experience in OSSRI, you could shorten that
timeline if you had around $10-$15 million to spend and pay for full-time
developer efforts but you're not looking at anything any faster than three
years. Speech recognition is unbelievably hard problem that doesn't work very
well but works well enough to keep people trying. This is why there is little or
no competition in the market (high-cost, low results)
it may seem like I'm trying to drag things down but, I mostly try to keep people
from making the same old mistakes I've lived through multiple times in the past.
What I believe is necessary to support disabled people is not going to be
pleasant for those driven by OSS ideology. For example:
Handicap accessibility trumps politics.
If a disabled person is kept from working because of ideology, then the ideology
is wrong. I use NaturallySpeaking because for a fair number of tasks, it works
and works far better than typing. I'm not even going to try and open source
equivalent because it's still too much work that burns my hands that I need to
use on other tasks so I can feed myself (cooking and making money). If someone
was to tell me, they had a fully featured programming by voice package for
thousand dollars complete with a restrictive license, I would use it without a
second thought except how I get the money. I wouldn't lose a second of sleep
over the licensing as long as it let me make money to live.
From my perspective, OSS ideology blinds developers and organizations from
solving the real problem, keeping disabled developers and others operating
computers at a level equivalent to TAB usability. This tells me that any OSS
accessibility interface should work from the application in towards the
accessibility tool. For example, any tools used to make applications accessible
should be built first using existing core technology such as NaturallySpeaking.
Developing recognition engines should be dead last because they have the
smallest impact on employability or usability.
We should be putting more effort into building appropriate speech level user
interfaces instead of replicating the same cruel mistakes and useless hacks of
the past 15 to 20 years. instead of trying to get people to speak the
keyboard or build interfaces which have been proven to destroy people's voices,
we should be spending our time looking at other solutions for enabling
applications without any application modifications or solving command discovery
problems. Both of these solutions can reducing vocal and cognitive load which is
a good thing. I've seen too many people try to use speech recognition in
inappropriate ways (i.e. programming by voice using macros) end up doubly
disabled both in the hands and the throat. Talk about well and truly screwed.
I've worked out a few models of how to produce better speech interfaces. Given
my hands don't work well and I can't write code anymore, I have not been able to
implement prototypes. I'll spare description but only say that I have talked
about them with people involved in the speech recognition world and gotten
double thumbs up on the ideas.
The current accessibility toolkits are doomed to fail because there is a 15ish
year history of that model failing. They count on application developers to do
things they have no financial interest in doing. In a speech recognition world,
the number of applications explicitly integrated with NaturallySpeaking is
virtually unchanged since NaturallySpeaking version 4. The number of
incidentally integrated applications (through the use of "standard edit
control") has dropped because there are more people using multiplatform toolkits
that don't follow standard practices or use a standard edit controls. There is
exactly one OSS application which was enabled for speech recognition but that
has fallen into disrepair because I've been told "it would encourage the use of
proprietary packages". nice way to treat the disabled.
I would like to see accessibility start focusing on the edges, tools where
people work. I used buzzword, a flash-based word processor, because it works
better, faster, with better recognition than any open source word processor.
I'm even considering going back to Microsoft Word because that has specifically
is supported and enabled. Why not make something like OpenOffice or ABIword
work with speech recognition because that lets people make an open-source choice
at a level that matters to them. All the other crap can come later once they
understand the benefits of open-source applications.
I also suggest looking to history. Look at all the things that have failed
repeatedly. I can give you a very long list that's very discouraging but the
nice thing about the list is that it forces you to think different. Don't try to
impose a GUI interface on speech recognition. Build a user interface which has
discoverability. Don't try to force a disabled user to work on a single machine.
Embrace the fact that your applications, data etc. run on a different machine.
remember that with speech recognition, you don't need to just enter data, you
also need to edit it.
http://xvoice.sourceforge.net/
Xvoice was in fact used by programmers with typing impairments up
until the day IBM stopped selling licenses to ViaVoice for Linux.
When IBM did that, those programmers lost the ability to program by
voice natively in Linux. IBM derailed programming by voice in Linux
for a decade, and we still have not recovered. In case you didn't
know, Microsoft owns HTK, not Cambridge University. So, every Linux
project that depends on HTK can be killed at any time by Microsoft.
That's not exactly what happened. In the first place, programming by voice is
never really been practical. Creating code by voice became more practical with
the voice coder project. Not wonderful but, better than straight dictation
except it ruins your ability to dictate comments. IBM had nothing to ruining
your ability to program by voice. It was that we couldn't get any attention by
anyone in the open-source community to help us with the problem. We have a
solution, it does some really nice things but I think the problem needs to be
solved by going a different direction.
as a person that actually tried to use the IBM product, it was a stinking pile
of crap that had a boatload of errors that IBM had no interest in fixing. When I
posted a list of failures, that message was censored from the list. I sent it to
a bunch of people who asked questions, see seeing the list and the second time
it got through. As far as I'm concerned, it wasn't useful, it was a cruel joke
that it hundred hours of my life and my hands which was a loss I didn't need at
the time.
As for the whole HTK thing, I really don't care. I use NaturallySpeaking, if
nuance stops selling it, I can keep using my license. If anything gets in the
way, I go to court to get a remedy. I suspect I would not be the only disabled
person working with the courts either. if you take the same approach to HTK
(i.e. mirror in case of legal disaster), you can move on with your life and deal
with a problem when it comes up. I believe the courts look favorably on
innovative solutions that solve disability problems without impairing normal
commercial activity.
Also, I know people don't want to hear this but programming by voice is
independent of the speech recognition engine. If you build on top of the
dragonfly SDK, you don't care if you are using Microsoft or nuance for your
speech recognition engine. If you want to really support disabled people, help
build applications using dragonfly and once you solve the problem for disabled
users, then go build a speech recognition engine. Remember, handicap
accessibility trumps politics. If we can't work, it's bloody useless.
Because of the HTK license, Simon is not going to be fully integrated
into Vinux, or Ubuntu which is the upstream distro we test
technologies for. Simon built on HTK can never be included Debian, or
Fedora. In other words, Simon is dead, because of HTK. Typing
impaired programmers around the world will not benefit from all the
hard work of either the Julius or Simon project. If you can't tell,
this really pisses me off.
cool. But you're getting pissed off the wrong reason.
Fortunately, we can freely read the HTK source code, and can learn how
it works. We can then go rewrite it, and hopefully do a better job.
I propose we start an open-source effort to do exactly that, in order
to enable Simon and other accessibility software to be freely used to
help typing impaired people. There is already a similar effort under
way, with a proper license:
http://sourceforge.net/projects/ghmm/
Let let me reinforce. Typing impaired people (a really bad nomenclature since
I'm also driving impaired, door opening impaired, preparing food impaired,
hugging other people impaired...) don't care about licenses. They care about
being able to participate online, work, write, etc. full native language
dictation is the most import feature. If you are speech recognition package
can't be used to create a message like this e-mail, then you failed. Completely
and totally failed.
How about this. Let's start with something simple like fixing Emacs vr-mode so
we can use NaturallySpeaking with Emacs on multiple platforms. If you can't get
a useful tool for disabled programmers working then something is seriously wrong
and I don't believe you have the interests of disabled programmers in mind.
Harsh words but right now, I can't use Emacs and I go to proprietary editor
because that's the only choice I have if I'm going to work.
Maybe this other example might help. When the free software foundation for
started up, Emacs ran on a bunch of proprietary platforms. It showed people the
benefits of open source. Then came a whole bunch of other components in the gnu
tool chain. Eventually, thanks to Linux, a TAB was able to use a completely
free system or, a broadly functional not so free system. Right now, we are back
at the beginning. We don't even have the basic Emacs equivalent in handicap
accessibility applications. Let's start with Emacs again and gradually add
speech recognition enhancements throughout the entire system.
--- eric