Re: [Accessibility] Can you help write a free version of HTK?

accessibility

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Can you help write a free version of HTK?

From:	Eric S. Johansson
Subject:	Re: [Accessibility] Can you help write a free version of HTK?
Date:	Fri, 09 Jul 2010 17:06:57 -0400
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1

I've been reading list for a while and bills posting finally prompts me tointroduce myself.

I am a former developer, injured in 1994 ago after 18 year software developmentcareer. I have been a successful user of speech recognition in writing butlittle else because of fundamental mismatches between speech recognition andcomputer interfaces. I was involved in programming by voice efforts throughroughly the early to thousands when I had organized a couple of workshops forlike-minded people to discuss programming by voice issues as well as a trainingsession at Dragon Systems for use of Joe Gould's natPython system. I've been astudent of speech user interfaces and an observer of how the general dictationvocabulary market collapsed from various Factors ranging from price erosionthrough false competition to monopolistic acquisition of competitors.

In the mid-to thousands, I was one of the founding members of the open-sourcespeech recognition initiative (nonprofit organization) and again observed itssubsequent failure because of a lack of resources (i.e. developers). In OSSRI,we had some seriously high level people involved ranging from computationallinguists to speech recognition engine designers (Sphinx 4) and cutting-edgeusers ( ISS and Mars mission applications).

The consensus of this August body was that all of the speech recognitiontoolkits out there (Julius HEK, Sphinx) were all designed to keep graduatestudents busy but not designed for use in the real world. I did take a look atSimon, it looks like it's the closest of the bunch but I estimate it somewherebetween 5 to 8 years away from being useful (i.e. on parity withNaturallySpeaking). Based on my experience in OSSRI, you could shorten thattimeline if you had around $10-$15 million to spend and pay for full-timedeveloper efforts but you're not looking at anything any faster than threeyears. Speech recognition is unbelievably hard problem that doesn't work verywell but works well enough to keep people trying. This is why there is little orno competition in the market (high-cost, low results)

it may seem like I'm trying to drag things down but, I mostly try to keep peoplefrom making the same old mistakes I've lived through multiple times in the past.What I believe is necessary to support disabled people is not going to bepleasant for those driven by OSS ideology. For example:


Handicap accessibility trumps politics.

If a disabled person is kept from working because of ideology, then the ideologyis wrong. I use NaturallySpeaking because for a fair number of tasks, it worksand works far better than typing. I'm not even going to try and open sourceequivalent because it's still too much work that burns my hands that I need touse on other tasks so I can feed myself (cooking and making money). If someonewas to tell me, they had a fully featured programming by voice package forthousand dollars complete with a restrictive license, I would use it without asecond thought except how I get the money. I wouldn't lose a second of sleepover the licensing as long as it let me make money to live.

From my perspective, OSS ideology blinds developers and organizations fromsolving the real problem, keeping disabled developers and others operatingcomputers at a level equivalent to TAB usability. This tells me that any OSSaccessibility interface should work from the application in towards theaccessibility tool. For example, any tools used to make applications accessibleshould be built first using existing core technology such as NaturallySpeaking.Developing recognition engines should be dead last because they have thesmallest impact on employability or usability.

We should be putting more effort into building appropriate speech level userinterfaces instead of replicating the same cruel mistakes and useless hacks ofthe past 15 to 20 years. instead of trying to get people to speak thekeyboard or build interfaces which have been proven to destroy people's voices,we should be spending our time looking at other solutions for enablingapplications without any application modifications or solving command discoveryproblems. Both of these solutions can reducing vocal and cognitive load which isa good thing. I've seen too many people try to use speech recognition ininappropriate ways (i.e. programming by voice using macros) end up doublydisabled both in the hands and the throat. Talk about well and truly screwed.

I've worked out a few models of how to produce better speech interfaces. Givenmy hands don't work well and I can't write code anymore, I have not been able toimplement prototypes. I'll spare description but only say that I have talkedabout them with people involved in the speech recognition world and gottendouble thumbs up on the ideas.

The current accessibility toolkits are doomed to fail because there is a 15ishyear history of that model failing. They count on application developers to dothings they have no financial interest in doing. In a speech recognition world,the number of applications explicitly integrated with NaturallySpeaking isvirtually unchanged since NaturallySpeaking version 4. The number ofincidentally integrated applications (through the use of "standard editcontrol") has dropped because there are more people using multiplatform toolkitsthat don't follow standard practices or use a standard edit controls. There isexactly one OSS application which was enabled for speech recognition but thathas fallen into disrepair because I've been told "it would encourage the use ofproprietary packages". nice way to treat the disabled.

I would like to see accessibility start focusing on the edges, tools wherepeople work. I used buzzword, a flash-based word processor, because it worksbetter, faster, with better recognition than any open source word processor.I'm even considering going back to Microsoft Word because that has specificallyis supported and enabled. Why not make something like OpenOffice or ABIwordwork with speech recognition because that lets people make an open-source choiceat a level that matters to them. All the other crap can come later once theyunderstand the benefits of open-source applications.

I also suggest looking to history. Look at all the things that have failedrepeatedly. I can give you a very long list that's very discouraging but thenice thing about the list is that it forces you to think different. Don't try toimpose a GUI interface on speech recognition. Build a user interface which hasdiscoverability. Don't try to force a disabled user to work on a single machine.Embrace the fact that your applications, data etc. run on a different machine.remember that with speech recognition, you don't need to just enter data, youalso need to edit it.

http://xvoice.sourceforge.net/

Xvoice was in fact used by programmers with typing impairments up
until the day IBM stopped selling licenses to ViaVoice for Linux.
When IBM did that, those programmers lost the ability to program by
voice natively in Linux.  IBM derailed programming by voice in Linux
for a decade, and we still have not recovered.  In case you didn't
know, Microsoft owns HTK, not Cambridge University.  So, every Linux
project that depends on HTK can be killed at any time by Microsoft.

That's not exactly what happened. In the first place, programming by voice isnever really been practical. Creating code by voice became more practical withthe voice coder project. Not wonderful but, better than straight dictationexcept it ruins your ability to dictate comments. IBM had nothing to ruiningyour ability to program by voice. It was that we couldn't get any attention byanyone in the open-source community to help us with the problem. We have asolution, it does some really nice things but I think the problem needs to besolved by going a different direction.

as a person that actually tried to use the IBM product, it was a stinking pileof crap that had a boatload of errors that IBM had no interest in fixing. When Iposted a list of failures, that message was censored from the list. I sent it toa bunch of people who asked questions, see seeing the list and the second timeit got through. As far as I'm concerned, it wasn't useful, it was a cruel jokethat it hundred hours of my life and my hands which was a loss I didn't need atthe time.

As for the whole HTK thing, I really don't care. I use NaturallySpeaking, ifnuance stops selling it, I can keep using my license. If anything gets in theway, I go to court to get a remedy. I suspect I would not be the only disabledperson working with the courts either. if you take the same approach to HTK(i.e. mirror in case of legal disaster), you can move on with your life and dealwith a problem when it comes up. I believe the courts look favorably oninnovative solutions that solve disability problems without impairing normalcommercial activity.

Also, I know people don't want to hear this but programming by voice isindependent of the speech recognition engine. If you build on top of thedragonfly SDK, you don't care if you are using Microsoft or nuance for yourspeech recognition engine. If you want to really support disabled people, helpbuild applications using dragonfly and once you solve the problem for disabledusers, then go build a speech recognition engine. Remember, handicapaccessibility trumps politics. If we can't work, it's bloody useless.

Because of the HTK license, Simon is not going to be fully integrated
into Vinux, or Ubuntu which is the upstream distro we test
technologies for.  Simon built on HTK can never be included Debian, or
Fedora.  In other words, Simon is dead, because of HTK.  Typing
impaired programmers around the world will not benefit from all the
hard work of either the Julius or Simon project.  If you can't tell,
this really pisses me off.


cool. But you're getting pissed off the wrong reason.

Fortunately, we can freely read the HTK source code, and can learn how
it works.  We can then go rewrite it, and hopefully do a better job.
I propose we start an open-source effort to do exactly that, in order
to enable Simon and other accessibility software to be freely used to
help typing impaired people.  There is already a similar effort under
way, with a proper license:

http://sourceforge.net/projects/ghmm/

Let let me reinforce. Typing impaired people (a really bad nomenclature sinceI'm also driving impaired, door opening impaired, preparing food impaired,hugging other people impaired...) don't care about licenses. They care aboutbeing able to participate online, work, write, etc. full native languagedictation is the most import feature. If you are speech recognition packagecan't be used to create a message like this e-mail, then you failed. Completelyand totally failed.

How about this. Let's start with something simple like fixing Emacs vr-mode sowe can use NaturallySpeaking with Emacs on multiple platforms. If you can't geta useful tool for disabled programmers working then something is seriously wrongand I don't believe you have the interests of disabled programmers in mind.Harsh words but right now, I can't use Emacs and I go to proprietary editorbecause that's the only choice I have if I'm going to work.

Maybe this other example might help. When the free software foundation forstarted up, Emacs ran on a bunch of proprietary platforms. It showed people thebenefits of open source. Then came a whole bunch of other components in the gnutool chain. Eventually, thanks to Linux, a TAB was able to use a completelyfree system or, a broadly functional not so free system. Right now, we are backat the beginning. We don't even have the basic Emacs equivalent in handicapaccessibility applications. Let's start with Emacs again and gradually addspeech recognition enhancements throughout the entire system.


--- eric

[Prev in Thread]

Current Thread

[Next in Thread]

[Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/09
- Re: [Accessibility] Can you help write a free version of HTK?, Eric S. Johansson <=
  - Re: [Accessibility] Can you help write a free version of HTK?, Jeremy Whiting, 2010/07/10
  - Re: [Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/12
    - Re: [Accessibility] Can you help write a free version of HTK?, Eric S. Johansson, 2010/07/12
    - Re: [Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/12
- [Accessibility] Re: Can you help write a free version of HTK?, grasch, 2010/07/11
  - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, Alexander Schliep, 2010/07/11
    - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, grasch, 2010/07/11
    - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, Thomas Harris, 2010/07/11
  - RE: [Accessibility] Re: Can you help write a free version of HTK?, Sina Bahram, 2010/07/11
  - [Accessibility] Re: Can you help write a free version of HTK?, Bill Cox, 2010/07/12

Prev by Date: Re: [Accessibility] Announcing BrailleBlaster
Next by Date: Re: [Accessibility] Announcing BrailleBlaster
Previous by thread: [Accessibility] Can you help write a free version of HTK?
Next by thread: Re: [Accessibility] Can you help write a free version of HTK?
Index(es):
- Date
- Thread