master's thesis
Real time speech recognition based on FPGA

Matija Labak (2016)
Sveučilište Josipa Jurja Strossmayera u Osijeku
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek
Zavod za računalno inženjerstvo i automatiku
Katedra za računalno inženjerstvo
Metadata
TitlePrepoznavanje govora u stvarnom vremenu pomoću FPGA
AuthorMatija Labak
Mentor(s)Željko Hocenski (thesis advisor)
Tomislav Matić (thesis advisor)
Abstract
U ovom radu je dan pregled povijesti razvitka sustava za prepoznavanje govora. Opisani su osnovni algoritmi koje sustavi za prepoznavanje govora koriste. Objašnjeno je što su to značajke govornog signala te kako se stvaraju. Sustavi za prepoznavanje govora koriste algoritme odabira jezičnih jedinica. Dinamičko savijanje vremena je algoritam u kojem se govorni signal uspoređuje s drugim govornim signalom dinamičkim savijanjem značajki signala po vremenskoj osi. Umjetne neuronske mreže su alat koji se primjenjuje na problem pretraživanja uzoraka, pa tako i na problem prepoznavanja govora. Skriveni Markovljevi modeli uspješno modeliraju razne jezične jedinice te su jedan od najčešćih algoritama koje se koriste u sustavima za prepoznavanje govora. Dubinske neuronske mreže se koriste u sustavima s velikim zahtjevima. Za implementaciju je korišten Altium NanoBoard 3000 sustav koji je razvijan u programu Altium Designer. Razvijen je ugrađeni računalni sustav koji se temelji na TSK3000A procesoru, a uključuje audio jedinicu koja služi za unos zvuka s audio ulaza ploče. Sustav koji je implementiran se temelji na linearno-prediktivnoj analizi. Linearno-preditkivnom analizom nastaju LPC koeficijenti koji u konačnici služe za usporedbu s usrednjenim značajkama u sustavu za prepoznavanju glasova hrvatskog jezika. Provedeno je testiranje napravljenog sustava za prepoznavanje glasova hrvatskog jezika. Točnost sustava za cjelokupno proveden eksperiment iznosi 44%. Za snimke po kojima su napravljene referente usrednjene značajke sustava točnost iznosi 93%. Mjerenje vremena proračuna značajki je potvrdilo da sustav u trenutnom stanju ne može ispuniti vremenski zahtjev sustava za proračunom u stvarnom vremenu.
Keywordsautomatic speech recognition Altium NanoBoard FPGA voice signal features linear predictive analysis Croatian phone
Parallel title (English)Real time speech recognition based on FPGA
Committee MembersŽeljko Hocenski (committee chairperson)
Ivan Aleksi (committee member)
Tomislav Matić (committee member)
GranterSveučilište Josipa Jurja Strossmayera u Osijeku
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek
Lower level organizational unitsZavod za računalno inženjerstvo i automatiku
Katedra za računalno inženjerstvo
PlaceOsijek
StateCroatia
Scientific field, discipline, subdisciplineTECHNICAL SCIENCES
Electrical Engineering
Telecommunications and Informatics
Study programme typeuniversity
Study levelgraduate
Study programmeGraduate University Study Programme in Electrical Engineering; specializations in: Communications and Informatics, Power Engineering
Study specializationCommunications and Informatics
Academic title abbreviationmag.ing.el.
Genremaster's thesis
Language Croatian
Defense date2016-07-19
Parallel abstract (English)
This thesis contains historical development overview of voice recognition systems. Basic algorithms that are used in voice recognition systems are described. Voice signal features are explained as much as process of their creation. Speech recognition systems are using algorithms for selection of language units. Dynamic time warping is algorithm in wich one speech singal is compared to another speech signal by dynamically warping characteristics of signal in time domain. Artificial neural networks are tool that is used for solving pattern comparison problems, wich means that they are also used for speech recognition purposes. Hidden Markov models are succesfully used for modeling different kinds of language units so they are one of the most frequently used algorithms in speech recognition systems. Deep neural networks are used in 77 systems with large requirements. Implementation is designed for Altium NanoBoard 3000 system in Altium Designer software. Embedded computer system that is based on TSK3000A processor has been developed. System also utilizes audio unit that is used for sound recording from audio input located on the board. Implemented system is based on linear predictive analysis. Results of linear predictive analysis are LPC coefficients that are used for comparisson with mean features in the system for recognition of Croatian phones. System for recogniton of Croatian phones has been tested. Precision of the system is 44% for every recording of individual phone. For recordings that are used for generation of mean features precision is 93%. It is confirmed by measuring feature extraction time that current system is inadequate for requirement of real time calculation
Parallel keywords (Croatian)automatsko prepoznavanje govora Altium NanoBoard FPGA značajke signala govora linerano-preditkivna analiza glasovi hrvatskog jezika
Resource typetext
Access conditionOpen access
Terms of usehttp://rightsstatements.org/vocab/InC/1.0/
URN:NBNhttps://urn.nsk.hr/urn:nbn:hr:200:200954
CommitterAnka Ovničević