This page surveys the numerous freely-downloadable software programs for conducting acoustic analyses of speech recordings.
Highlighted resources:
Audacity (under 1st section), Praat, and Wavesurfer (both under 2nd section) are the most popular.
For online resources where you can learn more about acoustics, see the Acoustic Phonetics section on the Web Portal page.
|
+ Programs for handling recordings
- Programs for handling recordings
Audacity
-
Overview: Audacity is a free, open source, cross-platform software for recording and editing sounds.
For more information, see the Audacity homepage and the Wikipedia page.
-
Download:
Audacity can be downloaded from the project's SourceForge page
-
Documentation:
See Audacity's online Manual and Wiki.
You can also take a workshop on Audacity from Indiana University's IT Training department (or download the workshop's files and teach yourself).
-
Support:
See the help page on the Audacity site for the various available resources, including a forum and mailing lists.
|
|
+ Software with a graphical user interface
- Software with a graphical user interface
For information on the commercial program "Adobe Audition" (formerly Cool Edit Pro), see the download page for Creative Suite 6 Production Premium on IUware (Windows, Mac) and the workshops offered through IT Training.
Note that a comparison of the different software programs listed below is available from Peter Roach (University of Reading).
Praat
-
Overview:
Praat (the Dutch word for "talk") is the most widely used phonetics software program today, both for research and for pronunciation teaching.
See also the Wikipedia page.
-
Creators:
Paul Boersma and David Weenik at the University of Amsterdam
-
Download:
Praat is available for Windows, Mac, and Linux, among others.
The source code is also available.
-
Documentation:
A copy of the official Praat manual comes with an installation of the program and is also available online.
See also the Praat FAQ.
For information about the various algorithms implemented in Praat, see the list of publications on Praat.
Unofficial tutorials have been created by Will Styler (University of Colorado) and Sidney Wood (University of Lund).
-
Support:
Co-creator Paul Boersma moderates the Praat Users Group, a Yahoo! group where users can post questions.
-
Scripts:
The following websites serve as central repositories for the numerous Praat scripts available all around the web:
-
Plug-ins:
Plug-ins are special scripts that are embedded into the Praat user interface. (See the manual page for details.)
-
External tools:
Several programs have been developed for external programs to interface with Praat and its files.
- Praat Launcher: A Microsoft Excel add-in that launches Praat from Excel worksheets
- PraatR: An architecture for controlling Praat with R code
- textgridR: Read Praat TextGrid annotations into R so that they can be manipulated as R objects
- Phon: An external transcription program that can connect with Praat to import/export TextGrid annotations
- Quick Look Plugin for viewing Praat script files and TextGrids in Mac OS X
- Syntax highlighting for Praat's scripting language is available for Notepad++ (Windows) and for the Kate Editor (Linux)
-
Published reference:
Boersma (2001)
WaveSurfer
-
Overview:
WaveSurfer provides less overall functionality compared to the latest versions of Praat, but it is more user-friendly for beginners.
See also the Wikipedia page.
-
Creators:
Kåre Sjölander and Jonas Beskow at the Royal Institute of Technology in Stockholm, Sweden
-
Download:
The latest version of WaveSurfer can be downloaded from the WaveSurfer Sourceforge page.
-
Documentation:
Much information is still available at the old wavesurfer homepage, including a manual and a FAQ).
In addition, unofficial tutorials have been created by
Kyōko Nagao (now at University of Delaware), and ,
Bruce Hayes (University of California, Los Angeles).
-
Support:
Support is available either by posting to the forum or by submitting a ticket for a bug, support request, patch, or feature request
-
Published reference:
Sjölander & Beskow (2000).
EMU Speech Database System
Speech Filing System
-
Overview:
While less widely used, Speech Filing System (SFS) has a wide variety of speech signal processing capabilities.
Its smaller, more light-weight companion program is WASP (Waveforms, Annotations, Spectrograms, and Pitch).
-
Creators:
Originally the product of the collaborative effort of several researchers, the program is currently being developed and maintained by Mark Huckvale at University College London.
-
Download:
SFS is available from the SFS download page.
See also the installation instructions.
Alternatively, SFS-related files can be accessed at the SFS FTP site.
(See the readme.)
-
Documentation:
Several forms of documentation are available for SFS, most notably the manual, and FAQ.
Moreover, how-to guides are available for a number of specific tasks, like transcription/segmentation and formant extraction.
See also the introduction on how to use SFS on Windows.
-
Support:
Questions regarding SFS can be posted to the Speech Tools listserv, linked into the manual page.
-
Published reference:
Huckvale, Brookes, Dworkin, Johnson, Pearce, and Whitaker (1987).
|
|
+ Packages for scripting languages
- Packages for scripting languages
R
-
Overview:
R is free software programming language and software environment for statistical computing and graphics.
Further details can be found at the R homepage and its Wikipedia page.
See the Statistics page for ways to learn R at IU.
-
Official releases: R has several packages that are useful for phonetics:
-
phontools:
("Functions for phonetics in R")
-
phonR:
("R tools for phoneticians and phonologists")
-
vowels:
("Vowel manipulation, normalization, and plotting")
-
tuneR:
("Analysis of music and speech")
-
signal:
("Signal processing")
-
seewave:
("Sound analysis and synthesis")
-
audio:
("Audio interface for R")
-
Interfaces: R can control many of the other phonetics software programs described on this page:
MATLAB / GNU Octave
-
Overview:
MATLAB is a numerical computing environment and programming language sold by MathWorks.
Its open-source alternative is GNU Octave.
Further details can be found at the Wikipedia pages for MATLAB and for GNU Octave.
See also the list of resources on MATLAB prepared by IU's Research Analytics Group, especially the 'Getting Started' tutorial.
IU students can use MATLAB for free in the Student Technology Center (STC) labs on campus, and graduate-level courses on MATLAB are available in various departments (e.g. VSCI-V 768, PSY-P 657, or GEOG-G 577).
-
Official releases:
Of MathWork's official toolboxes, a useful one for phonetics is the Signal Processing Toolbox.
In addition, many phonetics-related scripts are shared on the MATLAB File Exchange.
-
External resources:
|
|
+ Libraries for advanced programming
- Libraries for advanced programming
Snack Sound Toolkit
-
Overview:
Snack is one of the most widely used libraries for signal processing, especially for speech applications in computer science and engineering.
For details on its functionality, see the Snack homepage and the Wikipedia page.
-
Creator: Kåre Sjölander at the Royal Institute of Technology in Stockholm, Sweden
-
Download: See the Snack download page.
-
Documentation: Separate documentation is available for each of the different versions of Snack:
-
Support:
See the official FAQ.
-
Published reference:
Sjölander & Beskow (2000) describes how Snack is used for the core functionality of WaveSurfer.
Entropic Signal Processing System (ESPS)
-
History: Originally a commercial product of Entropic Research Laboratory, Inc. (commonly abbreviated 'Entropic'), the following bundle of software was extremely popular in phonetics labs around the world in the 1990s:
- Entropic Signal Processing System (ESPS): a collection of over 200 UNIX commands and programming libraries (written in C) for speech signal processing
- waves+: a set of functions for interactive data visualization and manipulation of data processed by ESPS. The program that forms the core of the waves+ package is known as 'xwaves'.
- EnSig: A higher-level graphical user interface for setting waves+ preferences and behavior with interactive prompts
After Entropic was acquired by Microsoft in 1999, the rights to the final legacy version of the ESPS source code (i.e. #1 above, but not #2 or #3) was donated to the Center for Speech Technology (Centrum för TalTeknologi, CTT) in the Department of Speech, Music, and Hearing (Tal musik och hörsel, TMH) at the Royal Institute of Technology (Kungliga Tekniska högskolan, KTH) in Stockholm, Sweden. This code was incorporated into the Snack Sound Toolkit. (See above.)
-
Download: The ESPS source code can be obtained in any the following ways:
-
Documentation:
The full manual is available from Roberto Togneri (University of Western Australia).
Edinburgh Speech Tools Library
Speech Signal Processing Toolkit
-
Overview:
The Speech Signal Processing Toolkit (SPTK) is a set of speech signal processing tools for Unix environments.
-
Creators:
SPTK working group at Tokyo Institute of Technology and Nagoya Institute of Technology (See the readme.)
-
Download:
SPTK can be downloaded from the project's SourceForge page.
-
Documentation:
A link to the SPTK manual can be found on the SPTK home page.
Advanced Speech Signal Processor
-
Overview:
The Advanced Speech Signal Processor (ASSP) library provides functionality for handling speech signal files - both as a command-line tool by itself ('libassp') and with a Tcl extension ('tclassp').
-
Creators:
Lasse Bombien (Ludwig Maximilian University of Munich) and Michel Scheffers (Christian Albert University of Kiel)
-
Download:
Both libassp and tclassp can be downloaded from the project's SourceForge page.
-
Documentation:
A manual is available for tclassp.
-
Support:
Bugs and feature requests can be submitted on the project's SourceForge page. See also the mailing list.
|