Acoustics - Indiana University Phonetics Lab

This page surveys the numerous freely-downloadable software programs for conducting acoustic analyses of speech recordings.

Highlighted resources: Audacity (under 1st section), Praat, and Wavesurfer (both under 2nd section) are the most popular.

For online resources where you can learn more about acoustics, see the Acoustic Phonetics section on the Web Portal page.

Programs for handling recordings

Audacity

Overview: Audacity is a free, open source, cross-platform software for recording and editing sounds. For more information, see the Audacity homepage and the Wikipedia page.
Download: Audacity can be downloaded from the project's SourceForge page
Documentation: See Audacity's online Manual and Wiki. You can also take a workshop on Audacity from Indiana University's IT Training department (or download the workshop's files and teach yourself).
Support: See the help page on the Audacity site for the various available resources, including a forum and mailing lists.

Miscellaneous

To convert audio files (format, sampling rate, etc.): Sound eXchange (SoX)
To quickly browse the contents of recordings: BROWSE
To add event markers within a soundfile: AUDINDEX
To concatenate multiple WAV files together: WaveCat
To transcribe speech in a recording, many tools are available, cf. Phon or the list on Digital Resource Tools Directory.

Software with a graphical user interface

For information on the commercial program "Adobe Audition" (formerly Cool Edit Pro), see the download page for Creative Suite 6 Production Premium on IUware (Windows, Mac) and the workshops offered through IT Training.

Note that a comparison of the different software programs listed below is available from Peter Roach (University of Reading).

Praat

Overview: Praat (the Dutch word for "talk") is the most widely used phonetics software program today, both for research and for pronunciation teaching. See also the Wikipedia page.
Creators: Paul Boersma and David Weenik at the University of Amsterdam
Download: Praat is available for Windows, Mac, and Linux, among others. The source code is also available.
Documentation: A copy of the official Praat manual comes with an installation of the program and is also available online. See also the Praat FAQ. For information about the various algorithms implemented in Praat, see the list of publications on Praat. Unofficial tutorials have been created by Will Styler (University of Colorado) and Sidney Wood (University of Lund).
Support: Co-creator Paul Boersma moderates the Praat Users Group, a Yahoo! group where users can post questions.
Scripts: The following websites serve as central repositories for the numerous Praat scripts available all around the web:

Speech Corpus Toolkit from Mietta Lennes (University of Helsinki)
Praat Script Archives from Joe Toscano (University of Illinois at Urbana-Champaign)
Praat Script Resources from the Phonetics Lab at the University of California, Los Angeles
Praat Resources from Die Praatpfanne

Plug-ins: Plug-ins are special scripts that are embedded into the Praat user interface. (See the manual page for details.)

Bartlomiej Plichta (University of Minnesota): Akustyk (vowel analysis software package)
Ramon Miret Corretgé (University of Barcelona): Praat Vocal Toolkit (Automated scripts for voice processing)
Jean-Philippe Goldman (University of Geneva): EasyAlign, Inter-rater agreement, File tools
Volker Dellwo (University of Zurich): Duration Analyzer, Sentence Presenter, Sasasa Delexicalizer, CV Tier Creator, etc.

External tools: Several programs have been developed for external programs to interface with Praat and its files.

Praat Launcher: A Microsoft Excel add-in that launches Praat from Excel worksheets
PraatR: An architecture for controlling Praat with R code
textgridR: Read Praat TextGrid annotations into R so that they can be manipulated as R objects
Phon: An external transcription program that can connect with Praat to import/export TextGrid annotations
Quick Look Plugin for viewing Praat script files and TextGrids in Mac OS X
Syntax highlighting for Praat's scripting language is available for Notepad++ (Windows) and for the Kate Editor (Linux)

Published reference: Boersma (2001)

WaveSurfer

Overview: WaveSurfer provides less overall functionality compared to the latest versions of Praat, but it is more user-friendly for beginners. See also the Wikipedia page.
Creators: Kåre Sjölander and Jonas Beskow at the Royal Institute of Technology in Stockholm, Sweden
Download: The latest version of WaveSurfer can be downloaded from the WaveSurfer Sourceforge page.
Documentation: Much information is still available at the old wavesurfer homepage, including a manual and a FAQ). In addition, unofficial tutorials have been created by Kyōko Nagao (now at University of Delaware), and , Bruce Hayes (University of California, Los Angeles).
Support: Support is available either by posting to the forum or by submitting a ticket for a bug, support request, patch, or feature request
Published reference: Sjölander & Beskow (2000).

EMU Speech Database System

Overview: EMU is a software program for creating and querying hierarhically-structured annotations of speech databases. EMU is available both as a downloadable software program and as a browser application.
Creators: Steve Cassidy (Macquarie University) and Jonathan Harrington (Ludwig Maximilian University of Munich)
Download/Access:
- The EMU software program can be downloaded for Windows, Mac, or Linux at the project's SourceForge page. Installation instructions are available under the 'Downloads' and 'FAQ' tabs on the left side of the EMU home page.
- The EMU browser application can be accessed from an Internet browser at the EMU-webApp page.
Documentation:
- The most in-depth documentation is in the section on the EMU core from the official EMU manual.
- Additional documentation can be found in Harrington's 2010 monograph Phonetic Analysis of Speech Corpora. [IUCAT]
- Video tutorials are on the book's website and under the 'Documentation' tab on the left side of the EMU home page.
Support:
- Posting a question to one of the forums
- Submitting a ticket for a bug, support request, patch, or feature request
- Searching the archives of, and/or posting a question to, the EMU mailing lists
Published references: Cassidy & Harrington (1996), Cassidy & Harrington (2001), Bombien et al. (2006), Williams (2008) (See also the 'Publications' tab on the left side of the EMU home page.)

Speech Filing System

Overview: While less widely used, Speech Filing System (SFS) has a wide variety of speech signal processing capabilities. Its smaller, more light-weight companion program is WASP (Waveforms, Annotations, Spectrograms, and Pitch).
Creators: Originally the product of the collaborative effort of several researchers, the program is currently being developed and maintained by Mark Huckvale at University College London.
Download: SFS is available from the SFS download page. See also the installation instructions. Alternatively, SFS-related files can be accessed at the SFS FTP site. (See the readme.)
Documentation: Several forms of documentation are available for SFS, most notably the manual, and FAQ. Moreover, how-to guides are available for a number of specific tasks, like transcription/segmentation and formant extraction. See also the introduction on how to use SFS on Windows.
Support: Questions regarding SFS can be posted to the Speech Tools listserv, linked into the manual page.
Published reference: Huckvale, Brookes, Dworkin, Johnson, Pearce, and Whitaker (1987).

Speech Analyzer

Overview: Speech Analyzer is lightweight and easy to use, hence it is ideal for beginners. While it can only perform basic signal processing functions by itself, plug-in extensions are available, e.g. for plotting formant scatterplots. For further information, see the Speech Analyzer homepage and its software catalog entry.
Creator: Summer Institute of Linguistics (SIL) International
Download: See the Speech Analyzer download page or download as part of the Speech Tools software bundle.
Documentation: See the 'Help' menu inside the program or this SlideShare tutorial from Cristina Golzález Rico.
Support: See the Speech Analyzer technical support page for an e-mail address to send bug reports.

Packages for scripting languages

R

Overview: R is free software programming language and software environment for statistical computing and graphics. Further details can be found at the R homepage and its Wikipedia page. See the Statistics page for ways to learn R at IU.
Official releases: R has several packages that are useful for phonetics:
- phontools: ("Functions for phonetics in R")
- phonR: ("R tools for phoneticians and phonologists")
- vowels: ("Vowel manipulation, normalization, and plotting")
- tuneR: ("Analysis of music and speech")
- signal: ("Signal processing")
- seewave: ("Sound analysis and synthesis")
- audio: ("Audio interface for R")
Interfaces: R can control many of the other phonetics software programs described on this page:
- Praat: With PraatR, you can execute most commands on most of Praat's object types, thus control Praat with R code. With textgridR, you can read Praat TextGrid annotations into R so that they can be manipulated as R objects.
- EMU: See the tab for the package 'Emu/R' on the left side of the home page for EMU. Walk-through tutorials are available in Harrington (2010) [IUCAT] and the book's companion website.
- ASSP: R can access the functionality of libassp through a wrapper program known as wrassp.
- Others: Use system() on a compiled program (e.g. Snack Sound Toolkit or a component of Speech Filing System)

Python

Overview: Python is a widely used general-purpose, high-level programming language. Further details can be found at the Python homepage and its Wikipedia page. Note you can learn Python at Indiana University either in a workshop from IT Training or in a class from the Computational Linguistics program (in particular "Programming for Computational Linguists").
Official releases:
- scikits.talkbox ("Talkbox: A set of python modules for speech/signal processing")
- pyssp ("Python speech signal processing library for education")
- voicing ("Speech signal processing libraries and scripts"; One component of Speech Research Tools)
External resources:
- Penn Phonetics Toolkit: A collection of Python and Praat scripts and other tools to aid speech research
- OpenSauce-Python: A Python port of VoiceSauce / OpenSauce
- pyo: Python digital signal processing module
- There is also a version of the Snack Sound Toolkit for Python. For details, see the Snack section on this page.

MATLAB / GNU Octave

Overview: MATLAB is a numerical computing environment and programming language sold by MathWorks. Its open-source alternative is GNU Octave. Further details can be found at the Wikipedia pages for MATLAB and for GNU Octave. See also the list of resources on MATLAB prepared by IU's Research Analytics Group, especially the 'Getting Started' tutorial. IU students can use MATLAB for free in the Student Technology Center (STC) labs on campus, and graduate-level courses on MATLAB are available in various departments (e.g. VSCI-V 768, PSY-P 657, or GEOG-G 577).
Official releases: Of MathWork's official toolboxes, a useful one for phonetics is the Signal Processing Toolbox. In addition, many phonetics-related scripts are shared on the MATLAB File Exchange.
External resources:
- VoiceSauce, a program for voice analysis, from the University of California, Los Angeles (For Octave, see OpenSauce.)
- Voicebox, a speech processing toolbox for MATLAB, from Mike Brookes (Imperial College)
- COLEA, a MATLAB software tool for speech analysis, from Philip Loizou (University of Texas at Dallas)
- Speech and Audio Processing Toolbox from Roger Jang (National Taiwan University)
- MATLAB API to Speech Filing System from Mark Huckvale (University College London)
- MATLAB realtime speech tools by Hideki Kawahara (Wakayama University)

Libraries for advanced programming

Snack Sound Toolkit

Overview: Snack is one of the most widely used libraries for signal processing, especially for speech applications in computer science and engineering. For details on its functionality, see the Snack homepage and the Wikipedia page.
Creator: Kåre Sjölander at the Royal Institute of Technology in Stockholm, Sweden
Download: See the Snack download page.
Documentation: Separate documentation is available for each of the different versions of Snack:
- Tcl: Installation notes; Tutorial; Manual; Example scripts; Web applications
- Python: Installation notes; Manual
- Ruby: An unofficial version of Snack for Ruby has been written by Steve Legrand.
- C/C++: Extensions to Snack can be written using the Snack C library.
Support: See the official FAQ.
Published reference: Sjölander & Beskow (2000) describes how Snack is used for the core functionality of WaveSurfer.

Entropic Signal Processing System (ESPS)

History: Originally a commercial product of Entropic Research Laboratory, Inc. (commonly abbreviated 'Entropic'), the following bundle of software was extremely popular in phonetics labs around the world in the 1990s:
1. Entropic Signal Processing System (ESPS): a collection of over 200 UNIX commands and programming libraries (written in C) for speech signal processing
2. waves+: a set of functions for interactive data visualization and manipulation of data processed by ESPS. The program that forms the core of the waves+ package is known as 'xwaves'.
3. EnSig: A higher-level graphical user interface for setting waves+ preferences and behavior with interactive prompts
After Entropic was acquired by Microsoft in 1999, the rights to the final legacy version of the ESPS source code (i.e. #1 above, but not #2 or #3) was donated to the Center for Speech Technology (Centrum för TalTeknologi, CTT) in the Department of Speech, Music, and Hearing (Tal musik och hörsel, TMH) at the Royal Institute of Technology (Kungliga Tekniska högskolan, KTH) in Stockholm, Sweden. This code was incorporated into the Snack Sound Toolkit. (See above.)
Download: The ESPS source code can be obtained in any the following ways:
- as a zip file directly from the Center for Speech Technology
- from a mirror on Github
- as a .deb file updated to compile and run on Ubuntu 12.04 from the University of Oxford Phonetics Laboratory
Documentation: The full manual is available from Roberto Togneri (University of Western Australia).

Edinburgh Speech Tools Library

Overview: The Edinburgh Speech Tools ("EST") library is a collection of C++ classes, functions, and programs for speech signal processing. EST forms the backbone of the Festival Speech Synthesis System (part of the Festvox project).
Creators: Researchers at University of Edinburgh (King, Clark, Richmond, Strom) and Carnegie Mellon University (Black)
Download: EST can be downloaded from the Festvox website. (Look for the file name beginning with "speech_tools".) There is also an unofficial fork on Github.
Documentation: See the official EST manual.
Support: Questions can be posted to the Festival mailing lists.

Speech Signal Processing Toolkit

Overview: The Speech Signal Processing Toolkit (SPTK) is a set of speech signal processing tools for Unix environments.
Creators: SPTK working group at Tokyo Institute of Technology and Nagoya Institute of Technology (See the readme.)
Download: SPTK can be downloaded from the project's SourceForge page.
Documentation: A link to the SPTK manual can be found on the SPTK home page.

Advanced Speech Signal Processor

Overview: The Advanced Speech Signal Processor (ASSP) library provides functionality for handling speech signal files - both as a command-line tool by itself ('libassp') and with a Tcl extension ('tclassp').
Creators: Lasse Bombien (Ludwig Maximilian University of Munich) and Michel Scheffers (Christian Albert University of Kiel)
Download: Both libassp and tclassp can be downloaded from the project's SourceForge page.
Documentation: A manual is available for tclassp.
Support: Bugs and feature requests can be submitted on the project's SourceForge page. See also the mailing list.