Recently in Audio Category

Audio production and editing software.

  1. Oct15

    The current state of audio search

    Online audio is definitely on an upswing, fueled by the iPod revolution, improved online playback, and broadband penetration. Audio search is keeping up with demand for new content, thanks in part to national security spending in the Cold War and beyond. In this post I will outline the current state of audio search, and how machines make sense of spoken word, progressing from easy to difficult.

    Mic

    First, let's define the space. I'm interested how a search engine might index content with non-professionally produced metadata. The President's weekly radio address contains a full transcript. Music catalogs are available for purchase from Muze and others to provide structured data about Bob Dylan and what he's saying. A voicemail message or a podcast might not be as thoroughly described.

    Let's take a look at audio files a search engine might discover during a web crawl and current methods of understanding the content.

    Filetype identification

    Audio content can be broken down into a few unique file extensions that hint at the remote audio container.

    wav
    The waveform audio format is a common form of uncompressed audio on Windows PCs.
    aiff
    The Audio Interchange File Format is a common form of uncompressed audio on Apple computers.
    mp3
    MPEG-1 Audio Layer 3 is a popular form of distribution for compressed audio files.
    wma
    Windows Media Audio, popular on Windows machines.
    asf
    Advanced Systems Format, a container for streaming audio and video commonly used by Microsoft products.
    m4a
    MPEG 4 audio files, most likely Advanced Audio Coding compressed audio created by Apple software.
    ra
    RealAudio format by Real Networks
    ogg
    Ogg Vorbis open source compression format.
    flac
    The Free Lossless Audio Codec is a compressed format used by audioheads and for archival purposes.

    A web search engine can take a look at all of the links in its index and identify possible audio files based on these file extensions without retrieving any file information from the host server. You can search Google for URLs containing "MP3" and referencing "Bob Dylan." Audio files are not currently supported in Google's file type operator. del.icio.us exposes bookmarked audio through the system:media:audio tag.

    HTML markup

    Audio files found in the wild are often described and referenced from within HTML pages. Here's an example of how an audio file might be described within a web page link:

    
    <a href="speech.mp3"
     type="audio/mpeg"
     hreflang="en-us"
     title="A longer description of the target audio">
     A short description</a>
    

    The href attribute points to the location of the audio file. The audio/mpeg type value provides a hint for user agents about the type of file on the other end of the link. The hreflang attribute communicates the base language of the linked content. The title attribute provides more information about the linked resource, and may be displayed as a tooltip in some browsers. The element value, "A short description," is the linked text on the page.

    It's not very likely publishers will produce more data than the functional effort of href. Title is a semi-visible attribute and therefore more likely to be included in the description, but still uncommon. It's possible to identify audio by a given MIME type such as audio/mpeg but few sites provide the advisory hint of type in their HTML markup. Collecting a file's MIME type requires "touching" the remote file, and will most likely return default values of popular hosting applications such as Apache or IIS, so a search engine is likely better off relying on a local list of mapped extensions and helper application behaviors.

    Syndication formats

    It is possible for a publisher to include more information about a file using a syndication feed combined with a specialized namespace such as the iTunes podcasting spec or Yahoo! Media RSS. A search engine may parse these feeds to gather more information about a particular audio item such as title, description, and length, which often provides a closer correlation than an audio link present on a web page.

    Hosted audio

    Large search engines such as Google, Yahoo!, and Microsoft have not created the same sort of hosted audio community for user-generated content as is present in images or video. Sites such as the Internet Archive host audio such as a Grateful Dead concert complete with data such as artist, title, performance date, equipment used, and audio editors.

    Apple's GarageBand software is one example of integrated recording, compression, descriptive markup, and remote hosting.

    Metadata containers

    Once you reach out and "touch" the audio file the search engine can discover more description information embedded within. An ID3 tag describes the track title, artist, album, genre, and other information provided by the publisher. The metadata descriptor might contain additional information such as album art, lyrics, or descriptions specific to a specific segment of the audio file described as "chapters." An audio metadata parser takes a look at each frame it knows how to read to extract the associated descriptive data.

    ID3 tags often occur at the beginning of the file to assist streaming applications and a metadata indexer might not grab the entire audio file, opting instead to only look for data in those first bytes.

    Parsing spoken word

    Speech recognition has enjoyed rapid improvement over the last decade, thanks in part to the large budgets of national security indexing spoken words captured through ECHELON and other methods. Similar technology is now being applied to medical and legal transcriptions and creating more searchable content for each podcast.

    AVOKE ATX Speech processing

    Speech-to-text software such as AVOKE from BBN Technologies is used to create transcripts of phone calls to call centers, the nightly news, and government surveillance. The system utilizes known vocabularies by language applied over a continuous density hidden Markov model to analyze speech phonemes in various contexts. The system uses multiple passes to determine context and associative clustering of words and phrases.

    Spoken word analysis is utilized in consumer search engine PodZinger to track a search term and jump to the appropriate marker within the file containing the given term. You can search for audio containing mentions of the Athletics and Tigers and view your results in the context of the file with direct links to that segment of the audio program.

    Summary

    Online audio content will only continue to get bigger, as more content makes its way online and into the ears of consumers on a PC, iPod, or other listening device. The maturity of online audio and the current business feasibility should consolidate audio format offerings into audio understood by dominant market players in the desktop, portable, and home theater markets.

    I expect even more speech-to-text work in the future as the CPUs, memory, and disk space available continues to become computationally and monetarily cheaper. Perhaps we might even see client-side analysis of content similar to analysis work being conducted on images. Windows Media Player and iTunes are just two examples of popular media players that connect to the Internet to retrieve more information about your media files, from album art to recorded year. In the future such applications might also query data services such as Last.fm, MusicBrainz, or the Music Genome Project to apply more data to each file based on a purchased database, collective intelligence, or expert analysis.

    Creating new sources of audio content is becoming easier. The popularity of VoIP will place new value on microphones connected to our PCs, gaming systems, and other connected electronics devices. Voice will become an integrated feature, allowing you to easily save a compressed audio file of a recent planning call or your Halo trash-talking session.

    I think many search engines have looked past audio search due to the litigious nature of the RIAA and others evidenced by last year's MGM vs. Grokster Supreme Court ruling. Google's recent $1.65 billion purchase of YouTube is perhaps a sign that search technology will continue to advance, challenging any emergent legal roadblocks along the way.

    As with most search sectors, audio search is still in very early stages. Expect known vocabularies and relationship mappings to increase over time, providing more insight not only into each word, but also speaker identification, tone, and possibly even relationships between events such as a power outage's correlation to customer service calls. We'll keep talking and publishing and search will attempt to keep up with our rate of speech, accents, and methods of describing our creations.

  2. Feb13

    Odeo audio messages

    Odeo introduced some new features last week including extended profiles and the ability to send any Odeo member an audio message. I've been asking Ev for audio comments for a few months and I'm glad it's finally here!

    Send me an Odeo

    Anyone with a Flash player can send an audio message to an Odeo member. You can add a special button to your podcast site and instantly collect audio comments for each episode. Choose from over 20 pre-made buttons to include on your site and you can enable audio comments in minutes. I added a "Send me an Odeo" button to my contact page and my podcast site.

    Tags:
  3. Jan31

    VoIP, not just for cheap calls

    The latest episode of Om and Niall PodSessions is now available. This week Om and I talk about VoIP and the new applications with seamless integration of new voice technologies.

    A recent study by In-Stat found 73% of all VoIP subscribers have migrated to VoIP without making a conscious decision to adopt the new technology. On Sunday my dad asked me about Vonage, and the various boxes he saw advertised with the services in the Sunday newspaper inserts. To him, Vonage was just another long distance provider and happened to have cheap rates to call Ireland. He had no clue what he was supposed to do with the Linksys box pictured in the ad. My mom also mentioned a bunch of parents are using "voice chat" to talk to their sons and daughters serving in Iraq. The lower costs and the integrated connectivity to endpoints across the world is driving adoption in my small sample of the suburban household. Big changes are underway in how we connect to each other using some of the same technologies that power the Internet, so Om and I decided to have a chat about what's changing and what's coming.

    This week's podsession is titled VoIP, not just for cheap calls, is 22 minutes long and a 10 MB download.

  4. Dec02

    Odeo Studio released

    Odeo Studio

    Odeo just released their online audio recording software called Odeo Studio. Odeo Studio was previously available to a limited group of users. The new version 0.14 is the first public release of the software.

    Giving people the ability to record content via a web page or telephone takes away a lot of the complexity of podcasting or casual voice message creation. No worries about getting the proper recording software, encoding the audio, uploading to a server. Now all you have to do is hit record on a web page and everything is done for you.

    Recordings are limited to only 3 minutes so Odeo is not yet a realistic tool for full podcasts. It's just the right amount of time for podcasts listeners to leave comments on shows. Yep, I'm requesting features already!

    Tags: ,

  5. Jan29

    Digital identity event at Future Salon

    Last night I attended a Future Salon presentation about digital and online identities. The event was hosted at SAP in Palo Alto.

    Eric Sachs of Google spoke about Google's relatively new entry into the digital identity realm with services such as Orkut and Gmail. Jeff Hodges of Liberty Alliance talked about identity systems in the enterprise marketplace. Fen Labalme of Identity Commons talked about identity systems built at the grassroots level for non-governmental organizations.

    I recorded all three speeches as well as the question and answer period using a directional microphone from my seat in the front row.

    Eric Sachs
    MP3 audio
    19:14, 8.7 MB
    Jeff Hodges
    MP3 audio
    15:40, 7.1 MB
    Fen Labalme
    MP3 audio
    22:49, 10.3 MB
    Questions & Answers
    MP3 audio
    36:34, 16.6 MB
  6. Aug13

    NeroSoft TimeTrax: Record XM Radio to MP3

    NeroSoft TimeTrax is an application for XM Satellite Radio subscribers with XM PCR hardware that allows users to convert the XM Satellite Radio stream into individual MP3 (with complete ID3 information) or WAV files per song. You can also limit your recordings by artist. See the documentation pages for more details. (via Gizmodo)
  7. Jun02

    Windows Media Player 10 Technical Beta

    Windows Media Player 10 Technical Beta is now available for download from Microsoft's Web site. The software only works with Windows XP. New design, better sync abilities, and an integrated online digital media mall.
  8. Oct16

    Apple iTunes for Windows

    Today I downloaded the new Apple iTunes for Windows 4.1.0.52. I have used the OS X version a few times on a friend's PowerBook, but this is my first time testing all of its different features. I use Windows Media Player 9 daily.

    When I mentioned to other people that I had downloaded iTunes for Windows and was playing around with it their assumption was that I was buying music with the software. Interesting what features have been getting the most publicity.

    iTunes automatically located the Music in the My Music folder of My Documents. When I tried to add an additional folder to my library, unmapped network drives were not an option and there was no space for me to specify the exact location of the folder of interest. (i.e. \\motherlode )

    I ripped a few albums in 128kbps AAC. About 4.3 MB for a 4:30 song. iTunes uses the CDDB database for its album information, which is inferior to Windows Media Player?s database in my opinion. Why? Because when I insert a CD into Media Player it pulls up detailed album information and even the cover art. Lyricists change from song to song, and the individual track information is very interesting. There is no special field to place lyrics, but I decided to enter some into the general comments area of a track.

    My entire music library is saved as an XML file. Very cool. Here is the Apple DTD if you are interested.

    More to come later as I install iTunes at home, purchase a track or two, try out the radio stations, and have it index my tens of gigabytes worth of WMA, AAC, and MP3 files.

    • Posted at 3:42PM
    • Updated at 1:32AM

Niall Kennedy Niall Kennedy is a web technologist in San Francisco, California in the United States. I am very interested in the world of... MORE »

Search this weblog:

Subscribe:

Latest feature: Widget development

Archives: Popular Categories

Sites: More from Niall