Data portability is only useful if outside systems can comprehend the exported data. Well-described and interoperable data sets open new possibilities for context-aware social applications, importing your friends, photos, or genetic markup from an existing system into your current tool of choice. In this post I will discuss website best practices for exporting portable, descriptive data sets in the name of data portability. This post builds upon user authorization concepts covered in my last post.

Expressing data between two unrelated systems is difficult at best. You need a shared set of vocabulary to explain even the basic data points (time, person, etc.). Good data exports will want to represent as much data as possible with the least probable data loss.

Voyager spacecraft gold disc

NASA launched the Voyager 1 spacecraft into space in September 1977 with a set of golden records onboard. These records communicate small pieces of human knowledge to any intelligent life that may discover our small explorer. The graphic above is humanity’s attempt at data interoperability, teaching alien explorers the proper positioning of an included stylus over a record rotating once every 3.6 seconds (time is expressed as the fundamental transition of the hydrogen atom). Thankfully web developers do not have to worry about interoperability with so many unknown measures, but your data could just as easily lost and never played back for other worlds to hear.

Identify exportable data

The first step in data export is identifying the unique pieces of information you would like to package and ship outside your walls. What information might be useful to a user seeking to backup or otherwise export his or her data? How would you like to import such data back into your own website?

Gmail inbox message view

Pictured above is a list of messages stored in Gmail. One message is part of a continuing conversation or thread, another message is flagged, and two messages have custom labels. A typical e-mail system might just export a list of raw messages but could possibly lose key data such as a flagged state or labels/tags.

Research existing data standards

Data interoperability is not a new concept and your current challenges may be easily solved by existing certified and de-facto standards. Standards increase the chances your data will be consumed, processed, and understood by others. You could invent an entirely new dialect and vocabulary to describe your information but you will be much more successful at disseminating data if you are easily interpreted.

Standards organizations have spent years analyzing the essential elements and interoperability requirements of many common forms of data. Below are just a few standard data formats for elements of the social web.

People, Places, and Things
vCard
xNAL
KML
LDAP
Events
iCalendar
News articles
Atom Syndication Format
News Industry Text Format
Human DNA
NCBI homo sapien genome build 36.2, FASTA.

Each data markup has a specific set of required data intended for a specific audience or interpreter. Google Maps prefers a feed of business listings and locations in xNAL while Google Earth prefers KML for example. Bloggers output news articles in Atom for consumption by a specific set of tools, while mainstream publications mark up their stories in a news industry format for increased granularity. Some formats may not be applicable if your product does not store all the required types of data (i.e. you know their name but not their hometown). Your company will need to select a target output format based on expected external use and how your information might map onto a format’s required elements.

Extend where appropriate

Each format supports extended namespaces for custom data not covered by the base vocabulary. A member’s favorite food or soccer club is not an essential component of an international standards body but can easily be extended with your own custom namespace where appropriate.

The same rules of data loss apply to custom namespaces: custom definitions are more likely to be missed while common namespaces are more easily understood. Extended namespaces may already be in active use by a big company or a coalition, increasing your chances of data visibility. An AOL Instant Messenger screenname is defined as “X-AIM” in a vCard context for example, where the X- represents an extension element.

Summary

Data portability and interoperability on the social web continues to be a hot topic. While there are PR benefits for first-movers I expect there will not be widespread adoption until portable data has a remote consumer. Startups with limited resources will need to see a possible consuming service for their exported data before carving out part of their product cycle for the new feature. I think data portability is a great project for this summer’s interns, providing deep exposure to data complexity and the industry as a whole while balancing proper authentication and privacy concerns.