Open Data

database
open-data

#1

Check our data

The IN COMMON database is licensed under the terms of:

  • Open Database License (ODbL) for the structure,
  • Creative Commons CC-BY-SA 4.0 International for the data.

Therefore we make it available for others to use.

We dump it every night at 03:02 GMT.

Download IN COMMON Database

You can download the latest version at

https://incommon.cc/data/incommon-api_latest.sql.gz

Note regarding personally-identifiable information

The database contains user information, however we do not store email addresses or any personally-identifying information there: we rely on remote authentication so we don’t have to. Usernames are encrypted, and the users table is excluded from the public dump. Everything in the data dump is public: either provided voluntarily by people for public display (e.g., organization contact) or from public sources.

We could store personally-identifiable information but chose not to because we’re not interested in tracking people. Many systems do not care about their users’ data and include them in their databases just because they can – and may monetize them at some point. By design, we can’t and we won’t.

We explicitly exclude the users table to avoid leaking authorization information that may compromise the integrity of the data (e.g., an attacker could impersonate a user by learning their API key and NONCE tokens). We do not exclude it because we’re storing personally identifying information in clear text: we do not. Here’s an example data entry for a user:

{
  "id": 2,
  "username": "1jvZMiHavdA28ois5V326xnNRMDA64qILDE=--bB95ULff9Qg+l9xJ--TYAJju8SA8ZyBhtQsCcXJw==",
  "password_digest": "$2a$10$foVgOl0QMMDUBxAM/z.Z7.sbzUCA5IOZUHx1gezREYVJyGMIbgFjG",
  "api_token": null,
  "uuid": "4a6f21c7-2064-4032-9422-f2f731632e84",
  "auth_token": "LsnqqkxRf2ik5xpACTEGrrss",
  "token": "EomYTEFZLrr1VZBeD4MvB2mv",
  "created_at": "2018-10-28T12:58:58.854Z",
  "updated_at": "2018-10-28T12:59:37.746Z",
  "sha256_hash": "0da432d6312bf1ab567b29f5db87dd0d39344ea9c65635d0e4376ca3f2b1b2f6"
}

How it works

The postgres user has a crontab defined:

PATH=/usr/local/bin:/usr/bin:/bin

# Every day at 03:02, dump incommon-api database
2 3 * * * public-data-dump.sh

The public-data-dump.sh script :

#!/bin/sh

if [ `id -un` != 'postgres' ]; then
        echo "you must run this script as the postgres user."
        exit 1
fi

PATH=/bin:/usr/bin

DB_NAME=incommon-api

PUBLIC_DATA=/srv/www/incommon.cc/www/data
PUBLIC_DATA_DUMP="$PUBLIC_DATA/${DB_NAME}_$(date +%F).sql.gz"

# Dump the current database
pg_dump -d $DB_NAME --exclude-table-data=users | gzip -9 > $PUBLIC_DATA_DUMP

# Link new backup to latest
ln -sf $PUBLIC_DATA_DUMP $PUBLIC_DATA/${DB_NAME}_latest.sql.gz

# Keep only 5 backups
old_files=`ls -td $PUBLIC_DATA/*.sql.gz | awk 'NR>5'`
test -n "$old_files" && rm $old_files