Updates

  1. 18F’s Melody Kramer interviewed me about this project and I went deeper into some of my takeaways.
  2. Open data strategist Rayna Stamboliyska translated parts of the blog into French for her DataColada newsletter.

Governments have been flocking to GitHub.

Their reasons are plenty: the promise of “private sector” tools, a conviction that publicly-funded code should be public, the company’s evangelism (and stickers), etc. Whatever the case, GitHub now hosts at least 600 government organizations, with over 9,000 public repositories between them.

I had a notion of the global ecosystem this activity has sprouted—the players and their interactions—but wanted to back it up with data.

So, using GitHub’s API, I compiled a database of government GitHub organizations, their repositories, members, and contributors and dove in.

Contents

Summary

Overall, reuse within the government GitHub “ecosystem” is uneven and limited.

Nearly all popular repositories (inside and outside of government) were created by US and UK national organizations. The bulk are standards or frameworks. Modular products, like data.gov.uk’s CKAN extensions, also seem relatively reusable.

Collaborative work and reuse is most concentrated within the large US and UK national-level networks. This may point to the importance of scale, “real world” interactions (e.g. talks, meet-ups, employees switching between organizations), and the alignment of policy priorities, timelines, licensing, and tech stacks.

14% of repositories have no further activity after being posted to GitHub. 46% remain under development a year after they were created.

I didn’t find a license file for half of the repositories. At least 13% use the MIT license. At least 8% use some version of the GPL. License choice varies geographically.

Government GitHub organizations are bringing some new users to the platform along with them. But 45% of the users predate the government organizations they contribute to.

Estonia has the most government repositories per capita at 72.8 per million residents (hover over and click to zoom in on the map up top).

Notes and Caveats

  • The code to generate the database is on GitHub.
  • The list of government GitHub organizations is certainly incomplete—add more if you know any!
  • Unless otherwise specified, ‘repositories’ refers to repositories that are not themselves forks.
  • All repository, membership, and contribution statistics include only public information. I assume most repositories are kept public. Organizational membership, however, is private by default.
  • The GitHub API only checks for license files at the root of the repository. Some may be embedded in the README or placed in sub-folders.

The Government GitHub Network

Government development teams interact and influence one another through various channels. These include:

  • sharing members or repository contributors;
  • forking, starring, or cloning each others’ repositories (and possibly submitting pull requests)
  • contributing issues;
  • reading each others’ code, READMEs, and blogs; and
  • talking to each other in person or on other platforms.

Because the GitHub API gives us access to membership, contribution, and forking relationships, I’ll focus on those.

Contribution Network

view full size

This graph shows 277 nodes (or organizations) connected to one another with 1270 edges. If two organizations share an edge, they have at least one contributor in common.

Overall, 941 unique users account for 2751 individual contributor connections between the organizations (see user statistics).

323 organizations are “loners,” with no contributors shared with other organizations. These don’t appear in the graph.

The thicker edges represent more contributors in common. Nodes are sized by the number of other nodes they’re connected to (their “degree”). If you view the full size graph, each node links to its GitHub site.

Two main clusters stick out. Up top in bright green are UK national organizations. On the right in purple are the US federal organizations.

The City of Philadelphia has the most prominent non-national organization. You might also notice the DC Government in the mix, as well the USGS/NOAA, Brazil, Canada, and Australia sub-networks.

It’s likely some of the connections aren’t real, but are the artifacts of cloned repositories. These may retain the original repository’s contribution history in addition to any new commits, but the API won’t mark them as forks.

This turns the contributor graph into a mish-mash of genuine collaboration of one organization’s members with another’s, non-members who contribute code to multiple organizations, and reuse.

Membership Network

We can do the same thing with shared members.

view full size

This graph of 96 nodes (organizations) is tied together using 148 edges. Behind these edges are 327 individual member connections from 137 unique users (see user statistics).

Again, we find the highly inter-linked US federal sub-network. There are also the smaller UK and Canada membership sub-networks.

GitHub makes membership private by default, so there are likely more member connections in reality. But, in general, it makes sense that this graph would be much sparser than the contribution graph.

Why is the US federal sub-network comparatively dense? Many of them have large memberships, so there are more potential connections to be made. Some (like 18F) have a policy requiring that staff make their membership public. A number operate as consultancies to other federal agencies. And, from what I’ve seen, many “techies” enter the US federal government through one of these agencies and then hop around.

Forking Network

view full size

This graph shows organization forking connections. Arrows connect the forked repository’s source → to its destination.

121 organizations have forked other organizations’ repositories 223 times (138 edges). For comparison, 1858 forks come from non-government organizations or users and 9032 repositories are not forks.

Most of the graph’s forks go unreciprocated; only 8 government organizations (mainly US federal) have forked one another’s repositories (↔).

data.gov.uk is in an unusual position, being disconnected from the rest of the UK, while the source for organizations in other countries (Romania, Estonia, Paraguay, Canada, and US).

Open data projects, like data.gov.uk’s CKAN contributions, seem poised for cross-border reuse. This may be because use cases are quite standardized and modular extensions address any differences.

Organization, Repository & User Statistics

I provide some additional views into government GitHub organization, repository, and user data below.

There are certainly many other questions to look into. Check out this repository to generate your own database (or reuse the one there).

Organizations

The list I used included 600 government GitHub organizations. You can see their geographic distribution on the map up top.

No. repositories

Of note, not only does UK's Government Digital Service make the top 10, but so does its GitHub organization for retired repositories! Neat appearances from the Norwegian Meterological Institute, the Gemeinsamer Bibliotheksverbund in Germany, and the National Library of Finland. 

This includes only repositories that are not themselves forks.
rank organization repositories
1 18F 437
2 Government Digital Service 345
3 Ministry of Justice 344
4 UKHomeOffice 177
5 Consumer Financial Protection Bureau 169

No. contributors

rank organization contributors
1 Government Digital Service 577
2 18F 507
3 Ministry of Justice 212
4 National Geospatial-Intelligence Agency 189
5 U.S. General Services Administration 187

No. members

Member counts are likely quite a bit higher in reality. Because GitHub defaults to private membership, in my experience many users don't switch their preferences to be public. I bet that this isn't intentional in most cases. GitHub would do well to make this option more obvious when you first join an organization.
rank organization members
1 18F 149
2 U.S. Geological Survey 131
3 Web Experience Toolkit (WET) 99
4 Consumer Financial Protection Bureau 63
5 Presidential Innovation Fellows 61

No. times forked by others

rank organization members
1 18F 2519
2 Government Digital Service 1515
3 Consumer Financial Protection Bureau 1418
4 The White House 1327
5 US Army Research Laboratory 1106

No. repositories that are forks

rank organization members
1 Government Digital Service 129
2 18F 109
3 U.S. General Services Administration 75
4 Ministry of Justice 67
5 Plein Overheid 64

Repositories

The data showed 11113 public repositories—2081 forked repositories and 9032 non-forked.

Licensing

Listed below are each license type, their frequency, the regions that most frequently use the license, and the percentage of each region's repositories with the license. 

I only include regions with at least 10 repositories and 2 organizations.

Note: The GitHub API only looks for a license file at the root of the repository, so licenses embedded in the README or stored in a subfolder are marked as having no license. Italy's not really that bad!
None | 4733 repositories

Italy 90.0% • Ecuador 88.89% • The Netherlands 84.43% • Bolivia 83.33% • Germany 74.38%

MIT License | 1192 repositories

U.K. 34.68% • International 30.77% • Belgium 28.36% • Lithuania 26.47% • Switzerland 20.27%

Other | 1179 repositories

Chile 27.08% • International 23.08% • Canada 21.33% • U.S. 18.91% • France 17.69%

Apache License 2.0 | 507 repositories

Australia 11.68% • Canada 10.14% • Japan 8.74% • U.K. 8.5% • U.S. 4.99%

Creative Commons Zero v1.0 Universal | 289 repositories

Japan 22.33% • Estonia 5.88% • U.S. 5.84% • Sweden 3.74% • Germany 1.65%

GNU General Public License v2.0 | 262 repositories

Norway 26.26% • Brazil 22.29% • Chile 20.83% • Venezuela 13.64% • Colombia 12.5%

GNU General Public License v3.0 | 249 repositories

Mexico 71.74% • Venezuela 43.18% • Belgium 20.9% • Colombia 12.5% • Finland 10.23%

GNU Affero General Public License v3.0 | 198 repositories

Sweden 33.64% • Switzerland 20.27% • France 16.15% • Finland 15.34% • Ecuador 11.11%

The Unlicense | 168 repositories

U.S. 3.88% • Argentina 1.19% • Sweden 0.93% • The Netherlands 0.82% • Brazil 0.6%

BSD 3-clause "New" or "Revised" License | 116 repositories

New Zealand 28.32% • Australia 10.22% • Estonia 7.35% • Chile 4.17% • The Netherlands 3.28%

GNU Lesser General Public License v3.0 | 65 repositories

Venezuela 6.82% • Colombia 6.25% • Singapore 5.26% • Belgium 1.49% • U.S. 1.11%

BSD 2-clause "Simplified" License | 27 repositories

Belgium 4.48% • Canada 1.05% • Japan 0.97% • Sweden 0.93% • The Netherlands 0.82%

GNU Lesser General Public License v2.1 | 22 repositories

Estonia 17.65% • Brazil 1.2% • Norway 0.56% • France 0.38% • U.S. 0.07%

Mozilla Public License 2.0 | 9 repositories

Chile 2.08% • Australia 0.36% • Canada 0.35% • U.K. 0.25% • U.S. 0.02%

ISC License | 7 repositories

Canada 0.35% • U.S. 0.14%

Eclipse Public License 1.0 | 4 repositories

Germany 0.83% • U.K. 0.05% • U.S. 0.05%

Artistic License 2.0 | 2 repositories

Finland 0.57% • U.S. 0.02%

Do What The F*ck You Want To Public License | 1 repositories

France 0.38%

Microsoft Public License | 1 repositories

U.K. 0.05%

Open Software License 3.0 | 1 repositories

Age

Repository count by date created

Things have certainly picked up.

Oldest Repositories

rank repository date created
1 nysenate/Newsclips June 10, 2009
2 sfcta/androidtracks November 30, 2009
3 HHS/pillbox_docs December 12, 2009
4 erasme/check_cciss March 03, 2010
5 usnationalarchives/fr2 April 12, 2010

Of the 10 oldest, 4 have had commits in the past month.

Development Lifetime

Many government GitHub repositories have a fairly short development lifespan. 14% show no further development after after they were first pushed to GitHub. However, 46% were under development a year in, 18% two years in, and 6% three years in.

11 repositories have Git histories earlier than their initial push date. Seems like a lot of development happens locally and without version control, then the repository is dumped on GitHub.

No. contributors

rank repository contributors
1 alphagov/govuk-puppet 146
T2 alphagov/government-service-design-manual 124
T2 bcgov/esm-server 124
4 adsib/20150408_ves_anup 122
5 18F/18f.gsa.gov 113

No. stars

Understandably, people are most interested in tools, standards, and frameworks. US Army's net-sec tool beating 18F was a surprise, though.
rank repository stars
1 USArmyResearchLab/Dshell 4638
2 18F/web-design-standards 2995
3 WhiteHouse/api-standards 1454
4 GovernmentCommunicationsHeadquarters/Gaffer 1146
5 WhiteHouse/petitions 953

No. forks (gov → gov)

rank repository forks
1 project-open-data/project-open-data.github.io 7
2 WhiteHouse/api-standards 6
T3 usds/playbook 5
T3 18F/analytics-reporter 5
T3 18F/analytics.usa.gov 5

No. forks (gov → all)

rank repository forks
1 USArmyResearchLab/Dshell 1093
2 wet-boew/wet-boew 480
3 18F/web-design-standards 340
4 project-open-data/project-open-data.github.io 325
5 WhiteHouse/wh-app-ios 260

No. forks (non-gov → gov)

Government forks little from without and little from without. The government.github.com repository is likely from organizations adding themselves to the "official" list.
rank repository forks
1 github/government.github.com 45
2 ckan/ckan 17
3 twbs/bootstrap 9
T4 ckan/ckanext-harvest 7
T4 ckan/ckanext-spatial 7

Users

The data showed 7887 public contributors to government repositories (that are not forks) and 1512 public members of government organizations.

User join date vs. organization creation date

Are government GitHub organizations spurring new users to join the platform? We don't know the date users joined an organization, so we'll have to proxy.

The histograms below show the difference (in days) between 1) when a user joined GitHub and 2) the earliest creation date of all the government organizations to which they contribute or belong.

There's clearly a bump in the center. That spike shows users who joined GitHub at the same time as the government organization with which they're involved. Some of those users who came to the platform later may also have joined a government organization the same day as their arrival.

In each case, about half of users predate their organization. This says to me that public, social coding is generally new at an individual level—not just at an institutional one.

No. repositories contributed to

The top contributors (in terms of government repository count, at least) are all members of UK's Government Digital Service, 18F, or the Consumer Financial Protection Bureau.

A majority of users are one-off contributors (see percentiles).
rank user repositories
1 mattbostock 136
2 bradwright 129
3 alext 125
4 gbinal 111
5 jabley 110

  • 25th percentile: 1
  • 50th percentile: 1
  • 75th percentile: 3

No. organizations a member of

All but one (mattbostock of the UK) of the top ten members are part of the web of U.S. federal programs—the USDS-18F-PIF-CFPB-White House connection.

It's quite rare for a user to be a member of more than one organization.
rank user organizations
1 blacktm 8
T2 jgrevich 7
T2 tyronegrandison 7
T4 robertsosinski 5
T4 adelevie 5
T4 amoose 5
T4 leahbannon 5

  • 25th percentile: 1
  • 50th percentile: 1
  • 75th percentile: 1

Made it down this far?

Huzzah! You deserve a cookie.