No matter the size of the organization, running an Open Source Programs Office requires staying on top of several things at once. While the processes between organizations might vary, many of run into a common set of needs and have subsequently developed a set of tools to manage corporate scale open source needs. As part of the TODO Group, we have started sharing those tools with each other and the open source community at large.
The road to strategic use of open source starts with a carefully planned, organized, and empowered open source program office to guide and manage its creation, distribution, and use. But, that’s just a first step. To get such an office underway and running smoothly, you need the right tools. These mission-critical tools will be used to track goals and metrics in departments from engineering and legal to executive leadership, PR and marketing to HR, and give each of these functions all the resources they need to gather data, provide snapshots of performance, and manage the daily use of open source within your company.
This guide provides details and scenarios for how to get your open source tool collection started, including information about the most important tools to use to track and manage your open source projects. Many of the tools have been created and open-sourced by The Linux Foundation and other leaders in the field, providing free and easy access for your projects. You’ll also find an example dashboard setup, which combines information from multiple tools for central review.
Table of Contents
- Why you need special tools for open source program management
- How to select and plan your tools
- Elements of a basic toolset
- Tools for managing source code
- Tools for tracking project health
- Tools for communications and collaboration
- Tools for corporate-scale GitHub management
- Final words
Why you need special tools for open source program management
Once your open source program office is up and running, it’s time to collect the right software tools that will allow your development teams to manage, track, guide, and advance their open source projects, consumption, contributions, and releases.
These tools are critical because using open source for business strategy requires its own methodologies and processes which are very different than those needed when using and releasing proprietary software. Open source tools allow companies to do a myriad of tasks:
- Provide a workplace for collaboration and code building.
- Manage project health.
- Automate critical and repeatable tasks such as code review and tracking and license compliance.
- Generate data to prove ROI for your program office and open source strategy, in general.
- Oversee project quality and to make sure that guard rails are in place if issues arise.
Having the right, targeted tools as you begin your open source journey will also make jobs easier for developers and other employees, will provide better insight to results, and will become the basis for successful collaboration and communication of a company’s open source projects.
“If you have more than 100 code repositories or 100 people that you're trying to manage, you really can’t have someone doing it manually with spreadsheets anymore. Obviously, people still do it that way. But it starts to become ad hoc and laborious. That’s where tools come into play. They allow you to scale.” – Jeff McAffer, Director of the Open Source Programs Office at Microsoft
How to select and plan your tools
Most of the early discussions about which open source tools are needed by a company will depend on its business, products, and services and how it serves its customers and employees. As the planning process and strategy map are developed by its open source program office, tools can be chosen to integrate the company’s goals, processes and infrastructure.
Ultimately, the only way to know which tools you will need is to understand what you want to do with open source.
Below are the basic steps for choosing the tools you’ll need for managing your open source program office:
- Get buy-in and selection preferences from developers and community members. To accomplish this, you’ll want to conduct detailed discussions with developers and community members. They can describe what tools have been or would work best for them. Take those recommendations and requests very seriously. Listen to the people who are going to get you to your goal. They have most likely been using many of these tools already, so benefit from their experiences.
- Understand necessary software dependencies and integrations for business-critical applications. This means understanding and knowing which open source software your business depends on so you can stay up to date with security issues and ensure software continuity.
- Research existing tools and decide what you can use as-is, or build out to suit your needs. Don’t start from scratch for every tool. See what is out there and being used in the open source communities you are in and get advice and feedback about those tools. Linger in online development communities to see what works and ask for recommendations and advice. Ask questions at open source conferences, talk to fellow developers in Birds-of-a-Feather sessions, and learn from others who are already doing what you want to do.
Once selected, the tools must then be implemented, which requires several additional steps:
- Create an internal infrastructure to support, manage, and use the tools. Through your newly-formed open source program office, designate someone to maintain and build the internal infrastructure that will distribute the tools through an online internal portal where they are kept and organized into tasks and features. In this tool portal you can make the tools available to all developers or restrict them to specific users through authentications and permissions based on their jobs and requirements.
- Provide training plans for employees who will use the tools. Just getting the tools isn’t enough. Now you have to be sure that your developers know how to use them and are mastering their capabilities. This is where training programs, whether online, in classrooms or in small lunchtime group settings, will be important to reap the benefits of their use. Ask your developers which learning methods work best for them and let them choose how they want to learn.
- Ensure the tools are centrally visible in your organization. Make it easy for developers to find and use them, preferably integrated into any existing developer dashboards that track development progress. Again, this is where the internal tool portal is going to help your company organize and distribute the critical tools for your operations.
Implementation is helpful to keep in mind as you are choosing your tools, as this may also affect your decision. A tool with a steep learning curve, for example, may require more training.
Leverage existing tools
After you have a good idea of what your team needs to meet your organization’s open source goals and the possible limitations of your own dependencies and infrastructure, the first step is to explore and learn about existing tools that are ready-built and available for you today. Since most are open source tools themselves, if they don’t meet your exact needs at the start, your development teams can contact the builders of the tools to see if they can collaborate and contribute to take the tools in new directions by adding features.
Ironically, many open source program offices don’t always reuse the tools developed by others, or collaborate with other companies to work on the tools they require to manage their open source programs. Often, they want to do that, but many businesses, including Facebook and Microsoft, already have existing tool suites which were in place before collaboration really became a discussion topic. Because they already have their tool sets and have made those investments, they seem to have less desire to adopt those of other companies.
That’s where companies that are just starting to build out their own open source programs have a significant advantage. Since they are now establishing their own open source program offices and diving into open source, they don’t have to be bothered with such limitations.
Instead, they can wisely take advantage of the experiences and successes of others and build their open source toolboxes using the proven tools created by companies which led the way in recent years. The Linux Foundation’s open source industry organization, the TODO Group (Talk Openly Develop Openly), collects a list of these tools in this document.
Create a dashboard
Along with the proper tools, companies should also incorporate central dashboards which allow them to monitor and track their open source projects and development in real time. Many companies likely have such dashboards for existing development work and applications and may be able to integrate the existing dashboards with their open source work. If not, they should create or adopt new dashboards to improve the management of their open source deployments.
“On dashboards, there are many ways to create them, and it’s really an art in terms of how companies want to display development-related information. Some people build these fancy screens with rotating dashboards, but the key thing is to have a central location**, preferably co-located with your existing dev dashboards,** where people can go to learn more about open source project health, metrics, and so on.” – Chris Aniszczyk, COO of the Cloud Native Computing Foundation.
Elements of a basic toolset
The abundance of tools available for managing and reporting on open source projects can quickly become overwhelming. If your open source program is just getting started, it helps to focus your research on just a few of the basic tools that you’ll need to get up and running.
Then as your program grows and you’ve gained more experience using these tools, you can start to adopt new tools to help you automate and streamline your processes as the need arises. Remember that you want the tools you choose to complement and support your internal culture and processes – not lead them.
The sections below give the basic categories of tools that pretty much all open source programs use on a daily basis. This is a good way to organize your research.
Tools which automate processes are among the most important you will select and use for your company’s open source program. The tasks for such tools are broad, including automating procedures for contributor license agreements (CLAs), which are legal documents stating that a developer created the code and didn’t copy it from anywhere else illegally. Traditionally these kinds of agreements were done manually by printing out the agreements and then signing and faxing them in to comply. But in a world of email and instant communications, that’s crazy today. Instead, the process can be automated using bots that request electronic signatures and then track and handle the submissions.
Other automation tools can tell you who exactly is contributing to your projects and can help
remove procedural friction which slows down progress in projects as they get larger and scale to meet the needs of companies.
In Microsoft’s open source program office, where some 8,000 repositories are managed on GitHub involving some 11,000 contributors, about 40,000 internal requests came in to use open source in projects in 2016, according to the company. To manage those requests as well as the code that’s created and the code versions which are being updated,the company turns to tools which can automate the chaos. And because the code is likely being used in potentially hundreds of other projects, it must be tracked carefully so that if a security bug arises all software impacts can quickly be mapped out and fixed. At such a large scale, automation is critical and manual updates would be almost impossible.
Manage critical tasks
Other important tools to be considered and acquired are those which help manage critical tasks, such as project management, tracking project health and ensuring clear and quick communications between developers, open source communities, and others inside a company.
Source code management
Most corporate software projects being developed through open source program offices use GitHub as their centralized hosting and development platform.
GitHub is an online source code management site that allows open source developers to manage and house their code in a central “repository” or storage space where participants can collaborate and build their code together. Some 64 million open source coding projects are hosted within GitHub today, involving some 23 million developers.
GitHub users can add code, review submitted code, propose changes, get and offer feedback and provide project management using the service. GitHub uses the Git Version Control System, the open source project developed by Linux creator Linus Torvalds which provides organization for the code and people who are collaborating on open source. Each “contributor” has their own copy of the project repository they are working on, where they can make changes on their own computer and then submit it back to the project for future inclusion. That “pull request,” (example here) or code contribution, is then reviewed, discussed, modified and approved or rejected by the project organizers.
Also important are code scanning and compliance tools, which help track code provenance and license requirements. It’s important for companies to watch over the open source code being brought into its own infrastructure, products, and services to ensure license requirements are met.
Your applications, for example, could include several thousand open source components. To protect your company from legal issues it’s critical to know these details. In scenarios that are high risk, users must dive into the code to deeply validate and verify that the licenses are what they say they are, depending on where your business is on a risk spectrum. (See our guide on using and distributing open source code.)
“You must understand your risk profile, because in the end scanning is all about risk management. You can stick your head in the sand at one end then just trust and hope that you are OK. Or you could say ‘If I get sued, it’s going to devastate my business.’ You need to really be sure. So, you crack open the package and you look through all the lines of code and you find everything that could possibly be in there.” – Jeff McAffer, director of the Open Source Programs Office at Microsoft.
Tools for managing source code
As we discussed earlier GitHub is the go-to source code management system for most open source program offices these days. But GitHub alone won’t meet all your program’s code management needs – especially as you scale up your efforts.
Some of the tools used in the world of open source are aimed at improving GitHub itself by adding features it lacks, such as support for checking Developer Certificate of Origin (DCO) statements to be sure that code can be legally licensed and used in an open source project.
GitHub also has some deficiencies when it comes to code reviews, so there are available tools that can automatically send questionable code back to the contributors who created it and ask them to review and make needed changes. GitHub doesn’t have a way to force someone to review their code, so these clever tools make that happen to improve workflows.
Other GitHub-specific tools expand on GitHub’s performance metrics capabilities, which tend to be very project specific rather than providing detailed information across whole organizations. For companies that maintain many open source code repositories across multiple GitHub projects, better tools are needed to organize and aggregate them to make sense of it all. A wide range of such tools are available from Amazon, Netflix, and Microsoft to help with those tasks.
Here are some of the most popular and useful source code management tools which can streamline and help your GitHub presence:
Source code scanning and license compliance
Antepedia Reporter – A commercial, fee-based application from Antepedia, Reporter is a report-generation product which lets developers, project managers, legal advisors and others create license compliance audits and IP rights management reports about the open source, public and private components in your code base.
Black Duck Hub – The commercial Hub service scans code to identify all embedded open source components, and then automatically searches for known vulnerabilities for remediation. It can send alerts when new vulnerabilities are found in your code.
Black Duck Protex – Protex is a commercial, fee-based license compliance management tool from Black Duck which integrates with existing tools to automatically scan, identify and inventory open source software, while also enforcing license compliance and corporate policy requirements.
Copyright review tools – This collection of open source command line tools help make initial copyright file construction and subsequent review and update easier.
dep-checker – A free dependency checker tool from The Linux Foundation, dep-checker performs a complete analysis of linkages between code packages.
FlexNet Code Insight – Flexera, which acquired licensing compliance vendor Palamida in 2016, commercially offers FlexNet Code Insight to help automate corporate open source use among developers, legal teams and security staffers.
FOSSA – This is a commercial tool that automatically performs code dependency tracking, license compliance scanning in the background.
FOSSID - FOSSID is a commercial tool for license and vulnerability scanning. Rather than relying upon declared components and licenses, FOSSID uses a large database of projects and code fragments to scan for code snippets. This enables detection of copied/pasted code, or code where license declarations were not properly preserved. In particular, this is useful when auditing code received from a third party or when preparing to open source code that was originally developed for internal use only.
FOSSology – A Linux Foundation project, FOSSology is an open source license compliance software toolkit which can run license, copyright and export control scans from the command line. A database and web UI are also available to create compliance workflows.
janitor.git – Code Janitor is an open source tool that helps evaluate source code for compliance with open source licenses. From The Linux Foundation, Code Janitor can be used with other products to check code.
LicenseFinder – An open source tool which detects the licenses of the code being used in your projects, compares those licenses against a user-defined whitelist and then provides an actionable report.
Protecode Enterprise Analyzer – This commercial application is used to analyze and identify all code in any directory to determine code ownership and ensure open source license compliance based on predetermined internal policies.
REUSE – A free software tool to help adopt and check the application of licenses in a code repository. It is based on best practices, including the SPDX specification. It offers a badge API service to market the compliance.
scancode-toolkit – From nexB, the open source ScanCode suite of utilities scans code for licenses, copyright and dependencies to find, discover and inventory open source and third-party components used in your code.
SPDX – The Software Package Data Exchange (SPDX) specification is a standard format used to describe the components, licenses and copyrights associated with software packages. The SPDX standard aids compliance with free and open source software licenses by standardizing the way license information is shared between developers and companies. The SPDX specification is developed by the SPDX workgroup, which is hosted by The Linux Foundation. The group offers open source tools to help users of SPDX documents.
WhiteSource – Provides licensing, security, code quality and reporting analysis for managing open source components in real-time by automatically and continuously scanning dozens of open source repositories on a commercial basis.
Bug and issue tracking
Bugzilla – Open source, server-based software featuring an advanced query tool that can remember searches, integrated email capabilities and a comprehensive permissions system. Bugzilla is used by Mozilla as its bug tracking system.
GitHub Issues – GitHub’s own integrated feedback and bug tracker, GitHub Issues is available as part of GitHub’s project hosting.
GitLab – This bug tracking tool unifies issue tracking, code review, Git repository management, activity streams, wikis and more in a single UI to assist your open source projects. GitLab is available as a service or as a commercial software.
JIRA – From Atlassian, JIRA contains custom filters, developer tool integrations, customizable workflows and rich APIs to integrate JIRA with other applications. JIRA is available as a commercial software.
Archiving and release management
Artifactory – Artifactory is a repository manager from JFrog which supports software packages created in any code language. It integrates with all major DevOps and continuous integration and continuous deployment tools. Artifactory is available as a commercial or as an open source tool.
Bintray – A commercial archiving tool from JFrog that allows companies to publish their code release archives to maintain storage for older and larger files.
Docker Hub – A cloud-based registry service which allows users to link to code repositories and build and test their images. It also stores manually-pushed images and links to Docker Cloud so users can deploy images to project hosts. Docker Hub is a centralized resource for container image discovery, distribution and change management, collaboration and workflow automation throughout the development pipeline.
Tools for tracking project health
Monitoring and tracking the overall health of open source projects as they grow and mature is a core task for an enterprise open source program. To accomplish it, you must gather tools which report on how individual open source projects are performing and being received by their communities – often across dozens, hundreds or even thousands of projects at once. The tools also must be able to roll the data into meaningful, useful, and actionable information about overall project performance across your entire open source portfolio.
The bottom line here is it’s all about the critical and useful insights you can glean from the data – not about vanity metrics such as detailing how many “watcher” stars a project has logged, how many contributors have been part of the project since its start, or other metrics that lack important context.
The best project health tools must also help the project teams be responsive to the communities which support their efforts and encourage engagement and diversity with contributing developers. That means the tools help maintainers quickly respond to questions or feedback posted by community members so they remain enthusiastically engaged and don’t get bored and move on to other projects.
Some open source communities will have large groups of contributors, while others will have small niche groups of community members. The project health tools need to be able to work with projects of all sizes.
“Regarding existing tools and systems, my hope is that we're quickly getting to a point where a company’s open source program office should not need to create any tools or technologies on their own. They should be able to find and use existing open source tools which can be used to manage their open source programs.” – Jeff McAffer, Director of the Open Source Programs Office at Microsoft
Here are some of the most popular and useful project statistics and project health tracking tools:
- CatWatch – CatWatch is an open source metrics dashboard from Zalando that fetches GitHub statistics for your GitHub accounts, processes and saves your GitHub data in a database. The data reveals the popularity of your open source projects, your most active contributors and other interesting statistics.
- Gander – Gander is an open source dashboard which generates usable metrics for a range of open source projects in one quick look. Created by PayPal, Gander is designed for individuals who are responsible for running Open Source Program Offices or keeping track of multiple open source projects.
- GHCrawler – Created by Microsoft, GHCrawler is an open source GitHub API crawler that crawls a GitHub-hosted project and automatically tracks, retrieves, and stores its contents. GHCrawler is primarily intended for people trying to track sets of organizations and data repositories.
- Gittagstats – Gittagstats is an open source tool which generates statistics reports from a set of tags for a Git repository. The tool was created by Qualcomm.
- GrimoireLab – GrimoireLab has a variety of open source tools to measure open source project statistics and visualize them, from git repositories, GitHub pull requests or Bugzilla tickets to mailing lists, Meetup groups or Slack channels. GrimoireLab is a project in CHAOSS, a collaborative group on open source development metrics.
- OSS-dashboard – The Open Source Program Dashboard, which comes from Amazon, is a multi-function dashboard which can be used to view and monitor many GitHub organizations and or users at one time.
- OSS Tracker – OSS Tracker, from Netflix, collects data about a GitHub organization and aggregates it across all projects within that organization in a single user interface. All repositories are listed and metrics are combined for an organization, but community managers can also organize projects into functional areas and appoint administrators to assign management and engineering leads.
“The goal is to have the tools, along with transparent data and metrics-related information, which can be used to guide the organization.” – Chris Aniszczyk, COO of the Cloud Native Computing Foundation
The TODO Group also offers a helpful list that adds other tools as well:
For better code reviews:
- mention-bot – Developed by Facebook, this tool automatically mentions potential reviewers for code contributions by community members to speed up the review process.
- PullApprove – Brings more formalization to code contributions – or pull requests – by improving code quality through peer-review, enforcing style guidelines, catching errors and providing security checks on code.
- sentinel – A repository management bot which reviews and tests code contributions, builds a list of maintainers for the repository and communicates the status of a pull request with users.
For Contributor License Agreements
CLA Assistant – Contributed by SAP, the CLA Assistant streamlines workflows by handling the legal side of contributions for users. The Assistant asks code contributors to sign CLAs as they make their code contributions and authenticates each contributor with his or her GitHub account. It also updates the status of a pull request when the contributor agrees to the CLA and automatically asks users to re-sign the CLA for each new pull request if changes are made to the CLA.
CLA Portal – From VMware, CLA Portal adds a workflow to enable contributors to digitally sign a CLA for pull requests to your GitHub repositories. When a developer opens a pull request, they are prompted to sign the agreement if needed. Also included is an administrator interface for CLA authoring, CLA-to-project mapping, and agreement reviews.
DCOB – A Developer Certificate of Origin Bot which helps to enforce developer certificate of origin sign-offs for each code change in a pull request. The DCOB sets the status for each accepted code change, as required by the Developer Certificate of Origin.
GitHub Management at Corporate Scale
- hubcommander - A Slack bot for GitHub organization management, HubCommander uses chat-ops – or conversation-driven development – to help manage GitHub projects. It creates a simple way to perform privileged GitHub organization management tasks without granting administrative or owner privileges to your GitHub organization members.
- opensource-portal – From Microsoft, this tool is designed to help large organizations with their large-scale GitHub management operations, onboarding and more. This is one of a suite of tools provided by the Open Source Programs Office at Microsoft.
- settings – This app syncs repository settings defined in .github/settings.yml to GitHub, enabling pull requests for repositories.
- zappr - Zappr is a GitHub integration built to enhance project workflows. From Zalando, zappr helps developers to increase productivity and improve open-source project quality by removing bottlenecks around pull request approval and helping project owners halt “rogue” pull requests before they're merged into the project master branches.
- CII Best Practices Badging – From The Linux Foundation, the Core Infrastructure Initiative (CII) Best Practices badge is a way for Free/Libre and Open Source Software (FLOSS) projects to show that they follow best practices. Projects can voluntarily self-certify for free by using this web application to explain how they follow each best practice.
- CodeClimate – Code Climate empowers organizations to take control of their code quality by incorporating fully configurable test coverage and maintainability data throughout the development workflow. It’s free for open source projects!
Tools for communications and collaboration
Of course, open source development isn’t just about the code. It also requires healthy communications and collaborations between a diverse group of people who are working on the projects inside and outside of enterprises,as well as by staff members in a company’s Open Source Program Office.
For that developers can lean on tools they may already be using for other projects, including Internet Relay Chat (IRC), where developers can post inquiries and get quick responses to development-related topics. Another example is TWiki, which is an open source enterprise Wiki and web collaboration platform where developers can discuss code and projects and related topics.
Communications can also be fostered through social media platforms, web portals, open source project repositories and other places where input, questions and discussions can be found and fostered.
Other useful tools include mention-bot from Facebook, which can help get fast input turnaround on pull requests by automatically mentioning potential reviewers for reviews. This is especially appreciated when GitHub projects get too big for community members to subscribe to all of a project’s notifications.
Then there’s Slack, which is an online team project management and communications platform where users can access and share messages and files, organize workflows, perform searches for information and more. Slack can be configured to receive notifications for support requests, code check-ins, error logs and other tasks as well.
And don’t forget your company’s public relations and marketing staff when it comes to shouting out your company’s participation and support of open source. Social media accounts with sites including Twitter, Reddit, Facebook, LinkedIn and others are important, as well as the use of internal and external blogs and websites. Customer Relationship Management (CRM) software, as well as email blasts and newsletters, can help companies keep customers and clients informed about their open source progress.
Tools for corporate-scale GitHub management
When it comes to the tools your company provides and uses for its corporate open source projects, the most important ones are arguably those which help companies manage their corporate-scale GitHub operations. GitHub is a perfect platform for many operations, but for large, complex companies such as Google, Microsoft, Facebook, Twitter, LinkedIn and others, there can be many limitations to using the standard GitHub offerings.
Large enterprises need many more capabilities, including things like identity management, settings and permissions management, security and two-factor authentication enforcement, as well as deeper means to understand and track code repositories.
That’s where specialized, automated tools often need to be built to handle tasks such as onboarding, offboarding, enforcing security policies and giving developers request access to repositories.
Microsoft responded to its own unique requirements by building its own tools to handle many such tasks to streamline and improve its open source program. Microsoft has a healthy presence on GitHub, with some 1,345 repositories and involving about 3,580 developers to date.
“That management of your GitHub presence is something that as you scale, it becomes important. You get a GitHub organization, which is a collection of repositories, and then you get members and you have teams. Managing all of that stuff becomes a little bit complicated, especially if it starts to scale out to hundreds of repositories, hundreds of people and multiple organizations on GitHub.” – Jeff McAffer, Director of the Open Source Programs Office at Microsoft
One of the things Microsoft created was a custom-built self-service GitHub management and onboarding portal for organizing its projects, repositories, and teams. On its simplest level, the web-based portal allows developers to map their Microsoft company ID to their GitHub ID, which bolsters system security and helps simplify the organization of large numbers of developers who are involved in large numbers of important projects.
The portal also lets employees authenticate with GitHub and Microsoft, which creates a “virtual link” of their identities so they can do their work while giving them needed permissions for tasks depending on their work roles. If employees leave the company, the system can be adjusted to remove or reclassify their access rights as needed.
The portal runs on one or more cloud servers and relies on a cache to help with sessions and reduce pressure on the GitHub API. The Microsoft portal, which averages about 1,000 unique users daily as a tool for its engineers, is part of the company’s growing open source efforts, which now includes more than 10,000 engineers who are using, contributing to and releasing open source code.
Corporate Scale GitHub Management
Managing a company's presence on GitHub is a deceptively complex challenge -- maintaining proper permissions, tracking team members, and understanding the many (potentially thousands) repositories. Something as simple as understanding who's who is hard. If an employee is going to be working on open source, then you need to map their corporate identity to their external Github account. This mapping is important for several reasons. Using two-factor authentication (2FA) is a best practice for everyone and a requirement for many companies. Knowing the id mapping enables 2FA tooling and auditing. If an employee has administrative rights over company teams or orgs, and that employee leaves the company, they may need to have their rights adjusted. If a company has multiple orgs on GitHub (not uncommon when companies acquire each other over time), knowing an employee's public id quickly resolves any questions about code provenance and removes the need for them to sign a Contributor License Agreement or CLA.
Similar topics show up in org/repo/team creation and management where issues of security, compliance and scale come to play. Who can create teams? What licenses are used? Can they be marked public? How do you search across dozens of orgs and thousands of repos? How are settings and permissions managed across many repos and orgs? These are all reasonably easy for a few dozen users/teams/repos. But when you scale to dozens of orgs, hundreds of teams and thousands of repos and users, the challenge quickly gets out of hand.
The Azure team from Microsoft has released an open source portal that addresses many of these issues. The portal on-boards employees through a workflow that creates the mapping between their corporate ID and their GitHub.com id, configures two-factor authentication, and adds them to a base set of orgs and teams. The portal also suggests teams and projects that the user might want to join, provides the capability to audit 2fa use, facilitates identifying and removing employees who are no longer with the company, and more. This portal is in use today across all of Azure (3 orgs, hundreds of repos, 2000 users) and is being rolled out to all of Microsoft (40+ orgs, 4-5,000 repos and up to 50,000 users).
Project Statistics and GitHub Data
Open Source Programs Offices also need to report on how individual open source projects are performing or being received. It's easy to look at a project on GitHub and evaluate its health based on number and age of issues, the status of pull requests, how recently the project was updated, stars, forks, and other activity. While this process is easy for a single project, it becomes time consuming when a company has several pages of projects.
Amazon has released the dashboard they use to track this information. Using the tool, one can quickly see if projects are seeing activity or not, how popular projects are, identify if projects need some guidance or additional work, and provides the capability to plug in custom reports. The dashboard also provides some auditing capability for user accounts and 2fa.
Netflix's open source program spans multiple years shaping the architecture of public cloud. Recently, Netflix open source has been evolving to focus their efforts on improving the community health of their open source offerings. They have updated their website to organize projects into consistent areas that align well with their internal engineering organizations. Each of these areas have shepherds that focus on the health of the area using a tool they open sourced - Netflix OSS Tracker - which displays ownership and health metrics across an entire Github organization. To power this tool, tags were introduced across their open source repositories that clearly indicate the lifecycle of any project (active vs. maintenance vs. archived). For more information, see their recent OSS Meetup video presentation.
Also, PayPal released a proof of concept tool called Gander
that provides and open source metrics dashboard. On top of the basics metrics, the tool allows the ability to sort by number of open issues and pull requests. If pull requests or issues are accumulating, it can be an indicator that project ownership has fallen by the wayside.
Insights into the behavior and state of open source projects is critically important for both project teams and potential consumers. Open Source Programs Offices often spend tremendous effort promoting best practices and effectiveness within the projects their company drives. Similarly, taking a dependency on an open source project can have very deep implications for a company (legal, security, support). Operating confidently in open source requires insights.
GHTorrent is an open source research project run by Georgios Gousios (@gousiosg) that archives all public GitHub events, all entities referenced from those events (transitively), and a set of links derived from these events and entities. The data is stored in a combination of MySQL and MongoDB and goes back in time to 2012. This is phenomenal resource for open source communities. It is easy to get raw data on GitHub usage as @ghtorrent tweeted recently:
Yesterday, Github got 10k new users, 43k new repos, 40k new issues, 27k new PRs, 84k new issue coments and 600k new commits #scale— GHTorrent (@ghtorrent) January 27, 2016
The data enables deeper insights such as the interactive programming language usage site shown below.
Microsoft has started working with the GHTorrent team to enable the use of this data more broadly. The first step was to ensure the GHTorrent.org infrastructure is on solid ground. It is now running on Azure sponsored machines with enough power to ensure smooth operation. The next step, underway now, is to make the data widely available and consumable. In addition to the current daily database dumps, they are working to both make all of the data immediately available as it arrives, and pump it all into Azure Data Lake. With the info in Data Lake, users can apply big data technology like Hadoop, HDFS, Spark, HBase and so on to develop the insights they need without having to make and manage their own copy of the ~10TB of data. The team is also looking at enabling personal GHTorrents for more focused use on private repositories.
Outside of what the members within the TODO Group have developed, there are other tools that Open Source Programs Offices may learn from or find valuable.
Zalando has released a tool that provides an open source metrics dashboard called CatWatch. This dashboard gives the company the ability to view which projects are popular on GitHub. The projects must self-identify to be shown on the list, you can see it in action at https://zalando.github.io
Bitergia have released several tools in this area. The first tool is a pairing of MetricsGrimoire toolkit and VizGrimoire that can be used to generate metrics and insights into project health. These tools can include data from source control systems other than GitHub, issue tracking systems like bugzilla, JIRA, mailing lists and other data sources related with Open Source and Inner Source development.
They are updating Metrics Grimoire to a new toolkit (GrimoireLab) and you can follow its progress in their blog. It provides actionable metrics dashboards through a customized Kibana dashboard, but data can be plugged to regular Kibana or other BI tools.
These solutions are best suited for projects with a lot of different data sources repositories. They recently released a GitHub focused service running from biterg.io that is free for a small number of projects.
Hey, nobody said it was going to be simple to move your company into the world of open source. But plenty of other companies, including giants like Microsoft and Google have done this before you and have provided detailed road maps, code, suggestions, and more to make your own journey easier.
The creation of an open source program office and the selection of a package of critical tools to get your efforts started are within your grasp. And they are likely already inspiring great anticipation among your developers, many of whom are probably already contributing to open source projects on their own (or at work, under cover of darkness).
By collaborating on open source projects and inviting others to collaborate with you, your company can gain immeasurable benefits and drive its progress forward with energy and innovation.
Having the right tools is critical to empowering your company’s open innovation.
- Chris Aniszczyk, COO of the Cloud Native Computing Foundation.
- Jeff McAffer, Director of the Open Source Programs Office at Microsoft.