Building an upstream-focused OSPO at G-Research

by VanL

G-Research wanted to invest in its open source supply chain, but traditional engagement didn't quite fit. Instead they built what they call a "muscular OSPO" with a deep investment in advancing upstream projects. Alex Scammon, head of G-Research's OSPO, talked to us about it.

Creating an OSPO focused on the upstream

Can you tell us a little bit about G-Research and about your OSPO there?

G-Research is a what's known as a quantitative research firm. We essentially try to predict movements in markets. What that means in practice is that we take machine learning and data science tools and we try to apply them to the markets and make money out of money. That's how we make a living and and what we are trying to support with the OSPO.

We have been able to build a fairly unique OSPO because G-Research has this peculiar kind of business model where we're not trying to sell to anybody, we're not trying to do any marketing. We're not trying to get people to buy our products or anything like that. Instead, I lead a fairly large number of people in the OSPO and we really get to just contribute to upstream projects. It's guided by what G-Research is interested in - data science, machine learning, ML Ops, and all the infrastructure under that. But within that very, very large realm we get to contribute code philanthropically.

Is there any push by G-Research management to say, "We want to improve the particular products that we are consuming," or is it really anything that would help the machine learning area is in scope?

Everything is fair game. Of course there is some focus on the things that we're using internally, but we can contribute upstream anyplace it will be of value. I'll give you a couple of examples in both directions.

We use the normal data science and machine learning tools like Spark, TensorFlow, Horovod, Dask, Ray, Arrow, Pandas and NumPy and Numba... all of them. You go down the list of normal packages and tools and products that are associated with data science and machine learning and we are almost certainly using them. As a result we run into feature requests and bug fixes from inside G-Research. Our team goes and takes those feature requests and bug fix requests and tries to merge them upstream. We have contributed to all of those projects in some fashion. This is similar to what other organizations are doing in open source.

What distinguishes us is that we have the scope to take a much more active role in open source generally than just contributing minor patches. For example, we are on the technical steering committee for Horovod. But there are examples of things that we have contributed to where we knew that G-Research wasn't going to benefit directly.

Take our work in OpenStack. We did a bunch of work focused on securing internal networks within OpenStack using TLS, including patches making it easy to source certificates from Let's Encrypt. But we knew out of the gate that we weren't gonna be able to use this work. G-Research's systems are air-gapped. We wouldn't be able to use the certificates from Let's Encrypt because we wouldn't be able to get them. But we decided to do the work anyway because we thought it would be a really good and positive community contribution first and foremost.

We don't do extraneous projects all the time. Mostly it's things that G-Research will use. But we do have the leeway and the capacity to make those decisions. If it's going to make the community stronger, we can go and do it.

Using an OSPO technical team to provide value

What's the size of your technical team? It sounds like you are really focusing on on upstream contributions.

We have probably 20 to 25 people who are technically inclined and then another five to ten people who surround that with other types of contributions, like developer relations and tech writing. Those are technically related but not specifically engineering roles. So all in all, it's maybe 25-35 people working upstream as part of the OSPO. It's a good size team.

It's very different than what I generally run into in the OSPO world. What I have seen in other OSPOs, even in companies that are much much larger than we are, is usually a couple of people, maybe four or five. Usually less than ten. There may be some others that deal with non-technical work, but frequently there are only a few active technical contributors.

One question that frequently comes up when talking to leaders of OSPOs is funding. It always seems like any time revenues start to go down, the OSPO is one of the first places they look to cut. It always seems like investing upstream is something they could do later if needed. Have you felt that sort of pressure with your OSPO?

That's a great point. And sort of paradoxically because we have real engineering resources on the team, we've been able to position the team to deliver direct value to the company. We're not just an organization that cheerleads or helps smooth the path to open source contributions or is a somehow just a consultative resource for the organization. Because we are actually delivering engineering value, it's actually easier for me to show them, "This is what we delivered here, this is what this is equivalent to in real dollar values." Right from the beginning, I was careful to draw those parallels and to do the accounting on how much money we were either saving for the company or how much value we were delivering. Because we were doing that math right from the start, if we ever needed to have a conversation about how much value we were delivering, we could give that number. It might be sort of vague in some senses, but no less so than other measurements that companies sometimes use.

So what you're doing with your greater engineering resources is directly engaging with the upstream to solve upstream problems that affect downstream use by G-Research. And that is similar in concept to what other OSPOs are doing. The difference there is that most organizations don't have engineers dedicated to upstream issues - they are working on their existing products. Whereas you are more aggressively managing the upstream supply chain for the software that you're using.

Yes, you're right. The way that a lot of people talk about open source contributions is that if we delivered upstream, eventually the company will benefit from it. Our reality is that by delivering contributions upstream, we benefit right now in positive monetary ways. If you're doing it right, both sides win immediately.

Let me give one example. There's one bit of work that we did in the Nuget project. G-Research is, for historical reasons, primarily a Windows shop. We're moving towards Linux, but there were a few things holding us back from moving beyond .NET 5. We have a lot of stuff running on .NET core and as we tried to upgrade to .NET 6, there was a feature that Microsoft had removed in .NET 6 that made Nuget restores incredibly slow for projects of our size. As a result we were unable as a company to move beyond .NET 5. But that's not acceptable for a business. You can't be stuck at an old version indefinitely. My team worked with Microsoft to develop a fix for this issue that got into .NET 7, and it was so important that Microsoft back-ported it to .NET 6. This allowed us to move ahead with our upgrades and eventually adopt .NET 7, 8, and 9. Just doing some back-of-the-napkin calculations on how much value we received, it was worth well more than the one engineer's time that was spent fixing the issue. It was also all the other engineers in the company, wasting engineering time and effort and losing engineers because they're bored of working on .NET 5. The value of this is just immense. You don't have to be a genius to see that on balance, the half a year's worth of time that we put into this issue is reaping a much bigger reward.

This is a good example of open source theory being put into practice. A lot of companies say, "If we happen to come up with a fix, if we happen to need something, our OSPO will help push it upstream." In contrast, it seems like the benefits that you are getting came from a more conscious effort to say, "No, our effort is to improve the upstream.

Yes, absolutely. It was a conscious decision that we made to make engineering a first class citizen of the OSPO itself, and not merely focused on releasing internal company projects as open source. We do try and engage the internal engineering teams, but we were always clear from the beginning that we wanted a real engineering team as part of the OSPO to make the communities that we were helping better and stronger.

The benefits of an upstream-first orientation

That also sounds like a lot less interaction with lawyers. A much less compliance-centered focus.

That's for sure. We can operate outside of many of those considerations because we're starting from the perspective of contributing upstream rather than trying to externalize work that was previously done inside. Although it's funny that one of the very first projects that we undertook was exactly that - getting some code developed inside into the community. But yes, it is much easier to just be working in the open from the beginning.

Does your focus on upstream engagement help you to approach projects in a different way than you see other OSPOs or other organizations?

Totally. There's a thing that I've been musing about recently where I think we fill a really interesting middle space in terms of the open source contributions that we do. Going back to the example of us fixing that issue in Nuget. The problem that we worked on had been around for three or four years. Microsoft knew about it, but it it just wasn't quite important enough for them to deal with it. But it was also a lot of work and not merely something that somebody with a lower level of engagement could do. It was really involved and took a full time person on our team a long time to understand and fix. Even after our engineer had figured out a way through the technical problem, he had to work with Microsoft to do to make sure that that it got over the line and got into the code base. We fill this this middle ground where you still have pay someone to do this work that isn't the at the top of the priority list for a primary sponsor, but is more than you would expect from unpaid contributors.

If I can make a baseball analogy, every team needs players that can consistently hit solid doubles and triples. Maybe they're not hitting the singles, and they're not getting the home runs and the grand slams. But there's a lot of value in getting those doubles and triples for the team.

Building community trust

Tell me about your community relationships.

We think about our community relationships on a much longer timeline. If we are using a piece of software and the community is strong and healthy, and we believe in the product, we want it to stay around for longer. If it stays around for longer we will have to do fewer migrations to different projects or products. Migrations between technologies cost us a huge amount of time and money. So it's in our best interest for the projects that we're using to make them stronger and make them healthier. We know that if the project isn't healthy, if we don't invest, then we suffer in all kinds of weird, indirect ways.

Maybe some of the good examples are the couple of projects where we have taken over the maintainership of the project. Spark Magic was the first one where we took over the maintainership. At the time there were a lot people who were contributing PRs (pull requests). There was a reasonable community interested in the project, but the original maintainer didn't have the time to review and integrate their contributions. As a result people were creating all of these personal forks of Spark Magic that each had various fixes or features that we would want, or the community would want. We weren't totally sure, to be honest, whether Spark Magic was going to be long term play in our ecosystem. But it was very clear to us that if we had to give up on Spark Magic and move to something else, it would cost us a lot of money and a lot of engineering time to figure out a different way of doing things.

Instead, I had one of our engineers reach out to the maintainer and get permission to review and integrate fixes from others. Our engineer then reached out to all of the other maintainers of all the other forks that had been created, invited them back into the party and invited them to resubmit their requests to the main fork. We worked with everyone to get their fixes in.

That's a lot of footwork.

It was. Although it wasn't a lot of engineering time, it was a lot of community work to to organize all those people and to get it all going again. Now it's a it's its own little community again, that is vibrant and is continuing forward.

The other big one that we did that for was Consul.NET, which was almost exact same story, and now there's a ton of activity around that project too. Lots of people use consul.NET, and it was languishing just because the maintainer didn't have the time to keep of that level of effort. Again, if we had let the project effectively die, our only alternative would have been to spend a huge amount of time and effort doing something else - building it from scratch maybe. The alternative was really doing the community legwork to take it over in the right way and make sure that everyone was aware that the project was back on track and that contributions would be evaluated and receive a response. Even continuing with that work now is still way less difficult than having to shift to another package or build something of our own.

A lot of organizations are very worried about the perception that they are trying to come in and "own" the project. Clearly you were successful in having an effective transition of leadership. Can you describe some of the reasons you were successful, or some of the things you did to be successful in facilitating that transition?

That's a really good question. It's different in each case, but it helps that we have a track record of doing philanthropic work and it's clear that we're not trying to sell anything or own anything. I also think it is just coming over and over again to the community with the right attitude and the right message to maintainers. It's tricky sometimes to build up trust quickly enough, particularly in a situation where there's a project that has been discontinued and there isn't a lot of time to build trust. You need to keep coming back to the table.

And now that we have a couple of examples under our belt, we can point them to say exactly what we want to do with this project. I'd also add that we've done the reverse in a couple of situations. We have built projects in the open that were eventually given away to other companies or other maintainers because we thought that it would be in the project's best interest to be maintained by somebody else. Such as the Aerospike Vault plugin. We built that for our own purposes but ended up handing over maintainership of the project to Aerospike, because we thought it would have a better life there.

Inviting the community into a project

A lot of projects are heavily focused around one particular company, so much that it's not always a community project even if it is open source. Have you had that experience? How do you engage with the entire spectrum of open source?

We've definitely encountered that. In our experience we've seen that with our work on Apache Ozone. There's quite a large community around Ozone, but the driver behind that project is Cloudera. We have willingly engaged with them within the project and it has been interesting to watch. I would say first Cloudera has been really good about inviting us into the project and extending a lot of time and energy in helping our team get up to speed so that we can contribute meaningfully. One reason to do that, even it you use an open source project to drive business, is to avoid the problem where there is only one group driving the project. It can rub people the wrong way and it heightens the risk of your company building the wrong thing. If you have your blinders on and you're not actually listening to the community, you can get too focused on what you want to build and not on what your community really needs.

So how do you avoid having a project be too vendor-focused?

You have to put in the time to invite people into the project again and again. The places where we've been very successful so far have been in the ILGPU project and perhaps the Fantomas project. In the ILGPU project, Marcel - our employee and the project founder - was an individual who decided to build this .NET compiler for GPUs and for .NET programs. He saw there was a hole in what was available and it was a hole he wanted to fill. So he built ILGPU and along the way he created a really great community of people who are really into compilers who also and want to nerd out at this level. Marcel has been very good at building the community. He has done "meet the developer" sessions, I think weekly, and really encouraged the community to come and participate. As a result, there are people who are contributing willingly to to that project just because they're so into it and because they're using it so much.

In the Fantomas project, we were more directly involved in helping create that community. A few years ago the primary developer was getting tired and a little burnt out. All the requests that were coming in and he felt like nobody understood him or the project. All these people wanted things that he didn't want. We actually did a lot of work with the developer too, to to turn his thinking about the community around. Instead of seeing the community as a pool of people who were demanding things from him, he could instead see them as potential partners that could be engaged with him. There was this emotional turnaround that came once the developer made this switch. It was night and day - all of a sudden he was happier. That's one of the more beautiful examples of our work in the open source.

One of the more important things that we helped with in Fantomas was just a couple community-oriented rules. The first was that when someone sends in a contribuion, the only thing you have to do is say "thank you." That is the number one thing - you always thank every volunteer. That rule actually came from me watching my mother run a volunteer organization when I was growing up. The number one rule she had was that you always thanked every single volunteer, no matter who they were, what they did, or even how annoying you thought they were. It didn't matter - you always thanked them. The second was to ask, before you offer to do anything, just ask "are you willing to work on this problem that you just submitted?" It helps them feel they are welcome and it helps them understand that they have a stake in the project as well.

Closing thoughts

Any last thoughts?

There are many ways to create open source programs, and I am not saying that one way is better than another. But I would invite anyone with an open source program office to consider more direct upstream investment. We joke and say that we have a "muscular" OSPO, because we do the work, rather than a "skinny" OSPO that stays on the sidelines. Working in open source becomes more rewarding, both personally and financially, when you do the work to engage with the community - and that means having the engineers available to focus on upstream needs and really make a difference.