Making the Kitchen Great Again
A Retrospective on OpenStack & Chef
This was influenced by Graham Hayes’ State of the Project for Designate.
I have been asked recently “what is going on with the OpenStack-Chef project?”, “how is the state of the cookbooks?”, and “hey sc, how are those integration tests coming?”. Having been the PTL for the Newton and Ocata cycles, yet having not shipped a release, is the unthinkable, and deserves at least a sentence or two.
It goes without saying, this is disheartening and depressing to me and
everybody that has devoted their time to making the cookbooks a solid
and viable method for deploying OpenStack. OpenStack-Chef is among the
oldest1 and most mature solutions for deploying OpenStack, though it is
not the most feature-rich.
*TL;DR* if you don't want to keep going - OpenStack-Chef is not in a good place and is not sustainable.
OpenStack-Chef has always been a small project with a big responsibility. The Chef approach to OpenStack historically has required a level of investment within the Chef ecosystem, which is a hard enough sell when you started out with Puppet or Ansible. Despite the unicorns and rainbows of being Chef cookbooks, OpenStack-Chef always asserted itself as an OpenStack project first, up to and including joining the Big Tent, whatever it takes. To beat that drum, we are OpenStack.
There is no cool factor from deploying and managing OpenStack using Chef, unless you’ve been running Chef, because insert Xzibit meme here and jokes about turtles. Unless you break something with automation, then it’s applause or facepalm. Usually both. At the same time.
As with any kitchen, it must be stocked and well maintained, and OpenStack-Chef is no exception. Starting out, there was a vibrant community producing organic, free-range code. Automation is invisible, assumed to be there in the background. Once it’s in place, it isn’t touched again unless it breaks. Upgrades in complex deployments can be fraught with error, even in an automated fashion.
As has been seen in previous surveys2, once an OpenStack release has chosen by an operator, some tend to not upgrade for the next cycle or three, to get the immediate bugs worked out. Though there are now multinode and upgrade scenarios supported with the Puppet OpenStack and TripleO projects, they do not use Chef, so Chef deployers do not directly benefit from any of this testing.
Being a deployment project, we are responsible for not one aspect of the OpenStack project but as many as can be reasonably supported.
We were very fortunate in the beginning, having support from public cloud providers, as well as large private cloud providers. Stackalytics shows a vibrant history, a veritable who’s-who of OpenStack contributors, too many to name. They’ve all moved on, working on other things.
As a previous PTL for the project once joked, the Chef approach to OpenStack was the “other deployment tool that nobody uses”. As time has gone by, that has become more of a true statement.
There are a few of us still cooking away, creating new recipes and cookbooks. The pilot lights are still lit and there’s usually something simmering away on the back burner, but there is no shouting of orders, and not every dish gets tasted. We think there might be rats, too, but we’re too shorthanded to maintain the traps.
We have yet to see many (meaningful) contributions from the community, however.
We have some amazing deployers that file bugs, and if they can, push up a patch.
It delights me when someone other than a core weighs in on a review. They are
highly appreciated and incredibly valuable, but they are very tactical
contributions. A project cannot live on such contributions.
Where does that leave OpenStack-Chef? Let’s take a look at the numbers:
+----------+---------+ | Cycle | Commits | +----------+---------+ | Havana | 557 | +----------+---------+ | Icehouse | 692 | +----------+---------+ | Juno | 424 | +----------+---------+ | Kilo | 474 | +----------+---------+ | Liberty | 259 | +----------+---------+ | Mitaka | 85 | +----------+---------+ | Newton | 112 | +----------+---------+ | Ocata | 78 | +----------+---------+
As of the time of this writing, Newton has not yet branched. Yes, you read correctly. This means the Ocata cycle has gone to ensuring that Newton just functions. In a virtual quasi-vacuum, without input from larger scale deployments, who are running releases older than Newton, reporting bugs we’ve fixed in master. Supporting Newton required implementing support for Ubuntu 16.04, as well as client and underlying cookbook changes, due to deprecations that started prior to Newton. Here is the output from berks viz for a top-down view into the complexity on just the Chef side.
For the Pike cycle, Jan Klare will be reprising the role of PTL. I do not intend to speak for him, but there are few paths forward in the Big Tent:
- Branching stable/newton and stable/ocata with the quickness.
- Improve OpenStack CI to the point of being able to trust it again for testing patches, as well as extend testing scenarios (including multinode).
For branching stable/newton, the external CI has been proving useful in overall
confidence in cutting a release. We’re way behind schedule, but nearly there. I
have begun working on implementing some basic multinode gates, as our allinone
no longer fits within the confines of the 8GB instances. But, it’s Chef, so
triangle wheels, yo. Some of the cross-project efforts translate to Chef, but
not all. With square spinners.
So… how did this happen?
As was in the case of Designate, as is in the case of OpenStack-Chef. There is no one single reason or cause that arrived us at this point.
The main catalyst was internal support shifting, which impacted the sponsored developers and contributors. OpenStack-Chef became less and less a priority, and one by one they shifted to other focuses. At the Austin 2016 Summit, we said farewell to all but the PTL and one core. This put OpenStack-Chef in a bad place given its mission and scope, but onward we go.
Due to the volume of work done by this small group and the lack of feedback during development, it became more and more difficult to tell when a release could be considered “done”. We could no longer trust our CI framework, as the developers with intrinsic knowledge had been refocused, with little more than commit history to go on.
Users were okay with leaving us work, which we added to the heap. This, with the departure in contributors, resulted in the majority of the development being funded by just two companies, which left the project at risk to changes in direction by those companies. Without regular feedback or guidance beyond release notes and the occasional chat in another project’s channel, the focus shifted away from features to just ship it, as long as it passes allinone and/or multinode locally, if there’s time. Does it pass lint/unit/style? Fuck it. Ship it, deal with the fallout. Yeah. This is bad on so many levels.
The Big Tent really did not do as much as advertised for OpenStack-Chef, as harsh of an opinion as that sounds. Larger, more well-funded projects have since created processes, frameworks and test suites that were developed for their own use cases, not necessarily taking into account Chef’s own blend of automation. That left us having to go and discover how to make fire on our own to make the cookbooks work on each supported platform and release of OpenStack. In the Big Tent, we were effectively left to our own devices. Just another OpenStack project. We numbered nine cores when we moved from StackForge to the Big Tent. Developer peak, though we did not know it yet.
Initially, the cookbooks had a very heavy dependency: Chef Server. If not Chef Server, Chef Solo, which still had its own quirks, and nobody liked Chef Solo anyway. Not even Chef Solo liked itself. During the Juno cycle, we switched to the Chef Development Kit, which gave us chef-provisioning. This decreased turnaround time for testing patches being submitted, and boosted confidence all around. Until Juno, it was difficult to run functional tests against the cookbooks. That’s when we discovered how to create fire. We could run OpenStack! In virtual machines! On our laptops! OMyG you guys! Suddenly, OpenStack on the laptop became easy, push button, single command, automated. We could test a patch without a long spin-up. With that, came integration gates, and periodic jobs. From days to minutes. We were cooking with gas! But… let’s not make those integration gates voting… yet.
Mitaka brought a significant overhaul and simplification, with the introduction of a multinode chef-provisioning recipe and more modular cookbooks. The pieces finally existed, but the damage had been done, and unfortunately, this momentum did not last. Internal priorities changed within companies sponsoring developers, many of which could not be fully committed in the first place, and we started shedding contributors, which happens. At this point is usually where someone comes in to either play Grim Reaper or lifesaver. By Austin, our numbers waned until just two cores remained, Jan and myself. We could not be in the worst of locations to communicate, he in Germany and I in California.
In the Newton and Ocata cycles, development progressed in a lurching capacity without a team. Due to the overhaul in Mitaka, patches slowed in frequency from the outside community, many of who continued deploying and running on older, EOL branches, or got frustrated enough to switch to other automation flavors. The remaining team had little overlapping time to communicate, being on different continents in conflicting time zones. What was difficult to do with a larger team spread across three continents and five time zones became impossible with just two. Day jobs increasingly took priority over OpenStack-Chef care and feeding, and some cookbooks started to go rancid (sorry, Ironic, Sahara, Swift and Trove. nobody was able to support a deployment with you). Interaction within the development team was limited to an hour or two a day, eventually down to once or twice a week if we had time. Day jobs proceeded to consume the development team, with sporadic development as the months ticked on.
Over the Newton cycle, one cookbook was offered, EC2, with inadequate coverage for our support matrix. The most desired integration API. In the end, it was not integrated due to time and commitment to support such a feature, having inadequate resources at our disposal. During Ocata, the project had one cookbook contributed from the community, Murano, that could be integrated, and grew an appendage in the form of the client cookbook. It is the closest anyone has gotten to new features since the Mitaka cycle. We added one core reviewer during this time.
Communication is a big part of any project, particularly a geographically diverse effort like OpenStack. Prior to the Big Tent, we held weekly meetings using Hangouts, which were open and publicized for the mailing list subscribers and channel denizens. Upon joining the Big Tent, we gave up the regular face-time in favor of text-based IRC meetings, per governance. Without the high bandwidth requirement of a video call once a week, one by one, cores had day job meetings do what they do, and take priority. In the Newton cycle, we relinquished our weekly time slot after it became apparent that neither of us could make the meetings. We have not held a scheduled meeting since then, as it is next to impossible to carve out adequate overlapping time.
We still have many of these problems to this day. Documentation is a mess or nonexistent. Despite the flexibility of the tooling, users have but two representative deployment examples: allinone or a rudimentary multinode. OpenStack-Chef gained modularity at the expense of features, and there was an overwhelming non-reaction to the deprecation of those features.
All of this results in a project that is not very friendly to new users, and
Chef does not look as attractive as a deployment option as other, more
feature-rich flavors. This has real business decisions behind it. One only need
look at the steady usage decline in the surveys to see how negatively things
appear to existing and new users. This, for a project that has roots in the
very cubicles OpenStack was born.1
But it’s not all bad
For all the negativity, this story has upsides. I call to the people who actually use this software in their deployments, in whatever shape it’s in, to not abandon OpenStack-Chef, or retire it to bitrot. We need help, not funeral arrangements. Share your pain, so that we may find a way forward together, not alone.
In my time with the project, I’ve gotten to know that there are some pretty big names that leverage Chef in their deployments, and some of them even use it for OpenStack. Some cookbook forks also exist, all serving to solve the same problems that face OpenStack-Chef. Without feedback from real-world deployments, OpenStack-Chef will continue to wither on the vine. This fragmentation harms more than it solves.
What do we need? It’s easier to list what we don’t need. Developers, tooling, documentation, testing, any and all are welcome and greatly appreciated. We don’t need much, but what we are able to do is limited by our size and time available.
We need developers with time and funding budgeted for contributing within the OpenStack ecosystem. We need better representation at events such as the PTG. For the first PTG, no OpenStack-Chef members will be attending, though we intend to meet up in Boston, maybe. Given our physical locations in proximity to Atlanta, it made sense to stay home and communicate over IRC/code review. It doesn’t mean our development has ceased, we’re just too few and far away for it to make sense.
We also need help from cross-project teams to understand and assimilate the work done to solve the problems that we are working to solve. We are all working toward the same goal, though with a different dialect and coat of arms.
OpenStack needs choice in how to OpenStack. Limiting to a few dialects of automation makes things look less like an ecosystem and more like a distro, which is fine, if that’s how people want to go about it. We can talk about that, too.
I am happy to talk to anyone about how they can help. OpenStack-Chef has roots in the pioneers of OpenStack, and, in my (not so humble) opinion, is way too nifty to just let fall to the wayside.
I do not have team visuals to represent how the team has grown and shrank over the years, so let me leave you with some mental visuals. At the end of Mitaka, we numbered nine. At the start of Ocata, we numbered just two. In Boston, we anticipate there will be, hopefully, all three of us, representing the sixteen active subprojects that consist OpenStack-Chef.
Update - April 29, 2017
I will be in Boston for the Summit, but I will be the only one representing the team. I would love to connect to anyone willing to talk about anything at all, be it the application of DevOps, Chef, OpenStack, beer, whiskey… you get the idea.
I would especially love to connect with operators that are using Chef. Your feedback is crucial, and much appreciated.
Update - May 15, 2017
The User Survey results4 (page 43!) came out, and it revealed that OpenStack-Chef/Chef OpenStack/The Kangaroo still holds out at as the #4 deployment tool of choice. It really puts things in perspective. THANK YOU!