A broad question we ask at balena that encompasses the scope of the problem is, “How can we make the most of our collective skills, talents, intelligence?” A follow-up question that this poses is: “Rather than succeeding despite organizational structure, how does a company benefit from its organizational structure, and, by doing so, become greater than the sum of its parts?” Our answer, in the broadest sense, is this: structure information first, then structure people around information. And we structure information as a feedback loop, or in short, a Loop.
Loops are balena's answer to whether companies should be structured as functional departments or product-faced divisions. Our answer is that functional departments should see themselves as products/platforms and judge their performance the exact same way any other product does. Thus, these Loops are less static “departments,” and more dynamic processes that use feedback to continuously align better with the needs of their customers. Balena currently has four main Loops: balena.io, TeamOS, ProductOS, and CompanyOS. Each Loop is responsible for a different aspect of the company’s operation.
balena.io covers the commercial balena product itself: balenaCloud, balenaOS, balenaEngine, balenaFin, balenaEtcher, balenaEtcherPro, and Open balena. The mission of balena.io is to “unlock the promise of physical computing by reducing friction for fleet owners.” Everything an embedded device fleet owner might need to succeed, including balena’s cloud services, hardware offerings, software infrastructure, and any other customer-facing aspect of the commercial product, is a part of the balena.io Loop.
ProductOS is a platform for building products. The supporting infrastructure of “loops” is itself a “product” that would come from ProductOS. This one is a bit meta and harder to grok, so for now we’ll just leave it at this: ProductOS is the part of balena responsible for coming up with and building better systems to get things done, and its customers are product builders. For our purposes, ProductOS’s biggest function is building and maintaining Jellyfish, balena's proprietary internal software that organizes Loops and provides the communication tools for the organization’s unique structure.
TeamOS strives to deliver the best fit team for each Loop. Anything team members need to make the best of their time in the company lives in TeamOS. This includes everything from how people are hired and how they work within the company, including interactions with other members, to evolving their broadly defined roles.
CompanyOS deals with the needs of shareholders, board members, and other legal and regulatory authorities. CompanyOS covers the financials, legal, and all other administrative aspects necessary to allow every other Loop to thrive in their unconventional structures safe from the harsh world outside.
The four main Loops are the minimum set of Loops required to build a fully-functional company. The main product Loop supports all other Loops financially, TeamOS provides talent to each Loop, CompanyOS provides a legal and financial framework to each Loop, and ProductOS provides software infrastructure to each Loop. Each supports all others and in conjunction cover the entirety of our operational concerns. On this foundation, more (externally available) product Loops can be added, or even some of the “internal” Loops can be made more broadly available.
This is core to our organizational philosophy: The key difference between a product and an organization is that a product can be seen from the outside, examined as a whole, questioned, and intentionally and continuously iterated. Instead of tribal knowledge, half-written wikis, and painful human-driven processes, each Loop aspires to be as smooth and automated as our main product is. If we could make something as complex as fleet management easy and self-serve, why can’t we do the same for the core functions our team needs, especially given the fact that our team is the one thing we spend the vast majority of our budget on? Teams tend to take care of the things that are intended for them last, leading to all sorts of chaos and pain. This bias is why they need to be seen as customers to internal platforms, and it’s how we aspire to stave off entropy in all aspects of balena.
This is all well and good in theory, but how does a Loop actually work? Let’s now turn to the process of a Loop. The reason they’re called Loops in the first place is because, inherent to their function, is a “loop” process of constant feedback that results in continuous improvement. This process is itself also called a loop (lowercase-l) and it defines the way in which the big four (capital-L) Loops improve based on feedback.
All four Loops use the same loop process to improve. Let’s go through the steps of that process to get an idea. We’ll start from the node at the top called “surface”.
The surface is all that is externally available to users, what the loop is responsible for to third parties. Whether it is the main product, or documentation, a blog post, a twitter account, anything that the outside world can see and justifiably associate to the product is part of the surface. In addition, any ongoing operational activities that are required for these things to continue to be available are also included in the surface. The surface should be able to stand alone and provide value, if we were to ignore the dimension of change over time. Several parts of the surface are instrumented such that they emit signals. These could include customer support, social media, backend infrastructure, various pieces of software, surveys, security bounties, etc.
Any signal that is generated from the surface is posted to a channel. Channels such as customer support, outreach, security, are monitored and processed by the team, with two main goals. First, handle the issue at hand, and second, learn as much as possible so the issue does not recur. Of the two, the second is the more important function, as that is preventative for future occurrences of the same or similar issues. All channel signals should ideally be attached to patterns in the knowledge base.
The knowledge base consists of patterns (or symptoms) that the loop receives signals about, across all channels. It’s important that only a single knowledge base unifies all patterns so that signals can be unified across channels and patterns can be identified even if individual channels only give partial and weak signals. As a pattern receives higher volume, or higher urgency signals, its relative importance is raised so that the loop team can see it emerge.
Patterns, preferentially the higher-priority ones, are linked to improvements. Improvements are proposals for changes to the surface (and can apply to any number of components of the surface) that address one or more patterns. The improvements of course are expressed not in terms of the surface, but in terms of a high-level model of the surface, or in short, the Model. The surface and model can be thought of as having a relationship between them as a phenotype has with a genotype. On occasion, improvements can be proposed without being backed by patterns, as an expression of a broader vision, but in our experience even those improvements usually reflect a broad-based learning drawn from reviewing a large number of patterns. Improvements cover the “why, what, how” questions of a change as holistically as possible.
An improvement that is approved for implementation is converted into issues that are filed in the respective source code repositories. From this point on, things start to look a lot like a classic software development process, with issues becoming pull requests (PRs), that in turn get merged into new versions of a particular source code repository.
These versions of various components flow into the various products. A component may be used in multiple products, in which case it can trigger all those products to produce a new release.
The product releases in turn are deployed back into the surface, getting back to where we started with our loop. At any point of the loop, if a technical problem, lack of clarity, confusion, or other unforeseen issue appears, the team is encouraged to submit their issue for a brainstorm. Brainstorms happen almost every day of the week, dedicated to different loops and aspects of each (we usually differentiate architecture or “how” questions from product or “what” questions).
A unique feature of the loop worth mentioning here, since all the aforementioned terms are tracked via interlinked data objects, is that when a new release is deployed to production, we are able to trace back our steps and automatically mark versions as released, issues as resolved, improvements as completed, patterns as addressed, and even reach all the way back into the original channel signals that started the process. For instance, if a feature that was originated via a support request is released, we have enough information in the system to automatically resurface the appropriate support ticket, let the user know of the change made as a result of their request, and ask them if it works for them, etc.
We have been running this workflow, which we call snapback at balena for years, and the surprised reactions of our customers when they realize that a support request led to material improvement without any need for payment or cajoling, and that we even remembered their original thread to inform them, are priceless. It is this kind of otherwise-impossible operation that makes us hopeful that the loop is the foundation of an entirely novel approach to collective intelligence, one that can even exhibit collective memory of specific contexts, in a way organizations typically don’t.