All software APIs are about abstractions. The API itself is a way to abstract away implementation details, so that the code consuming the API doesn't need to worry about those details. Most of the time, that's a terrific benefit. Consider something like an operating system windowing API. As a developer, that windowing API means you don't need to learn how to do things like draw individual pixels onto the screen. For the most part, software abstractions are great. But when those abstractions start "leaking," things grow complicated very quickly. In this post, we're going to talk about what leaky abstractions are and how you should approach one when you find it.
In an ideal world, abstractions are airtight. The ideal API requires that you know absolutely nothing about how they do what they do. As a developer, you simply read the documentation, program your implementation, and things work exactly the way they should. You happily go on your way, probably to get a sandwich or something, never thinking about the underlying details behind that API. But many APIs don't work that way. Many APIs "leak" their implementation details out to developers who work with them. The more of those underlying details you need to understand, the more "leaky" we say that abstraction is. One of the most common forms of leaky abstractions are object relational mappers, or ORMs.
One of the difficulties of working with databases in programming languages is that when you make a query, the results are often unwieldy. This is especially true for relational databases, where you're dealing with data in tables, rows, and columns. The idea behind an ORM is that instead of picking through those results using something like a multi-dimensional array, the ORM will read the results and map them to software objects. That functionality provides a variety of advantages to software developers who use it. But it's the sort of abstraction that tends to leak quite a bit.
The promise behind most ORMs is that your developers don't need to understand the details of the database they're working on. In their earliest days, ORM developers would even claim that you could entirely change your underlying database without needing to change a single line of application code. Moreover, many ORMs will try to entirely abstract away the differences between database systems, meaning that you write the same code to query different kinds of databases.
All of that sounds great. When it works, it works terrifically. But speaking as an experienced developer: there are an awful lot of places it doesn't work. For starters, the reality of different query engines means that no matter how good your ORM, you're not really writing the same code for a lot of queries. Yes, you'll write similar or identical code for simple queries. But you'll need to write much more specific queries as the data you model grows more complicated.
The most common way that abstractions leak in ORMs happens when you deal with database-specific features or have to write complex queries. Maybe the most famous example of this comes from ActiveRecord, the ORM bundled with Ruby on Rails. ActiveRecord has long sought to be database-agnostic, supporting the goal that we talked about earlier, of being able to use any database system without rewriting your code. That's an admirable goal, but it means that they avoid APIs for things that aren't supported in every database. The most common of these is the SQL like query. Because that query isn't supported by every database, ActiveRecord doesn't provide a native API for it. So, when a developer wants to make a query that matches on a substring, they need to write a pure SQL fragment to add to their query.
That approach of writing a pure SQL query directly from within your application is a classic example of a leaky abstraction. Instead of knowing how the ActiveRecord API works, you need to know how it compiles your code to the underlying SQL. The way that ORMs leak their abstractions leads to a situation where ORMs can be a great fit for the first phase of a project, but require more and more care and feeding as your project grows.
Let's consider a real-life application and how an ORM abstraction can leak. Consider an example food delivery app. Traditional ORMs provide great help for some of the things you'll want to do. For instance, if a user wants to see a list of their historical orders, that's probably no problem. It's likely that you'll have an object concept called a User, and then that will have a property called orders which lists out all of their historical orders. You might also be able to call additional filtering functions on that orders property, which might allow you to see all of the orders in the last month, or something similar.
But the ORM probably doesn't handle more complicated queries very well. For instance, let's imagine that a user shows up, wanting to place a new order. They likely have a specific type of food that they're looking for, and obviously they'll have a specific area in mind. Showing a user in Boston a list of restaurants that deliver in New York doesn't do any good. So, all of a sudden you need to start bounding your query by geographic parameters. Now imagine that the user is looking for a specific subtype of a particular food. Maybe they want pizza, but they also want a crust that's gluten free. It's unlikely that your database will store every possible subtype of food configuration in explicit columns. So now you're probably querying based on JSON attributes stored in specific JSON column types. Or maybe you're querying information from another source, like your geographic database.
In that situation, the dynamic query you build is probably pretty gnarly. You might be trying to cross reference data from different data sources, or using multiple database functions that don't have an API within your ORM.
No matter what kind of API you write or use, it will have leaks. That's just the nature of software development. So when we say that an abstraction is leaky, it's not the end of the world. It's simply an attribute of the code that you use. And ORMs are one of the most common examples. But that doesn't mean that you need to live with whatever leaks your ORM ships with. One of the most effective ways to ship high-quality software is to understand your APIs and pick the ones that suit your needs the best.
Sometimes, you pick one thing when you start out, and that works well for a while. Then, over time, your database access logic grows more complex, because your business does, too. That's when looking at Neurelo to simplify your database interactions can be a big help. Neurelo helps to break through the inefficiencies of ORMs, helping you to write clearer, more concise logic more quickly. If you're finding your ORM is leaking all over your application, today is a great day to try out a different approach.
This post was written by Eric Boersma. Eric is a software developer and development manager who's done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he's learned along the way, and he enjoys listening to and learning from others as well.