Separating SQL from code with files: Initial Thoughts

Written by Matt Tolman

Published: Nov. 28, 2022

Estimated reading time: 21 min read

Recently I've been working on a side project (that may or may not get published publicly, I haven't decided yet). For this project, I decided I'd try a lot of new things, not all of which worked out as well as I thought (which may be why it stays private). I've tried a new Java/Kotlin framework, briefly tried a new JVM build system (and then went back to Maven), tried new control flow mechanisms, new error handling techniques, etc. It's been more of a playground for new ideas than a true product.

One of the new ideas I've been trying deals with managing SQL queries. A lot of applications and frameworks I've worked with rely on Object Relational Mappers (ORMs) to manage the interface between code and the database, which is sort of okay, but it has always felt weird to me, especially once you get multiple data types involved in the same request. Which is why I'm experimenting, to see if there's a more natural or flexible way of interacting with the database.

SQL through ORMs

For instance, let's consider a site where users can log in and select different site-wide settings, such as themes, fonts, etc. To make our logic simpler throughout the code, we want to ensure that users always have a settings entry. This means that when we create a user, we'll add a record to the users table *and* we'll add a record to the user settings table. To do that with an ORM, we would have code similar to the following.


// Imports are hidden for brevity

// src/main/kotlin/User.kt
@Dao
data class User (val email: String, @JsonIgnore val password: String)

// src/main/kotlin/UserSettings.kt
@Dao
data class UserSettings(
    val userId: Long,
    val theme: String,
    val font: String,
    // ... Any other settings go here
)

// src/main/kotlin/UserRepository.kt
class UserRespository extends Repository<User, Long> { /* Methods go here */ }

// src/main/kotlin/UserSettingsRepository.kt
class UserSettingsRespository extends Repository<UserSettings, Long> { /* Methods go here */ }

// src/main/kotlin/UserController.kt
@Controller
@RequestMapping("/users")
class UserController {
    @Autowired lateinit var userRepository: UserRepository
    @Autowired lateinit var userSettingsRepository: UserSettingsRepository

    @RequestMapping("/new")
    @ResponseBody
    fun createUser(params: NewUserReqest /* class definition not shown for brevity */) {
        NewUserValidator(params).assertIsValid() /* class & method definition not shown for brevity */
        val id = userRepository.create(User(params.email, PasswordHasher.hash(params.password))
        // User is in the DB but not settings
        // A crash here could be bad
        userSettingsRepository.create(UserSettings(id, "light", "Consolas"))
    }
}

// src/main/kotlin/App.kt
@SpringBootApplication
class App {}

fun main(args: Array<String>) = SpringApplication.run(App::class, args)

ORMs have some issues. First, there's a lot of "abstraction" boilerplate. Not only are we defining the objects we're working with, but we're also defining each repository for each object we're using. So anytime I add a new table or data type, that's two more classes that are being added. Additionally, anytime I want to use a table, I have to inject the specific repository for that table. That's a lot of friction for adding a table to the DB.

There's another side effect too. Since my classes and my database are tightly coupled, I can't change one without impacting the other. This means that optimization techniques, such as normalizing or denormalizing the database, must be reflected in code. That increases costs for optimizations dramatically.

We also have another side effect. Since we're using an ORM, we can no longer hire a DBA to come in and just profile or optimize our SQL queries. All of our queries are generated by the ORM, and that's it. If we wanted to optimize our queries, we'd have to change the code (and the DB structure).

Finally, there's one last piece that seems off. Whenever we have to work with multiple related tables, we often have to make separate requests for each table. For example, above we make one request to create the user entry, we then get the id for the new user, and then we make a second request for the settings. Not only is this slower (twice as many network requests), but it's a lot more dangerous, especially if the ORM doesn't wrap the entire HTTP request in a transaction.

Assume that there isn't a transaction surrounding the request. This is fairly realistic for older systems or systems that have "custom" ORMs. When a failure occurs between the two requests (either due to faulty logic being introduced or a network connection being lost), suddenly our database has corrupted data. For instance, we may have intended for a user to always have a user settings entry. But if we fail after inserting a user to a database but haven't saved the settings entry yet, then the settings entry will be missing. If there's a single endpoint which assumes that our intended action is always the case, then the user will encounter an error whenever their request hits that code path. This could result in weird settings or the user not being able to "log in" since the site crashes as soon as it tries to load their profile.

Let's assume we use an ORM which wraps each incoming HTTP request in a single database transaction. If the HTTP request is wrapped in a database transaction, the database will reset itself on a failure, so no partial writes, no missing setting entry. So everything's good, right?

Not quite. Even if there is a transaction, the system can get in a bad state. Imagine that we had an email sending system, and we wanted to always send a welcome email whenever a user is created. Someone who wanted to guarantee that the email would always be sent might add that code to the user repository so that no one ever forgot to call the "email sending method." Now, I know that this is not "best practice" to add it to the UserRepository since it breaks "single responsibility", but I've seen many devs write code like this, so it's almost certainly in a few different production systems.


class UserRespository extends Repository<User, Long> {
  @Autowired emailService: EmailService
  
  override fun create(user: User) {
    super.create(user)
    emailService.sendWelcomeTo(user.email)
  }
}

As a reminder, here is our user create code:


@RequestMapping("/new")
@ResponseBody
/* class definition for User not shown for brevity */
fun createUser(body: User) {
  NewUserValidator(body).assertIsValid() /* class & method definition not shown for brevity */
  val id = userRepository.create(User(body.email, PasswordHasher.hash(body.password))
  userSettingsRepository.create(UserSettings(id, "light", "Consolas"))
}

With one little hidden change, the impact of a failure condition changes. The database will realize that the saving the settings failed and will rollback the user record. But the email system won't "unsend" the welcome email, so the user will still receive it. To the user, they get an email saying they have an account, but when they try to log in or recover their account they are told it doesn't exist.

Now imagine that instead of simply sending an email, we processed a payment for a service we offered. Now the user is without the service we promised them, and we've also taken their money.

With ORMs that need multiple requests, these types of problems are often hidden, or at least non-obvious. I've seen this to be the source of some hard-to-find and harder-to-fix bugs where developers usually add some sort of "hack" and a comment saying "this fixes something scary, don't touch" (and I almost always come across that comment when trying to fix the scary thing that the hack supposedly "fixed").

So, what alternatives are there to ORMs?

SQL through Stored Procedures

One practice I've done in the past is to have the SQL I need be written as stored procedures in the database. There are a few nice things to this approach. The first is that we have separated our code and our SQL. This means that we no longer have to reflect the logical SQL structure of tables in our code (or vice versa). It also means that DBAs can start looking at our queries and begin to optimize as needed.

It also means that our code can now run more complex queries in a single database request. For instance, instead of having two requests to create our user and user settings, it could be done in one request, as follows:


-- SQL code
-- newUser stored procedure
CREATE PROCEDURE newUser
  @email VARCHAR(255),
  @password VARCHAR(64),
  @theme VARCHAR(25),
  @font VARCHAR(32)
AS
BEGIN TRANSACTION
  DECLARE @ID int;
  INSERT INTO users (email, password) VALUES (@email, @password);
  SELECT @ID = scope_identity();
  INSERT INTO user_settings (user_id, font, theme) VALUES (@ID, @font, @theme);
COMMIT
GO


// Updated kotlin code
  
// DAOs are no longer needed, so we're excluding them in favor of maps until
// they are useful

// src/main/kotlin/UserController.kt
@Controller
@RequestMapping("/users")
class UserController {
  @RequestMapping("/new")
  @ResponseBody
  fun createUser(body: Map<String, *>, @Autowired dbConn: Connection) {
     /* class & method definition not shown for brevity */
    NewUserValidator(body).assertIsValid()
    val pdo = dbConn.prepareStatement("EXEC newUser @email =?, @password = ?, @theme = ?, @font = ?")
    pdo.setString(1, body["email"] as String)
    pdo.setString(2, PasswordHasher.hash(body["password"] as String))
    pdo.setString(3, "light")
    pdo.setString(4, "Consolas")
    pdo.execute()
  }
}

// src/main/kotlin/App.kt
@SpringBootApplication
class App {}

fun main(args: Array<String>) = SpringApplication.run(App::class, args)

A few things to note. One is that we now have some SQL in the code, namely the command to run a specific stored procedure. This isn't terrible as now we're only bound to the stored procedure name and parameters. However, it does open up the door for a developer to start using more and more custom SQL queries. This is especially problematic since prepared statements are harder to manage than inline queries (more on this below), so extra precautions will be needed to make sure that our inline SQL stays minimal.

The other thing to note is that we have prepared statements being used. In JDBC, prepared statements are fairly verbose, but they do give a lot more safety than just concatenating values into the SQL string, so they're worth it.

One nice thing is that we have gotten rid of four classes: User, UserSettings, UserRepository, and UserSettingsRepository. None of those classes are needed anymore since we can just use the built-in map type for the incoming data (and we can just use the map type for any data returned directly to the user).

One of the downsides of this approach is that we now have to figure out how to manage the stored procedure. Generally this is done through a database migration, but this now means that any change to how we query the database (such as getting an additional column in a SELECT statement) requires a migration. I'm not a huge fan of these additional migrations since they add a lot of friction to the development pipeline, especially in early stages of development when SELECTs and JOINs are in flux as teams try to figure out what data is needed to accomplish the (changing) feature requests.

Though one thing I did notice is that the system quickly broke down for a lot of teams who were annoyed with having to write a new DB migration just to add a new field to a query. This was especially common if a team member had previously caught the attention of the DBAs before, so their migration scripts were often more heavily scrutinized. These problems were amplified by poor management practices, frequent changes to requirements, and an insane pressure to deliver features quickly and consistently (they literally announced that they would fire devs who delivered the least amount of features each year).

The result was that devs usually ended up turning their SQL tables into a key/value document store where all the data was a giant JSON blob. This meant they only ever had a few stored procedures (add entry, update entry, get entry, delete entry) which were set up once, and then they never worried about SQL migrations again. Instead, they would load everything into memory and process the data using their programming language of choice and just ignore the power of the database. All of this happened because a) the friction to do changes in code was much lower than changing the database and b) there was enough exposure to the painful friction.

While not every organization which does stored procedures for everything will encounter these issues, it is important to look out for developers taking shortcuts (e.g. key/value stores and massive JSON blobs) and to take time to understand why these shortcuts are happening. It's usually not because the developers are bad at their job, rather it tends to happen because the system the company setup encourages the "bad practices" (e.g. high friction combined with frequent exposure to that friction).

Teams that already have a database migration system will find this system pretty straightforward to implement. Many teams with a conducive environment and sense of discipline could maintain this pattern for a while. However, scaling this pattern organization wide and having it succeed is extremely difficult, if not impossible for many organizations.

SQL as files

Now we get to the system that I've been experimenting with. It's similar to the stored procedures with one main exception: queries are stored in resource files inside the codebase instead of as stored procedures inside the database. Below is an example to load and run SQL queries:


-- src/resources/createUser.sql
-- Note: you sould use named parameters when using a driver which supports named parameters
BEGIN TRANSACTION
  DECLARE @ID int;
  INSERT INTO users (email, password) VALUES (?, ?);
  SELECT @ID = scope_identity();
  INSERT INTO user_settings (user_id, font, theme) VALUES (@ID, ?, ?);
COMMIT


// DAOs are no longer needed, so we're excluding them in favor of maps for now

// src/main/kotlin/SqlLoader.kt
@Component
class SqlLoader {
  @Autowired var lateinit dbConn: Connection
  
  fun loadFile(sqlFile: String): PreparedStatement {
    // Load contents of SQL file from resources and return it
    // Caching can be added if needed
    let uri = Path.of(Objects.requireNonNull(Sql::class.java.classLoader.getResource(file)).toURI())
    
    // Creating prepared statement since this is an SQL script
    return dbConn.prepareStatement(Files.readString(uri))
  }
}

// src/main/kotlin/UserController.kt
@Controller
@RequestMapping("/users")
class UserController {
  @Autowired sqlLoader: SqlLoader
  
  @RequestMapping("/new")
  @ResponseBody
  fun createUser(body: Map<String, *>) {
    NewUserValidator(body).assertIsValid() /* class & method definition not shown for brevity */
    val pdo = sqlLoader.loadFile("createUser.sql")
    pdo.setString(1, body["email"] as String)
    pdo.setString(2, PasswordHasher.hash(body["password"] as String))
    pdo.setString(3, "light")
    pdo.setString(4, "Consolas")
    pdo.execute()
  }
}

// src/main/kotlin/App.kt
@SpringBootApplication
class App {}

fun main(args: Array<String>) = SpringApplication.run(App::class, args)

A few things to note about this example. First, it's longer than just using the prepared statement since we now have a class to manage loading our scripts. However, that class can be used for all of our queries. It's not a "one class per table" situation. It is more overhead, but it's reusable overhead, which is more palatable than overhead for "abstraction's" sake.

Second, we've removed all SQL from our code, including generated queries from ORMs.

Furthermore, we've removed the need to manage prepared statements and views. Instead, we now have SQL files that devs can easily change, iterate on, commit, review, and revert. The separation of SQL like this also comes with a nice advantage. A lot of SQL-based tools can now be used to write and debug the query without having to run the application code. This is especially helpful for IDEs that have SQL tooling built in, like IntelliJ.

One thing that isn't as great is we now have a separation between how the substituted parameters are declared and where they're used. If someone changes the query slightly, then we could end up with code substituting in the wrong values. This problem is amplified with the built-in JVM drivers since the built-in drivers only support positional parameters, not named parameters. However, there are many drivers for various languages with named parameter support, and using named parameters greatly reduces this issue.

We are also referencing file names throughout our code. If a popular query gets renamed, it would require lots of changes (prepared statements has the same problem too, but not ORMs).

My solution to some of these problems is to have an enum of all possible queries and to have the enum maintain a "mapping" between parameter names and their SQL type and position. For this, I defined an enum class with all the queries and their parameters. Then, in my code instead of specifying a file name, I specified an enum and passed in the values. I also set up my SQL queries to use variables, that way the way data is used can be adjusted and moved around without breaking the whole integration flow. Below is what my enum classes looked like and how I used them:


// src/main/kotlin/SqlQueries.kt
enum class SqlQueries(
  val sqlFile: String,
  val params: List<SqlParam> /* SqlParam maps JVM types to SQL types; ommitted for brevity */
) {
  NEW_USER(
    "create_user.sql",
    java.util.List.of(
      SqlParam("email", String::class, 1),
      SqlParam("password", String::class, 2),
      SqlParam("font", String::class, 3),
      SqlParam("theme", String::class, 4),
    )
  ),
  // ... rest of queries
  ;
}

// src/main/kotlin/Sql.kt
@Component
class Sql {
  @Autowired sqlLoader: SqlLoader
  
  public func execute(query: SqlQueries, params: Map<String, *>) {
    val pdo = sqlLoader.load(query.sqlFile)
    addParamsToPreparedStatement(pdo, query.params, params)
    pdo.execute()
  }
  
  private func addParamsToPreparedStatement(pdo: PreparedStatement, queryParams: List<SqlParam>, params: Map<String, *>) {
    // ...    
    // Lots of boilerplate code with reflection to map an "Any"/"Object" type to the
    //  correct SQL type and position
    // Algorithm was pretty simple though
    // For every SqlParam in query.params, pull out the map entry with the same name
    // Then, coerce the value into the proper SQL data type (e.g. an array to an SQL array)
    // Finally, call the proper "addX" method (e.g. addString) on the prepared statement
    //  and pass in the index in SqlParam
    // ...
  }
}

// src/main/kotlin/UserController.kt
@Controller
@RequestMapping("/users")
class UserController {
  @Autowired sql: Sql
  
  @RequestMapping("/new")
  @ResponseBody
  fun createUser(body: Map<String, *>) {
    NewUserValidator(body).assertIsValid() /* class & method definition not shown for brevity */
    
    // This is where we run our query
    sql.execute(
      // Specify which query to use via an enum
      SqlQueries.NEW_USER,
      // Map of parameters to values
      mapOf(
        "email" to body["email"],
        "password" to PasswordHasher.hash(body["password"] as String),
        "theme" to "light",
        "font" to "Consolas"
      )
    )
  }
}


-- src/main/resources/create_user.sql
WITH
  variables (v_email, v_password, v_font, v_theme) as (
    VALUES (?, ?, ?, ?)
  ),
userInsert AS (
  INSERT INTO users ("password", email) SELECT v_password, v_email FROM variables
  RETURNING user_id
)
INSERT INTO user_settings (user_id, font, theme)
SELECT user_id, v_font, v_theme FROM userInsert, variables;

Again, it's not perfect. There's still potential for SQL parameters to not get set properly, but it's a better (and more readable) than before, and it does shield a lot of the code from changes. For instance, if we did change the order of our parameters, we'd only have to adjust the query list in the enum and the rest of the code would stay the same. Likewise, if we removed a parameter, we wouldn't have to change code since it would just ignore the extra keys in the map. Or if we renamed a file, we'd only have to update the enum and not the entire codebase.

Of course, there is the question of "why not use named parameters"? The answer is that named parameters don't exist in vanilla JDBC. I'd have to pull in a library for that. Right now, I'm trying to keep the ideas and libraries I'm playing with limited to what JDBC does since I'm already pushing it for a quick experiment project.

That said, not all languages and code bases are limited to those restrictions. If we were in Node and had a library with named parameters, we could do something like the following:


// sql.ts
export enum QUERIES {
  NEW_USER = "new_user.sql"
}

export async function runQuery(query: QUERIES, params: {[key: string] :any}): PreparedStatement {
  return await (new PreparedStatement(await fs.readFile(query))).execute(params)
}

// user_controller.ts
import {QUERIES, runQuery} from 'sql.ts'

export interface User {
  email: String,
  password?: String
}

export async function create_user(user: User) {
  await runQuery(
  // Specify our query with a constant
  QUERIES.NEW_USER,
  // Map of named parameters
  {
    "email": user.email,
    "password": hashPassword(user.password),
    "font": "Consolas",
    "theme": "light"
  }
  )
}


-- resources/create_user.sql
WITH userInsert AS (
I  NSERT INTO users ("password", email) VALUES (:email, :password) RETURNING user_id
)
INSERT INTO user_settings (user_id, font, theme)
SELECT user_id, :font, :theme FROM userInsert;

The concept does become simpler with the right tooling available. At some point, I may look at adding in a library with named parameters to my Kotlin implementation. For now though, what I have works for my personal experiment project, and I'll probably continue the experiment with a non JVM language in the future anyways.

I am looking forward to playing around with the concept more. The idea hasn't been fully tested yet. I do need to look at long-term maintainability, which will take time. At some point, I'll need to see how it does with collaboration.

In theory, it should be pretty straightforward for a collaborative environment. All the queries are defined in enum types which can be discovered with autocomplete and enforced with a compiler. However, theory does not always align with practice. For instance, having to create a new enum for every SQL file could add enough friction that the system starts to break down when scaled up. Or, perhaps having to write SQL queries has more friction than creating boilerplate classes when scaled up.

The idea has promise, and I personally like it so far. We'll see how I still feel in a year or two.