Coding with Matt

Complexity Killed the Code

Matt Tolman — Wed, 06 May 2026 01:19:00 GMT

Recently I switched jobs. I went from a very large company to a smaller company. I went from everything self-hosted and on-premises (source control, CI/CD, ticket tracking, service deployment, feature flag management, internal package mirror, etc.) to everything being hosted on a cloud SaaS provider. It is very painful. Not because of anything that my new company has done wrong (they’re fabulous), but because the service providers are doing a lot of things wrong.

So, I checked the service providers out. I saw what they’ve been up to. Every single one that has had reliability issues, without fail, is generating as much code as they can with AI. Every single one.

The stable SaaS companies? Very little to no mention of using AI. Something is rotting in the software world. I’ll leave it for you to find out, because that’s not what I’m interested in writing about.

What I am interested in, is the secondary trend that I found, and am experiencing, and that I’m seeing in tech. The reliable systems were a simple architecturally. Their blogs talked about monolithic architectures. Or they were in the very beginnings of micro services, so their service was still mostly monolithic. Many had self-hosting options. Monorepos were common. The SaaS companies that had the most issues were also overly complicated from a software technology perspective. Tons of talks of microservices, aggressive scaling, eventing, micro frontends, micro repositories, chained deployment pipelines, etc1. Anyone who architected these software “solutions” would be unable to deploy on anything less than an entire data center, much less a handful of servers (or even one server). Doesn’t matter how big the servers are. These people have designed a system with so many pods, so many clusters, so many dependencies that they exceed the operational capacity of an entire rack in just container overhead.

The scary thing was that there was no sign of the complexity getting better, only of it getting worse. And the reliability is going with it.

And yet, I can’t help but feel like just saying “make things simple” won’t help. People justify complexity a lot. Sometimes too much. And sure, the naive solution won’t always scale. But it’s hard to understand simplicity when our textbook examples of how to do basic things are overly complex.

The Textbook CRUD Application

I’m going to start with a textbook CRUD application. Most SaaS code grows from these. I’m going to assume we’re using middleware for session management, authentication, and route authorization. That means when I show an endpoint, I’m not going to show the code loading the session from a datastore and verifying it (or getting it from a JWT, or whatever). I’m just going to show the CRUD endpoint. We’ll also be able to assume that whatever the CRUD endpoint does, the user has access to since if they don’t have access then they were filtered out before it reaches the code. Same thing goes for rate limiting, CSRF mitigations, CORS, database migrations, etc. Basically, anything not in the database communication flow is assumed to be taken care of elsewhere.

This series of assumptions is to only make the example more focused. We want to focus on a small part of the application - the textbook example - rather than the entire application. We can show how the rest of the application fares when applying the same techniques later on. For now, let’s just focus.

The textbook example we’re going to focus on is simply CRUD endpoints for a todo app. Yes, a todo app. The simplest, most plain, brain-dead, overdone code that anybody could copy-paste from millions of first-year college students and get something reasonable. We’re going to start there, since apparently simplicity in that type of application is hard to find.

Also, we’re only going to do an API for todo items. To help solidify things, here’s the schema we’re working with in a Postgres database.

CREATE TABLE todo_items (
  id BIGSERIAL PRIMARY KEY NOT NULL,
  user_id BIGINT NOT NULL REFERENCES users(id),
  item_name TEXT NOT NULL,
  item_description TEXT NULL,
  done BOOLEAN NOT NULL DEFAULT false,
  deleted BOOLEAN NOT NULL DEFAULT false
);

-- session management and user data not shown to help us focus on todo items

The Textbook Method

So, let’s define the textbook method. The “best practice” of ways to make our application. I’m not going to mirror an “ideal” MVC or MVVM or WYAI2 method. I’m going to do things the Enterprise, Gold-Standard, In-Production Patent-Pending methodology I’ve seen in so many codebases with parallels advertised throughout my entire career (MVC and MVVM being some of those parallels - and yes, there really isn’t much difference that’s actually significant in anyway). I’m talking about classes.

In this model, we usually have a class representing the database table. We also have a class representing the API schema we’re exporting to. If we need to do any logic/processing/filtering on the server side, we’ll have a “service model” class to represent the todo item internally - that way we can process “ephemeral” todo items that may be from a database record, or an event queue, or an API request3.

So, let’s create our models. I’m going to do them in Java because it’s where this type of code is widely accepted, even though it sucks. Feel free to use AI to translate it to TypeScript or C# or PHP or Go or whatever. Though I doubt you’ll need to because most of the ceremony is utter nonsense anyways.

package database.models;

@Entity
@Table(name = "todo_items")
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@EqualsAndHashCode(onlyExplicitelyIncluded = true)
class TodoItem {
  @Id
  @GeneratedValue
  @EqualsAndHashCode.Include
  private long id;

  @Column(name = "user_id", referencedColumnName = "id")
  private long userId;

  @Column(name = "item_name")
  private String name;

  @Column(name = "item_description")
  private String description;

  @Column(name = "done")
  private boolean done;

  @Column(name = "deleted")
  private boolean deleted;
}

package api.models;

@Data
@Builder
class TodoItem {
  private String id;

  private String name;

  private String desc;

  private boolean done;

  private boolean del;
}

package service.models;

@Data
@Builder
class TodoItem {
  private long id;
  private long userId;
  private String description;
  private Boolean done;
  private Boolean deleted;
}

package utilities;

class TodoItemMapper {
  @Autowired
  private SessionManager sessionManager;

  public dbToService(db.models.TodoItem input) service.models.TodoItem {
    return service.models.TodoItem.builder()
      .id(input.getId())
      .userId(input.getUserId())
      .description(input.getDescription())
      .done(input.isDone())
      .deleted(input.isDeleted())
      .build();
  }

  public serviceToDb(service.models.TodoItem input) db.models.TodoItem {
    return db.models.TodoItem.builder()
      .id(input.getId())
      .user(userRepository.getById(input.getUserId()))
      .description(input.getDescription())
      .done(input.isDone())
      .deleted(input.isDeleted())
      .build();
  }

  public apiToService(api.models.TodoItem input) service.models.TodoItem {
    var builder = service.models.TodoItem.builder()
      .userId(sessionManager.getCurrentUser().getId())
      .description(input.getDesc())
      .done(input.isDone())
      .deleted(input.isDel());

    try {
      builder = builder.id(Long.parseLong(input.getId()));
    } catch (Exception e) {
      builder = builder.id(0);
    }

    return builder.build();
  }

  public serviceToApi(service.models.TodoItem input) api.models.TodoItem {
    return api.models.TodoItem.builder()
      .id(input.getId())
      .desc(input.getDescription())
      .done(input.isDone())
      .del(input.isDeleted())
      .build();
  }
}

package repositories;

@Repository
interface TodoItemRepository extends CrudRepository {
  TodoItem findByIdAndUserId(long id, long userId);

  List findByUserId(long userId);
}

package api.routes;

import api.models.TodoItem;
// other imports excluded because they're self-obvious

@RestController
public class TodoItemController {
  @Autowired
  private TodoItemRepository todoItemRepository;

  @Autowired
  private SessionManager sessionManager;

  @Autowired
  private TodoItemMapper mapper;

  @GetMapping("/")
  public List index() {
    return todoItemRepository.findByUserId(sessionManager.GetCurrentUser().GetId())
        .stream()
        .map(x -> mapper.dbToService(x))
        .filter(x -> !x.isDeleted())
        .map(x -> mapper.serviceToApi(x))
        .collect(Collectors.toList());
  }

  @GetMapping("/{id}")
  public TodoItem getById(@PathVariable Long id) {
    var item = mapper.dbToService(
      todoItemRepository.findByIdAndUserId(sessionManager.GetCurrentUser().GetId())
    )
    if (item.IsDeleted()) {
      throw new ResourceNotFoundException();
    }
    return item;
  }

  // so many more methods, I don't feel like continuing this pain
}

This is the “textbook,” “maintainable,” “easy” code. It’s so verbose. It’s so much code. There’s so much abstraction. There’s so much going on. And it’s slow. And it’s not maintainable. And it’s not easy to use. And it’s really just nonsense.

Almost all of that code is useless. And I’ll prove it to you.

Simple Code

Let’s replace all that code, with the following.


package api.routes;

import api.models.TodoItem;
// other imports excluded because they're self-obvious

@RestController
public class TodoItemController {
  @Autowired
  private JdbcTemplate jdbcTemplate;

  @Autowired
  private SessionManager sessionManager;

  @GetMapping("/")
  public List> index() {
    String sql = """
SELECT
  id AS "id"
  item_name AS "name",
  item_description AS "desc",
  done AS "done",
  deleted AS "del"
FROM todo_items
WHERE deleted = false AND user_id = ?
""";
   List> data = jdbcTemplate.queryForList(sql, sessionManager.GetCurrentUser().GetId());
   return data;
  }

  @GetMapping("/{id}")
  public TodoItem getById(@PathVariable Long id) {
    String sql = """
SELECT
  id AS "id"
  item_name AS "name",
  item_description AS "desc",
  done AS "done",
  deleted AS "del"
FROM todo_items
WHERE deleted = false AND user_id = ? AND id = ?
LIMIT 1
""";
   Map data = jdbcTemplate.queryForMap(sql, sessionManager.GetCurrentUser().GetId());
   if (data == null || data.isEmpty()) {
      throw new ResourceNotFoundException();
   }
   return data;
  }
}

So much simpler. So much less code. And so much more composable. For instance, let’s say we want to make sure the fields we select are standardized across our methods. We can easily do that with string concatenation.

@RestController
public class TodoItemController {
  @Autowired
  private JdbcTemplate jdbcTemplate;

  @Autowired
  private SessionManager sessionManager;

  private static fields = """
  todo_items.id AS "id"
  todo_items.item_name AS "name",
  todo_items.item_description AS "desc",
  todo_items.done AS "done",
  todo_items.deleted AS "del"
"""

  @GetMapping("/")
  public List> index() {
    String sql = "SELECT" + fields + "
FROM todo_items
WHERE deleted = false AND user_id = ?
""";
   List> data = jdbcTemplate.queryForList(sql, sessionManager.GetCurrentUser().GetId());
   return data;
  }

  @GetMapping("/{id}")
  public TodoItem getById(@PathVariable Long id) {
    String sql = "SELECT" + fields + "
FROM todo_items
WHERE deleted = false AND user_id = ? AND id = ?
LIMIT 1
""";
   Map data = jdbcTemplate.queryForMap(sql, sessionManager.GetCurrentUser().GetId());
   if (data == null || data.isEmpty()) {
      throw new ResourceNotFoundException();
   }
   return data;
  }
}

And just like that, we have one place to update our field selection for our class. Similar things could be done with other aspects, however it’s probably not necessary.

Now you might be thinking “well, that’s great if you are doing basic queries - but I need joins and nested data structures - no way that works!” Well, it does. You just haven’t learned SQL. Here’s one of my favorite SQL queries I’ve ever done, all in Postgres, and it handles formatting nested data, lists, etcetera. This was for quizes, but I love it.

SELECT q.id as id,
        json_build_object('id', q.id, 'text', q.text) as question,
        json_build_object('id', m.id, 'type', m.type, 'url', m.url) as media,
        json_agg(json_build_object('id', a.id, 'text', a.text) ORDER BY a.pos ASC) as answers,
        ua."answerId" as "previousAnswer",
        ua."isCorrect" as "isCorrect",
        json_build_object('correctCount', uao."correctCount", 'totalAttempts', uao."totalAttempts") as summary
 FROM "quizSection" q
          INNER JOIN questions q ON q."questionId" = q.id
          LEFT JOIN media m ON q."mediaId" = m.id
          INNER JOIN answers a ON a."quizId" = q.id
          LEFT JOIN (
     SELECT ua."quizId", ua."userId", ua."answerId", a."isCorrect"
     FROM user_answers ua
              INNER JOIN answers a ON ua."answerId" = a.id
     WHERE ua."userId" = ?
 ) ua ON ua."quizId" = q.id
  INNER JOIN user_answer_overview uao ON q.id = uao."quizId" AND ua."userId" = uao."userId"
 group by q.id, q.id, m.id, ua."answerId", ua."isCorrect

This handles grouping quiz sections with the question data, potential media information, answers (sorted in a specific display order), what the user’s previous answer was, if the user got the answer correct, and the user’s total attempts (and number of correct attempts). It formats all of this as nested objects that I could simply JSON serialize and return to the user. It’s all in the format my API desired.

One query to aggregate a lot of data from different tables, and then simply serialize and return it. No mess of database models, service models, and API models. A single input parameter for the current user id. My server code was literally just the following:

async (req: Request, res: Response) => {
   res.send(await db.query(`SELECT q.id as id,
        json_build_object('id', q.id, 'text', q.text) as question,
        json_build_object('id', m.id, 'type', m.type, 'url', m.url) as media,
        json_agg(json_build_object('id', a.id, 'text', a.text) ORDER BY a.pos ASC) as answers,
        ua."answerId" as "previousAnswer",
        ua."isCorrect" as "isCorrect",
        json_build_object('correctCount', uao."correctCount", 'totalAttempts', uao."totalAttempts") as summary
 FROM "quizSection" q
          INNER JOIN questions q ON q."questionId" = q.id
          LEFT JOIN media m ON q."mediaId" = m.id
          INNER JOIN answers a ON a."quizId" = q.id
          LEFT JOIN (
     SELECT ua."quizId", ua."userId", ua."answerId", a."isCorrect"
     FROM user_answers ua
              INNER JOIN answers a ON ua."answerId" = a.id
     WHERE ua."userId" = ?
 ) ua ON ua."quizId" = q.id
  INNER JOIN user_answer_overview uao ON q.id = uao."quizId" AND ua."userId" = uao."userId"
 group by q.id, q.id, m.id, ua."answerId", ua."isCorrect`, req.session.user))
}

Could things be simpler? Absolutely! The database model was definitely a mess. Having a view built-in to the database, less normalization (like why is quiz section and question data separate when there is a one-to-one mapping? why is media and question data separate when there is at most a one-to-one mapping?), and more would help. Ideally the query from the server would simply become the following:

async (req: Request, res: Response) => {
   res.send(await db.query(`SELECT * FROM quiz_view WHERE user_id = ?`, req.session.user))
}

Then, all the complexity above would be handled in the database layer with the migrations. I could then do things like use materialized views to cache read-heavy workloads, add additional indexes based on the query plan, and even modify the table structure without breaking application code (the view just has to map). There wouldn’t be this nonsense of having a data object match the table structure just so we then map it to a service model so we can operate on it in-memory and then map it to an API model so we can remove the data that we don’t show the user.

Which also brings up another point of mine.

To the nay-sayers

Some people are going to argue that the SQL code above is not really all that simple, or that it’s hard to read, or debug, or whatever. Except it’s not. Compare the SQL with the “much simpler” Repository code:

@Repository
interface QuizeRepository extends CrudRepository {
  List findQuizByUserId(long userId);
}

Now, tell me what those above functions do exactly. Tell me what the SQL plan is. Tell me if there’s a scan or an index lookup. Tell me what joins happen, and when. Tell me how many network trips happen. Tell me when those network trips happen. Is it when I access a field, or sub-field, or a list element? Is it all at once even or is it lazy? Tell me when the framework switches between lazy and eager. Tell me how to debug it when it’s getting the wrong fields, or not enough fields, or making too many requests. Tell me what happens if I lose my database connection. Tell me what happens if I lose my database connection after I accessed some fields, but not others. Tell me how it runs on the database. Show me the execution plan. Tell me how it operates under load. Tell me how to make it faster. If you can’t answer all of those from that snippet and a connection to your production database, then your solution is objectively worse than mine.

With my snippet you can get all of that information and more. You can even get the current worse case scenario and best case scenario. You can see every index used and not used. You can control how much is loaded and when. You can add in materialized views with ease.

At the end of the day, for both code snippets to be functionally equivalent, the SQL in the second example must be at least as complex as what I showed earlier. You just can’t see it. Instead, you see mindless classes, that offer no real value outside of extra memory allocations and memory copies for your garbage collector to clean up. You see annotations that are “magic” without thinking about the runtime cost (hint: most of those annotations use reflection - one of the slowest operations in any runtime/interpreter). You don’t see SQL, you see class method accesses. You don't even see your class data members, just getters and setters. Meanwhile that SQL - the thing your database runs and is the slowest part of your request - is generated on the fly, at runtime, and ran against your database without any sort of review at all. No DBA looked at it, no team member gave it a glance, and you did not even check if it’s using an index or not. You just shipped it.

At the end of the day, your users suffer because your query that you didn’t write - only generated - was slow. Your server suffers because you filled it with garbage memory and wasted millions of cycles per request copying data from class to class to class to class. Your garbage collector suffers from frequent, short-lived memory allocations blocking it from cleaning up the real stuff taking up memory (that’s how generational garbage collectors work - short-lived allocations block it from cleaning up long-lived allocations). Your database suffers because you’re hitting it with thousands of unvetted, unoptomized queries that you don’t even know what they do or where they came from or why sometimes they get everything at once and why sometimes they make hundreds of micro-queries per field.

All of this in the name of “abstraction.” Your abstraction brought in complexity, it killed your code, and AI cannot save you since it’s been trained to increase the poison - not lower it.

In the best case, you have DBAs pinging you at 2:00 in the morning asking what code is running such-and-such query that’s killing the database and you don’t know because you never wrote the query - and you have to find out. In the worst case, that urgent message is never sent because nobody cares, and your application joins the rest of the rot. Reliability goes from 99.99% to 99.9% to 99%, 98%, 95%, 92%, 90%, 85%, 70%, and down and down and down. Users start questioning your company. They look for alternatives. Some are forced off even though they once loved your product.

And so, the product dies. Slowly. Very slowly. Until something both breaks and a competitor gains advantage. Then, irrelevancy. Just like MySpace, and Yahoo Search. Maybe a rebrand happens, like Hotmail to LiveMail. Or maybe an aquisition like Sun Microsystems. Or just bankruptcy like Palm.

Of course, that may not matter much for you. There’s always another dying software that needs more poison, more bad abstractions, and more cycles wasted in the name of productivity and best practice. If, after all of this, you’re still fine with the complexity of the common abstractions, then go ahead. Use your AI tools as long as they’re affordable. Go nuts. I can’t convince you, so this post isn’t for you. If, on the other hand, you’re starting to question the textbook, then keep questioning it. Question the “best practice” because, at the end of the day, it’s almost all a cargo cult. Built to hide, not know.

Back to Microservices, AI, and SaaS

We just looked at a very basic “best practice” that kills code. Now, stack that on top of itself. We have controllers, views, models, and database models. Add in eventing with their own models, controllers, and tie those into your existing models and databases. Add in services with API requests. Now wrap those requests in a cache layer, wrap those cache layers in models, and integrate those models with your existing models and services.

Now do that again. And again. And again. And you have a little series of services that does one little part of a feature. Now keep doing it several dozen more times. Now you have most of a feature. Do it several dozen more times, and now you have a button that can send an email some time in the future for an authenticated user. Keep going. Now you can send emails immediately. Keep going. Do it a few thousand times. Maybe a few million times. Eventually, doing it enough times, you get GMail. Or maybe GitHub. Or NPM. Or any other SaaS4.

And yet, they didn’t need the microservices. They didn’t need the ORM, or the many model layers. Many of them don’t even need the eventing (or at least, not “proper” eventing - a database table with a status column and a single background job often suffices).

Even at scale, many companies don’t need microservices. It actually makes scaling harder. Now to scale up one service, you need to scale up everything that it touches - whether that’s event pipelines or direct API calls anybody that it puts load on needs to scale. But, to scale those services, you need to scale everything they touch. And so on.

And, if you ever want to do a sweeping change - like update from Java 8 to Java 11 - you now need to do that work thousands to millions of times. Every service needs to be updated, tested, fixed, and tested again.

Now do that for your frontend too. Do that with Angular. Or React. Try to keep your Node versions updated. Upgrade your build pipelines so you can upgrade your Node. Do that thousands of times.

And keep the network stable. Make sure you don’t overload the network. Make sure that you don’t have too much latency. Make sure you don’t route too much traffic from thousands of services to a single server.

And now debug things when they go wrong. Try to figure out what actually happened when a user clicked a button. Which services were called, when were they called, how were they called (API? gRPC? event?). Which ones failed. Which ones tried to fix the failure and made it worse. Which API gateway decided to drop the message before it hit the service - and why.

Now handle incidents. Try to get the right teams on an incident call. I’m sure they all kept your wiki updated - especially after that reorg last month. Try to keep track of which services are unreliable. Track the cascading failure. Which ones take down your site? What unexpected parts went down?

Now do this across time zones. Have some developers in Central India and some in on the US East Coast. They’ll never meet, or talk. Except maybe in an incident. How do they respond, under pressure, with complete strangers, in the middle of the night, after a long day of work, while the other people are about to go to lunch, and when the optics are bad?

Now add in some acquisitions. Little bubbles with completely different technologies, languages, patterns, design, and philosophy. Now apply pressure for them to conform or leave.

And finally, add in AI. Something designed to take all of the data of every instance of any of the above situations and amplify it. Automate it. But, do it in a way that is scientifically shown to be the most enticing and pleasing.

What do you expect the outcome to be? I expect it to mirror what we’re already seeing. Increased outages, especially from high profile companies. Increased severity of outages when they happen. Decreased public satisfaction with software. Increased concern about relying on SaaS. In short, I expect our complexity to kill our code.

It’s also interesting how the companies with the most complex code are the ones relying on AI the most, while the companies with less complex code don’t rely on AI as much. It’s almost like building a system that no person could ever hope to understand breeds a helplessness that heavy AI usage can prey on, while building a system that’s comprehensible builds a sort of resistance to AI.

Whatever-Your-Acronym-Is

And yes, this is very much real. I have worked on many production code bases with this as the simplest case. Usually it was a lot worse.

Almost all of them send emails now or in the future.

A Library for JavaScript Threads

Matt Tolman — Sat, 28 Mar 2026 18:50:07 GMT

Recently I wrote quite a bit about threads in JavaScript, sharing memory, building mutexes (and other synchronization primitives), etc. It’s a lot of details, and it was really fun to write about. But, I had a problem. I didn’t have a way for developers to easily use the content without running into weird, undocumented, technical issues - and that’s kind of a problem.

It turns out, threads aren’t that used, and quite a bit of information I encountered was outdated or flat-out wrong (the highest ranking article on how to use threads claimed that only strings could be passed as messages - something that is absolutely untrue). So, there was a lot of trial and error in just getting something basic for normal workers.

Since it’s such a problem, I decided to make a library (which I’ll cover in a bit). That library represents a culmination of trial and error, learning, and testing. Is it finished? Not yet. Mostly I’m reworking the algorithm for handling crashed threads in a thread pool. But, it’s good enough for a pre-1.0 release.

Show Me the Code!

First, let’s install it with NPM.

npm i -S peak-threads

Now let’s do something simple. Send work, receive a response. No shared memory. Here’s the example:

// main.js
import {Thread} from "peak-threads"

async function simpleExample() {
  const thread = await Thread.spawn('worker.js', {type: 'module'})
  return await thread.sendWork({op: 'add', inputs: [2, 5]})
}

// worker.js
import "peak-threads"

onwork = ({op, inputs}) => {
  switch (op) {
    case 'add': return inputs.reduce((a, b) => a + b)
  }
}

The above code is fairly straightforward. In the main thread, you simply import the library, spawn a thread1, wait for it to initialize, and then send it work. In the worker thread, you simply register a callback handler for when work comes, and you return the result. No need to manage listening or anything.

Of course, since JavaScript has many ways of doing things, and since TypeScript doesn’t like global unknown handlers like the above. So, I have provided a function wrapper for registering callbacks that can be used instead. Example:

import {registerHandler} from 'peak-threads'

registerHandler('work', ({op, inputs}) => {
  switch (op) {
    case 'add': return inputs.reduce((a, b) => a + b)
  }
})

Practically the same code, just a little different dressing and more compatible with TypeScript.

We can also avoid copies for large data objects (e.g. image data) by “transferring” data over. For example, here we send an array buffer over:

// main.js
import {Thread} from “peak-threads”

async function simpleExample() {
  const data = new ArrayBuffer(1024 * 1024 * 5)
  const array = new Int8Array(data)
  const thread = await Thread.spawn(’worker.js’, {type: ‘module’, closeWhenIdle: 100})
  return await thread.sendWork(array, {transfer: array.buffer})
}

// worker.js
onwork = (arr) => {
  // do some work
  arr.set([1, 2, 3], 0)
  // transfer the memory back
  return ResponseWithTransfer(arr, [arr.buffer])
}

Managing Threads

The above examples do leak resources since we don’t close the thread. We can fix that with either a call to “close” or a “closeWhenIdle” parameter to our thread spawn function.

// main.js
import {Thread} from “peak-threads”

async function simpleExample() {
  // close when idle for 100ms
  const thread = await Thread.spawn(’worker.js’, {type: ‘module’, closeWhenIdle: 100})
  try {
    return await thread.sendWork({op: ‘add’, inputs: [2, 5]})
  }
  finally {
    // immediately close
    thread.close()
  }
}

Better yet, let’s simply create a global thread pool to manage threads for us. We can even make the pool global so our whole app can use it!

// main.js
import {ThreadPool} from “peak-threads”

let getPool = ThreadPool.spawn(’worker.js’, {type: ‘module’}) // returns a promise

async function simpleExample() {
  const pool = await getPool // since this is a promise, we need to await
  return await pool.sendWork({op: ‘add’, inputs: [2, 5]})
}

The thread pool will now handle managing the threads for us!

React and Vite

If we’re in React (or another framework with contexts or async state), we could simply wrap our pool and not render any components that rely on the pool until it’s ready. If we’re using Vite, it might look something like this:

import {ThreadPool} from "peak-threads";
import {createContext} from "react";
import React, {useEffect, useState} from "react";

// Tell Vite to compile a separate worker entry point
// We'll then pass this URL to our thread pool
import WorkerUrl from "./worker.ts?worker&url";

export const PoolContext = createContext((null as any) as ThreadPool);

export function ReactThreadPool({children}: any) {
    const [pool, setPool] = useState(undefined as any)

    useEffect(() => {
        Pool.spawn(WorkerUrl, {type: 'module'}).then(p => setPool(p))
    }, [])

    return (
        
            {pool ? children : <>Initializing...}
        
    )
}

// app.tsx
function App() {
  return (
    
      
    
  )
}

// my-component-that-uses-pools.tsx
export function MyComponentThatUsesPools() {
    const [result, setResult] = useState(0)
    const [running, setRunning] = useState(false)
    const pool = useContext(PoolContext)

    return <>
        
        
            {result}
        
    
}

A little verbose, but not too bad. Most of it is “one-time” boiler plate that can easily be wrapped in a library. Also, it hides the “getting a pool is async” issue, so all of our threading code can assume a pool is ready.

Of course, most existing libraries out there already do stuff like this just fine. They can handle wrapping workers, or sending threads, or correlating messages just fine. So, let’s take it up a notch. Let’s go where other libraries don’t (or at least, not yet).

The Cool Code

First, while most libraries let you wrap workers in another class, they don’t let you send or receive a class. I do. Here’s the code:

// my-typescript-class.ts
import {registerDeHydration} from 'peak-threads'

class MyTypeScriptClass {
  private a: number
  private b: number

  constructor(a: number, b: number) {
    this.a = a
    this.b = b
  }

  public function sum() {
    return a + b
  }

  //// Hydrate/Dehydrate Methods used for sending a class
  //// This is where the magic happens

  static hydrate({a, b}: {a: number, b: number}) {
    return new MyTypeScriptClass(a, b)
  }

  static dehydrate(instance: MyTypeScriptClass) {
    return {a: instance.a, b: instance.b}
  }
}

// magic line that makes the above hydrate/dehydrate methods work
registerDeHydration({key: 'MyTypeScriptClass', type: MyTypeScriptClass})

// main.ts
import {MyTypeScriptClass} from './my-typescript-class.ts'
import {getPool} from './get-pool' // <- sets up pool like we showed above

async function doSum(a: number, b: number) {
  const pool = await getPool
  const c = new MyTypeScriptClass(a, b)
  return pool.sendWork(c)
}

// worker.ts
import {registerHandler} from 'peak-threads'
import {MyTypeScriptClass} from './my-typescript-class.ts'

registerHandler('work', (c: MyTypeScriptClass) => c.sum())

Here, we “send” a class and we “receive” a class. Really, behind the scenes we’re calling the dehydrate method to get a transferable object, tagging the object with the key string from the register call, transferring it, and reversing the process on the other side (using the tag to know what hydrate method to call). It’s really just automatic serialization and deserialization. But, it works really well.

We can also use this coolness combined with initialization data. When a thread is spawned, we can set an option “initData” with our initial data to send to the thread. The thread is guaranteed to receive that data before the promise returns - so we know that everything is all setup and ready. Our thread registers an “oninit” handler which receives the initial data - that way we can save it or process it however we need to. Here’s an example:

// main.ts
import {MyTypeScriptClass} from './my-typescript-class.ts'
import {Thread} from 'peak-threads'

async function doSum(a: number, b: number) {
  const c = new MyTypeScriptClass(a, b)
  const thread = Thread.spawn('worker.js', {initData: c})
  return thread.sendWork()
}

// worker.ts
import {registerHandler} from 'peak-threads'
import {MyTypeScriptClass} from './my-typescript-class.ts'

let c: MyTypeScriptClass

registerHandler('init', (i: MyTypeScriptClass) => c = i)
registerHandler('work', () => c.sum())

I use the serialization and initial data for transferring objects with SharedArrayBuffers provided by the library - such as mutexes! Here’s an example:

// main.js
const {Mutex, Thread} = import("peak-threads")

async function sharedMemExample() {
    const mem = new Int32Array(new SharedArrayBuffer(64))
    const mux = Mutex.make()
    const [thread1, thread2] = await Promise.all([
      // Initialize our thread with shared memory and a mutex
      Thread.spawn('worker.js', {initData: {mem, mux}}),
      Thread.spawn('worker.js', {initData: {mem, mux}})
    ])

    // lock the mutex, write to memory, and queue work
    await mux.lockAsync()
    mem.set([1, 2, 3], 0)
    
    const promise = Promise.all([
      thread1.sendWork({add: {v: 10, i: 0}}),
      thread2.sendWork({add: {v: 20, i: 2}}),
    ])
    // unlock to let them run
    mux.unlock()

    // wait for the results
    const [r1, r2] = await promise
    
    // Prints: 11, 23
    console.log(r1, r2)
}


// worker.js
let memory, mutex
oninit = ({mem, mux}) => {
    // Save our initial data
    memory = mem
    mutex = mux
}

onwork = ({add}) => {
    // lock our memory
    mux.lock()
    try {
        // Read from our memory, do some math, return
        // returned data is automatically sent back to the caller
        return add.v + memory.at(add.i)
    }
    finally {
        mux.unlock()
    }
}

The code above demonstrates how we can share memory between threads in a fairly straightforward manner.

Some might be wondering, well why do we need to pass the mutex in the initial data? The answer is, there’s a hidden race condition where sometimes when shared array buffers are being transferred to one thread (Thread A) while another thread (Thread B) is writing, then Thread A may reset the shared buffer and lose Thread B’s data. I am not at all sure why this happens. I’ve been debugging it for a very long time. I get magical “resets” that don’t happen from my code, just the browser. I’ve been able to reproduce it quite reliably - though it only happens about 1 out of every 10,000 runs. The best workaround I’ve found is that I just need to wait for shared memory to “settle” before I start using them. That’s way I have initial data and an asynchronous spawn - it’s so that shared memory can “settle” before it’s used.

Other Features

My library has other features in it as well, such as sending messages without waiting for responses (called “events” - can be sent/received from both sides), barriers, condition variables, wait groups, semaphores, etc. Also, I have optional debug logging that you can turn on with “setLogging”. This will print a lot of debug messages whenever events are sent or transformed, and it will print the thread id that it’s tied to (the thread ids also show the parent thread chain, so you can see “oh, this is a child of a child thread” which I have found helps). If you want to get a thread’s id, simply use “curThread()”. Do note that “setLogging” only turns on logs for that thread. It’s not a “global” logging setter. I did that so you can focus on debugging specific threads and not get a bunch of background worker noise (e.g. from another thread pool).

import {curThread, setLogging} from 'peak-threads'

// This works from any thread - including the main thread!
setLogging(true) // turn debug logs on
setLogging(false) // turn debug logs off - default
console.log(curThread()) // prints the current thread id

On other important note is that I have overridden the “postMessage” and “onmessage” handlers for workers (and Worker objects). From what I can tell, most libraries do this to some extent as there really isn’t a lot you can do without overloading or wrapping it in some way. I just overloaded it instead of wrapping it. This means if you call “postMessage” you’ll get the automatic class sending, debug logging, etc. (but it will be triggered as an “event” not “work response” or “work request”).

Links

The source code can be found on GitHub 2. I also have published it to NPM under the name “peak-threads”. Feel free to check it out there.

The license is MPL 2.0 - meaning you can use the project in commercial or non-commercial without releasing your code. Only direct changes to the library itself need to be public (so if make a change/bug fix, that change/bug fix needs to be shared somewhere3). I chose this license since it’s a nice blend between allowing commercial closed-source products to use the code, while also allowing the library itself to remain open.

The “type: module” simply says “spawn this thread with ESM module support” - which is needed if you import my library with “import” rather than using “importScripts” and specifying a url to the IIFE bundle. In other words, using ESM “import” means you get an ESM module. Using “importScripts” and IIFE means you get a good old-fashion JavaScript library.

Technically, GitHub is a mirror, but I’m using it for issue tracking so that’s why I list it first.

By “somewhere” I really do mean “somewhere”. It does NOT have to be a direct contribution to my repository, or to someone’s fork, or a community repository, or whatever. It doesn’t even have to be on the internet. It could be on a floppy drive or a piece of paper you mail to someone. That’s allowed by the license. I don’t really care where you put the modified copy. The goal is to make sure that people are sharing their changes to a free, publicly available library and not hoarding those changes. As for the end product - make money off of it. Keep the rest of your code private. Lock it in an underground vault. Do whatever. That’s you’re code. My code that is used stays open.

WaitGroups in JavaScript

Matt Tolman — Sun, 01 Mar 2026 01:25:19 GMT

So far in my JavaScript threading series, we’ve covered spawning threads & passing messages, mutexes, and condition variables. We can now start to have signaling and synchronization between threads, and we could start trying to make higher-level abstractions on-top of what we have, such as channels (which is something I did in C++ a while back).

However, before we do, there is one additional primitive I want to introduce, and that is Go-style WaitGroups. Currently, we are still using message passing in our examples to tell the main thread that our thread’s work is “done” and now they can read the value. Which really undermines the whole reason we went through all the trouble to share memory in the first place. If we’re still going to pass messages to signal the main thread1, then have we gained anything from sharing memory?

We could use a Condition Variable to signal things - but then we would also need to manage a mutex as well. Instead, what we want is a signaling primitive that is stand-alone - no mutex needed.

This is where Go’s WaitGroups come in. They are basically a waitable-counter. The main thread sets the initial value of the counter, and then waits until that counter hits zero. All of the other threads decrement the counter once they’re done with their work.

Here’s an example of how this would look in JavaScript:

// This would be in your main code
async function mainThread() {
  // Setup our memory and wait group
  const sab = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * 3)
  const ints = new Int32Array(sab)
  const wg = new WaitGroup(ints, 0)

  // Initialize our thread
  const worker = new Worker('my-worker.js')

  // Setup our work
  wg.add(1)
  worker.postMessage({__type: 'square', wg: 0, mem: ints, input: 3, dest: 1})

  // Setup another work item
  wg.add(1)
  worker.postMessage({__type: 'cube', wg: 0, mem: ints, input: 4, dest: 2})

  // Wait for the work to get done
  await wg.waitAsync()

  // Read our results
  console.log('Square of 3: ', ints.at(1))
  console.log('Cube of 4: ', ints.at(2))
}

// my-worker.js

onmessage = (msg) => {
  if (msg.__type === 'square') {
     const v = msg.input
     msg.mem.set([v * v], msg.dest)
     new WaitGroup(msg.mem, msg.wg).done()
  }
  else if (msg.__type === 'cube') {
     const v = msg.input
     msg.mem.set([v * v * v], msg.dest)
     new WaitGroup(msg.mem, msg.wg).done()
  }
}

With this code, we can now send tasks off to the worker thread, tell it where we want the results written, and then wait for the tasks to get done. Once the tasks are done, we simply read the memory. Everything is also self-contained, in the message2, so we can simply

In a way, this is simpler than the outstanding work map that stored Promise resolvers which we had before. All of the control flow for the main thread is linear. There is no magic data structure somewhere to correlate sent messages with received messages, and there’s no weird onmessage handler which then uses said data structure. Instead, we send some work, we wait for it, and we read the result. Very beautiful.

On the worker-side, it’s also fairly simple. We just call done when we’re done, rather than post a response back. We don’t have to parse and propagate a message id either. We call done and it takes care of the rest.

Actually Making a WaitGroup

So, now that we see how we would use a WaitGroup, lets’s make one!

Fortunately, they’re very simple and only require a few atomics. No mutexes or condition variables necessary! (Hence why I call it a primitive).

Here’s the code in TypeScript:

class WaitGroup {
    private memory: Int32Array
    private offset: number

    constructor(memory: Int32Array, offset: number) {
        this.memory = memory
        this.offset = offset
    }

    public add(count: number = 1) {
        // Very simple, just an atomic add
        Atomics.add(this.memory, this.offset, count)
    }

    public done() {
        // Perform a check if we should notify the waiter
        // We'll notify the waiter of we set the value to zero
        // Due to how atomics return the *old* value, that means
        // for a subtract we send when the old value was 1
        if (Atomics.sub(this.memory, this.offset, 1) === 1) {
            Atomics.notify(this.memory, this.offset)
        }
    }

    public wait(timeout: number = Infinity) {
        let lastTime = Date.now()

        // Loop until our counter hits zero
        while (true) {
            // Load our counter and see if we're zero!
            const cur = Atomics.load(this.memory, this.offset);
            if (cur == 0) {
                return true;
            }

            // Suspend when we're not zero
            if (Atomics.wait(this.memory, this.offset, cur, timeout) === 'timed-out') {
                return false
            }

            // remember to update the timeout value whenever we loop!
            if (Number.isFinite(timeout)) {
                let curTime = Date.now()
                let elapsed = curTime - lastTime
                timeout -= elapsed
                lastTime = curTime
                if (timeout <= 0) {
                    return false
                }
            }
        }

    }

    public async waitAsync(timeout: number = Infinity) {
        // Same thing as above, but with promises now!
        let lastTime = Date.now()
        while (true) {
            const cur = Atomics.load(this.memory, this.offset);
            if (cur == 0) {
                return true;
            }

            // Yay promises!
            const {async, value} = (Atomics as any).waitAsync(this.memory, this.offset, cur, timeout)
            if (async) {
                if (await value === 'timed-out') {
                    return false
                }
            } else if (value === 'timed-out') {
                return false
            } else {
                // Always ensure we suspend for at least one micro-tick per cycle
                await new Promise(res => res(null))
            }


            if (Number.isFinite(timeout)) {
                let curTime = Date.now()
                let elapsed = curTime - lastTime
                timeout -= elapsed
                lastTime = curTime
                if (timeout <= 0) {
                    return false
                }
            }
        }
    }
}

The general idea is pretty straightforward:

When we call “add”, we atomically increment the counter
When we call “done”, we atomically decrement the counter
- We’ll also signal once we hit zero
When we call “wait”, we just loop until the counter hits zero while guaranteeing a suspend in every iteration

Pretty straightforward stuff.

Unfortunately, due to how the standard operates the main thread will always need to send at least one message to the worker. This is due to the fact that the standard requires us to use message passing to pass in shared memory. So, when I say “remove message passing” I’m talking about removing the messages sent from the worker to the main/parent thread.

We do have to reconstruct the wait group inside the worker simply because classes and class instances cannot be passed with message passing. We could use some very clever overrides of global methods to make the system automated - however we would still be reconstructing it on the other end. For explicitness, I’ll leave the manual reconstructions in-plain sight for this series.

JavaScript Condition Variables

Matt Tolman — Mon, 16 Feb 2026 01:29:25 GMT

This is a continuation of my threading in JavaScript series.

So far we’ve created locks, which lets us make sure that only one thread accesses shared memory at a time. We’ve also updated the locks to have promises, so we can use it from the main thread. However, we don’t have a way to signal to the main thread that our work is done - at least not using shared memory1. What we need is a way to signal to other threads. And we need something more robust and complete than simply a futex.

This is where condition variables come in. They let threads wait until a condition changes, not just when a memory address changes. These conditions can be complex or simple, and the developer (us) gets to define them. For instance, we can wait until the account balance for a user is below a threshold, or we can wait until a queue has items or is empty. We could wait for other threads to finish their work. There are lots of possibilities!

Of course, when we’re dealing with so many possibilities, it usually means we’re dealing with a primitive, and that we’ll need to build up those possibilities ourselves. Which is true, condition variables are a synchronization primitive. But, once we have a primitive, we can learn the patterns around using that primitive (which I’ll cover more next post).

So, let’s get started.

Condition Variable Usage

Before we get too far into the implementation details, let’s look at how a condition variable will work. Here’s an example C program to highlight the usage patterns:

typedef struct {
    Mutex   lock; // mutex for locking shared data
    CondVar sendCv; // Condition variable to wait for data to be sendable
    CondVar recvCv; // Condition variable to wait for data to be receiveable
    int     message; // Data to send/receive
} SharedQueue;

int receive(SharedQueue* q) {
    mutex_lock(&q->lock); // lock our mutex

    // wait for a message
    while (q->message == 0) {
        // notice that we don't unlock here
        cond_var_wait(&q->recvCv, &q->lock);
    }

    // read our message
    printf("Received: %d\n", q->message);
    int res = q->message;
    q->message = 0;

    // Wake up a single waiter waiting to send
    cond_var_notify(&q->sendCv, 1);

    // now we unlock
    mutex_unlock(&q->lock);
   
    return res;
}

void send(SharedQueue* q, int msg) {
    mutex_lock(&q->lock); // lock our mutex

    // wait for the queue to be empty
    while (q->message != 0) {
        // we don't unlock here either
        cond_var_wait(&q->sendCv, &q->lock);
    }

    printf("Sending: %d\n", msg);

    // send the message
    q->message = msg;

    // notify someone that the message was sent
    cond_var_notify(&q->recvCv, 1);

    // unlock
    mutex_unlock(&q->lock);
}

Things look a little odd. We’re getting the lock so no one else can change things, but then we hold onto the lock while waiting for someone to change things. It appears that we have a deadlock. Except, we don’t.

What the above example doesn’t show is how the condition variable works. It turns out, when we call wait on a condition variable the condition variable will unlock the lock. That’s why we have to pass in the lock with the wait call - so the condition variable knows what to unlock.

But, wait, if it unlocks, then why aren’t we relocking when the method returns? Simply put, the condition variable will lock immediately before returning.

This “lock/unlock” inside a condition variable is what allows the condition to change, all while our code runs inside a locked context. Neat!

Though, you may be wondering what happens with the notify. We’re notifying before we unlock. Shouldn’t we switch the order?

    // unlock
    mutex_unlock(&q->lock);

    // notify someone that the message was sent
    cond_var_notify(&q->recvCv, 1);

Well, we definitely could! Both patterns are generally valid. The only difference is where the other thread is waiting. The other thread is either waiting on the condition variable signal, or it’s waiting on the lock’s unlock signal. In both cases, it can’t proceed until we both unlock and notify.

One other thing to note, we have the whole “while” loop around the condition variable, but wasn’t the point to notify when something was done?

    // wait for a message
    while (q->message == 0) {
        // notice that we don't unlock here
        cond_var_wait(&q->recvCv, &q->lock);
    }

Well, yes. But, we’re doing two separate atomic operations. We’re first notifying a thread, and then locking a mutex (or vice versa). Between those two atomic operations, another thread could come right on in and change the condition again (in this case, steal our message!). To handle that scenario, we need to loop and retry.

With condition variables, and pretty much any synchronization primitive, it’s very important for us to understand how it should be used. The usage will help drive the implementation, especially when we have to consider all sorts of edge cases - like threads stealing our state.

Now that we’ve covered the basics of how condition variables are used, let’s start making one!

Building a Condition Variable

There are two main operations we’ll be doing with a condition variable:

Wait
Notify

Our wait will have an async (Promise-based) and synchronous (blocking) version - both of which will have timeouts. Our notify will allow us to specify how many waiters we want to notify. We’ll have separate async/sync versions of the wait be separate methods. This gives us three methods in total. Here’s our scaffolding:

function CondVariable() {
    // initialize here
    return {
        // mutex from previous posts
        wait: (mux) => {
            // TODO
        },
        waitAsync: async (mux) => {
            // TODO
        },
        notify: (count) -> {
           // TODO
        }
    }
}

We’ll start with the blocking wait and notify methods first, and then we’ll make the other methods from that. Also, let’s get things working before we do a timeout.

Subscribe now

Blocking Wait

Let’s start off with the lock/unlock code first in our wait method, and stub the other data.

wait: (mux) => {
    // TODO: do something here to indicate we're waiting
   
    mux.unlock()

    // TODO: do something here to wait until something changes

    mux.lock()
}

Well, so far so good. But now we have a problem. We need some sort of internal state that we can share across threads2.

So, we’ll need to update our constructor to take in some memory and offset positions. We’re going to use two pieces of state. The very first piece is going to be the counter we’re waiting on. The second piece of state is a safeguard to make sure our counter becomes a different value after we started waiting3. The safeguard becomes needed once we start getting more threads involved.

So, let’s update our constructor:

// prevOffset is used to mitigate wrapping behaviors
// valOffset is used to wakeup our thread
function CondVariable(memory, prevOffset, valOffset) {
    // initialize here
    return {

Now that we have our memory, let’s make our wait function. What we’re going to do is take our value that we’re waiting on, store it in our previous counter, and then wait for our value to change. By storing the previous value, we’ll allow our notifier to know what we waited on, and guarantee we get something unique regardless of how many threads are competing. Here’s the code:

wait: (mux) => {
    const val = Atomics.load(memory, valOffset)
    Atomics.store(memory, prevOffset, val)
   
    mux.unlock()

    Atomics.wait(memory, valOffset, val)

    mux.lock()
}

Now let’s write our notification code:

notify: (couunt) => {
    const val = Atomics.load(memory, prevOffset)
    Atomics.store(memory, valOffset, (val + 1) | 0)
    Atomics.notify(memory, valOffset)
}

Notice something odd about this? We store the current value into previous, and then when we notify we do one more than previous, not one more than the current address.

The reason for this is to “rollback” when there are multiple waiters. Each waiter only cares if the value is different from “previous” when it’s woken up, not if it’s different from “previous + 1”. So, we can reuse the same value (“previous + 1”) to wake up all of our threads.

There’s also another odd thing. We’re doing “(val + 1) | 0”. This emulates signed 32-bit integers which gives us very specific overflow patterns - in this case, two’s complement wrapping. If we didn’t do this, at a certain point we’d hit the maximum “safe integer” range in JavaScript and we’d get stuck at the same number forever. So, it’s best that we define the wrapping mechanism.

The async code is very similar. Here it is:

waitAsync: async (mux) => {
    const val = Atomics.load(memory, valOffset)
    Atomics.store(memory, prevOffset, val)
   
    mux.unlock()

    const {async, value} = Atomics.waitAsync(memory, valOffset, val)
    if (async) { await value }
    else { await new Promise(r => r()) }

    await mux.lockAsync()
}

Async and Timeouts

Adding timeouts is very straightforward. We don’t have any loops, but we do have two separate calls that need timeouts. We’ll still do the timeout adjustment.

wait: (mux, timeout = Infinity) => {
    const start = Date.now()
    const val = Atomics.load(memory, valOffset)
    Atomics.store(memory, prevOffset, val)
   
    mux.unlock()

    if (Atomics.wait(memory, valOffset, val, timeout) === 'timed-out') {
        return false
    }

    if (!Number.isFinite(timeout)) {
        const end = Date.now()
        const elapsed = end - start
        timeout -= elapsed
        if (timeout <= 0) return false;
    }

    mux.lock(timeout)
    return true
},
waitAsync: async (mux, timeout = Infinity) => {
    const start = Date.now()
    const val = Atomics.load(memory, valOffset)
    Atomics.store(memory, prevOffset, val)
   
    mux.unlock()

    const {async, value} = Atomics.waitAsync(memory, valOffset, val, timeout)
    if (async) {
        if (await value === 'timed-out') return false
    }
    else if (value === 'timed-out') return false
    else await new Promise(r => r())

    if (!Number.isFinite(timeout)) {
        const end = Date.now()
        const elapsed = end - start
        timeout -= elapsed
        if (timeout <= 0) return false;
    }

    return await mux.lockAsync(timeout)
}

Wrap Up

Well, that’s it! We’ve created a new synchronization primitive. Obviously, we need to spend more time figuring out how to use it (that’s next time!). However, between atomics, futexes, mutexes and condition variables we can basically create any other primitive we want. We can even start creating some more “modern” synchronization patterns, like Go’s WaitGroup and Channels, or C++20’s barrier. Read/Write locks are also something we can make.

All of which I’ll be getting to in future posts, so stay tuned!

We can do the built-in postMessage and onmessage style message passing, but we’re trying to move beyond that and do everything with SharedArrayBuffers.

In fact, we’ll need two pieces of internal state.

Remlab has a great article about why the two variables are needed: https://www.remlab.net/op/futex-condvar.shtml

Better Mutexes in JavaScript

Matt Tolman — Sun, 15 Feb 2026 02:29:29 GMT

Previously we talked about some of the issues that can happen when doing multi-threading. Some of the mitigations involved used timeouts or “try” locks without actually blocking. One other issue we didn’t really discuss is that we can’t call blocking atomic methods (i.e. Atomics.wait) inside our main thread (basically our main JavaScript code).

If we take a step back and look at the mutexes we’ve made previously, we’ll notice that we’re lacking on all of those fronts. We block, so our main thread can’t be used. We don’t offer timeouts. And we don’t offer a “try” to lock mechanism.

Let’s fix that.

For a refresher, here’s our mutex code (with some minor cleanup to be more readable):

function mutex(memory, offset) {
    const unlocked = 0
    const locked = 1
    const contended = 2
    return {
        lock: () => {
            // Try to get the lock (will only lock if we're unlocked)
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) {
                return // got the lock
            }

            while(true) {
                // signal contention
                if (cur !== contended) {
                    Atomics.compareExchange(memory, offset, locked, contended)
                }

                // Wait until we're unlocked
                Atomics.wait(memory, offset, contended)

                // try to lock again
                cur = Atomics.compareExchange(memory, offset, unlocked, contended)

                if (cur === unlocked) {
                    return // got the lock
                }
            }
        },
        unlock: () => {
            // try to unlock
            if (Atomics.sub(memory, offset, 1) !== locked) {
                // Lock was contended, so we need to unlock and signal
                Atomics.store(memory, offset, unlocked)
                Atomics.notify(memory, offset, 1)
            }
        }
    }
}

So, now we need to create two new methods: tryLock and lockAsync. We also need to insert a timeout as an optional parameter to both lock and lockAsync.

tryLock

Try lock is fortunately very easy. All we do is try to lock without doing the loop. If we fail to lock, we return false, otherwise we return true. We can do this by simply copying our lock code up until the while loop, and then adjust the return values. Here’s tryLock.

    tryLock: () => {
            // Try to get the lock (will only lock if we're unlocked)
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) {
                return true // got the lock
            }
            return false // didn't get the lock
        },

lockAsync

Making an async version of our lock method is also fairly straightforward. We mostly need to swap our Atomics.wait call with Atomics.waitAsync, and then await the promise we get back.

Except, we don’t always get a promise back. So, we can only await sometimes.

If we look at the documentation, we’ll get an object back with an async flag, and a value. If the async flag is true, then value has a Promise. Otherwise, it has a string.

With that information, we can now create an async lock method.

        lockAsync: async () => {
            // Try to get the lock (will only lock if we're unlocked)
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) {
                return // got the lock
            }

            while(true) {
                // signal contention
                if (cur !== contended) {
                    Atomics.compareExchange(memory, offset, locked, contended)
                }

                // Wait until we're unlocked
                const {async, value} = Atomics.waitAsync(memory, offset, contended)
                if (async) {
                    await value
                } else {
                    // Suspend for one micro-tick
                    await new Promise(res => res())
                }

                // try to lock again
                cur = Atomics.compareExchange(memory, offset, unlocked, contended)

                if (cur === unlocked) {
                    return // got the lock
                }
            }
        },

Adding Timeouts

Adding timeouts is a little trickier, but not by much. Both wait and waitAsync will provide a timeout value on a timeout, so we’ll just check the result (in the waitAsync case it’s the value field when async is false). We’ll also need to check the result of awaiting the promise, in case that times out. Once we know if we’ve timed out or not, we will need to copy the “true/false” behavior from tryLock.

The only tricky thing is what happens when we loop. If we failed to get the lock after a wait, then we need to update our timeout based on how much time has passed. We’ll do that by tracking the start time, and then at the end of the loop we’ll query the current time. We’ll then subtract the start time from the current time to get the elapsed time. As for the timeout, we’ll default the timeout value to Infinity - which means no timeout.

Let’s take a look at how to adjust the lock function (lockAsync is very similar).

Similar adjustments can be made for the lockAsync function.

With this, we’ve added timeouts to our mutex.

        lock: (timeout = Infinity) => {
            // Try to get the lock (will only lock if we're unlocked)
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) {
                return true // got the lock
            }

            // Time to track the time to make sure we can timeout when needed
            let lastTime = Date.now()

            while(true) {
                // signal contention
                if (cur !== contended) {
                    Atomics.compareExchange(memory, offset, locked, contended)
                }

                // Wait until we're unlocked, or until we timeout
                const res = Atomics.wait(memory, offset, contended, timeout)

                // Check for a timeout
                if (res === 'timed-out') {
                    return false; // timed out
                }

                // try to lock again
                cur = Atomics.compareExchange(memory, offset, unlocked, contended)

                if (cur === unlocked) {
                    return true // got the lock
                }

                // Check elapsed time and then update the timeout (if we have one)
                if (Number.isFinite(timeout)) {
                    const curTime = Date.now()
                    const elapsed = curTime - lastTime
                    timeout -= elapsed

                    // Make sure we didn't timeout
                    if (timeout <= 0) {
                        return false // timed out
                    }
                }
            }
        },

Wrap Up

We’ve added try locks, async, and timeouts to our mutexes, making them far more robust and versatile to use for our JavaScript code.

For reference, here’s the full mutex with all of our improvements.

function mutex(memory, offset) {
    const unlocked = 0
    const locked = 1
    const contended = 2
    return {
        /**
         * Locks the mutex (blocking)
         *
         * If given a timeout, then it will try to lock before the timeout occurs, otherwise it will fail to lock
         *
         * @param timeout Timeout (in milliseconds) for obtaining the lock
         * @returns {boolean} True if got the lock, false if timed out
         */
        lock: (timeout = Infinity) => {
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) { return true /* got the lock */ }
            let lastTime = Date.now()

            while(true) {
                if (cur !== 2) {
                    Atomics.compareExchange(memory, offset, cur, contended)
                }
                const r = Atomics.wait(memory, offset, contended, timeout)
                if (r === "timed-out") {
                    return false
                }

                cur = Atomics.compareExchange(memory, offset, unlocked, contended)
                if (cur === unlocked) {
                    return true /* got the lock */
                }

                if (Number.isFinite(timeout)) {
                    let curTime = Date.now()
                    let elapsed = curTime - lastTime
                    timeout -= elapsed
                    lastTime = curTime
                    if (timeout <= 0) {
                        return false
                    }
                }
            }
        },
        /**
         * Asynchronously locks a mutex.
         * Returns a promise which resolves to true if the lock was obtained, or false otherwise
         * @param timeout Timeout (in milliseconds) for obtaining the lock
         * @returns {Promise} Promise that resolves to true if got the lock, false if timed out
         */
        lockAsync: async (timeout = Infinity) => {
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) { return true /* got the lock */ }
            let lastTime = Date.now()

            while(true) {
                if (cur !== 2) {
                    Atomics.compareExchange(memory, offset, cur, contended)
                }
                const {async, value} = Atomics.waitAsync(memory, offset, contended, timeout)
                if (async) {
                    const r = await value
                    if (r === 'timed-out') {
                        return false
                    }
                }
                else if (value === 'timed-out') {
                    return false
                } else {
                    await new Promise(res => res())
                }

                cur = Atomics.compareExchange(memory, offset, unlocked, contended)
                if (cur === unlocked) {
                    return true /* got the lock */
                }

                if (Number.isFinite(timeout)) {
                    let curTime = Date.now()
                    let elapsed = curTime - lastTime
                    timeout -= elapsed
                    lastTime = curTime
                    if (timeout <= 0) {
                        return false
                    }
                }
            }
        },
        /**
         * Tries to get a lock without waiting. Only locks if the mutex is unlocked and not contended
         * @returns {boolean} True if it got the lock, false otherwise
         */
        tryLock: () => {
            // Try to get the lock (will only lock if we're unlocked)
            let cur = Atomics.compareExchange(memory, offset, unlocked, locked)
            if (cur === unlocked) {
                return true // got the lock
            }
            return false // didn't get the lock
        },
        /**
         * Unlocks the mutex
         */
        unlock: () => {
            if (Atomics.sub(memory, offset, 1) !== locked) {
                Atomics.store(memory, offset, unlocked)
                Atomics.notify(memory, offset, 1)
            }
        }
    }
}

Deadlocks and Starvation in JavaScript

Matt Tolman — Tue, 10 Feb 2026 01:30:38 GMT

Previously I showed how to share memory between threads and create mutexes inside of JavaScript. Which is really cool since it avoids the overhead of creating, queuing, and polling messages when passing simple data. And it can make it a lot easier for multiple threads to share data (the alternatives are broadcasting data or doing a map-reduce model).

However, not all is calm in multi-threading land. As soon as we bring multiple threads in, we get a whole host of problems (data races, logic races, starvation, etc.). And now that we’ve brought in locks, we get even more problems (deadlocks). Before we get too much further into all the cool ways to create primitives or to do cool things with threads, I wanted to take a step back. The rest of our journey will have many perils, so let’s rest and tell tales before continuing.

Starvation

Starvation is when greedy threads repeatedly take the resources from other threads, thereby blocking (or starving) them from making any progress. Signs of starvation include some threads perpetually hanging and some tasks being unusually delayed while other tasks are going through just fine. Another sign is tasks that are way too old to be running end up getting completed (and sometimes corrupting data) of more recent tasks.

Depending on the environment and severity, starvation can start to affect other resource uses than just CPU time or task completion time. Memory usage and disk usage can grow as work gets queued but threads don’t get resources to process them.

Thread starvation is related to thread contention, but in the way that starvation is a more serious (and dangerous) condition than contention. Contention is when all threads still make some progress, but many (or all) are getting slowed down by synchronization (aka. your program runs slower). Starvation is when the contention gets so out-of hand that some threads are unable to make any progress.

The risks and dangers of starvation is very real, even for massive tech companies. And, paradoxically, some of the same patterns that help with contention increase the risks of starvation.

Common Patterns which cause Starvation

One such pattern is exponential back-off. The idea is simple, if a resource is highly shared, then having every thread constantly trying it can make every thread slower. Additionally, sometimes the resource has limits (rate limits, throughput limits, etc.) so hammering the resource at the same rate can overwhelm the resource. The most common solution to this is to simply “back-off” with longer and longer time gaps, that way if the resource is contended we give it additional time to be in a non-contended state. Often this is achieved by doubling (or some other constant multiplication) the wait period. The doubling is why this is called “exponential” back-off as the wait period grows exponentially, which decreases the load on the contended resource massively. Sounds good, right?

Well, almost. What happens when a resource is highly contended? What happens when a resource is locked 80% or 90% or 99% of the time? In these scenarios, when a thread tries to get a lock, it will fail to get a lock 80% or 90% or 99% of the time. If it fails to get a lock, it will wait longer and longer between each retry.

Exponential back-offs become very long very quick. At an initial 1ms delay and a doubling each time, it only takes 10 retries before it’s been a full second. At 1 second, modern CPUs have wasted 3-4 billion cycles. After 16 retries, it’s been over a minute. That’s 180-240 billion cycles.

Another common cause is simply over-allocating threads or over sharing data between threads. More threads doesn’t always make things faster, and instead can make things slower due to contention. The more threads, the more contention, and the more likely some poor thread gets repeatedly locked.

More shared data isn’t good either, and it may be more acceptable to duplicate and copy data than to share it. Generally, it’s best to minimize the amount of data shared between threads as much as possible.

Other common patterns involve overuse of read-write locks, over locking or locking for extended periods of time, and doing network requests or expensive computation tasks while locked.

Read-write locks are an optimization that sounds good (and is often very good for usual performance). However, they have a dark side too. Many read-write locks are not “fair” in that they don’t prioritize readers equally to writers. Instead, they will prefer one caste over the other - usually readers since they can be more concurrent. The segregation of read-write locks can cause massive issues if the preferred caste is extremely pervasive in usage. For instance, a read-preferring lock will starve write locks if there is always at least one active reader - a situation that can become very common in high-traffic scenarios.

Over locking is another issue where more locks are acquired than needed, or when locks are overused when simple atomic operations would have sufficed. Locks are blocking, and by being blocking they cause open up starvation points.

Holding locks for long periods of time also causes starvation. If a lock is held while performing expensive calculations or network requests, then that means no other thread can proceed until those calculations are done. If no other threads can proceed, then they end up creating a backlog of stuck threads. Stuck threads create contention, and that can lead to starvation.

Mitigations

The first mitigation is less about preventing starvation, and more about detecting it (and preventing it from causing outdated data to be propagated). It’s also rather simple: add enforced timeouts. If something takes too long, kill it and raise an alert. Get too many alerts, and there’s a starvation/high contention problem. The downside to this approach though, is that something you wanted to get done didn’t actually get done. That little fact can cause it’s own set of issues. But, in many scenarios, it’s far better for stale work to not get done then for it to get done and cause other issues.

A similar approach specific to retries (especially with exponential back-off) is to limit the number of retries allowed. If we only allow 5 retries before failing, then we guarantee we will succeed or fail in a fixed time period.

Another way to prevent starvation is to reduce contention. A simply solution is to reduce the number of threads (usually by using thread pools instead of spinning up a new thread per task). Other ways involve reorganizing code and memory access patterns to not require nearly as much sharing (prioritize independence over synchronization). Reducing the number of locks obtained (while maintaining correctness) also helps. Optimizing (or removing) work done in a locked section will help prevent threads from backing up. All of these strategies can involve new data structures (e.g. concurrent queues), large refactors of the code, or new algorithms and coding patterns (e.g. map-reduce).

Using wait-free algorithms and data structures can also help as they ensure that every thread makes progress. However, there are many drawbacks. Many wait-free algorithms have higher base latency, meaning a lock system with low contention can outperform a wait-free algorithm. Additionally, wait-free algorithms are hard to implement, hard to verify, and even harder to invent. Not every situation may have an appropriate wait-free solution.

Data Races

Data races are a very simple type of race condition. A program reads a value, modifies a value, and then that work becomes stale, and finally the stale value is written back.

Work becomes stale when another thread, process, or computer modifies the original value before the work was written back. We saw a very simple example earlier when adding to the same address from different threads. However, this type of problem can arise in much more complex code or environments. It can happen with outdated cache layers, contention at the database layer, lack of idempotence in event processing, and lack of locks inside a program.

Generally, data races simply occur when there aren’t enough locks around shared data (in memory or on disk). Often, the solution is to either add more locks or reduce the need for shared memory. However, that’s not the only solution.

One other solution is to use more “compare exchange” semantics. Basically, perform an atomic read, do the calculation, and then do a compare exchange to write the answer back only if the value didn’t change. If it did change, read it, recalculate, and retry. This approach is called “optimistic concurrency” in that it’s optimistic a lock won’t be necessary since the value rarely changes.1 Of course, only use optimistic concurrency when that holds true, as unfounded optimism can lead to headaches when debugging production issues.

One side note, out of all the issues that we’ll talk about today, data races is the only concurrency issue Rust addresses when they are limited to inside a program’s memory! Data races across servers, cache layers, and databases aren’t covered by Rust’s compile time detection as it has no insight into those systems. High contention, starvation, logic races, and deadlocks aren’t covered either.

Logic Races

Logic races are similar to data races in that some work is stale (or incorrect) based on the work another thread did. However, the difference is that the race condition didn’t happen because some memory was changed instead of being locked. Instead, it has to do with a mismatched ordering of systems or tasks running.

To help clarify this distinction, we’ll use an example. Suppose that Jim has $30 in his account. He goes to the bank and deposits a $200 check. Simultaneously he purchases a $60 game on his phone while the banker is processing the check. In a data race, we would get a sequence like the following:

Banker machine reads current balance of $30
Jim’s phone reads current balance of $30
Banker machine adds $200 to the balance resulting in $230
Jim’s phone deducts $60 from the balance resulting in ($30) and an overdraft fee
Banker machine writes $230 to the account
Jim’s phone writes ($30) to the account and an overdraft fee

In this scenario, Jim’s $200 disappeared due to a data race - the actual data of his balance was not properly locked.

Now, let’s look at a logic race. We have two possible scenarios.

Scenario 1

Banker machine lock’s Jim’s balance
Jim’s phone waits for lock
Banker machine updates Jim’s balance to $230 and unlocks
Jim’s phone locks Jim’s balance
Jim’s phone deducts $60, sets new balance to $170, and unlocks

Scenario 2

Jim’s phone locks’s Jim’s balance
Banker machine waits for lock
Jim’s phone sets Jim’s balance to ($30), adds an overdraft fee, and unlocks
Banker machine locks Jim’s balance
Banker machine sets Jim’s balance to $170 and unlocks

In this scenario, we still end up with the correct final balance of $170. But, in Scenario 1 Jim doesn’t have an overdraft fee, yet in Scenario 2 he does! This difference based on ordering of simultaneous events is called a logic race and is a separate type of race condition from data races. What’s even more insidious is they’re incredibly hard to detect ahead of time2, and very hard to replicate.

To make matters worse, the code often appears correct during code reviews, and the code will run correctly during testing since it is correct when ran sequentially (i.e. banker first, then Jim’s phone). These types of bugs can also lay hidden for years in a production system where simultaneous events are extremely rare. And tracking these types of bugs down can be downright maddening as they appear and disappear like phantoms in the night. Validating a fix is also almost impossible. How do you check that a bug which only occurs once in tens of millions of button clicks is actually fixed?

The other worst part of these bugs is adding any other synchronization step (debugger, printing or logging messages, allocating memory) can be enough to add a reliable-enough ordering to prevent the bug from appearing inside a development or test environment. Truly maddening stuff.

Deadlocks

Deadlocks are one of the most dreaded concurrency bugs as they don’t just affect one thread or one customer. They can bring entire systems and products to their knees.

A deadlock is a simple. Multiple threads get stuck waiting for a lock they will never get. This isn’t starvation where they won’t get it for a long, long time or there’s a small probability that they won’t get it. No. This is never. As in, not even with infinite time.

There are a few ways a deadlock can happen. The first, is that a thread never calls unlock on a lock. Often, this happens because either the thread threw an exception and didn’t have the unlock in a finally block, or because the thread unexpectedly terminated prematurely - bypassing the finally block altogether3. This is the easy case, both to create, find, and fix.

The other common way a deadlock happens is when multiple threads need multiple locks, but the way they try to get the locks causes errors.

For instance, assume we have two threads (1 and 2) and two locks (A and B).

Thread 1 acquires lock B
Thread 2 acquires lock A
Thread 1 does some work, but realizes it now needs lock A
Thread 1 blocks on lock A, waiting for Thread 2 to finish
Thread 2 does some work, but realizes it cannot finish without lock B
Thread 2 blocks on lock B, waiting for Thread 1 to finish
Both threads are now waiting for the other, but neither thread can finish

Congratulations! We just made a deadlock. And these types of deadlocks are really easy to create accidentally. In the above example, we were doing some work, but found we needed more locks than we originally had. When we tried to get them, we ended up getting blocked, forever.

Common Deadlock Patterns

Commonly, this type of deadlock I’ve ran into when dealing with typical Object Oriented Programming (OOP) and/or concurrent data structures. In OOP, there’s a very heavy emphasis on encapsulation where as much as possible is private. Often, this includes implementation details like locks. In OOP, there’s also this idea of reuse through interfaces and hierarchies. So when we pass a class as a parameter, we don’t actually pass it as that specific class. We pass it as a higher-level abstraction with no relation to the implementation. Meaning, the code calling the class has no idea if the method will block or not.

This means that we could end up having one thread start in class WorkQueue and then end up calling into class TaskProcessor, all while another thread starts in TaskProcessor and calls WorkQueue. And if both TaskProcessor and WorkQueue have class-wide locks, then we’ve made a deadlock!

Mitigations

Fortunately, there are mitigation strategies. First, having a thread lock everything it will (or will possibly) need at the start of a critical section can go a long way. It prevents the “I have resource A, but now I need B” scenario we saw. That said, when locking everything needed, make sure it’s done in the same order for every thread (i.e. lock A then B everywhere)! Otherwise, you’ll still end up with deadlocks.

For one, since we’re locking up-front, that means we haven’t done any critical work yet. So, if we fail to get even one lock, we simply unlock everything, wait for a random period of time (that way we minimize the chance we have contention on our next retry), and then retry to lock. This prevents deadlock (some thread can now make progress), but we could end up in starvation if we aren’t careful (one thread is in a constant unlock-and-retry cycle).

If we want a strategy that works regardless of whether we lock up-front or as needed, then we need to adjust things a little more. We can introduce a “try-lock” method. A “try-lock” will try to acquire a lock without blocking. If it fails to lock, it will throw an error (or return an error code/false boolean - all depends on your coding pattern!). We can now use this to retry acquiring the inner lock while doing a back-off with a retry limit. Again, this approach will cause some tasks to fail, but it’s better than blocking the whole system.

Another strategy is to add a timeout to every lock. That way, if we hit a deadlock, the system will abort the wait, we’ll unlock, and end with a failure. Not ideal to fail a task, but at least the system continues moving forward. Though, keep in mind that these timeouts will need to be short enough to prevent the whole system from stalling (as we still have a lock while we’re waiting), all while being long enough to prevent false positives (i.e. waiting on a resource that will be freed if we waited just a little longer).

Thrashing and Contention

These last two are more performance-related rather than “end the world” related. Threading does not automatically make programs faster. And, throwing more threads at the problem only works up to a point. After that, it makes things worse.

For one, adding more threads causes more contention on locks. More contention causes more waits, more waits increases the chance for starvation. However, contention and starvation aren’t the only issues with more threads.

There’s also thrashing.

CPUs only see the state of the currently running thread. The operating system uses some special hardware interrupts to schedule a switch to another thread. Switching to another thread involves unloading the memory of the current thread (the instructions/code, the registers, the cache, etc.), saving it off to memory, and then loading the other thread from memory. This process also clears out pipelines, and requires re-warming the CPU cache. This whole process is called “context switching.”

Context switching isn’t bad, it’s just a necessary part of having threads. It’s also a very slow or costly experience, so it’s best to minimize it.

Unfortunately, the more threads there are trying to do stuff, the more the operating system needs to trigger context switches, which means the slower everything goes. You may have noticed this when you open way too many applications at once, and they’re all trying to do stuff. The OS tries it’s best to schedule each one to do it’s work, but the time it takes the hardware to switch causes slow downs.

Overloading the hardware with context switches is “thrashing.” We’re basically beating down the hardware with so many context switches that it’s unable to get anything done.

Fortunately, modern CPUs and OS’s are actually really good at handling a lot of threads, so having slightly too many threads won’t make a big noticeable difference for us. However, trying to spawn hundred or thousands of threads definitely will cause issues.

The general rule of thumb is to spawn 1 - 2 threads per logical core4. 1 thread if it’s not going to be blocked waiting for I/O very often, 2 if it will be (that way one thread can work while the other is loading from disk).

So, how do we know how many logical cores there are? By using navigator.hardwareConcurrency - which is a number that tells us how many cores there are.

That said, on the web we have a little more to worry about. We don’t really want to go too high above the logical core count for our entire site. So if we’re using a lot of Worker we need to keep in mind how many active tabs or contexts the user will have at once. Whereas if we’re using a global SharedWorker we need to keep in mind how many legacy versions of the worker will be on the user’s machine at once.

For the Worker scenario, we may need to scale back our worker count or limit how many iframes we embed. For the SharedWorker we may need to consider forced refreshes of the site if the code gets too stale.

Wrap Up

We looked at a lot of different edge cases and issues that arise from multi-threaded programming. They aren’t impossible to deal with - code has been dealing with them for decades now. But, it’s also good to get familiar with the concepts at a high-level prior to diving head first into more complex multi-threaded programming.

Next time will be more about code rather than theory, as I’ll be going into some more synchronization primitives and their inner workings. I’ll first revisit the Mutex in more depth, and we’ll create an async version that uses promises instead of blocking a thread. We’ll also add timeouts to it as well.

For optimistic concurrency, it is very important that the calculation is side-effect free (e.g. doesn’t send emails or update the database record) as it will be ran repeatedly in a loop. Also, be wary with where it’s used. If optimistic concurrency is used where there are frequent writes to memory, then it could cause starvation as writes get continually stuck in recalculation loops!

Rust cannot detect logic races at compile time, only the easier data race subset.

For Rustaceans, think “a thread paniced while it held a lock.” Also, Rust doesn’t prevent deadlocks, and badly misplaced panics can cause deadlocks as it kills the thread and not the process.

Logical cores include things like hyper-threaded cores (basically a “virtual” core) that software can use. Since it’s what our code can use, it’s what’s generally what’s preferred in the software layer.

JavaScript Mutexes

Matt Tolman — Thu, 05 Feb 2026 01:21:23 GMT

In my previous post, I introduced shared memory to JavaScript workers. Sharing memory ended up introducing data races. We fixed them using atomics, but atomics can only get us so far. At some point, we’ll need more than just a simple add, subtract, load or exchange. We’ll need to lock for longer, more complex logic.

Mutexes (mutual exclusion locks) allow us to do just that. They’re one of the fundamental threading primitives in most languages. A mutex lets us “lock” on a memory location and then later unlock. And they’d let us synchronize entire blocks of code.

So, let’s just pull out a mutex and…

Um, JavaScript doesn’t have mutexes1. Instead, they have three little functions: Atomics.wait, Atomics.waitAsync, and Atomics.notify. The waitAsync function will give a promise2 so that it can be used in the main thread. The wait function just blocks, so it cannot be used in the main thread. For this article, I’m just going to focus on wait for simplicity. If you need to use waitAsync, you’ll just need to translate over to promises.

Okay, so we have functions to wait and notify on a memory address. But, those aren’t mutexes. So, how does this help us?

Well, it turns out that wait and notify form the foundation of a futex - which is an even lower-level primitive that we can use to make a mutex. A futex lets us wait on an address if the address stores a specific value. It also lets us change the value at that address, and then signal to one or more waiters that the value has changed and they should stop waiting. It’s basically a way to wait on a condition and control the signal that the condition changed.

And this was the last piece we needed. With futexes and atomic operations, we can build up any sort of synchronization primitive - whether it be mutexes, semaphores, condition variables, barriers, wait groups, or anything else. They end up being really powerful - but are also tricky to use.

To make matters worse, JavaScript doesn’t really have good multi-threaded debuggers, sanitizers, or analysis tools. The whole “threading in JavaScript” is really new, which means developing our own primitives inside JavaScript will be pretty tricky.

So, we won’t develop them in JavaScript. We’ll develop them in a language with really good tooling for making your own primitives that people normally give you, and then translate the result into JavaScript.

Out of all the languages to do it in, the one I found is the best at having those types of tools is C. Why? Because every operating system provides futexes, and a lot of developers in C make their own libraries (including “standard” libraries), and a lot of developers in C make tools to improve C, and there’s a lot more papers about making these types of primitives that assume either C or C-style C++. So, it’s our best shot at the initial design and validation prior to a translation.

That said, I’m not going to use a lot of the scary features in C. I’m not going to use malloc or free or macros. I won’t use the Win32 API (just the linux libraries) and I won’t use inline assembly. I am going to use pointers, but those are just memory addresses (basically the index into our shared memory buffer that something lives).

So, let’s get started.

Building our Mutex

For this mutex, I did a lot of looking into a lot of implementations people put online3. I did find some back and forth between the articles where some people were trying to show someone else’s implementation was slow or incorrect in a pathological edge case (like missing the lock some 2^64-1 times in a row). Eventually, I landed on a design with the following lock steps:

A single integer is stored at the mutex address
The mutex address is what is passed to the lock/unlock functions
The integer starts at 0 which signifies “unlocked”
When locking, a compare exchange is done to try to turn from 0 to 1 (locked)
If the compare exchange succeeds, then the lock is obtained
Otherwise, we enter a loop
In the loop, we do a compare exchange to set the value to 2 (contested)
We then wait on the integer’s memory address until it changes from 2 to something else
Then we try to acquire the lock and loop again if needed

For unlocking, I do the following:

Subtract 1 from the atomic value. If I end with 0, then no one was waiting and I return
Otherwise, someone was waiting, so I signal that I’m done and wake up 1 thread

The C code I ended up with looks like the following:

#include 
#include 
#include 
#include 

typedef int Mutex;

void futex_wait(int* addr, int waitWhen);
void futex_wake(int* addr, int amount);
int compare_exchange(int* addr, int* expected, int desired);
int fetch_sub(int* addr, int val);
void store(int* addr, int val);

// GCC, linux
void lock(Mutex* mux) {
    int cur = 0;

    // do a compare exchange (update if the value at the address matches our expected)
    // cur is our "expected" value
    // this always sets cur to what was at the address
    // it returns true if the exchange happened
    if (compare_exchange(mux, &cur, 1)) {
       // we got the lock!
       return
    }

    do {
        // indicate contention
        if (cur != 2) {
            compare_exchange(mux, &cur, 2);
        }

        futex_wait(mux, 2);
        cur = 0;

    // if we had contention, assume there are other threads when we lock
    } while (!compare_exchange(mux, &cur, 2);
    // we got the lock
}

void unlock(Mutex* mux) {
    if (fetch_sub(mux, 1) != 1) {
        // always indicate we're unlocked
        // contended threads will indicate that there is contention
        store(mux, 0);

        futex_wake(mux, 1); // only wake 1 thread
    }
}

void futex_wait(int* addr, int waitWhen) {
    // we put it in a loop since the OS may cause spurious wake ups
    do {
        if (syscall(SYS_futex, addr, FUTEX_WAIT, 2, 0, 0, 0) == -1) {
            if (errno == EAGAIN) return; // value changed, no wait needed
            perror(); abort(); // something went wrong!
        }
    } while (__atomic_load_n(addr, __ATOMIC_SEQ_CST) == waitWhen);
}

int compare_exchange() {
    return __atomic_compare_exchange_n(
        mux,
        &cur,
        1,
        false,
        __ATOMIC_SEQ_CST,
        __ATOMIC_SEQ_CST
    );
}

int fetch_sub(int* addr, int val) {
    return __atomic_fetch_sub(ptr, val, __ATOMIC_SEQ_CST);
}

void store(int* addr, int val) {
    __atomic_store_n(addr, val, __ATOMIC_SEQ_CST);
}

void futex_wake(int* addr, int amt) {
    if (syscall(SYS_futex, addr, FUTEX_WAKE, amt, 0, 0, 0) != -1) {
        perror(); abort(); // something went wrong
    }
}

That’s, a lot of code. But most of it is fairly straightforward. We have some wrapper methods to hide verbose function names or functions with large parameter lists. We’re using a lot of atomics (which are wrapped), and we’re doing a futex wait/wake (wake being the C equivalent of notify).

The main takeaway is that we do 0 for unlocked, 1 for locked, and 2 for contended. We then have logic split between the locker and unlocker to maintain that differentiation.

As far as how it works, I’ve ran the above code (with some modifications for namespacing and additional features like timeouts) thousands of times on multiple machines, both with and without thread sensitization turned on. That’s not to say it’s “perfect” or “bug-free” - only that it works well enough for me, so that’s what we’ll translate over.

Translating to JavaScript

Translating the C code isn’t all that difficult. Our pointers do become a “memory array + offset”, and we use the Atomics methods rather than syscalls. It’s fairly straightforward, though there are some differences in the C and JavaScript atomic APIs. That said, it’s not all that complicated.

function mutex(memory, offset) {
    return {
        lock: () => {
            let expected = 0

            let cur = Atomics.compareExchange(memory, offset, expected, 1)
            if (cur === expected) {
                // got the lock
                return
            }

            while(true) {
                if (cur !== 2) {
                    expected = cur
                    Atomics.compareExchange(memory, offset, expected, 2)
                }

                Atomics.wait(memory, offset, 2)

                expected = 0
                cur = Atomics.compareExchange(memory, offset, expected, 2)

                if (cur === 0) {
                    // got the lock
                    return
                }
            }
        },
        unlock: () => {
            if (Atomics.sub(memory, offset, 1) !== 1) {
                Atomics.store(memory, offset, 0)
                Atomics.notify(memory, offset, 1)
            }
        }
    }
}

The biggest difference now is that instead of passing only one memory address to our workers, we need to pass two: one for the working data, and another for the mutex location. Our worker code now looks like the following:

importScripts('mutex.js') // load our mutex code

let memory = new Int32Array(new SharedArrayBuffer(1024))
let offset = 0
let muxOffset = 1 // must be different than offset!

onmessage = (e) => {
    if (e.data.type === 'init') {
        memory = new Int32Array(e.data.memory) // use the memory buffer were given
        offset = e.data.offset
        muxOffset = e.data.muxOffset
        // don't respond
    }
    else if (e.data.type === 'run') {
        const mux = mutex(memory, muxOffset)
        let lastRead = 0
        for (let i = 0; i < 200; ++i) {
            mux.lock()
            try {
                memory.set([memory.at(offset) + 1], offset)
                // we can do more instructions now!
                lastRead = memory.at(offset)
            } finally {
                // unlock (in finally so we always run it)
                mux.unlock()
            }
        }
        postMessage({final: lastRead})
    }
}

Our worker has been updated! Time to update our runner so we pass both offsets.

/// our config
const memory = new SharedArrayBuffer(1024)
const arr = new Int32Array(memory)
const muxOffset = 1
const offset = 2
const numRunners = 4

const runners = []

for (let i = 0; i < numRunners; ++i) {
    runners.push(run(memory, muxOffset, offset))
}

await Promise.all(runners)

console.log("Memory data: ", arr.at(offset))

function run(memory, muxOffset, offset) {
    const worker = new Worker('mutex-worker.js')
    const wait = new Promise((resolve) => {
        worker.onmessage = (e) => {
            resolve(e.data)
        }
    })
    worker.postMessage({type: 'init', offset, muxOffset, memory})
    return wait
}

If we run that code, we see our expected final result of 800!

Of course, there is more to multi-threading. However, this should be enough to get you started. You’ve seen how to synchronize, and how to take C code and convert it over to JavaScript. With this, you should be able to turn my other C threading primitives (semaphores, condition variables, barriers) over to JavaScript.

When I next post about JavaScript threads, I’ll talk about some of the other edge cases you’ll hit (deadlocks, starvation, and logic race conditions).

They do have the Web Locks API - but that’s more for Indexed DB and Local Storage locks. It’s based around string names rather than addresses in a shared array buffer. Not saying you can’t use it - ever hammer can be used not-as-intended. Just that it’s outside the topic of this series as we’re focused on the APIs intended for workers and shared memory - and those APIs are the Atomics interface added with everything else.

It actually gives an object with two fields, a boolean indicating if there is a promise and a promise. This is due to wait only waiting conditionally, more on that later.

Futexes are Tricky; Basics of futexes; Mutexes and Condition Variables using Futexes; Landing a new syscall: What is futex? and so many more that I lost the links to

Sharing Memory Across Threads in JavaScript

Matt Tolman — Tue, 03 Feb 2026 18:27:49 GMT

Previously I wrote about sharing threads across tabs. Now we’ll talk about sharing memory across threads.

Which is harder to get setup than simply using the API. It turns out, there are little hardware vulnerabilities called Spectre and Meltdown, and these rely on some strange hardware behavior when it comes to timing, threads, and shared memory. To mitigate this, shared memory in JavaScript contexts have security restrictions that your site must meet.

Making Shared Memory

The requirements is that your site must be in a secure context (i.e. localhost or HTTPS) and your site must be cross origin isolated. Cross origin isolated basically means that you’re page is only pulling resources from locations allowed in CORS, and that popups are only allowed for the same origin, and embedding is restricted as well. In other words, CORS has to be setup properly, and it needs to be at least somewhat restricted.

In my experience, this has usually meant that I can’t just use python3 -m http.server to run my shared memory code as it doesn’t setup CORS properly. Instead, I need to create a test server with proper CORS headers.

To check if you’re cross origin isolated, read the boolean value window.crossOriginIsolated. If it’s true, you’re good.

Once those security requirements are met, it’s time to actually share memory.

To share memory, we first create a region of linear memory1 to share. Then we send that memory off to the worker (either Worker or SharedWorker) for use. Once the memory is shared, we can use it to communicate data between threads, just like we would in threaded programming languages like Java and C++. To create shared memory, we simply create a SharedArrayBuffer.

const sab = new SharedArrayBuffer(1024); // 1KB
worker.postMessage(sab); // share it with the worker

Simple enough!

Using Array Buffers

So, now we need to create our workers that use the shared memory. Since multi-threading itself can be difficult, we’ll start with a worker that begins in an isolated, non-multi-threaded state which we can then “upgrade” to a multi-threaded state. That way we can separate shared memory issues from logic or implementation issues. Which will help a lot.

One of the first issues that we’ll run into with the linear memory approach is that array buffers (shared or non-shared) cannot be accessed directly. Instead, they need to be wrapped inside a typed array to “view” into the buffer. I don’t know why JavaScript did things this way, but that’s the way the standard went.

We’ll start with basic worker that just increments an index some number of times. Here’s our code:

let memory = new Int32Array(new ArrayBuffer(1024))
let offset = 0

onmessage = (e) => {
    for (let i = 0; i < 200; ++i) {
        memory.set([memory.at(offset) + 1], offset)
    }
    postMessage({final: memory.at(offset)})
}

Here we create an array buffer, and we get an integer array view over that buffer2. We then increment that index in the buffer 200 times, and then we return the final measurement at that buffer.

We can easily test our code by spinning up a worker and sending a message, like so:

const worker = new Worker('worker-01.js')
worker.onmessage = (e) => console.log(e.data)
worker.postMessage('run')

With that, we get back our {final: 200} just as expected. If we post another message, we’ll get {final: 400} and so on.

Nothing too surprising. Now, let’s update our worker to use a shared memory buffer. We’ll need to update our message receiving. While we’re in there, we’ll also pass in the offset we’re writing to. Here’s the new worker code:

let memory = new Int32Array(new ArrayBuffer(1024))
let offset = 0

onmessage = (e) => {
    if (e.data.type === 'init') {
        memory = new Int32Array(e.data.memory) // use the memory buffer were given
        offset = e.data.offset
        // don't respond
    }
    else if (e.data.type === 'run') {
        for (let i = 0; i < 200; ++i) {
            memory.set([memory.at(offset) + 1], offset)
        }
        postMessage({final: memory.at(offset)})
    }
}

Now, setting up our worker is a little more complicated, but not much. We’ll create a shared array buffer this time, and then we’ll send that to our worker and run it. We can then also wait for a response, and then read from our memory to make sure the values line up. Here’s our runner code:

// our config
const memory = new SharedArrayBuffer(1024)
const arr = new Int32Array(memory)
const offset = 1

await run(memory, offset)
console.log("Memory data: ", arr.at(offset))

function run(memory, offset) {
    const worker = new Worker('example-02.js')
    const wait = new Promise((resolve) => {
        worker.onmessage = (e) => {
            resolve(e.data)
        }
    })
    worker.postMessage({type: 'init', offset, memory})
    return wait
}

We get the same response back from the worker, but this time we can also read the data directly. Once we read the data, we see that we indeed have 200 in our main thread’s memory!

I am aware that the top-level await will require the method to be in an async function in most JavaScript/TypeScript engines. I’m omitting the wrapper async function for brevity.

Racing with Multiple Threads

Now that we have one thread writing to our data, let’s add more! We want all of them working towards the same goal (in this case, adding 200 to a piece of memory). It should be as simple as spawning more workers, and giving them each the same offset and memory, right? Let’s give it a try.

/// our config
const memory = new SharedArrayBuffer(1024)
const arr = new Int32Array(memory)
const offset = 1
const numRunners = 4

const runners = []

for (let i = 0; i < numRunners; ++i) {
    runners.push(run(memory, offset))
}

await Promise.all(runners)

console.log("Memory data: ", arr.at(offset))

// omiting run function, same as above example

Let’s run that and… we don’t get 800. At least, not always. In fact, we pretty much get different results in all of our test runs. For one of my runs, I got 688 total. When I looked at the messages from each of my runners for that run, I got 288, 288, 488, and 688. When I ran it again, I got 284, 303, 503, and 703. What’s going on?

Well, we introduced a race condition into our code. All of our threads are trying to read and write the same memory at the same time. These reads and writes don’t have any sort of sequencing guarantees, so they get interleaved arbitrarily which causes the odd data. To resolve this, we must instruct the code on how to sequence memory access.

Sequencing Shared Memory

The simplest method to sequence memory is with atomics. Atomics allow a single operation on memory (read, set, exchange, add, subtract) to be done in a single, sequenced operation. However, the catch is that only the atomic operation is sequenced. If we have two atomic operations, then they are sequenced separately, meaning that interleaving (and thus data races) can still happen. This means that the following code still has the same bug as before, even though it uses atomics:

let memory = new Int32Array(new SharedArrayBuffer(1024))
let offset = 0

onmessage = (e) => {
    if (e.data.type === 'init') {
        memory = new Int32Array(e.data.memory) // use the memory buffer were given
        offset = e.data.offset
        // don't respond
    }
    else if (e.data.type === 'run') {
        for (let i = 0; i < 200; ++i) {
            Atomics.exchange(memory, offset, Atomics.load(memory, offset) + 1)
        }
        postMessage({final: memory.at(offset)})
    }
}

Between the load and exchange, the CPU can interleave other instructions from other threads, which isn’t what we desire. To fix this, we need to have the entire operation happen in one atomic operation. This can be done as follows:

let memory = new Int32Array(new SharedArrayBuffer(1024))
let offset = 0

onmessage = (e) => {
    if (e.data.type === 'init') {
        memory = new Int32Array(e.data.memory) // use the memory buffer were given
        offset = e.data.offset
        // don't respond
    }
    else if (e.data.type === 'run') {
        for (let i = 0; i < 200; ++i) {
            Atomics.add(memory, offset, 1)
        }
        postMessage({final: Atomics.load(offset)})
    }
}

Now if we run this updated threading code, we’ll get 800 as our final result. We can run it several times and we’ll always end up with 800 in the end.

There’s a lot more to threading than what we’ve covered in this article. This at least should be enough to get some wheels turning.

In future posts, I’ll cover locking for more complicated synchronization, growing shared memory, and more. JavaScript doesn’t include locking primitives most developers are familiar with (outside of atomics), so we’ll need to build our own locking primitives, such as mutexes and semaphores.

Linear memory is basically an array of bytes. There’s a starting index (0) and an ending index (the maximum size of the memory). Most low-level systems and embedded code view memory this way since it’s closely mirrors the way the hardware operates. Memory allocators will then take this linear memory and subdivide it into “allocations” - which are essentially just pieces of memory designated for use. Allocations are pretty much just the allocator saying “this chunk of memory is already used.” When the allocated memory is no longer needed, it is “freed” by simply marking that section as “available.” When memory is “freed” we don’t destroy it, we simply mark it as available for reuse.

The array interface provides at and set to access the memory, so that’s what we use. The set method does take an ArrayLike for the first parameter since it lets setting an array of values at once. We’re just setting one element, so we pass an array of one element.

Sharing Threads in JavaScript

Matt Tolman — Mon, 02 Feb 2026 22:14:25 GMT

Previously I wrote about the basics of threading in JavaScript. In short, threads are the worker specification, and you can pass messages to workers and receive messages from workers.

In the previous article, we saw how we could spin up a thread and use it to do some job outside of the main event loop. Which is perfect for most use cases!

Except for when it isn’t.

Most users don’t just have one tab open, they have many tabs open. And they may have many tabs open for the same site. This is especially true for sites where there’s a lot of data that users need to compare, correct, and enter. Often users will have two (or more) tabs open side-by-side so they can compare and contrast data, or copy data over.

With our previous example, each tab gets its own background thread - completely isolated from all other tabs. Which means if a user has two tabs open, then they have two additional background threads, and if they have 10 tabs open then they have 10 additional background threads.

The problem compounds once we start having more threads. It turns out, workers can spawn other workers which in turn creates more threads, and those workers can spawn even more. If we have such a structure, then the extra resource consumption can start to cascade.

Of course, modern browsers have some safeguards to control the impact. But, we don’t want to be wasteful developers here, especially if most of our threads are idle a majority of the time (like in my example where I offload long-running user interactions to a background process). Instead, we’d like a way to say “browser, give me up to this many threads for my site and share those threads across tabs.”

Fortunately, we can have threads be shared across tabs simply by using SharedWorker instead of Worker.

Well, almost. Since SharedWorker is shared across tabs, the browser needs to know which tab is sending or receiving data. This is done with ports. Ports simply indicate which thread is doing the communicating. As a result, every postMessage and onmessage must be tied to a port - on both the worker and page side. If we tried to access postMessage or onmessage directly, we would get a “function undefined” error.

So, with that background knowledge done, let’s make a shared worker!

importScripts('calcs.js')

let jobCount = 0

onconnect = (e) => {
    const port = e.ports[0]
    port.onmessage = e => {
        ++jobCount
        const job = e.data.job
        const jobId = e.data.jobId
        if (job.hasOwnProperty('factorialOf')) {
            return sendResponse(port, jobId, factorial(job.factorialOf))
        }
        if (job.hasOwnProperty('fibonacciOf')) {
            return sendResponse(port, jobId, fibonacci(job.fibonacciOf))
        }
         sendError(port, jobId, "bad job type")
    }
}


function sendResponse(port, jobId, res) {
    port.postMessage({jobId, res, jobCount})
}

function sendError(port, jobId, err) {
    port.postMessage({jobId, err, jobCount})
}

And now let’s use it!

const sharedBackgroundThread = new SharedWorker('shared-background-process.js')
const workQueue = {}

function enqueueSharedJob(job) {
	const jobId = crypto.randomUUID ? crypto.randomUUID() : ++incId
	const ret = new Promise((resolve, reject) => {
		workQueue[jobId] = {
			resolve,
			reject
		}
	})
	sharedBackgroundThread.port.postMessage({jobId, job})
	return ret
}

sharedBackgroundThread.port.onmessage = e => {
    const jobId = e.data.jobId
    try {
        console.log('Shared worker job count', e.data.jobCount)
        if (typeof e.data.res !== "undefined") {
            workQueue[jobId].resolve(e.data.res)
        } else {
            workQueue[jobId].reject(e.data.err || 'BAD MESSAGE FORMAT')
        }
    } finally {
        delete workQueue[jobId]
    }
}


await enqueueSharedJob({factorial: 5})

It has a few extra steps with the ports, but otherwise it’s not that much different than using a normal worker.

One thing you may have noticed is that I didn’t include the code for the actual calculations. Instead, I have an importScripts call. The importScripts call will load another JavaScript file into the context of the worker. It’s how we can load libraries and reuse code between workers and pages. In this case, I put the code for the factorial and fibonacci calculations inside a different file which I’m then loading.

There is one other advantage to using a shared worker over a normal worker, and that’s communication across tabs! We can store all the ports, and have events broadcast to every open tabs, or have a cache, or some sort of shared state.

For simplicity, we’re going to just add a cache. The cache is going to be a simple “for this function input, cache this value” type of cache. It won’t have cleanup or upper bounds on size, but it’ll do for a small example.

// calc.js
const cache = {
    factorial: {},
    fibonacci: {}
}

function fibonacci(n) {
    if (typeof cache.fibonacci[n] !== 'undefined') {
        return cache.fibonacci[n]
    }

    let n1 = 0n
    let n2 = 1n

    for (let i = 0; i < n; ++i) {
        [n1, n2] = [n2, n1 + n2]
    }
    cache.fibonacci[n] = n1
    return n1
}

function factorial(n) {
    if (!(n instanceof BigInt)) {
        n = BigInt(n)
    }
    let original = n
    if (typeof cache.factorial[n] !== 'undefined') {
        return cache.factorial[n]
    }
    let res = 1n
    while (n > 1n) {
        res *= n
        n -= 1n
    }
    if (!(original instanceof BigInt)) {
        cache.factorial[original] = res
    }
    return res
}

The cache will then persist so long as our shared worker does, and we will be able to use any of the calculated values across our tabs.

But, how long does our shared worker last? Well, per the spec it’s basically so long as there’s at least one context referring to that shared worker (page, iframe, window, etc.) with a grace period for page navigation that’s defined by the browser.

For us, it means that any data we store in a shared worker is not durable. For background jobs, that’s totally fine.

One other detail to keep in mind is that there’s a shared origin policy for shared workers. Basically, if a page gets loaded on your site, and another page has already made a shared worker, than the new page will use the existing shared worker - even if the script file has changed since the worker was first made.

This can make live updates more complicated, since it’s not as simple as “just logout and log back in” anymore. It’s “close all the tabs, wait 10 to 30 seconds, then reopen everything.”

That’s not to say there aren’t solutions - they’re just not elegant. We just need to have a version in the script.

// old-page.html
const worker = new SharedWorker('worker-script-v1.0.0.js');

// new-page.html
const worker = new SharedWorker('worker-script-v1.1.0.js');

All this to say, shared workers offer some really cool stuff, but due to their shared nature they’re a little trickier to get right.

Of course, we’re not done yet. There’s still more to go when it comes to JavaScript threads. I’ll post more soon.

Archives of my old blog now online!

Matt Tolman — Wed, 28 Jan 2026 23:29:15 GMT

Recently (this year) I switched from my own self-hosted blogging solution to Substack. As part of my transition, I was not able to bring my blog history with me. Mostly since I used a custom blog engine and Substack doesn’t know how to import blog posts from that engine (and they shouldn’t have to - expecting them to know how every blog engine made by every random person isn’t practical. In fact, I’m quite impressed that they’re able to import blogs from just some engines!).

I didn’t manually bring my posts over either since I couldn’t find a way for Substack to release posts in the past or even with a publish date in the past (if someone knows, please let me know!). So, my solution was to make a dump of the raw, final HTML pages and then later get them up online under an “archives” site.

Well, they’re up now at https://archives.matthewtolman.com/! Anyone can read my old posts now. Sadly, there isn’t the ability for people to comment on the archives, but you can comment on this post. Have fun!

Playing with threads in JavaScript

Matt Tolman — Wed, 28 Jan 2026 22:35:50 GMT

So, lately I’ve been doing a lot of multi-threading in C. Which has been a lot of fun. But, I also know that there’s a way to do it with JavaScript using workers.

I’ve used service workers in the past to do some caching and pre-emptive reloading. When I used it in the past, it cut load times of our worst pages by half and made other pages feel instantaneous. However, service workers are meant more as a caching layer/custom proxy rather than as a way of doing general computations. It’s definitely not a thread I’d want to clog up in a modern web app given that they make so many network calls.

But, that still doesn’t mean that doing some sort of threading wouldn’t be useful. I’ve had a lot of pages get bogged down because there’s a large computation done in the main thread - a big no-no for anyone familiar with desktop development.

The problem with JavaScript is that it’s a notoriously single-threaded environment where all the code you write is in that main thread - or is it?

Well, surprise surprise, someone on the committee realized only having one thread was a bad idea, so they created something called “workers” which are basically background threads you pass messages to and receive messages from. Really handy.

The model is less the “C/Java” model where memory is shared (with some very limited exceptions) and much more the Erlang/Elixir model of message passing. Though it’s also pretty low level and doesn’t come with a good protocol for “awaiting” an event you send off. That said, it’s not too difficult to create a very crude protocol, and there are some really polished libraries out there.

For my simple use case, I went with just doing a simple protocol where I auto-assign ids, create a promise, and then put that promise’s resolve and reject methods in a map, followed by returning the promise. Then when I send a response back from my background thread, I include the same id as well as the data (or error), and then I lookup the corresponding resolve/reject method and call it to do a dispatch.

The reason for needing a map is simply because we only have a global “onmessage” handler which doesn’t have the context of what we just sent. So, we need a way to tell that handler where the resolver or rejector is so it can forward things along.

Here’s the little snippet of code I use:

let incId = 0
const backgroundThread = new Worker('background-process.js')

const workQueue = {}

backgroundThread.onmessage = e => {
	const jobId = e.data.jobId
	try {
		console.log('Page worker job count', e.data.jobCount)
		if (typeof e.data.res !== "undefined") {
			workQueue[jobId].resolve(e.data.res)
		} else {
			workQueue[jobId].reject(e.data.err || 'BAD MESSAGE FORMAT')
		}
	}
	finally {
		delete workQueue[jobId]
	}
}

function asyncJob(job) {
	const jobId = crypto.randomUUID()
	const ret = new Promise((resolve, reject) => {
		workQueue[jobId] = {
			resolve,
			reject
		}
	})
	backgroundThread.postMessage({jobId, job})
	return ret
}

And just like that, we have a really easy way to send jobs to our background thread and get a response back. In our code, it will look like the following:

async function doCalc(n) {
  return await asyncJob({expensiveCalculation: n})
}

Of course, every protocol has two sides. So, let’s look at the other side: the background thread.

For this, we also have only a global “onmessage” handler. However, we don’t have to coordinate the state with some other context’s data, so we can pretty much do everything in the handler. Here’s my handler code:

let jobCount = 0

onmessage = e => {
    ++jobCount
    const job = e.data.job
    const jobId = e.data.jobId
    if (job.hasOwnProperty('expensiveCalculation')) {
        return sendResponse(jobId, Math.pow(13, job.expensiveCalculation))
    }
    sendError(jobId, "bad job type")
}

function sendResponse(jobId, res) {
    postMessage({jobId, res, jobCount})
}

function sendError(jobId, err) {
    postMessage({jobId, err, jobCount})
}

This preserves the job id, does the calculation, and sends it back. Not too bad.

There is a demo I have online which shows this off, as well as shared workers (very different from service workers) which I’ll talk more about in another post. To help illustrate the lack of stuttering, I have an animation that plays in the JS loop with an update every frame (it’s just a really small square bouncing animation). There’s also a button that runs everything in the main thread instead of the background thread for compare/contrast. I also use something a little more expensive (fibonacci + factorial of large numbers with BigInt) to make the stutter more noticeable on the main thread example.

Property Testing in C

Matt Tolman — Mon, 12 Jan 2026 19:20:47 GMT

Last time I wrote about snapshot testing, which is great for capturing behavior of a current system - especially in a way that can be quickly checked. I use it as a complement to traditional example-based unit testing where we check a system gives expected (known) output to known input.

However, snapshot testing and example-based testing don’t find bugs. They’re not meant to. They instead act as anchor points for a system, saying that the system behaves this way at this point - but they say nothing about any other point.

For instance, let’s say we wrote a custom memory allocator, and we’re writing tests for it. We might write an example-based test like the following:

TEST(my_allocator) {
    int initUsage = my_alloc_usage();
    void *ptr = my_alloc(24);

    int curUsage = my_alloc_usage() - initUsage;
    int expectedUsage = 24;

    ASSERT_EQ(expectedUsage, curUsage);
    ASSERT_NE(NULL, ptr);

    my_free(&ptr);

    int finalUsage = my_usage();
    ASSERT_EQ(initUsage, finalUsage);

    PASS();
}

It’s not hard to add a single point. The above code shows a test that allocates and frees some amount of memory, with different assertions making sure things work. We can add more tests for different sizes, or maybe multiple allocations. Each test will act as an anchor checking how our allocator performs.

But, most bugs in production don’t happen at the anchored input - and many don’t happen close around the input. Instead, they tend to happen in the gaps between each anchor - gaps which tend to be very large and obscure to find. How the system really behaves is unknown until the code is ran with that input.

To illustrate how hard it is to get good coverage, let’s look at our use case. Users making sure that the allocator behaves a certain way way at that input point - but only at that input point. In our case, the input is the combination of:

The order of allocations
The order of frees for each allocation
The size of each allocation
The order allocations and frees are interleaved

It turns out, changing any one of those parameters can cause our allocator to run through any number of different code paths - and trigger any number of hidden bugs.

We could add a lot of test cases and try to cover as many possibilities by hand - but we would be typing test code until the sun burned out trying to catch every possible combination. And, most of the tests would look suspiciously similar.

Well, what if we wrote a program that tried to break our app? We could do that - it’s called a fuzzer. And there are several programs already out there for different languages. However, writing a fuzzer is non-trivial, and getting an existing fuzzer to work would require not only installing it on every dev machine (and build server), but also time to set it up, training on how to use it, and dealing with updates. For many projects, it’s worth the effort, but for many projects it’s also not worth the effort.

Is there an in-between? Something simple enough to write as part of tests but powerful enough to test many different cases? Well, yes!

They’re property tests!

Property tests are just a fancy name saying “with random data shaped like this, throw it into my code and expect it to sort of behave like that.” In the case of our allocator, we just take a bunch of random allocation and free requests and throw them at our code. Similar to the following:

#ifndef MAXROUNDS
#define MAXROUNDS 1000
#endif

#ifndef MAXALLOCS
#define MAXALLOCS 100
#endif

TEST(my_allocator_rand) {
  size_t initUsage = my_alloc_usage(&alloc);
  void *allocations[MAXALLOCS] = {NULL};
  srand(time(0));

  for (size_t round = 0; round < MAXROUNDS; ++round) {
    size_t slot = (rand() / 16) % MAXALLOCS;
    if (allocations[slot]) {
      my_free(allocations[slot]);
      allocations[slot] = NULL;
    } else {
      size_t allocAmount = (rand() / 16) % 4000 + 64;
      allocations[slot] = my_alloc(allocAmount);
      ASSERT_NE(NULL, allocations[slot]);
    }
  }

  for (size_t slot = 0; slot < MAXALLOCS; slot++) {
    if (allocations[slot]) {
      peaks_free(&alloc, allocations[slot]);
    }
  }

  size_t finalUsage = my_alloc_usage(&alloc);
  ASSERT_EQ(initUsage, finalUsage);

  PASS();
}

The above code randomizes1 the interleaving of allocations with frees and randomizes the allocation size. It’s a very simple check, and yet it covers a wide range of possibilities.

However, property tests aren’t a silver bullet - and there are some notable drawbacks. But before we get to those drawbacks, let’s show another example that will highlight the drawbacks even more. Let’s move from memory allocators to math functions - like sin. Here’s a test for sin.

#ifndef MAXROUNDS
#define MAXROUNDS 1000
#endif

TEST(my_sin) {
  void *allocations[MAXALLOCS] = {NULL};

  for (size_t round = 0; round < MAXROUNDS; ++round) {
    double in = (double)(rand() / 16) / 50000.0;
    double out = my_sin(in);
    
    ASSERT_LE(out, 1);
    ASSERT_GE(out, -1);
  }

  PASS();
}

If you’re wondering how this test ensures each output is the correct approximation for the corresponding input, the answer is very simple: it doesn’t. Which leads me into a big downside of property tests: they don’t really check for correctness.2

We’re running through a lot of questions - but we can’t verify if they’re right. So, what’s the point?

The point is we’re looking for unknown failures not known successes. This is most commonly manifested with exceptions (e.g. error codes), failed assertions (via assert3), segmentation faults, and behavior detected by sanitizers (like clang’s thread sanitizer, address sanitizer, memory sanitizer, and UB sanitizer).

For instance, our free code may look something like the following:

void my_free(void *ptr) {
  if (ptr == NULL) return;

  pthread_mutex_lock(&global_alloc_lock);

  struct MyBlockHeader *block = my_align_pointer_down(
    (struct MyaBlockHeader *) ptr - 1, MY_MEM_ALIGNMENT);
  
  // My ensure macro runs an assertion in debug and release mode
  // On a failure, it outputs a formatted message
  // Along with file and line information to quickly id the failed code
  assert(block != NULL);

  // Ensure that the header is valid (e.g. look for "magic bytes")
  assert(my_block_is_valid(block));

  // Detect "double frees"
  assert(block->used == true);
  block->used = false; // mark it as freed

  merge_with_neighbors(block);

  // make sure we didn't corrupt the current (or neighboring) blocks
  assert(my_block_is_valid(block));
  assert(block->next == NULL || my_block_is_valid(block->next));
  assert(block->prev == NULL || my_block_is_valid(block->prev));
}

These asserts add a lot more power to property tests. If we fail an assert, we’ll know. And since we’re doing a lot more tests and a lot more input variation, we make it a lot more likely that we’ll trip a bug which will fail an assert.

But wait - we’re using random numbers in our property tests! Which means that even if we find a bug we may not be able to reproduce it.

That is unless we do two things - print the seed at the start of every run, and provide a way to seed a run if we desire. Once we do that, we’re able to reproduce a failure4 and start debugging it.

There are other aspects to property testing that people tend to focus a whole lot on (rules for input generation, how to define “properties”, shrinking, etc.). However, I’ve found that they’re either very easy to do, not very important, or mostly used when trying to create a regression test (example or snapshot test) around a failed property test. In other words, they’re “nice to have” but aren’t the real meat of property testing, and can often distract or take away from the use of property testing.

If this post inspired you to start using property testing, I do have some library recommendations for you.

theft is a C99 property testing library that’s been around for a while. It is pretty heavy-handed when it comes to
rapdicheck is a C++ property testing library which I’ve used to test C code
Test_Jam is a C++/C library I made for property testing a few years back. The C interface is very primitive and poorly tested as I focused primarily on the C++ interface.
PeaksC is a library I’m currently working on which has some property testing utilities built in. I’m currently working on this project. It is highly opinionated though, so if you need a drop-in solution for a large code base, then one of the other solutions may be better.

Subscribe now

Yes, I know rand() is a linear congruent generator and isn’t very “random” - but it works well enough for basic use cases and can be quickly switched out.

A property test I can think of which could verify sin is to generate a right triangle and solve for all the angles without using sin, and then verify that sin lines up with the solved solution. There are often cases like this, but they take a lot more working out to get right, and by the time you do when a test case fails it’s unclear if it’s a bug in the code or a bug in the test. Which makes debugging way more difficult.

The two downsides to assert is it only runs in debug builds - meaning it’s not going to catch bugs in a release build - and it has very limited debug information when the program crashes. In my code, I have a macro called “ensure” which runs in debug and release builds, an which allows printing a formatted string that also includes the file and line number.

There are some other caveats, like if a test is non-deterministic by nature (e.g. relies on calls to the system clock or doing network calls) then a seed won’t be sufficient to reproduce without additional work. But at that point, we’re leaving the realm of property tests and entering either mocking, deterministic simulation testing, or record-and-replay debugging.

Of which mocking is the worst option because we now exclude any possibility of finding unknown failures on a whole segment of our code - which also tends to be the most flaky and complex dependency of our code due to it’s non-deterministic nature. Couple this with the fact that mocked code is almost always someone else’s code (e.g. OS or network service) which also means we’re not testing our integration layer with somebody else’s code. Not only that, but we aren’t running mocked code - so we aren’t even testing our known successes in addition to not finding unknown failures!

Snapshot Tests in C

Matt Tolman — Fri, 09 Jan 2026 02:44:43 GMT

Lately I’ve been working on a fairly big C library (it’s still early days, and I’ll write more about it over time). As part of this library, I’ve been creating my own test framework.

Most of my tests are typical unit tests - you have some setup, run your code, and then do a series of assertions with hard-coded values. Nothing too surprising, and the test framework for that isn’t too difficult to make.

For most of my tests, that style is ideal. It’s very focused (both writing the tests and what is being tested), and it’s not too hard to setup. Also, since I’m writing the examples, it works really well when I know what the output should be (which is most of the time).

However, it does have a lot of drawbacks. I’m not going to cover all of the drawbacks, or how I’m trying to address each one. Rather, I’m going to focus on one singular drawback: I need to know the answer/behavior ahead of time.

Again, for most use cases, this is fairly trivial. Memory allocators have a very known behavior (allocates memory sufficiently large or returns an error, memory doesn’t overlap, memory isn’t leaked, etc.), so for my memory allocators it’s pretty easy to write some typical unit tests. Similar thing for most of the other code I write. I know what behavior I want it to have.

Where it’s not so true is when it comes to low-level math functions - like sin, cos, tan, etc. I don’t know what the answer should be for every number ahead of time. I’m not someone who regularly calculates sin by hand or has a book full of tan lookup tables sitting on my shelf.

This means that I need to lookup or calculate the correct values. Fortunately, the standard library provides many math functions, so I wrote something like the following:

for (int i = 0; i < numInputs; ++i) {
    double expected = sin(inputs[i]);
    double actual = mySin(inputs[i]);

    if (!approx_equal(expected, actual, 0.0000001)) {
        fail();
    }
}
pass();

This works great - so long as you only stick with one standard library implementation. It turns out, there are different ways of implementing sin. A lot of open source standard libraries I’ve gone through appear to be based off of the Sun Microsystems code in FreeBSD - which to be fair is what I’m basing most of my math code around too. This means that they’re very compatible in similar precision, error tolerances, and biases (sin methods are an approximation not a calculation so a lot can change due to different trade-offs made in the implementation).

However, not everything is open source, and Microsoft’s MSVC standard library notably is not. So much so that when I used FreeBSD’s approximation, my above test failed1. Which, is honestly what I expected. The reason I wanted to have my own sin was simply because I want my code to behave the same way on every platform (or at least as close as I can get it). Having a math library that executes the same code on every platform is a huge step towards that determinism.

Fortunately, my laziness got the better of me and I didn’t want to manually generate a table of expected and actual values and try to keep that updated. I have that pattern in a few places in my code and it’s a nightmare to debug and maintain.

So, for a while I just wrapped whatever was in the standard, that way my simple test would pass on both open-source and closed-source libraries. In the meantime, I was thinking about how to write a test that better captured the FreeBSD-based version of sin I was wanting to have.

One day I read about snapshot tests and was reminded about Jane Street’s blog post on their snapshot testing.

I had known about snapshot tests when I first wrote my library, but I’ve only used snapshot tests in the context of the React components, and React snapshot tests suck. I’m not going to go into too much detail in this post, but here are the main gripes I have with React snapshot tests:

They check the exact HTML output, not the visual output and not the semantic output. This makes tests flaky in the worst way - they fail for changes you don’t care about and pass for changes you do care about
The snapshots are incredibly verbose, so when things do change the deltas can be massive - which means no one looks at them
Snapshots aren’t stored in the code, but rather in a separate directory. So now there are multiple files that you need to look at to understand what a test is doing

Because of the issues I’ve had with React snapshots, I had initially written off snapshot testing entirely. And when I had first read Jane Street’s post I couldn’t comprehend how the mess of snapshot testing could be enjoyable.

And then, after pondering my own predicament, I realized it was precisely what I wanted.

I already had a way to generate correct cases - simply test against GCC/Clang/Musl. All I needed was a way to capture that output and preserve it in the code. And snapshot tests let me do that.

So, I quickly wrote my own snapshot testing macro. This macro was based around Jane Street’s methodology where it generates correct source code to embed rather than separate files. Though, for ease of use with how many cases I was going to be generating, I gave it the ability update the source files directly, though I normally have it off by keeping a “return false” at the start. That way I only update the test cases when I want them to be updated, not all the time.2

So far it’s been really great! I have a lot of snapshot tests in my math code right now. This allows me to separate my math tests from the standard library implementation, which is needed for me to standardize the math functions I use across platforms.

That said, there are some limitations in my system. For one, my parser is really simple - it basically looks for the phrase “CHECK_SNAPSHOT” followed by an open parenthesis and then a comma (while ignoring commas inside quotes) in the C file. If you’ve ever used C macros, you’ll know this isn’t enough to handle complex use cases. Something simple like the following would break my snapshot test code:

#define COMMA ,

TEST(my_test) {
    CHECK_SNAPSHOT(“hello” COMMA “bye”);
}

Also, it only looks for the first occurrence on a line - it doesn’t try to understand if it’s the “right” occurrence. So each snapshot check must be on a separate line.

Of course, getting that level of robustness takes a lot of work. I’d have to have a preprocessor, C compiler front-end, etc. just to identify what code to modify - something that doesn’t provide me benefit and only provides marginal benefit generally.

The bigger issue is that my snapshot testing requires values to be strings - which is a little annoying in C traditionally due to the fact memory is manually managed. Combine this with the fact I have a limited parser and now it’s a lot harder for someone to macro their way to “elegant” test code.3

My solution is just to have a fixed buffer that I reuse and to have macros define how to write to that buffer.

Overall, I’m really pleased with my snapshot test solution.

The numbers are fairly similar, but they’re within a percentage value of each other rather than a fixed distance, and my approximate equality method only handled a fixed distance. I didn’t feel like trying to figure out what the value tolerance so just moved on to other options - especially since at the end of the day I’m wanting to match GCC/Clang/Musl since those are closer to the FreeBSD library anyways.

At some point, this will become a CLI/environment variable option. However, I get the most bang for effort by simply using a return statement for now and then refactoring it later once my usage/needs gets more complex and I can know what features I really need.

I do have a little trick where it will match a phrase with “CHECK_SNAPSHOT” in it rather than doing an exact match - but that may still be too limiting for some use cases. Also, I haven’t actually tested how it works with extended phrases, so it may not actually work.

A build system around nix-shell

Matt Tolman — Thu, 18 Dec 2025 19:01:01 GMT

I finally got my CI/CD pipeline moved over to Forgejo actions. As part of my migration, I had to learn a lot about Forgejo (since it’s my first time using Foregejo), but I also needed to learn a lot about how a nixos host works with Forgejo too. Mostly since I’m still new to NixOS and there’s a fairly steep learning curve.

As such, nothing here is “idiomatic” nix or Forgejo. It’s just what I managed to cobble together after several days of trial and error (which you’ll see reflected in my action run history - there was a lot of trial and error).

That said, I did get something working, and I’m pretty happy with it and just wanted to share it briefly.

Part 1: The Project

The project I chose to work on first is my most recent C project, which is basically a collection of functions, macros, etc. for me to use in other C projects. It has it’s own unit test library, syscall wrappers, threading primitives, allocators, etc. In short, it’s trying to be my own “standard” library that I can use to minimize my reliance on other standard (or platform-specific) libraries. It also adds in additional functionality that I find standard libraries match.

I chose this project as a starting point for a few reasons. Namely:

It uses C99 with no 3rd-party libraries outside of standard and OS-libraries (which it wraps), so it’s very simple to get a build environment setup.
I’ve done the most experiments with using nix-shell for setting up the build system in this project, so I already have a good starting point.
I’ve done a lot of tests to make sure that the code compiles with many different tool-chains. This means that if I run into any errors it’s most likely to be a configuration issue not a code issue.
I have minimal artifacts being produced right now (pretty much just documentation), which means I don’t have to setup an artifact repository quite yet. This helps narrow the scope of my experiment (and the work needed to get off the ground)
It’s a library not an application, so there’s no deploy pipeline to worry about which greatly reduces the scope of my experiment

My experiments in this project included using nix-shell to spin up different versions of GCC and Clang, as well as cross-compile for Windows and Linux on different architectures - and run those cross-compiled versions in virtual machines. So, I had a pretty good starting point to say the least. Before we go too far into the CI/CD part, let’s first dive into my starting point a little more.

Nix-Shell for local builds

I had three parts1 for my nix-shell builds:

A build.sh file that took in a lot of environment variables to tweak things as needed
A folder of *.nix files with instructions on how to run build.sh for every tool-chain and platform
A nix-build.sh file which ran through every one of my nix files in order and ran them

The build.sh file was a little complicated - not because building my library is complicated (it’s pretty much just compiling my peaksc.c file which then includes all my other .c files). Rather, it’s because I have code generation for my utility executables. This code generation handles things like turning text and data files into C-style byte arrays, or generating type-specialized data structures. These utilities then have their outputs used for further steps, like running additional tests or even as a prerequisite for the other utilities.

What makes this more complex is that not every one of my tool chains can reliably run these executables - or at least not as part of a “build” step. Emscripten has a lot of restrictions where it can’t directly access the disk but uses a virtual file system, at least with default build settings (due to how WASM works and the need to “expose” disk access through a JS API). This means that generating code files doesn’t really work since changes would be lost.

I also run into issues with Wine and MinGW since MinGW and Wine don’t like being loaded as part of the same nix-shell (from what I can tell it has something to do with the “crossSystem” config in my nix-shell file).

So, this means that my build.sh file needs to be able to not run the utilities. However, I still want to compile them to make sure that I didn’t break anything there (like use a linux-only API in a windows executable) - so I still need a way to get the generated files.

On my local machine, how I get those generated files is I simply run a non-emulated version of my generators before I run anything that can’t generate those files. That way they live on-disk already and I can just reuse them.

Overall, this system works great - although it’s really slow especially as configurations have grown a lot over time. It turns out running 9 different emulation layers (some for different versions of windows, others for different CPU architectures) and 12 non-emulated toolchains (including different sanitizers, older versions of compilers, and some esoteric compilers) just takes time. I am glad that I’m using C for doing this crazy experiment - trying to run all of these setups in C++ or Rust would be painfully slow!

The other benefit is that running just one tool-chain was really easy to do, just run “nix-shell nix/-shell.nix" and suddenly you had a reliable, reproducible run on that tool-chain - even if it needed emulation!

Part 2: Plan vs Reality

My initial plan was actually really simple. I have NixOS installed. I can simply setup a Forgejo runner with my host NixOS (I know it’s not “secure” - but this is self hosted for me with no one else able to contribute or trigger or login), and then I would just run my “nix-build.sh” script and be done. Simple, right?

Well, it would be, if there weren’t safety mechanisms built in where the runner actually ran in a virtual Nix environment and didn’t have access to the nix-shell command - which is what my entire local build system was built around. Argh.

Okay, new plan. I had two options:

Rebuild my entire build system so that I have all the tools globally available and I just run those specific tools
Figure out how to get nix-shell to work

Option 1 is what I’ve seen in most enterprise places I’ve worked at. The build system is setup fundamentally differently than the local environment. And it’s awful. Things work in the build that don’t work locally and vice versa. A true pain. Also, I don’t think it would be possible to do easily given that I have multiple versions of the same tool-chain. I don’t want to figure out how to keep gcc-9 and gcc-12 versions straight. With all of this, Option 1 felt like it wasn’t a real option - at least not for long-term success. So I crossed it off quickly.

So, Option 2 it is then. So, how do I get nix-shell to work?

Well, I tried a lot of things. Exposing programs, installing the “nix” package manager directly on the hidden VM, etc. Eventually, what I settled on was to not run on the host machine at all and instead run everything in a nixos docker container. That way it was a “clean” NixOS, not a nix VM in NixOS. Of course, the base nix container doesn’t quite have everything installed that Forgejo needs. Forgejo is built for NodeJS devs, and so they assume NodeJS is installed when they do a git checkout. This meant I had to update my run steps to install NodeJS before I did a checkout. Also, docker runs in headless mode by default, but Wine (and winecfg) don’t. Which caused some interesting issues. Fortunately, there’s xvfb-run which will stub out the GUI stuff so I can effectively run GUI apps in headless mode. But after all that, it worked!

Except, it was slow.

The hardware I’m using for my server isn’t bad, but it is older and energy efficient - so it’s not all that quick. Plus, now I was running inside a container, and the container was acting - funny. I hadn’t quite nailed it down yet (I would soon). Plus, everything was running in-serial, when it didn’t really have to.

Part 3: Optimizing

Fortunately, I’ve had to fix a lot of build systems at work. So I had a really good idea on what was going on, and how to fix it.

The first issue was everything was serial - which was needed for the dependencies between generating files. Well, that’s okay. First step is to simply break it from one command to a series of jobs to run each nix-shell, and then I can reorder, parallelize, or remove really slow bits that I don’t care about. But once I get it into jobs, it’ll be easier to work with.

So I did, and then very quickly realized what that funky behavior was I was seeing. Every job that couldn’t generate code was now failing. And this was because every job was getting a new docker container. It was never reusing a container - even between jobs. This was different than what I’m used to where a clean step was mandatory since the same git repo would be reused between runs.

Okay, so quick patch is to just run a job that can generate files in every step. No sweat.

Once that was done, I started parallelizing (which just meant spinning up new runners).

But, the fact I was regenerating files everywhere bugged me. I only really needed to generate once and download everywhere it was needed. So, I took a look at Forgejo artifacts, realized it was easy to upload and then download output between steps, and then proceeded to add uploading of generated sources followed by downloading of those sources everywhere it was needed. I then had a parallel build system with only a few sections in serial.

I then noticed something interesting: Forgejo runs in the order declared whenever it can. So I did one more trick, I put the blocking steps first. That way the blocked steps would be unblocked as quickly as possible.

And with that, I had my build system!

Where I’m headed

So far, I only have Linux and Windows support. And right now, Windows builds are only automated through MinGW - I still have to do MSVC builds manually. What I want to do is automated MSVC builds somehow (maybe through a VM or wine or native windows box, not sure yet).

Additionally, I want to add support for OSX. There are a few places that will need to be updated, but it shouldn’t be too hard. I have a Mac laying around somewhere - I just need to get it dusted off and add in the platform-specific wrappers where they’re needed.

After that, we’ll I’ll probably keep working on the library again. I have a lot of functionality I’m trying to add as I build my way up to an application. Right now I’m focused on adding in more testing functionality. I’ll then want to add a network stack and a window management system - which will bring oh so many levels of “fun” when trying to merge it with the current build pipeline.

At some point I’ll start adding support for more platforms. Arduino is pretty high on my list (I already have Raspberry Pi), as well as probably another micro-controller or single-board computer or two. Android and iOS support would be nice at some point, but I’m not really big into mobile development so it’s not a high priority for me yet. FreeBSD support would also be interesting - but again it’s not very high on my list. I’ve only used FreeBSD a handful of times, but that’s always been exclusively in a VM. I know that some work is needed since currently I have a few Linux-specific APIs instead of only generic POSIX APIs (yes, there’s a difference).

At some point, I’ll get more of my other projects moved over. So far I’m liking the nix-shell system, especially now that I’ve gotten a lot of the tool-chains working.

Technically, I also have a CMake setup. However, I mostly use that for IDE support (like CLion and Visual Studio). I do try to keep the build.sh and CMake systems in-sync. However, I chose not to build my nix-shell stuff around CMake since that’s just a lot more complexity, and honestly bash scripting this stuff isn’t that hard.

Nix: The wounded siren

Matt Tolman — Sun, 14 Dec 2025 04:47:28 GMT

A few months ago I switched from Fedora to NixOS as my daily driver for my laptop. So far I’ve liked it enough I recently replaced Ubuntu on my home server with NixOS as well. It called to me like a siren, and I have been captivated by that call.

Yet, the call is not perfect. It’s remarkably imperfect. So much so that unlike many “hype” technologies, this one struggles to be recommended by those who it ensnares - me included.

This realization didn’t dawn on me until after a few different conversations I had recently. For the first, I was talking to someone who was switching from Windows to Linux. They asked what I used, and I replied that I use NixOS - but that I don’t recommend it. I instead recommended Fedora or Mint or one of the other distros.

Later, I talked to someone who was an experienced Linux user about which distros we were both using. Again, I said I use NixOS - but that I don’t recommend it.

Which posses the question - if I love NixOS so much I’m switching my daily driver and my servers to it, why am I not recommending it?

The Software “Problem”

Nix is built around a solution to a problem that you may - or may not - have. The entire philosophy around how everything works is built around this solution. And that brings a lot of headaches. And that problem is how software is installed and managed on your machine.

This problem ranges from simply installing different versions of the same software, to installing software with many shared - and sometimes conflicting - libraries. There are lots of solutions to this problem - package managers, homebrew, installers, flatpaks, etc. However, Nix takes the problem deeper than simply “installing” software or “isolating” software.

Nix strives to make software installation reproducible, declarative, and reliable.

In other words, you should be able to say “I want this software with these settings” and it should just happen. You can even say “for this project I want this version of this software with these settings, and for this other project I want a different version and different settings” and Nix will do it.

This is a very specific problem, and one many people have not had. Often because people usually don’t care about the version or settings - so long as the software works.

However, I do care about the version and settings.

Legacy Code, Interpreters, and Compilers

One of the big things I’ve done through my career - at least so far - is work on legacy systems. These are systems with not just old code, but old versions of stuff. Things like outdated build systems, language versions, interpreters, run times, native extensions, etc.

Often, the new version has some series of major breaking changes - or it simply doesn’t exist. The language extension the whole system was built around just simply stopped being maintained 6 years ago, and now the new language runtime made some internal change so now the extension won’t work.

When this happens, a rewrite or refactor or new system is usually started with the latest software - but it too quickly becomes legacy. Rinse and repeat a few times, and soon there’s a mess of legacy systems with very different version dependencies and incompatibilities all somehow working together to get a job done.

And this is the type of situation that I don’t just work in, but I thrive in.

These old systems are a canvas of opportunities. They also tend to be what makes the business the most money - way more than that shiny new microservice someone just built. That means these services are important to care for, extend, maintain, grow, and develop.

But, we very quickly run into a problem. Getting just one of these systems running in on a development machine is difficult. Getting two is excruciating. Three is almost impossible. Very quickly package systems start fighting.

I’ve had homebrew completely break my PHP runtime because it updated my NodeJS install which then bumped some shared library and deleted the old version. I’ve had NVM install x86 libraries on an M2 mac which then prevented Homebrew from installing an Arm version of Ruby since it tried to link to those x86 libraries. I’ve had Node 18 be unable to build a Node 14 project, and Node 20 be unable to run npm install for a Node 18 project. I’ve seen apt-get delete old versions of a compiler, homebrew installs get unlinked by the OS, nvm lose it’s mind, asdf get removed from the path by homebrew, snap just completely stop working, flatpak breaking auto updates, dotnet CLI completely brick itself trying to update, and so much more. Not to mention managing database versions, port conflicts, VPN, XCode developer tools, MSVC compiler toolchain, Windows vs Linux vs OSX incompatibilities, etc.

And then there’s the whole “breaking changes” in programming languages. I’ve gone through the JVM 8 to 9 to 11 trek, the PHP 5 to 7 to 8 landslide, and the Python 2 to 3 transition. And then there’s TypeScript which broke so many things when it introduced the “unknown” type.

And things just continue for my side projects. I love learning what’s new in languages. But when some languages have multiple implementations (e.g. C++), this quickly becomes an issue. Especially when I ever try sharing my code with someone and they have a different compiler version which has different syntax rules.

All this to say, I very much do care about what versions and settings I have. Having an easy way to manage a version is the difference between me having a productive day or spending a week rebuilding my developer machine.

And I’ve tried a lot of options. Vagrant, Docker, cloud workspaces, separate machines for different configurations, etc. None of them seemed to stick.

The Nix “Solution”

And then came along Nix. Promising to fix all of my problems.

Well, it did do a really good job addressing that one problem - and then gave me quite a few (more on that in a bit). But it really did a good job addressing my main problem - installing things intelligently and in an isolated way without breaking other things.

How it does it is similar to Docker or Vagrant or cloud workspaces at first. There’s a file with some custom DSL that describes how to build a machine, and then some program uses that to build the machine. The difference though, is that this isn’t running in some container or virtualization layer (which always brings headaches when trying to interface with said layer). Instead, it’s running natively on your machine. So much so that in my case, it is my machine.

What’s even better is it can archive a configuration so that you can rollback to a previous version. This is awesome when I’m trying to “improve” my Linux distribution. If I totally screw something up (like install a new login window and forget to disable the old one) then I can revert to the previous install that wasn’t broken.

This ability to “revert” goes into user-space as well. We can create a temporary “machine” with a nix shell, and that machine can now operate with different versions of compilers, new programs, or even remove programs we have installed “globally.” Then when we’re done, we can exit the shell thereby “reverting” to our machine’s global configuration.

And, since Nix is configuration driven, we can write config files describing those temporary machines and add them to our source control. This allows us to then share those temporary machines with other nix users so they can then run those machines locally. And again, with NixOS it’s not a VM - it’s their actual machine. With full access to their file system, with full access to the GUI, hardware, etc. No SSH, no remote debugging, no slowness from a VM hogging all the memory/CPU. It’s just a temporary “update” for their machine.

And, all of these shells are isolated. So you could have two, or three, or four “versions” of your machine running simultaneously and reading from the same data on your system.

We can then take it a step further and build VMs with nix for things we couldn’t do natively on our machine - like testing x86, ARM, RISC-V, PowerPC, and MIPS versions of our code all at once - something I recently did for one of my projects.

This is where I fell in love with Nix. It gave me something that solves the problem I have, and which wastes so much of my time.

Of course, no lunch comes for free, and Nix has a lot of drawbacks.

Why I don’t recommend Nix

Again, most people don’t usually have the problem I have. They just want to install a default option to have things work by default. They don’t care about comparing options, or having the ability to switch versions, or anything like that. They’re usually not installing a lot of things anyway. So, does Nix serve these types of people?

No, not really.

One of Nix’s biggest drawbacks is that everything is driven by a global config file. Your installed programs, user list, firewall, hosts file, timezone, printer discovery settings, etc. are all determined by /etc/nix/configuration.nix. Changing this file requires root access. Once the file is changed, it needs to be reloaded with a special command (also requiring root access). And, this file is in a unique “nix” configuration language. It’s not YAML, or JSON, or Lua, or TOML, or C, or JavaScript, or Bash, anything but Nix. This makes it a steep learning curve to just install a new program.

Not only that, but not everything works out of the box. It turns out, every time I installed NixOS, Bluetooth was disabled by default! And when I installed it on my laptop, I didn’t have proper WiFi drivers since I had to download and install a special hardware configuration for my laptop first (luckily I had an Ethernet adapter which worked, but this was terrifying for a bit!).

And then there’s a “fun” little quirk where you can’t just run executables built for generic Linux. I tried doing that, and I got a little pop-up saying that NixOs can’t open executables made for generic linux. So, things need to be specially packaged. This makes getting printer drivers installed a real pain (way more painful than other Linux distros).

Plus, there’s just some general annoyances that get in the way. Nix isn’t necessarily an immutable distro, but it does make a lot of standard configuration files immutable (like your bash/zsh configuration file, hosts file, etc).

Also, nix documentation is horrendously fragmented, and the parts you do find often comes across as some random person’s notes they wrote on the back of a napkin before wiping ketchup all over it and throwing it in the trash. Like, there’s definitely signs of a “eureka” moment someone had, but there’s no context at all as to where it should go in the configuration file, or what dependencies it needs, or really what anything means, and that page is really old and most likely will never be updated.

And then there’s the little things that are just broken in my day-to-day life, but I’ve become so used to them and developed so many workarounds by now that I’d only ever know about them if someone sat down watching me use my computer and then immediately point out everything that doesn’t work well. I know that stuff is there, but I love the fact that the many versions of each random compiler or interpreter I use works way more reliably that I just don’t notice the stuff that’s broken anymore.

These pain points, the wounds of nix, limit the reach of it’s call. The tune is filled with both promise and despair. A beautiful melody is maimed by an ugly harmony. At some point, perhaps the siren will no longer be encumbered by it’s short comings. And then, at that day, perchance those who it’s enraptured will be able to joyfully sing it’s praise. But until then, it’s tune only charms those who can focus on only the benefits and become blind to the defects.

P.S. For those who do want to try out nix, I don’t recommend starting with nix as a package manager/shell on another distro (e.g. debian). I know it’s an option, but it really sucks since at that point it’s just another opinionated package manager conflicting with everything else you’re doing. It was an awful experience that made me almost give up on nix as just another gimmicky package manager. A lot of the shortcomings of the standalone nix shell are addressed in NixOS.