On Readers and Writers

Picture of Matt Tolman
Written by
Published: Jun. 30, 2025
Estimated reading time: 15 min read

Recently I've been "Zigifying" my C++ code. As part of this, I've been moving away from using C++'s streams to custom readers and writers. Honestly, I like it a lot better. Not only is it a lot simpler, but it's a lot easier to compose. Also, I've made a big shift from Zig's readers and writers which are limited to text, and instead templatized them so I can do more advanced things, like automatic encoding.

C++ Streams

C++ streams allow piping text in and out of some medium (such as printing to the screen, reading from a file, etc.).

Streams also handle formatting data automatically with the << operator. There are also format controls like std::hex which can change how something is formatted.

In theory, it sounds great. However, I don't actually use streams for a lot of reasons.

One of the main ones is formatting is not very intuitive to me. I just prefer formatting strings, like printf and the new std::format.

Also, I really don't like having << splitting up my output so often. Not only is it annoying to type, it can make things hard to read as there's a lot more noise and things are just more broken up.

Thinking "how hard could it be to write my own replacement," I decided to write my own reader/writer/formatting system. For this, I drew a lot of inspiration from Zig.

Zig's Readers and Writers

Zig has a concept of a reader which reads bytes from somewhere, and a writer which writes bytes to somewhere.

It's pretty simple to define a new reader or writer. Basically, for readers we define a read function that puts data into an output parameter and returns the number of bytes put into that parameter (or an error). For writers, we define a function that takes bytes, "writes" them somewhere, and then returns the number of bytes written. Easy.

This simplicity is also powerful. Not only is it simple to make new destinations and sources (just one function each), but we can have writers write to other writers or have readers read from other readers.

Zig does have a limit that writers and readers only work with bytes. But, ours doesn't have to have that limit. So, let's go further.

Custom Readers

My readers are part of my mtcore library, so I'll be using quite a few types I defined there. The main ones are Slice (which is pretty much a pointer and length) and Result (which is my errors-as-values pattern).

I also use some concepts for trait checking like WriterImpl and ReaderImpl. But, those aren't really necessary to understand the code and can just be replaced with typename.

Let's get to our custom readers. Here's my code:


template<ReaderImpl Impl>
struct Reader {
  // Type of elements we're reading (e.g. char, u8, wchar_t)
  using ReadElem = typename Impl::ReadElem;
  // Type of error we're dealing with
  using ErrType  = typename Impl::ErrType;

  // Slice used for buffer parameter (what we write to) 
  using BuffSlice = Slice<std::remove_const_t<ReadElem>>;

  // Slice used for result (we return a subslice of buffer)
  using ResSlice  = Slice<std::add_const_t<ReadElem>>;

  // Underlying iterator 
  Impl underlying;

  // Basic read-into-buffer function
  // Will read as much as possible
  Result<ResSlice, ErrType> read(BuffSlice buff) {
    // read is the only method that we need to define
    return underlying.read(buff);
  }
  
  // Try to read the entire sequence into a buffer`
  // Will fail if it doesn't all fit`
  Result<ResSlice, ErrType> read_all(BuffSlice buff) {
    auto res = read(buff);
    if (res.is_error()) {
      return res.error();
    }
    auto resSlice = res.value();
    if (resSlice.size() < buff.size()) {
      return resSlice;
    }

    // If there's another element not read, then we didn't read everything
    if (auto ro = read_one(); ro.is_success() || ro.error().code != ErrType::END_OF_FILE) {
      return error(ErrType::SIZE_EXCEEDED);
    }
    return resSlice;
  }

  // ... rest of functions ommitted for brevity
  // see https://mtcore.matthewtolman.dev/reader_8hpp_source.html
  }

Our Reader class takes a type for an underlying reader implementation. All that implementation needs to define is a read method which takes in a slice it can write to, and returns a sub slice with the read values (as well as some type declarations). Here, the reader can read anything, whether it's bytes, runes, numbers, etc.

We then take that simple implementation and add standardized methods to extend behavior, such as adding a "read all" method. We can add any extensions we need or want, so long as they can all be written using only the underlying read method.

Here's a sample implementation of a reader:


namespace impl {
template<typename T>
struct SliceReaderImpl {
  // Slice we're reading from
  Slice<T> buff;

  // State variables
  size_t curReadIndex = 0;
  bool first          = true;

  // Type declarations needed for Reader
  using ReadElem = T;
  using ErrType  = SliceReaderError;

  using BuffSlice = Slice<std::remove_const_t<ReadElem>>;
  using ResSlice  = Slice<std::add_const_t<ReadElem>>;

  // Our actual read function
  Result<ResSlice, ErrType> read(BuffSlice out) {
    // If we've returned something and we hit the end,
    // return an END_OF_FILE error
    // This does mean if we don't have anything to read,
    // then we'll return one empty slice before returning an EOF
    if (!first && curReadIndex >= buff.size()) {
      return error(ErrType::END_OF_FILE);
    }

    // Track that we've returned something
    first        = false;
    size_t count = 0;

    // Copy from input to output
    for (; curReadIndex < buff.size() && count < out.size(); ++curReadIndex, ++count) {
      out[count] = buff[curReadIndex];
    }

    // Return our substring as a success result
    return success(out.sub(0, count).to_const());
  }
};
}  // namespace impl

// Convenience function to make a slice reader from a slice
template<typename T>
Reader<impl::SliceReaderImpl<T>> slice_reader(Slice<T> buff) {
return Reader<impl::SliceReaderImpl<T>>{.underlying = impl::SliceReaderImpl<T>{.buff = buff}};
}

Our slice implementation is also templatized, meaning we can create a slice reader over a slice of numbers, characters, lists, or even other readers!

One of my favorite possibilities is having a reader that takes another reader and parses the data. This would work well for streamed data, like CSV or JSONL, where the reader reads CSV values from an underlying disk or network reader.

Of course, we're not done with just readers. We also have writers to make too.

Custom Writers

Here is part of my writer implementation class:


template<WriterImpl Impl>
struct Writer {
  // Declarations for types being written/errors
  using WriteElem = typename Impl::WriteElem;
  using ErrType   = typename Impl::ErrType;

  // Underlying writer
  Impl underlying;

  // Partial writes for mutable slices
  // Will return how much was written
  Result<size_t, ErrType> write(const Slice<std::remove_const_t<WriteElem>> &s) { return write(s.to_const()); }
  
  // Partial writes for immutable slices
  // Will return how much was written
  Result<size_t, ErrType> write(const Slice<std::add_const_t<WriteElem>> &s) { return underlying.write(s); }

  // Will return an error if not all of the slice was written
  Result<size_t, ErrType> write_all(const Slice<std::add_const_t<WriteElem>> &s) {
    Result<size_t, ErrType> r = underlying.write(s);
    if (r.is_error()) {
      return r.error();
    }
    // if we're dealing with a write-through writer, we have to rely on the final check in the stack
    // as there will be transformations, so we can't check for out of room here
    if constexpr (!is_write_through<Impl>) {
      if (r.value() != s.size()) {
        return error(ErrType::OUT_OF_ROOM);
      }
    }
    return success(r.value());
  }

  // ... Rest of functions ommitted for brevity
  // see https://mtcore.matthewtolman.dev/io_2writer_8hpp_source.html
};

Our writer follows a very similar pattern to our reader. Only one function needs to be defined, and the types used are abstracted away. We also maintain the ability to add additional methods as needed.

Here is our slice writer:


enum class SliceWriteError {
  OUT_OF_ROOM,  // necessary for writers
};

namespace impl {
  template<typename T>
  struct SliceWriterImpl {
    // Make sure our slice isn't immutable
    static_assert(!std::is_const_v<T>, "Cannot write to const pointer!");

    // Slice we're writing to
    Slice<T> out;

    // current state
    size_t curWriteIndex = 0;

    // Declarations for writer interface
    using WriteElem = T;
    using ErrType   = SliceWriteError;

    // Actual write method
    Result<size_t, ErrType> write(Slice<std::add_const_t<T>> bytes) {
      size_t i = 0;
      // Simply copy bytes to our output buffer
      for (; i < bytes.size() && curWriteIndex < out.size(); ++i, ++curWriteIndex) {
        out[curWriteIndex] = bytes[i];
      }

      // Return how many bytes we wrote
      return success(i);
    }
  };
}  // namespace impl

// Convenience function 
template<typename T>
Writer<impl::SliceWriterImpl<T>> slice_writer(Slice<T> out) {
  return Writer<impl::SliceWriterImpl<T>>{.underlying = impl::SliceWriterImpl<T>{.out = out}};
}

Pretty straightforward. Now let's get into the fun stuff.

Writers that Map Data

What if we want to transform the data before writing by say, converting UTF-16 to UTF-8, or formatting integers before we write them? Well, let's make a generic "mapper" writer.


// Templates to handle declaring the type we write,
// and declaring the underlying writer type,
// and extracting the error and lambda types
template<
  typename InType,
  WriterImpl WI,
  typename Err = typename Writer<WI>::ErrType,
  typename FuncType = std::function<
    Result<size_t, Err>(Writer<WI> &, Slice<std::add_const_t<InType>>)
>>
struct WriteTransformer {
  // meant for trait detection that this is an intermediate writer
  static constexpr bool IsWriteThrough = true;

  // interface declarations
  using WriteElem                      = InType;
  using ErrType                        = Err;

  // Underlying writer that we'll write to
  Writer<WI> &underlying;

  // Function we'll use to map data
  FuncType mapper;
  
  /**
    * Writes elements to the writer
    * (will be transformed before being passed through)
    * @param elems Elements to write
    */
  Result<size_t, ErrType> write(Slice<std::add_const_t<InType>> elems) {
    // Our mapper handles writing things directly
    // Allows writer to buffer or stream as desired
    // Also makes it easier to do zero-allocation methods
    return mapper(underlying, elems);
  }
};

// template soup. Only T needs to be passed in
template<typename T,
        WriterImpl WI,
        typename Err = typename Writer<WI>::ErrType,
        typename FuncType = std::function<
          Result<size_t, Err>(Writer<WI> &, Slice<std::add_const_t<T>>)
        >>
auto write_transformer(Writer<WI> &writer, FuncType mapper)
 -> Writer<impl::WriteTransformer<T, WI, Err, FuncType>> {
  // Make our writer
  return Writer<impl::WriteTransformer<T, WI, Err, FuncType>>{
    writer, mapper
  };
}

We can use the above writer as follows:


// Writer for final data
auto outBuff = std::array<char, 1000>{0};
auto outSlice = mtcore::mut_slice_from(outBuff);
auto sw = mtcore::slice_writer(outSlice);

// Writer we'll be actually writing to
auto writer = mtcore::write_transformer<int>(
  sw,
  // we have a mutable lambda variable for proper spacing
  [first=true](
    decltype(sw)& w, // Underlying slice writer we write to
    mtcore::Slice<const int> i // we get a slice of data to map
  ) mutable
    // declare we're returning a result object
    -> mtcore::Result<size_t, typename decltype(sw)::ErrType>
  {
    // Iterate our slice and write the transformed data
    auto iter = i.iter();
    int cur;
    size_t written = 0;
    while (iter.next().copy_if_present(cur)) {
      // We're writing our numbers as hex with
      // spaces between them
      const auto fmtStr = first ? "{x}" : " {x}";
      first = false;

      // Write our numbers
      auto r = mtcore::print(w, fmtStr, i);

      // handle errors
      if (r.is_error()) {
        return r.error();
      }
      else {
        // increment how much was written
        written += r.value();
      }
    }
    
    // Return how much was written
    return written;
  }
);

// Write some integer a bunch
ensure(writer.write_n_times(1493534, 14).is_success());

And, just like that, we've made a writer that takes integers and transforms them into strings.

We don't have to be limited to just simple transformations though. We could create more complex writers which do CSV encodings, or JSON, or whatever.

Further Extensions

Of course, we could add "optional" methods that we can then hook into if present, or do a default (e.g. no-op) if not. One example would be adding a "flush" method to our writer wrapper which will call the underlying implementation's "flush" method if it exists. If it doesn't exist, we could simply do nothing. This lets us have buffered writers that we can then periodically flush while not requiring that all writers implement a flush method. We could also use static asserts and requires checks to disable optional methods on the writers if preconditions don't exist (e.g. only allow "resetting" a writer if the underlying writer can be reset). I'm still experimenting with this idea, so maybe I'll go into more details later on after I've had a better chance to flush it out.

Formatting

I didn't just stop at writing bytes into a buffer. I also started working on custom formatters to allow formatting non-string types (like integers, floating points, etc.).

For formatting, I have a templatized struct that can be specialized to define a format for a new type. I went this route so I can have format strings.

The formatter only worries about formatting single elements, while my print method worries about formatting multiple elements. My print method is used as follows:


auto printRes = print(someWriter, "Default formatting: {}\nHex Formatting: {x}", 12, 12);

Currently, I support formatting strings (slices of chars), slices (e.g. int), optionals, results, IP addresses (v4 and v6), integers, and floating points. I'm adding more types over time, but it's so far a good start.

What I do like is I can use it to easily format into an in-memory buffer as well as C++ streams. At some point I'll do file support and have a quick stdin/stdout/stderr way of getting readers/writers.

Performance

I haven't done any serious optimizations yet. I've used "fast" algorithms for some things like floating point formatting, but I'm not optimizing my implementation of said algorithms. This makes a big difference when it comes to formatting vs writing. For instance, my unoptimized integer formatting code definitely isn't going to beat the standard's heavily optimized integer formatting code simply because I haven't put in the effort to make it fast. Whereas my writers copying raw bytes could be faster than stringsream simply because it's much simpler.

Because I'm comparing unoptimized vs optimized code, it makes measuring performance a little useless. Sure, my implementation may be slower in some cases, but that doesn't mean that the core idea has to be slower. I just haven't put in the time to make it not-slow yet. But, even if it's not very useful, we can still compare. I've done a few performance checks, and these are my current results in a release build:

Teststd::stringstreammtcore::writer
Writing 8192 characters 0.022ms0.002ms
Writing 8192 integers 0.086ms0.111ms
Writing 8192 integers (hex)0.085ms0.221ms
Writing 8192 doubles0.754ms0.278ms
The 10x increase in simply writing raw characters is huge! I wanted to prove to myself I could get in spitting distance of the standard library with this idea, and I did!

And yeah, my integer formatting sucks. But, my floating point number formatting is faster so it's not a problem with the idea, but rather with either the algorithm I picked or my implementation of it. Both of which I can solve in isolation from the rest of my writer/formatting code (just due to my design). Also, while it's notably slower, it's not orders of magnitude slower (at least for the non-hex code). So it could also be a matter of fine-tuning needing to be done.

Wrap Up

Personally, I like the writer/reader idea a lot better than the stream idea from C++ - especially since I'm not overloading operators. It also gives me a lot more control over formatting variables, which I really like. I'm currently experimenting with having more complex formatting streams too, like padding (e.g. { <12;}), date formatting (e.g. {YYYY-MM-dd}), etc.

The more advanced formatting I'm playing with goes beyond what Zig does, which has it's pros and cons. Zig has a really good unified syntax, so it's really easy to guess how to format a new type, but it makes it a lot harder to get the level of control I'd want in a formatter. In my writers, I don't have high uniformity. So, it requires more reading documentation to use my formatter, but there is a lot more control to allow me to get exactly what I want. Pros and cons.

I've been enjoying playing around with readers, writers, and creating my own formatters. It turns out, there's a lot more to formatters than I thought (such as floating point numbers). I'm going to keep experimenting and developing ideas, and I may write a few more posts along the way.