Switching to Erlang
Published:
Recently I rewrote my blog again. This time, I moved from a language I know really well (TypeScript/JavaScript) to a language I don't (Erlang). I'm familiar enough with Erlang, but haven't made any projects of this scale or complexity (I have custom parsers, HTML generators, statistics metrics, etc).
I chose Erlang for two main reasons:
- Erlang has live deploys, and I wanted to be able to update my blog without taking it offline
- I wanted to learn Erlang more
I think I accomplished my goals. I got a blog hooked up to my existing CI/CD pipeline for my other (smaller) Erlang projects, so I can deploy without taking it offline. I also learned a lot more about Erlang. I'll go over my learning highlights.
Leex and Yecc are pretty cool
Leex and Yeec are Erlang's built-in parser generation tools. They're similar to Lex and Yacc, but made for Erlang instead of C. That said, they're still parser generators and get all the downsides that entails, like obtuse error messages and difficulty debugging. I got something cobbled together well enough, and it seems to be less buggy than what I had previously thrown together in TypeScript, but it has a very different set of limitations which has proved annoying to use.
Text processing is rough
Erlang has multiple ways to represent text. It has iolists, which are either lists of numbers from 0-255 (ASCII) or binaries or lists of binaries/numbers. It has unicode lists, which are lists of unicode code points. It has binaries. It has unicode encoded binaries (UTF-8, UTF-16, UTF-32). It has iodata. It's not great.
The worst part is that there isn't anyway to check whether a list is text or not. You just have to know, which is frustrating. It's even more frustrating when things just don't work because of legacy reasons.
For instance, binaries are newer. But, they can't be passed to some of the old string processing methods (only new ones). That sucks, but okay, I can convert them. But, when I do I/O, I get binaries out (both my HTTP server and file methods give me binaries), so I really want to use binaries to avoid copies.
However, Unicode throws a mix into everything. I have Unicode in some of my blog posts (especially ones about localization). Those same legacy methods which don't take binaries also don't take lists with unicode code points, only ASCII code points. And, converting a binary with unicode into a list using the default conversion methods will cause those unicode points to be interpreted as Latin-1 encodings, which messes things up. Also, it's really hard to get those messed up code points put back together.
There are unicode methods to convert binaries to UTF-8 encoded lists, but a lot of legacy methods still interpret those as Latin-1/Extended ASCII, not UTF-8, so weird things can happen when processing those lists (which I do for parsing and generation). Eventually I figured things out (mostly), but for any processing where I could get away with dropping unicode characters (like creating summaries or calculating reading time), I did. I only kept unicode when I knew the only processing was parsing and then concatenation. Anything else (like trimming) I had to treat as "unsafe." This was the worst language-level experience I've had, even worse than Windows UTF-16 nonsense.
Hot Deploys can be tricky
Erlang uses relup files to say how to upgrade from one version to another. These files are pretty poorly documented, and they have to be done for every sub-module. What's more, the error messages for a failed deploy are abysmal. There's no details in those messages, just (hopefully) a massive core dump log. I eventually got my deploy figured out, but it took hours to do with the amount of new stuff I added (database configs, many new sub-modules, etc.).
Mnesia is pretty cool
Mnesia is a key-value store that comes built-in to Erlang. It has options to setup distribution for in-memory and on-disk copies of data across different nodes in a cluster, which is really cool. It also has it's own query system with select, which reminds me a bit of datalog (but is definitely not datalog). It took a bit to get used to, but once I did I really liked it. Currently, my entire blog is running from Mnesia.
Folsom for realtime stats is cool
Folsom is a 3rd party library, but it's been pretty nice. It has cool stats like histograms, meters, gauges, etc that are tracked in memory. It doesn't do any visualizations (you still have to do that part), but it makes tracking whatever metrics you want to track pretty easy. I use it for tracking response times, CPU load, memory usage, etc.
Observer is really cool
Observer is a built-in GUI observation tool for Erlang. It's not web based, but you can connect a node with a GUI to headless nodes and then monitor those headless nodes. It shows a lot of detail, and is really cool.
What are my plans
For now, I just plan on writing more blog posts and creating more dashboards for myself. I've been using D3 to make my own charts, and I have a few more ideas I want to do, but I wanted to get the blog deployed before I spend too much time building dashboards.
Other than dashboards, I'm going to continue my C journey. Lots of stuff I want to do there. As for updates to my blog engine, I don't have too much stuff I'm going to do there. It's pretty much what I want (minus a few dashboards I haven't gotten to yet).