Stupid Swift Tricks #7½

Writing a User Guide In It

In the first part of this article, I explained that I put together a system for producing user guides for apps, by writing the guides in Swift itself, and having the compiler turn that into code/data. It can then be turned into HTML on demand, or inspected by the app to generate the table of contents, search indices, and so on.

In this part, we’re going to look into how that actually happens: How does it work? How does the code get turned into a data structure? And most importantly, how do you make it easy and comfortable for a human to write a user guide in a language intended for programming computers?

There are a few parts to this. Here are the things we’re going to need to be able to do:

  • Write text, large blocks of it, without having to get bogged down by compiler junk like escaping newlines or quotes
  • Apply general text replacements such as smart quotes, and properly escape characters like “&” that have special meaning in HTML
  • Embed tokens that will get replaced at runtime, for things like the device or platform name
  • Embed special formatting within a string, such as on-screen buttons or keyboard keys
  • Mark blocks or phrases that require special formatting, such as chapter titles, lists, definitions, and call-out boxes (for tips, notes and warnings)
  • Mark blocks that should only be included on certain platforms — and ideally support this for parts of sentences as well (so we don’t have to duplicate an entire paragraph due to minor platform differences)
  • Automatically generate a table of contents, and allow linking to chapters and other headings, by some kind of reference that gets checked at compile-time

Let’s see how these are achieved using Swift.

Writing Blocks of Text

This part is pretty straightforward: Swift supports multi-line strings, by starting with triple quote marks. Until the next set of triple-quotes, it will treat newlines and any other kind of quote as being part of the text. Since triple-quotes are never used in the user guide, this means we can plough ahead and write text with impunity, caring not for what Swift thinks of it.

But there is something else going on. We could write the entire user guide as a single giant multi-line quote block, but then we’d have to parse it at runtime using code we wrote ourselves. We want the structure — the paragraphs, the chapters, and more — to be parsed by Swift as it compiles. This will flag up certain kinds of errors in advance, render faster when the user opens the user guide, and if the already-parsed guide ends up in data structures, we can inspect them to provide search and so on.

How might we do this? Well, one crude way (not recommended) might be to have a big array of tuples, where the first element is the “type” of the block (heading, paragraph, list, etc) and the second is the text contained within it.

This is simple to design and implement, but it’s… not ergonomic, to put it lightly. It’s a pain to write and maintain, and also quite limited.

For example, although we could make the “block type” an enum and hang various parameters off the enum cases, it still doesn’t support nested content (e.g. lists with nested lists, or tables with other content inside), or allow for attaching any kind of metadata that varies within the content itself.

Let’s fix that first:

User Guide Content

We’re gonna need more than a tuple, but we don’t want to make one big overstuffed struct or class that tries to Do It All.

If we instead define a Content protocol and ensure anything that can go into the user guide conforms to it (and avoid Self or associated types) then we can define all sorts of different types of content, and mix and match them in a single array.

This allows us to make very flexible structures to represent the content, and the protocol’s pretty simple, too:


Content protocol
protocol Content { func render(context: inout Context, into: Output) func content(for context: Context) -> [Content] }

The render(context: into:) method is the main thing here. Context is a struct with flags and other information that can be useful for customising the output (e.g. what device it’s being run on), and Output is something that receives pieces of raw HTML for display.

The content(for:) method isn’t used during rendering, but it is used when inspecting the data structure to discover chapters, keywords, etc. This also takes a Context so that chapters/keywords/etc can avoid being scanned if they’re not included on a particular device.

(Why is the Context marked inout? As well as global flags, it has some transient state that rendering can use for its own purposes. For example, smart quotes use it to track open/closed state between blocks, so you can have a quote extend across multiple paragraphs without getting confused as to which way round they should go. Most of the Context is read-only though.)

Okay, so now we have some flexibility. What do we do with it?


The first thing we can do is make String and Substring conform to Content. Their content(for:) return an empty array (there’s nothing “nested inside” them), and their render(context: into:) can just write themselves, suitably HTML-escaped, to the Output.

(I also take this opportunity to apply a few other simple search-and-replaces, to apply things like smart quotes and Markdown-style bold and italic emphasis.)

This handles plain text, but a number of other structs are also going to make up the user guide system, and now they don’t need to worry about HTML escaping or other formatting niceties: if their content includes any Strings, those will now handle themselves automatically.

(There’s one exception: I do also have a RawHTML struct that contains a single String member. RawHTML writes that to the output directly, unaltered, so if I ever do need to inject some code directly, I have that escape hatch by writing RawHTML("..."))

What about the more complex substitutions, though? For example, embedding buttons and glyphs, or replacing substitution tokens like .device with the type of device (e.g. iPhone, iPad) the guide’s being displayed on? Is this just more runtime search-and-replace?


Climbing the Glyph Case

These tokens, macros and other replacements are handled by Swift during compilation. This means that they can be checked at compile time (e.g. if you mis-spell the name of a glyph, you’ll get a compile error, instead of a broken image at runtime, which you might not notice unless you carefully scan through every line of the user guide). They’re also faster to render, and since they’re native Swift data types, also amenable to introspection by code.

This is probably the most important feature, maybe the feature that tipped me over the edge into using Swift for this.

See, I already had an enum Glyphs { ... } in both Ferrite and Hokusai, with a case for every glyph used by the app. This is because in iOS 13, these glyphs are all SF Symbols, but for backward compatibility with iOS 12, there needs to be a table to map them back to the images used in older versions of the app. Plus, it helps maintain consistency and avoid issues due to typos in image filenames.

They’re pretty big tables, but by using Swift, I can just reference Glyphs directly, instead of having to maintain an entirely separate table for the user guide. If I ever change, add or remove glyphs from the app, the user guide stays in sync, or lets me know in the Buildtime Errors tab of Xcode if something needs updating.

And Swift’s contextual lookup rules mean that it’s pretty ergonomic, too. Describing a button with a glyph looks like this:


Button example
"This string has a \(button: .redo) button in it"

This will render as “This string has a Redo button in it”, tweaked slightly depending on context: in the print version, the text “Redo” will be styled inside button chrome, while inside an iOS 13 app, it will render the SF Symbols Redo glyph inside the button chrome, then place the “Redo” text after it, outside the chrome, for additional context.

How does this work?

Care And Feeding Of Literally Literals

In Swift, a string literal can construct not only native strings, but also custom types. In other words, if you have a type like this:


example type:
struct FourCharacterCode: ExpressibleByStringLiteral { ... }

And a function like this:


example function:
Example(_ someParameter: FourCharacterCode) -> () { ... }

You don’t have to write:


function being called, clunkily:

You can instead just write:


function being called, cleanly:

and the transform is done automatically by Swift at compile time.

In this example, FourCharacterCode is a struct that turns a four character string, like “RIFF”, into a UInt32 equivalent based on ASCII codes, in this case, 0x52494646. Lots of file formats use codes like this, and so do many Apple APIs, so it’s useful to have around. C-family languages have native support for this, but Swift doesn’t. As you can see though, we can easily add it.

This conversion is done with a bunch of string/character operations and math, so you might expect this to be pretty inefficient — if you compiled the above, then disassembled the result, you might expect to see a String being constructed, passed into the FourCharacterCode constructor, where a bunch of math happens, and the result comes back and is then passed on to Example().

But no! Because it’s taking a string literal, not a string variable, the compiler can figure it all out in advance. So in release builds, it all boils away at compile time, and the code just passes 0x52494646 to Example(). Neat!

What does this have to do with our replacement tokens, glyphs and buttons?

Literally Everything

Normally, Swift detects text in strings \(written like this) and replaces it with the result of evaluating the content of the brackets. So "A string with \(6*7) in it" turns into "A string with 42 in it".

But in recent versions of Swift, when you write types like FourCharacterCode that can be constructed from string literals, you can easily and comprehensively customise what happens to that text inside the brackets. Our button renderer is one such customisation.

It’s pretty straightforward, yet flexible: Swift will call a method named appendLiteral() for each chunk of regualr text, and appendInterpolation() for each piece of interpolated text. You just have to make sure those methods exist — the contents of the brackets in your string will get copy-and-pasted into the method call parameters. So this:


Example string, before
"a \(button: .redo) button"

Is rewritten by the compiler into something like:


Example string, after
text.appendLiteral("a ") text.appendInterpolation(button: .redo) text.appendLiteral(" button")

You can write as many overloads as you like for the method, so you can make a rich and diverse syntax inside those brackets if you wish, by overloading with different types, or explicit parameter names. What your methods do with all that, is up to you, but typically you’d collect/combine them into an array or something. At the end of the string, you use that to initialise your custom type.

So the user guide has a custom TextContent type (conforming to our Content protocol), and that’s what allows us to embed substitution tokens like \(.device) and custom formatting like \(button: .delete) into the documentation and have it compile, and do something useful. Without having to write our own parser for it — it’s a concept that’s native to Swift, so it Just Works™.

It’s pretty simple to put together:


TextContent (part of it, anyway)
struct TextContent: Content, ExpressibleByStringInterpolation { struct Interpolation: StringInterpolationProtocol { var content: [Content] = [] init(literalCapacity: Int, interpolationCount: Int) {} // regualar text mutating func appendLiteral(_ literal: StringLiteralType) { content.append(literal) } // substitution tokens (like .device) mutating func appendInterpolation(_ token: Tokens) { content.append(token) } // text-only button mutating func appendInterpolation(button: TextContent) { content.append(Button(button)) } // graphical button, default title is same as glyph name, but can be overriden: mutating func appendInterpolation(button: Glyphs, _ title: TextContent? = nil) { content.append(Button(button, title)) } // key caps mutating func appendInterpolation(key: TextContent) { content.append(KeyCap(key)) } // ... and a bunch more ... } init(stringLiteral value: String) { content = [value] } init(stringInterpolation: Interpolation) { content = stringInterpolation.content } func render(context: inout Paperwork.Context, into: ContentOutput) { for item in content { item.render(context: &context, into: into) } } }

This takes care of all our special formatting and substitutions inside of strings: Token is an enum that has cases for the various substitution tokens like device, it conforms to Content, and its render(context: into:) writes whatever the appropriate string is for that enum case. The addInterpolation() for Token doesn’t have a named parameter, but Swift understands which one to call based on the type.

Other types like Button and KeyCap are structs that conform to Content (or functions that return structs which conform to it) and do the work of rendering those styles in HTML.

Some of these methods have explicitly named parameters, because if you just include a plain text string, Swift needs to know if that should be rendered inside a Button or a KeyCap or something else — even in the cases where no glyph or modifier key (another enum) is included.

Okay, so now we’ve covered the bulk of the formatting of paragraphs, but what about larger chunks of the document? Chapters and subsections, lists, tables of commands or key mappings, call-outs (like tips, notes and warnings), and blocks that should be included on some devices but not others?

Well, if you look back at TextContent, you can see that it’s a simple struct that contains an array of other Content, and its renderer just gets each piece of Content in the array to render itself. It’s simple enough to make other structs that do the same, wrapping them in HTML tags. For example, a warning call-out box might look something like this:


Hypothetical example
struct Warning: Content { let content: [Content] func render(context: inout Context, into: Output) { output.write("<div class='warning'><strong>Warning:</strong> ") for item in content { item.render(context: &context, into: into) } output.write("</div>") } func content(for context: Context) -> [Content] { return content } }

And, sure, this works! But, isn’t this going to be an ugly mess to write in the user guide itself? Aren’t we going to have to write something like:


Example Of What Not To Do
let userGuide = [ // lots of awkward stuff like Scope(.iOS13, content: [ Definitions(content: [ Item("blah blah", definition: [ "First bit of definition", "Another bit of definition" ]), Warning(content: [ "Some stuff", "Some more stuff" ]) ]) ]) ]

(The Scope tag restricts things to only appearing for certain devices/situations/etc)

I mean, this works, but it’s not really the clean usable document layout format we’re looking for.

As it happens, we don’t have to do this. Thanks to some recently-added Swift features, in reality it looks like this:


Example Of What We Actually Do
Scope(.iOS13) { Definitions { Item("blah blah") { "First bit of definition" "Another bit of definition" } Warning { "Some stuff" "Some more stuff" } } }

Much cleaner! In some ways, it’s funny because it’s not a huge difference, mostly getting rid of commas and moving stuff out of the nested brackets. But the readability difference is pretty clear (to my tired eyes anyway…).

(Also — and this is a big win — you can use if statements in there, which you can’t inside of arrays.)

How does that work?

Block the Builder

Inspired by SwiftUI, a main piece of the user guide puzzle is solved by using Function Builders. These are… arguably misnamed? I feel like they should probably be called Builder Functions (they are “functions that build things”, not “things that build functions”?). They are one of the keys to how SwiftUI works, and they’re how we can write a bunch of arbitrary statements with no context and still have them turned into a data structure.

In our system, Scope looks something like this:


Scope struct
struct Scope: Content { let requirements: Context.Requirements let content: [Content] init(_ requirements: Context.Requirements, @ContentBuilder block: ()->Content) { (self.requirement, self.content) = (requirements, [block()]) } func render(context: inout Context, into: Output) { if requirements.match(context) { content.render(context: &context, into: into) } } func content(for context: Context) -> [Content] { requirements.match(context) ? content : [] } }

The real “magic” is in the @ContentBuilder. It’s what allows us to just write a bunch of stuff in { braces } after the Scope() tag, and have that turned into our [Content] array. So what is it?

ContentBuilder is just a struct, but it’s tagged with @_functionBuilder, which means any functions that reference it (like Scope‘s initialiser) get marked for special treatment: when the block passed to Scope’s initialiser is compiled, Function Builder rules get applied to it.

These cause Swift to internally rewrite the affected block, so that any unused values are gathered together, and at the end of the block, they’re passed to the ContentBuilder for processing.

Whatever object it returns, is the new result of the block. As it happens, it barely needs to do anything:


Content Builder
@_functionBuilder struct ContentBuilder { static func buildBlock(_ component: Content...) -> [Content] { return component } }

Nothing to it, right? All the unused Content inside the builder block is gathered into an array and returned. And that’s it. It gets saved by the Scope struct. It seems almost pointless — and yet, it declutters the code a whole bunch.

(Supporting if statements requires an extra method. It’s no biggie. I don’t use that right now for the user guide, but I do for some other things — I’ve been experimenting with using builder blocks to make it easier to set up Auto Layout Constraints).

Extra Features

A handful of other minor features:

  • I define a new operator to create list items automatically.
  • I define && and || operators on Context.Requirements so more complex Scope requirements can be built up.
  • The Context being passed around also includes the data structure of the document, as well as methods to recursively descend it and call some method on each piece of content — used by the TableOfContents
  • It’s trivial to add new functions for special features, or add whatever we want inside the functions that already exist. This lets us enhance the user guide as the user sees it, without cluttering up its source. For example, all headings automatically have Best Possible Ampersands applied, which I used to have to do by hand in the HTML, but is now taken care of inside the renderer for headings.

Mystery Meat?

One potential problem with these techniques is that it can add a lot of junk to native types like String, as well as additional/overloaded operators, and using fairly generic type names like Content. I really didn’t want to pollute the rest of the codebase with all of this!

There’s a saying I read once — decades ago, so unfortunately I don’t remember where — that says “First, C++ programmers learn how to overload operators. Then, they learn not to.” It’s about what I came to think of as “mystery meat”: when you don’t really know what you’re getting.

I mean, a programming language is (should be) deterministic so you can always check the rules to find out exactly what the compiler will do, but past a certain level of complexity, you’re adding work, not taking it away.

So, even though I know that * has precedence over + I will often put the parentheses in to an expression — and for any precedence rules more complex than that, it’s a must.

It’s similar with things like how overloading gets resolved, or how operators behave. It can also have an impact on compile times if the compiler has to try out a lot of additional possibilities for what you might mean in any given situation.

I don’t want to ever be writing general app code, and have the compiler do something surprising like take a string designed for machine use, and interpret it as being HTML TextContent, for example. Or try to turn something with an && into a Scope() requirement, instead of a regular boolean test.

There’s a simple, fairly shotgun solution to this: the entire user guide is thrown into a separate Swift Module, so the whole thing, with all its kooky extensions and operators and stuff, is completely isolated from the rest of the codebase.

No code-pollution worries at all, and the only parts that are public (visible to the rest of the app) are methods to get the finished HTML, get the search index, and an enum that defines platform/environment flags.

So, I feel comfortable going all-out with “magic” features that I would ordinarily avoid (to keep the rest of hte codebase sane), because they’re constrained to this sandbox of sorts.

And that’s how Ferrite’s new user guide is put together.