17 min read

Zero DOM: Compile-Time Rendering Part 2 - Latency

Zero DOM: Compile-Time Rendering Part 2 - Latency

This is part 5 of a 12-part series called Distilling the Web to Zero. The web is far from done improving - particularly as it pertains to building rich web applications that are, to users, developers, and businesses alike, more desirable than their native-app counterparts. This series is prefaced by defining the biggest challenges on the road towards that goal and the 12 essays that follow explore potential solutions using concrete examples.

On response times

The speed of light sucks.
– John Carmack

The web community has adopted an apathetic attitude towards response times. “Ruby has been fast enough for 13 years,” protests DHH. Ruby’s not alone though, though. This disregard for response times spans across nearly every popular language and framework used to build the web. Here are a few benchmarks put together by Rich Harris, creator of Svelte. This aims to test SSR in isolation with mock HTTP requests to eliminate any TCP overhead.


name renders/sec average (ms) relative to react
react 770 1.3 baseline?
sveltekit 612 1.6 1.26 x slower
solid 589 1.7 1.31 x slower
remix 467 2.1 1.65 x slower
vue 297 3.4 2.59 x slower
nuxt 290 3.4 2.66 x slower
next 54 18.4 14.26 x slower

What makes this engineering oddity even more peculiar is that plenty of very fast options do exist. Many are even 1000x faster yet they remain generally disregarded by the web community.

“It doesn’t matter,” they say. The performance engineer in me hates to admit it, but they’re kind of right.

How did we get here?

Obscured by "wire time"

It’s easy to conflate render time with response time. It’s not uncommon to see 99% of the response time spent either awaiting DB queries or round-tripping across the country to the user. When the total response time is 200 ms and roughly 2 ms is spent on rendering, it simply doesn’t move the needle to fixate on your render times.

Money can’t buy faster render times

Many problems can be solved by throwing more money at it. Increasing your capacity is one such problem and cloud providers are more than happy to be enablers of this strategy. A VM with CPU speeds equivalent to a modern laptop can cost you $27,869 per year. Here’s a comparison of various AWS EC2 instances and a maxed out dev machine using Geekbench, a benchmarking tool that runs your CPU through a collection of stress tests. Users can upload their results to make it easy to compare various systems. (Included are each system’s costs per year for on-demand instances in the US.)

NOTE

The laptop’s cost is amortized over a period of 5 years in an attempt to compare yearly subscription costs to a one-time laptop purchase.

vCPU Benchmarks.svg

There’s a clear correlation between performance and the number of cores. Now contrast that to their single-core benchmarks. Since UI-rendering is nearly always an atomic, indivisible operation, it isn’t split across multiple cores. Therefore spending 40x more money doesn’t boost your minimum render times by any margin, it only serves to increase your maximum conconcurrent workload. Put simply, money is an easy way to support more users but no amount of money can fix your latency issues.

With this knowledge of wire-time, CPU constraints, and the inability to throw money at the problem, it’s understandable how framework developers might normalize this apathetic attitude towards response times almost as if it were a law of nature we have no choice but to accept.

The Unforgiving Edge™

Success is a lousy teacher. It seduces smart people into thinking they can't lose.
– Bill Gates

CDNs help reduce “wire time” by distributing content to a multitude of locations that are in closer proximity to its users. This has the potential to shave hundreds of milliseconds off the response times. Historically, this content was limited to only static or cached assets but as our edge infrastructure became more sophisticated, naturally there was interest in running our application code on The Edge as well.

It did not go well.

Backpedaling from the edge

The web community brought this ethos of “acceptable latency” to The Edge. And why shouldn’t they? They all saw unprecedented success with regional cloud hosting for several decades! The internet has minted more billionaires than any other technological advance in human history.

What was discovered was that the best practices that thrived in The Cloud did not have the same impact when employed on The Edge. The path of innovation is never a straight line. With honest and candid humility, Vercel’s CEO and VP of Product provided some valuable insights.


To be clear, this was NOT an issue of skill or lack of effort. Vercel is a highly respected company composed of a large portion of the planet’s greatest web talent with hundreds of millions in funding.

The issue was that all our best practices that succeeded back in The Cloud suddenly became anti-patterns in the context of The Edge. In The Cloud, capacity is king (i.e. throughput) and as discussed above, latency matters very little. On The Edge, this equation is flipped upside down.

On The Edge, “latency is the mind killer”

The Cloud (web2) has a core tenet of capacity which causes latency to be wastefully disregarded since it's bounded by factors outside its control. At The Edge (web4), the core tenet is latency and it isn’t just important – it's the raison d'être.

Why exactly is The Edge so different from The Cloud? In short, when operating on The Edge your “wire time” rapidly approaches zero – there’s no awaiting on a database with static/cached content, and users are located milliseconds away. Suddenly all that remains to be optimized is the time it takes to get those bytes on the wire.

Artur Bergman, founder of Fastly, gives a 30 minute crash course on the many insane things they do at Fastly to minimize latency. It’s a quick peek at just how different of a ballgame it is to be operating at The Edge compared to The Cloud.

On The Edge, latency is the new north star, so bringing a web2-ethos of gluttonous render times mixed with blocking dependencies on cloud-located databases brought ruinous results. Apps became slower, not faster, and trying to fix the problem by “adding more web2 '' only served to worsen performance, never improving it. The difficult decision was made to retreat from The Edge.

yoda-fail.jpg

Fix needed, please advise

web4 is interesting in that it's not an extension of web2 but rather an initiative in the opposite direction. This makes it a tool that's very suitable for duty on The Edge. Read more on web4 from March. To be clear, The Edge’s latency challenges go deeper than just rendering latency. There’s also the issue of rendering in the absence of data. Techniques like suspenses and streaming are only bandaids that need solving on a more fundamental level more akin to what is seen in native mobile apps. Such local-first concepts will be covered later this year. For now, this essay will remain focused on optimizing rendering latencies.

If rendering needs to be not just incrementally faster but rather optimal, then the best approach to take is to offload as much runtime work as possible by way of compile-time rendering.

HTML, being a structural language, is notoriously verbose. Many web2 frameworks, like React, construct giant trees as a part of their rendering (and diffing) process. However, on the server side, there is no need to construct a DOM – neither a virtual DOM nor a regular DOM. Instead, it’s orders of magnitude faster to simply line up those bytes in contiguous memory so they can be fired down the wire piece by piece. This has the added advantage of avoiding any string concatenation, as seen in many other web2 frameworks without a VDOM, which negatively impacts garbage collection (covered in more depth in part 1).

Interpolated string handlers

Interpolated string handlers are new in C# 10. Sometimes they are referred to as tagged template literals. They enable you to customize how and when a string gets interpolated (or even forgo the process altogether). For example the following two lines produce the same output:
"Hello " + name + "!"
$"Hello {name}!"

If you create an interpolated string handler the compiler will hand that string to you in pieces allowing you to interpolate it however best fits your use case.

One useful use case is for logging. You could delay the construction of the string until you know it qualifies for output. No need to waste memory by creating a whole string on the heap before passing it into the method if the log message’s priority level is configured to be ignored. This saves CPU time and GC pressure. Another example use case is for sanitizing SQL inputs.

xUI takes advantage of this feature to indicate which parts of the HTML are static and guaranteed to never change and which parts are dynamic. It hangs on to both types in an array instead of constructing a tree, tag by tag.

As an additional benefit, these static parts are treated as string literals and are never recreated (a helpful feature when trying to handle millions of requests per second).

Finally, isolating the dynamic values and storing them in an array interlaced with pointers to string literals of the static content makes implementing a diffing algorithm exceedingly simple – a complexity of O(1)! Adding more tags/elements increases the workload by zero. Adding dynamic values will increase the work, but only linearly not exponentially. Compare that to diffing a virtual DOM tree which has a default complexity of O(n3). 1000 elements would require in the order of one billion comparisons. While heuristics can help mitigate that complexity, it’s still vastly more work to build and diff two trees compared to an approach where there is no tree.

Here’s a deeper look at how the C# compiler lowers that syntactic sugar. This example shows two raw string literals but one is assigned to a default type of string while the other is assigned to a custom interpolated string handler of my own creation called NotReallyAString.

Below is the lowered code generated by the compiler. (Check out the SharpLab of this code if you want to poke at it yourself from your browser to explore what the compiler is doing under the hood.) For the first method, the main thing to notice is the last line where it calls ToStringAndClear() which does the work of allocating an array of bytes on the heap that is the properly calculated length and subsequently writing each chunk to it ready for use as a string.

The main thing to notice about the second method is that it never calls ToStringAndClear(). Instead, after it collects each chunk, it just hands you your custom handler so you can do whatever you want with the chunks – zero memory allocated (depending on how you collect your chunks)!

In the context of xUI, this is the perfect format for two things:

  • Zero memory allocations
    Skip the expensive string-building and write those raw chunks directly to the network. Html pages are frequently over 100KB and composed of hundreds/thousands of these “chunks.” There’s no need to have a single giant contiguous array of bytes as a string object in the heap. Those string literals never need to be copied unnecessarily. Just treat them like a bandolier and rapid-fire them directly to the network.
  • Fast diffing
    If you have two of these NotReallyAString objects that were created from the same raw string literal, then it’s guaranteed to have the same number of chunks and, even better, it’s guaranteed that the order of each chunk never shifts either! Detecting if any of the dynamic values have changed (e.g. count) is a simple process of iterating through the appended chunks of two NotReallyAStrings, ignoring the string literals (because they’re immutable and cannot change) and performing a simple == operation at each index. Any index with unequal values can trigger a signal for HTTP/X to update the DOM.

Here’s an example from X/Twitter. Their row of buttons below each tweet is a combination of 8.5KB of structural markup and 4 dynamic values – the counts after each icon. Instead of building a tree of each tag and its attributes and children, it's far more efficient to split the string based on its dynamic values and store the segments as an array (visualized after this screenshot). Bonus, since it's the compiler doing the “splitting,” these segments get special treatment as string literals too!

xbuttons.png



Here's a visual of that array. It may represent 8.5K of HTML but in RAM it's exceedingly efficient since the immutable chunks are pointers to string literals that are never duplicated. This becomes particularly useful when components are repeated or the number of concurrent users climbs.

Chunk[]
0:
<div class="css-175oi2r"><div class="css-175oi2r"><div aria-label="6 replies, 28 reposts, 69 likes, 47 bookmarks, 28501 views" role="group" class="css-175oi2r r-1kbdv8c r-18u37iz r-1wtj0ep r-1ye8kvj r-1s2bzr4" id="id__2b6rax94p6y"><div class="css-175oi2r r-18u37iz r-1h0z5md r-13awgt0"><button aria-label="6 Replies. Reply" role="button" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1loqt21 r-1ny4l3l" data-testid="reply" type="button"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M1.751 10c0-4.42 3.584-8 8.005-8h4.366c4.49 0 8.129 3.64 8.129 8.13 0 2.96-1.607 5.68-4.196 7.11l-8.054 4.46v-3.69h-.067c-4.49.1-8.183-3.51-8.183-8.01zm8.005-6c-3.317 0-6.005 2.69-6.005 6 0 3.37 2.77 6.08 6.138 6.01l.351-.01h1.761v2.3l5.087-2.81c1.951-1.08 3.163-3.13 3.163-5.36 0-3.39-2.744-6.13-6.129-6.13H9.756z"></path></g></svg></div><div class="css-175oi2r r-xoduu5 r-1udh08x"><span data-testid="app-text-transition-container" style="transition-property: transform; transition-duration: 0.3s; transform: translate3d(0px, 0px, 0px);"><span class="css-1jxf684 r-1ttztb7 r-qvutc0 r-poiln3 r-n6v787 r-1cwl3u0 r-1k6nrdp r-n7gxbd" style="text-overflow: unset;"><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" style="text-overflow: unset;">
1:
6
2:
</span></span></span></div></div></button></div><div class="css-175oi2r r-18u37iz r-1h0z5md r-13awgt0"><button aria-expanded="false" aria-haspopup="menu" aria-label="28 reposts. Repost" role="button" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1loqt21 r-1ny4l3l" data-testid="retweet" type="button"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M4.5 3.88l4.432 4.14-1.364 1.46L5.5 7.55V16c0 1.1.896 2 2 2H13v2H7.5c-2.209 0-4-1.79-4-4V7.55L1.432 9.48.068 8.02 4.5 3.88zM16.5 6H11V4h5.5c2.209 0 4 1.79 4 4v8.45l2.068-1.93 1.364 1.46-4.432 4.14-4.432-4.14 1.364-1.46 2.068 1.93V8c0-1.1-.896-2-2-2z"></path></g></svg></div><div class="css-175oi2r r-xoduu5 r-1udh08x"><span data-testid="app-text-transition-container" style="transform: translate3d(0px, 0px, 0px); transition-property: transform; transition-duration: 0.3s;"><span class="css-1jxf684 r-1ttztb7 r-qvutc0 r-poiln3 r-n6v787 r-1cwl3u0 r-1k6nrdp r-n7gxbd" style="text-overflow: unset;"><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" style="text-overflow: unset;">
3:
27
4:
</span></span></span></div></div></button></div><div class="css-175oi2r r-18u37iz r-1h0z5md r-13awgt0"><button aria-label="69 Likes. Like" role="button" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1loqt21 r-1ny4l3l" data-testid="like" type="button"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M16.697 5.5c-1.222-.06-2.679.51-3.89 2.16l-.805 1.09-.806-1.09C9.984 6.01 8.526 5.44 7.304 5.5c-1.243.07-2.349.78-2.91 1.91-.552 1.12-.633 2.78.479 4.82 1.074 1.97 3.257 4.27 7.129 6.61 3.87-2.34 6.052-4.64 7.126-6.61 1.111-2.04 1.03-3.7.477-4.82-.561-1.13-1.666-1.84-2.908-1.91zm4.187 7.69c-1.351 2.48-4.001 5.12-8.379 7.67l-.503.3-.504-.3c-4.379-2.55-7.029-5.19-8.382-7.67-1.36-2.5-1.41-4.86-.514-6.67.887-1.79 2.647-2.91 4.601-3.01 1.651-.09 3.368.56 4.798 2.01 1.429-1.45 3.146-2.1 4.796-2.01 1.954.1 3.714 1.22 4.601 3.01.896 1.81.846 4.17-.514 6.67z"></path></g></svg></div><div class="css-175oi2r r-xoduu5 r-1udh08x"><span data-testid="app-text-transition-container" style="transform: translate3d(0px, 0px, 0px); transition-property: transform; transition-duration: 0.3s;"><span class="css-1jxf684 r-1ttztb7 r-qvutc0 r-poiln3 r-n6v787 r-1cwl3u0 r-1k6nrdp r-n7gxbd" style="text-overflow: unset;"><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" style="text-overflow: unset;">
5:
67
6:
</span></span></span></div></div></button></div><div class="css-175oi2r r-18u37iz r-1h0z5md r-13awgt0"><a href="/LukeW/status/1790446293568962910/analytics" aria-label="28501 views. View post analytics" role="link" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1ny4l3l r-1loqt21"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M8.75 21V3h2v18h-2zM18 21V8.5h2V21h-2zM4 21l.004-10h2L6 21H4zm9.248 0v-7h2v7h-2z"></path></g></svg></div><div class="css-175oi2r r-xoduu5 r-1udh08x"><span data-testid="app-text-transition-container" style="transform: translate3d(0px, 0px, 0px); transition-property: transform; transition-duration: 0.3s;"><span class="css-1jxf684 r-1ttztb7 r-qvutc0 r-poiln3 r-n6v787 r-1cwl3u0 r-1k6nrdp r-n7gxbd" style="text-overflow: unset;"><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3" style="text-overflow: unset;">
7:
27K
8:
</span></span></span></div></div></a></div><div class="css-175oi2r r-18u37iz r-1h0z5md r-1wron08"><button aria-label="Bookmark" role="button" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1loqt21 r-1ny4l3l" data-testid="bookmark" type="button"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M4 4.5C4 3.12 5.119 2 6.5 2h11C18.881 2 20 3.12 20 4.5v18.44l-8-5.71-8 5.71V4.5zM6.5 4c-.276 0-.5.22-.5.5v14.56l6-4.29 6 4.29V4.5c0-.28-.224-.5-.5-.5h-11z"></path></g></svg></div></div></button></div><div class="css-175oi2r" style="justify-content: inherit; display: inline-grid; transform: rotate(0deg) scale(1) translate3d(0px, 0px, 0px);"><div class="css-175oi2r r-18u37iz r-1h0z5md"><button aria-expanded="false" aria-haspopup="menu" aria-label="Share post" role="button" class="css-175oi2r r-1777fci r-bt1l66 r-bztko3 r-lrvibr r-1loqt21 r-1ny4l3l" type="button"><div dir="ltr" class="css-146c3p1 r-bcqeeo r-1ttztb7 r-qvutc0 r-37j5jr r-a023e6 r-rjixqe r-16dba41 r-1awozwy r-6koalj r-1h0z5md r-o7ynqc r-clp7b1 r-3s2u2q" style="text-overflow: unset; color: rgb(113, 118, 123);"><div class="css-175oi2r r-xoduu5"><div class="css-175oi2r r-xoduu5 r-1p0dtai r-1d2f490 r-u8s1d r-zchlnj r-ipm5af r-1niwhzg r-sdzlij r-xf4iuw r-o7ynqc r-6416eg r-1ny4l3l"></div><svg viewBox="0 0 24 24" aria-hidden="true" class="r-4qtqp9 r-yyyyoo r-dnmrzs r-bnwqim r-lrvibr r-m6rgpd r-1xvli5t r-1hdv0qi"><g><path d="M12 2.59l5.7 5.7-1.41 1.42L13 6.41V16h-2V6.41l-3.3 3.3-1.41-1.42L12 2.59zM21 15l-.02 3.51c0 1.38-1.12 2.49-2.5 2.49H5.5C4.11 21 3 19.88 3 18.5V15h2v3.5c0 .28.22.5.5.5h12.98c.28 0 .5-.22.5-.5L19 15h2z"></path></g></svg></div></div></button></div></div></div></div></div>

Results

Coming soon...

Artifacts