Devlog
This page contains a curated list of recent changes to main branch Zig.
Also available as an RSS feed.
This page contains entries from the current year. Previous years are available in the Devlog archive page.
Improved UBSan Error Messages
Author: David Rubin
Lately, I've been extensively working with C interop, and one thing that's been sorely missing is clear error messages from UBSan. When compiling C with zig cc
, Zig provides better defaults, including implicitly enabling -fsanitize=undefined
. This has been great for catching subtle bugs and makes working with C more bearable. However, due to the lack of a UBSan runtime, all undefined behavior was previously caught with a trap
instruction.
For example, consider this example C program:
#include <stdio.h>
int foo(int x, int y) {
return x + y;
}
int main() {
int result = foo(0x7fffffff, 0x7fffffff);
printf("%d\n", result);
}
Running this with zig cc
used to result in an unhelpful error:
$ zig run test.c -lc
fish: Job 1, 'zig run empty.c -lc' terminated by signal SIGILL (Illegal instruction)
Not exactly informative! To understand what went wrong, you'd have to run the executable in a debugger. Even then, tracking down the root cause could be daunting. Many newcomers ran into this Illegal instruction
error without realizing that UBSan was enabled by default, leading to confusion. This issue was common enough to warrant a dedicated Wiki page.
With the new UBSan runtime merged, the experience has completely changed. Now instead of an obscure SIGILL
, you get a much more helpful error message:
$ zig run test.c -lc
thread 208135 panic: signed integer overflow: 2147483647 + 2147483647 cannot be represented in type 'int'
/home/david/Code/zig/build/test.c:4:14: 0x1013e41 in foo (test.c)
return x + y;
^
/home/david/Code/zig/build/test.c:8:18: 0x1013e63 in main (test.c)
int result = foo(0x7fffffff, 0x7fffffff);
^
../sysdeps/nptl/libc_start_call_main.h:58:16: 0x7fca4c42e1c9 in __libc_start_call_main (../sysdeps/x86/libc-start.c)
../csu/libc-start.c:360:3: 0x7fca4c42e28a in __libc_start_main_impl (../sysdeps/x86/libc-start.c)
???:?:?: 0x1013de4 in ??? (???)
???:?:?: 0x0 in ??? (???)
fish: Job 1, 'zig run test.c -lc' terminated by signal SIGABRT (Abort)
Now, not only do we see what went wrong (signed integer overflow), but we also see where it happened -- two critical pieces of information that were previously missing.
Remaining Limitations
While the new runtime vastly improves debugging, there are still two features that LLVM's UBSan runtime provides which ours doesn't support yet:
- In C++, UBSan can detect when an object's vptr indicates the wrong dynamic type or when its lifetime hasn't started. Supporting this would require replicating the Itanium C++ ABI, which isn't worth the extreme complexity.
- Currently, the runtime doesn't show the exact locations of attributes like
assume_aligned
and__nonnull
. This should be relatively straightforward to add, and contributions are welcome!
If you've ever been frustrated by cryptic SIGILL
errors while trying out Zig, this update should make debugging undefined behavior a lot easier!
No-Libc Zig Now Outperforms Glibc Zig
Author: Andrew Kelley
Alright, I know I'm supposed to be focused on issue triage and merging PRs for the upcoming release this month, but in my defense, I do some of my best work while procrastinating.
Jokes aside, this week we had CI failures due to Zig's debug allocator creating too many memory mappings. This was interfering with Jacob's work on the x86 backend, so I spent the time to rework the debug allocator.
Since this was a chance to eliminate the dependency on a compile-time known page size, I based my work on contributor archbirdplus's patch to add runtime-known page size support to the Zig standard library. With this change landed, it means Zig finally works on Asahi Linux. My fault for originally making page size compile-time known. Sorry about that!
Along with detecting page size at runtime, the new implementation no longer memsets each page to 0xaa bytes then back to 0x00 bytes, no longer searches when freeing, and no longer depends on a treap data structure. Instead, the allocation metadata is stored inline, on the page, using a pre-cached lookup table that is computed at compile-time:
/// This is executed only at compile-time to prepopulate a lookup table.
fn calculateSlotCount(size_class_index: usize) SlotIndex {
const size_class = @as(usize, 1) << @as(Log2USize, @intCast(size_class_index));
var lower: usize = 1 << minimum_slots_per_bucket_log2;
var upper: usize = (page_size - bucketSize(lower)) / size_class;
while (upper > lower) {
const proposed: usize = lower + (upper - lower) / 2;
if (proposed == lower) return lower;
const slots_end = proposed * size_class;
const header_begin = mem.alignForward(usize, slots_end, @alignOf(BucketHeader));
const end = header_begin + bucketSize(proposed);
if (end > page_size) {
upper = proposed - 1;
} else {
lower = proposed;
}
}
const slots_end = lower * size_class;
const header_begin = mem.alignForward(usize, slots_end, @alignOf(BucketHeader));
const end = header_begin + bucketSize(lower);
assert(end <= page_size);
return lower;
}
It's pretty nice because you can tweak some global constants and then get optimal slot sizes. That assert at the end means if the constraints could not be satisfied you get a compile error. Meanwhile in C land, equivalent code has to resort to handcrafted lookup tables. Just look at the top of malloc.c from musl:
const uint16_t size_classes[] = {
1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 12, 15,
18, 20, 25, 31,
36, 42, 50, 63,
72, 84, 102, 127,
146, 170, 204, 255,
292, 340, 409, 511,
584, 682, 818, 1023,
1169, 1364, 1637, 2047,
2340, 2730, 3276, 4095,
4680, 5460, 6552, 8191,
};
Not nearly as nice to experiment with different size classes. The water's warm, Rich, come on in! 😛
Anyway, as a result of reworking this allocator, not only does it work with runtime-known page size, and avoid creating too many memory mappings, it also performs significantly better than before. The motivating test case for these changes was this degenerate ast-check task, with a debug compiler:
Benchmark 1 (3 runs): master/bin/zig ast-check ../lib/compiler_rt/udivmodti4_test.zig
measurement mean ± σ min … max outliers delta
wall_time 22.8s ± 184ms 22.6s … 22.9s 0 ( 0%) 0%
peak_rss 58.6MB ± 77.5KB 58.5MB … 58.6MB 0 ( 0%) 0%
cpu_cycles 38.1G ± 84.7M 38.0G … 38.2G 0 ( 0%) 0%
instructions 27.7G ± 16.6K 27.7G … 27.7G 0 ( 0%) 0%
cache_references 1.08G ± 4.40M 1.07G … 1.08G 0 ( 0%) 0%
cache_misses 7.54M ± 1.39M 6.51M … 9.12M 0 ( 0%) 0%
branch_misses 165M ± 454K 165M … 166M 0 ( 0%) 0%
Benchmark 2 (3 runs): branch/bin/zig ast-check ../lib/compiler_rt/udivmodti4_test.zig
measurement mean ± σ min … max outliers delta
wall_time 20.5s ± 95.8ms 20.4s … 20.6s 0 ( 0%) ⚡- 10.1% ± 1.5%
peak_rss 54.9MB ± 303KB 54.6MB … 55.1MB 0 ( 0%) ⚡- 6.2% ± 0.9%
cpu_cycles 34.8G ± 85.2M 34.7G … 34.9G 0 ( 0%) ⚡- 8.6% ± 0.5%
instructions 25.2G ± 2.21M 25.2G … 25.2G 0 ( 0%) ⚡- 8.8% ± 0.0%
cache_references 1.02G ± 195M 902M … 1.24G 0 ( 0%) - 5.8% ± 29.0%
cache_misses 4.57M ± 934K 3.93M … 5.64M 0 ( 0%) ⚡- 39.4% ± 35.6%
branch_misses 142M ± 183K 142M … 142M 0 ( 0%) ⚡- 14.1% ± 0.5%
I didn't stop there, however. Even though I had release tasks to get back to, this left me itching to make a fast allocator - one that was designed for multi-threaded applications built in ReleaseFast mode.
It's a tricky problem. A fast allocator needs to avoid contention by storing thread-local state, however, it does not directly learn when a thread exits, so one thread must periodically attempt to reclaim another thread's resources. There is also the producer-consumer pattern - one thread only allocates while one thread only frees. A naive implementation would never reclaim this memory.
Inspiration struck, and 200 lines of code later I had a working implementation... after Jacob helped me find a couple logic bugs.
I created Where in the World Did Carmen's Memory Go? and used it to test a couple specific usage patterns. Idea here is to over time collect a robust test suite, do fuzzing, benchmarking, etc., to make it easier to try out new Allocator ideas in Zig.
After getting good scores on those contrived tests, I turned to the real world use cases of the Zig compiler itself. Since it can be built with and without libc, it's a great way to test the performance delta between the two.
Here's that same degenerate case above, but with a release build of the compiler - glibc zig vs no libc zig:
Benchmark 1 (32 runs): glibc/bin/zig ast-check ../lib/compiler_rt/udivmodti4_test.zig
measurement mean ± σ min … max outliers delta
wall_time 156ms ± 6.58ms 151ms … 173ms 4 (13%) 0%
peak_rss 45.0MB ± 20.9KB 45.0MB … 45.1MB 1 ( 3%) 0%
cpu_cycles 766M ± 10.2M 754M … 796M 0 ( 0%) 0%
instructions 3.19G ± 12.7 3.19G … 3.19G 0 ( 0%) 0%
cache_references 4.12M ± 498K 3.88M … 6.13M 3 ( 9%) 0%
cache_misses 128K ± 2.42K 125K … 134K 0 ( 0%) 0%
branch_misses 1.14M ± 215K 925K … 1.43M 0 ( 0%) 0%
Benchmark 2 (34 runs): SmpAllocator/bin/zig ast-check ../lib/compiler_rt/udivmodti4_test.zig
measurement mean ± σ min … max outliers delta
wall_time 149ms ± 1.87ms 146ms … 156ms 1 ( 3%) ⚡- 4.9% ± 1.5%
peak_rss 39.6MB ± 141KB 38.8MB … 39.6MB 2 ( 6%) ⚡- 12.1% ± 0.1%
cpu_cycles 750M ± 3.77M 744M … 756M 0 ( 0%) ⚡- 2.1% ± 0.5%
instructions 3.05G ± 11.5 3.05G … 3.05G 0 ( 0%) ⚡- 4.5% ± 0.0%
cache_references 2.94M ± 99.2K 2.88M … 3.36M 4 (12%) ⚡- 28.7% ± 4.2%
cache_misses 48.2K ± 1.07K 45.6K … 52.1K 2 ( 6%) ⚡- 62.4% ± 0.7%
branch_misses 890K ± 28.8K 862K … 1.02M 2 ( 6%) ⚡- 21.8% ± 6.5%
Outperforming glibc!
And finally here's the entire compiler building itself:
Benchmark 1 (3 runs): glibc/bin/zig build -Dno-lib -p trash
measurement mean ± σ min … max outliers delta
wall_time 12.2s ± 99.4ms 12.1s … 12.3s 0 ( 0%) 0%
peak_rss 975MB ± 21.7MB 951MB … 993MB 0 ( 0%) 0%
cpu_cycles 88.7G ± 68.3M 88.7G … 88.8G 0 ( 0%) 0%
instructions 188G ± 1.40M 188G … 188G 0 ( 0%) 0%
cache_references 5.88G ± 33.2M 5.84G … 5.90G 0 ( 0%) 0%
cache_misses 383M ± 2.26M 381M … 385M 0 ( 0%) 0%
branch_misses 368M ± 1.77M 366M … 369M 0 ( 0%) 0%
Benchmark 2 (3 runs): SmpAllocator/fast/bin/zig build -Dno-lib -p trash
measurement mean ± σ min … max outliers delta
wall_time 12.2s ± 49.0ms 12.2s … 12.3s 0 ( 0%) + 0.0% ± 1.5%
peak_rss 953MB ± 3.47MB 950MB … 957MB 0 ( 0%) - 2.2% ± 3.6%
cpu_cycles 88.4G ± 165M 88.2G … 88.6G 0 ( 0%) - 0.4% ± 0.3%
instructions 181G ± 6.31M 181G … 181G 0 ( 0%) ⚡- 3.9% ± 0.0%
cache_references 5.48G ± 17.5M 5.46G … 5.50G 0 ( 0%) ⚡- 6.9% ± 1.0%
cache_misses 386M ± 1.85M 384M … 388M 0 ( 0%) + 0.6% ± 1.2%
branch_misses 377M ± 899K 377M … 378M 0 ( 0%) 💩+ 2.6% ± 0.9%
I feel that this is a key moment in the Zig project's trajectory. This last piece of the puzzle marks the point at which the language and standard library has become strictly better to use than C and libc.
While other languages build on top of libc, Zig instead has conquered it!
LLDB Fork for Zig
Author: Alex Rønne Petersen
One of the major things Jacob has been working on is good debugging support for Zig. This includes an LLDB fork with enhancements for the Zig language, and is primarily intended for use with Zig's self-hosted backends. With the self-hosted x86_64 backend becoming much more usable in the upcoming 0.14.0 release, I decided to type up a wiki page with instructions for building and using the fork.
If you're already trying out Zig's self-hosted backend in your workflow, please take the LLDB fork for a spin and see how it works for you.