KernRift | Bare-Metal Systems Language

KernRift is a self-hosting systems language compiler written entirely in KernRift — ~226K tokens of source across 19 files, plus an 18-module standard library. The compiler ships with an SSA-based IR backend: AST lowers to a target-independent intermediate representation, liveness analysis runs, a graph-coloring register allocator with Briggs/George copy coalescing assigns physical registers, an optimizer does constant folding / DCE / CSE / LICM and an AST-level function inliner with cost-aware rotation-shape detection, and dedicated emitters produce native x86_64 and AArch64 machine code. No Rust, no C, no LLVM, no external assembler — the compiler emits raw machine bytes directly. By default, krc produces BCJ+LZ-Rift-compressed fat binaries containing 8 platform slices — Linux, Windows, macOS, and Android, each with x86_64 and/or AArch64 native code. The compiler self-hosts on all 8 targets; CI verifies bootstrap fixed point and runs 448 tests on every push. Recent work: v2.8.25+ added eight codegen peepholes (LEA-immediate, LEA[base+idx×scale], 3-operand IMUL, CMP-with-immediate, 32-bit ROR pattern recognition, w32-clean mask elimination, FFMA fusion, and a cost-aware rotation-shape inliner) that dropped sort runtime 29 % and sha-256 29 % vs v2.8.24, plus shrunk the self-host binary 10 %. Earlier v2.8.23 fixed three IR-emitter memory leaks that dropped peak self-compile RSS by 96–99 % (single-arch 806 MB → 33 MB; fat 6.3 GB → 87 MB), making fat self-compile feasible on a 4 GB Pi 400. Kernel-first primitives include device blocks for typed MMIO, load/store/vload/vstore pointer builtins, slice parameters [T] name with .len, static and struct arrays, inline assembly, signed comparisons, atomic operations, bitfield operations, and --freestanding mode.

Self-Hosting Compiler

KernRift compiles itself. The compiler is ~226K tokens of KernRift across 18 source files, plus 18 stdlib modules. It achieves a bootstrap fixed point — compiling itself twice produces bit-identical binaries, verified on all 8 platform targets (Linux x86_64/ARM64, macOS ARM64/x86_64, Windows x86_64/ARM64, Android ARM64/x86_64) with 448 tests passing on every CI run. No LLVM, no Rust, no C, no external assembler — just KernRift all the way down.

Stage	Input	Output	Verified
krc → krc2	~45,000 lines (.kr)	~1.10 MB native ELF (IR, default) / ~1.19 MB (legacy)	✓
krc2 → krc3	~45,000 lines (.kr)	identical output to krc2	✓
krc3 → krc4	~45,000 lines (.kr)	identical output to krc3	krc3 == krc4 ✓
Self-compile time	~45,000 lines	~180ms (legacy) / ~1.05s (IR + optimizer, default)	~8.2s fat binary (8 slices, 3.81 MB) ✓
Cross-compile	x86_64 host	ARM64 + Windows PE + macOS Mach-O	Tested on all platforms ✓
Fat binary	any .kr file	.krbo (8 slices, BCJ+LZ-Rift)	Default output ✓

Benchmarks vs. C and Rust

KernRift produces the smallest binaries and compiles 20-90× faster than gcc/rustc. Runtime is competitive with unoptimized C. All measurements on AMD Ryzen 9 7900X, Linux 6.17, gcc 13.3, rustc 1.93.

Compile Time

Benchmark	krc	gcc -O0	gcc -O2	rustc	rustc -O2
Fibonacci (recursive, fib(40))	1ms	68ms	44ms	158ms	79ms
Sort (bubble 10K ints)	2ms	32ms	32ms	115ms	94ms
Sieve (primes to 10⁶)	2ms	27ms	30ms	77ms	89ms
Matrix Multiply (200×200 int)	2ms	32ms	31ms	71ms	85ms
Self-compile krc (legacy / IR default)	~180ms / ~1.05s	N/A

Binary Size

Benchmark	krc	gcc -O0	gcc -O2	rustc	rustc -O2
Fibonacci	176 B	15.8 KB	15.8 KB	3.9 MB	3.9 MB
Sort (bubble)	352 B	16.0 KB	16.0 KB	3.9 MB	3.9 MB
Sieve	344 B	16.0 KB	16.0 KB	3.9 MB	3.9 MB
Matrix Multiply	992 B	16.0 KB	16.0 KB	3.9 MB	3.9 MB
Self-compiled compiler (x86_64 IR default / legacy)	~1.10 MB / ~1.19 MB	N/A
Self-compiled fat binary (all 8 platforms)	~3.81 MB	BCJ + LZ-Rift compressed

Runtime (median of 3)

Benchmark	krc	gcc -O0	gcc -O2	rustc	rustc -O2
Fibonacci fib(40)	416ms	385ms	79ms	387ms	165ms
Sort (bubble 10K)	79ms	155ms	274ms	2,657ms	45ms
Sieve (primes to 10⁶)	3ms	4ms	2ms	21ms	2ms
Matrix 200×200	26ms	16ms	4ms	129ms	3ms

krc's SSA IR backend uses a graph-coloring register allocator (6 colors on x86_64, 10 on AArch64, Briggs/George copy coalescing, partial used-callee-save prologue, cross-register spill-reload peephole) plus constant folding / DCE / CSE / LICM, an AST-level function inliner with cost-aware rotation-shape detection, and codegen peepholes for LEA-immediate / LEA[base+idx×scale] / 3-operand IMUL / CMP-with-immediate / 32-bit ROR recognition / FFMA fusion. No auto-vectorization. v2.8.25+ codegen peepholes dropped sort 29% and sha-256 29% vs v2.8.24. Compile time and binary size are where krc excels; runtime is competitive with unoptimized C. Measurements on AMD Ryzen 9 7900X, gcc 13.3, rustc 1.93.

Self-Compilation Across Platforms

Self-compile = krc rebuilding its own ~226K-token source. Single-arch produces a native binary for the host architecture; fat binary bundles all 8 platform slices into one BCJ+LZ-Rift-compressed .krbo. v2.8.23 fixed three per-function memory leaks in the IR emitter and made the liveness scratch buffer reusable across basic blocks, dropping peak RSS ~96-99% vs v2.8.22 and making fat self-compile feasible on a 4 GB Raspberry Pi 400 (was OOM-bound). Peak RSS is the high-water mark of the compiler process; the resident set never grows beyond it during the run.

Platform / Device	CPU / RAM	Single Arch (time / peak RSS)	Fat Binary (time / peak RSS)
Linux x86_64 (desktop)	AMD Ryzen 9 7900X (12c/24t, 32 GB)	1.06 s / 33 MB	8.4 s / 87 MB
Windows 11 x86_64 (laptop)	Intel Core Ultra 9 275HX (24c, 64 GB)	2.04 s / 38 MB	15.2 s / 91 MB
Linux ARM64 (Raspberry Pi 400)	Cortex-A72 @ 1.8 GHz (4c, 4 GB)	23.8 s / 33 MB	3 min 11 s / 87 MB
Android ARM64 (Redmi Note 8 Pro)	MediaTek Helio G90T (2× A76 + 6× A55, 6 GB)	19.5 s / 33 MB	2 min 35 s / 80 MB
Android ARM64 (Galaxy Z Fold 5)	Snapdragon 8 Gen 2 (1×X3 + 4×A715/A710 + 3×A510, 12 GB)	6.84 s / 33 MB	52.9 s / 80 MB

How Other Compilers Compare (Self-Build)

Compiler	Self-Build Time	Binary Size	Source
KernRift krc (IR + optimizer, default)	1.05s	~1.10 MB	measured (Ryzen 9 7900X)
KernRift krc (legacy, --legacy flag)	180ms	~1.19 MB	measured (Ryzen 9 7900X)
TCC	<1s (est.)	~100 KB	bellard.org/tcc
Go toolchain	~1-3 min	~50 MB	go.dev/rebuild
LLVM + Clang	~4-5 min	~50 MB	OpenBenchmarking (Ryzen 9 7950X)
rustc (stage 2)	~6-8 min	~80 MB	dtolnay/buck2-rustc-bootstrap
GCC (3-stage)	~20-90 min	~30 MB (cc1)	OpenBenchmarking (220+ runs avg)

krc is not directly comparable to production compilers — it has an SSA IR with constant-folding / DCE / CSE / LICM, an AST inliner with rotation-shape cost model, codegen peepholes (LEA-imm, LEA[base+idx×scale], 3-operand IMUL, CMP-imm, 32-bit ROR, w32-clean mask elim, FFMA fusion), and a graph-coloring register allocator with Briggs/George copy coalescing, but no auto-vectorizer and no external linker in the loop. The comparison shows where a short, self-contained compiler sits on the build-time spectrum. TCC is the closest analog. External data is from public benchmarks on comparable hardware; see linked sources.

~45,000 lines, ~1.10 MB self-compiled binary. 448 tests. Bootstrap fixed point verified on all 8 targets (Linux, macOS, Windows, Android × x86_64, ARM64). Fat binary: 8 slices, BCJ+LZ-Rift compressed.

Self-Hosting

The compiler is written in KernRift and compiles itself. After a one-time bootstrap from the Rust bootstrap compiler, krc is fully self-sustaining. No external toolchain needed.

Universal Fat Binaries

By default, krc bundles 8 platform slices (Linux, Windows, macOS, Android × x86_64 + ARM64) into a single BCJ+LZ-Rift-compressed .krbo file. LZ-Rift compression uses 24-bit offsets, 65K hash tables, and lazy matching for arch-pair blobs. Use --arch=x86_64 or --arch=arm64 for a single-architecture native binary.

Device Blocks for MMIO

device UART0 at 0x3F201000 { Data at 0x00 : u32 } declares a hardware register set. Reads and writes to UART0.Data compile directly to volatile loads and stores with the right width, plus the appropriate memory barrier — mfence on x86_64, DSB SY on ARM64.

Clean Pointer Builtins

load8/16/32/64(addr) and store8/16/32/64(addr, val) replace the verbose unsafe { *(addr as uint32) = val } form. Volatile variants vload*/vstore* add memory barriers for MMIO. Same codegen, much cleaner to read.

Slice Parameters

fn foo([u8] data) takes a fat pointer (ptr, len). Inside the function, data.len reads the length and data is a plain pointer for indexing. Callers pass two arguments. Classic C (ptr, len) idiom with a nicer symbolic name.

Static & Struct Arrays

static u8[4096] page gives you a zero-initialized data-section buffer. Point[10] pts gives you a fixed array of struct instances with full pts[i].field syntax. Both work locally and at module scope.

Zero Dependencies

The compiler is a single static binary. It produces native ELF, PE, and Mach-O executables without cc, ld, ar, or any external tool. Each output binary is fully static — no libc, no dynamic linking, no runtime.

Cross-Platform Output

8 platform targets from any host: ELF (Linux + Android), PE/COFF (Windows), and Mach-O (macOS), each for x86_64 and AArch64. On Windows, install.ps1 sets up the toolchain and kr.exe runs fat binaries natively. KrboFat containers, AR archives, and KRBO portable objects.

Standard Library

18 modules (~4,100 lines) covering strings, I/O, math, formatting, memory management (bump-allocated arenas with guard pages, fixed-size pools with double-free detection — std/alloc.kr), dynamic arrays, hash maps, colors, fixed-point arithmetic, fast memory operations, framebuffer graphics, font rendering, UI widgets, time/clock access, structured logging, floating-point math, and raw socket networking. Import with import "std/string.kr" — the compiler resolves stdlib paths automatically via ~/.local/share/kernrift/.

VS Code LSP

First-class editor support via the KernRift VS Code extension (v0.2.3). Includes syntax highlighting, diagnostics from krc check, completions, hover documentation, and go-to-definition. Available on the VS Code Marketplace.

Import System

import "file.kr" brings in declarations from other source files with recursive dependency resolution and stdlib search paths. No more concatenation — the compiler resolves the dependency graph automatically.

Match Statements

match expr { val => { body } ... } for clean multi-way branching. Combined with enums, match provides exhaustive pattern handling for state machines and dispatch logic.

Methods & Short Aliases

fn Point.sum(Point self) attaches a method to a struct; call it with p.sum(). Short type aliases u8/u16/u32/u64 and i8..i64 are synonyms for the long forms. Nested struct access (a.b.c) works naturally.

Inline Assembly

asm("cli") or asm { "cli"; "sti" } emits raw machine instructions. Supports x86_64 privileged instructions (cr0/cr3, lgdt, lidt, wrmsr, cpuid, in/out) and ARM64 system instructions (msr, mrs, svc, wfi, dsb, dmb). Raw hex bytes for anything else.

Kernel Annotations

@naked functions skip prologue/epilogue — pure assembly bodies for ISR entry points. @noreturn marks diverging functions. @packed structs for hardware register layouts. @section(".text.init") for linker section placement.

Bitfield & Signed Ops

bit_get, bit_set, bit_clear, bit_range, bit_insert for hardware register manipulation. signed_lt/gt/le/ge for signed comparisons (the default <, >, <=, >= operators are unsigned). Stack size warnings at compile time.

Freestanding Mode

krc --freestanding disables the startup trampoline, auto-exit insertion, and OS-specific syscall wrappers — producing bare-metal code ready for kernel entry points, bootloaders, and embedded firmware.

Atomic Operations

Lock-free primitives for concurrent data structures: atomic_load, atomic_store, atomic_cas (compare-and-swap), atomic_add, atomic_sub, atomic_and, atomic_or, and atomic_xor. Compiled to native LOCK-prefixed instructions on x86_64 and LDXR/STXR exclusive pairs on ARM64.

Floating-Point & Multi-Return

f32 and f64 types with full arithmetic, comparisons, conversions, and a math library (sin, cos, exp, log, pow, sqrt, fmt_f64). f16 for storage. Hardware sqrt, software trig/exp/log. Multi-return with return (a, b) and 2-tuple destructuring (u64 q, u64 r) = divmod(17, 5). Inline asm I/O constraints: asm { "rdtsc" } out(rax -> lo, rdx -> hi).

Framebuffer & UI

Stdlib modules for bare-metal graphics: std/fb.kr for framebuffer pixel, line, and rectangle drawing; std/font.kr for bitmap font rendering; std/widget.kr for panels, labels, buttons, progress bars, and text fields.

Quickstart

No dependencies required. The compiler is a single static binary. No Rust, no C compiler, no linker needed.

Install

# Linux / macOS (installs krc, kr, and stdlib to ~/.local/)
curl -sSf https://raw.githubusercontent.com/Rift-Intelligence/KernRift/main/install.sh | sh

# Windows PowerShell (installs krc.exe, kr.exe, and stdlib to %LOCALAPPDATA%\KernRift\)
irm https://raw.githubusercontent.com/Rift-Intelligence/KernRift/main/install.ps1 | iex

# Homebrew (macOS / Linux)
brew install kernrift

# Scoop (Windows)
scoop bucket add kernrift https://github.com/Rift-Intelligence/KernRift
scoop install kernrift

# Winget (Windows)
winget install Pantelis23.KernRift

# AUR (Arch Linux)
yay -S kernrift

# Or download directly (x86_64 / ARM64)
curl -L -o krc https://github.com/Rift-Intelligence/KernRift/releases/latest/download/krc-linux-x86_64
curl -L -o kr  https://github.com/Rift-Intelligence/KernRift/releases/latest/download/kr
chmod +x krc kr && sudo mv krc kr /usr/local/bin/

fibonacci.kr — Recursive Functions

fn fib(uint64 n) -> uint64 {
    if n <= 1 { return n }
    return fib(n - 1) + fib(n - 2)
}

struct Point {
    uint64 x
    uint64 y
}

static uint64 counter = 0

fn main() {
    Point p
    p.x = fib(10)     // 55
    p.y = 42
    uint64 msg = "KernRift!\n"
    write(1, msg, 10)
    exit(p.x + p.y)   // 97
}

memory.kr — Unsafe Pointer Operations

// Direct memory access for kernel development
fn main() {
    uint64 buf = alloc(4096)

    // Write a uint32 to memory
    unsafe { *(buf as uint32) = 0xDEADBEEF }

    // Read it back
    uint32 val = 0
    unsafe { *(buf as uint32) -> val }

    // Array operations
    uint8[256] table
    uint64 i = 0
    while i < 256 {
        table[i] = i
        i += 1
    }
    exit(table[42])  // 42
}

kernel.kr — Inline Assembly & Kernel Features

// Naked ISR entry — no compiler-generated prologue
@naked fn isr_timer() {
    asm { "cli"; "nop"; "sti"; "iretq" }
}

// Hardware register manipulation with bitfields
fn enable_paging(uint64 cr0_val) -> uint64 {
    uint64 pg = bit_set(cr0_val, 31)   // set PG bit
    uint64 pe = bit_set(pg, 0)          // set PE bit
    return pe
}

// Signed comparisons for kernel math
fn clamp_signed(uint64 val, uint64 lo, uint64 hi) -> uint64 {
    if signed_lt(val, lo) { return lo }
    if signed_gt(val, hi) { return hi }
    return val
}

Compile & Run

# Compile to fat binary (8 platform slices, BCJ+LZ-Rift-compressed)
$ krc program.kr -o program.krbo

# Run on any platform
$ kr program.krbo

# Or compile for a single architecture and run directly
$ krc --arch=x86_64 program.kr -o program
$ ./program

# Safety analysis
$ krc check module.kr

# Living compiler — patterns + fitness score
$ krc lc program.kr
=== KernRift Living Compiler Report ===
Functions: 12
Calls: 45
Unsafe ops: 3
Patterns found: 2
Fitness score: 85/100

# The compiler compiles itself
$ krc --arch=x86_64 krc-source.kr -o krc