Managing RAM & avoiding heap fragmentation on CircuitPython

This is a general CircuitPython / MicroPython memory note (not specific to games) — it applies to any long-running program that handles large buffers: graphics surfaces, network responses, file/audio streams, data parsing, etc. The technique at the end (a pre-allocated arena) is broadly reusable.

The trap: `gc.mem_free()` lies (it’s not the largest free block)

MicroPython/CircuitPython use a non-moving mark-and-sweep GC: it frees unreachable objects but never moves live ones (objects are referenced by raw pointers, and the C stack is scanned conservatively — so relocating them safely isn’t possible). Adjacent free blocks are merged on gc.collect(), but free space split by live objects stays split.

Consequence: after a program has allocated and freed many differently-sized buffers, the heap fragments. You can have lots of total free RAM but no single contiguous block big enough for the next large allocation:

gc.mem_free() -> 90000      # 90 KB free...
bytearray(51200)            # ...but this raises MemoryError (no 51 KB contiguous run)

gc.mem_free() is total free; what a big allocation needs is the largest contiguous free block, which can be far smaller and which shrinks as a session fragments.

Measuring the largest contiguous block

There’s no built-in for it; binary-search it (after gc.collect()):

import gc
def largest_block():
    gc.collect()
    lo, hi = 0, gc.mem_free()
    while hi - lo > 256:
        m = (lo + hi) // 2
        try:
            b = bytearray(m); del b; lo = m
        except MemoryError:
            hi = m
        gc.collect()
    return lo

import micropython; micropython.mem_info(1) dumps the full heap map (what’s live and where) if you need to see why it’s fragmented.

When it bites

Any pattern that repeatedly allocates and frees a large buffer during one run:

Networking / web: reading an HTTP response, a JSON/MQTT payload, a TLS record, an image download — each request grabbing (and freeing) a fresh kilobyte-scale buffer.
File / stream processing: reading a file in chunks, decompressing, parsing.
Audio: per-clip sample buffers.
Graphics: full-/large-screen drawing surfaces (e.g. a displayio/picogame Canvas) created per screen/level.

A single big buffer allocated once at boot and kept forever is fine (it gets a contiguous block while the heap is fresh). The problem is the churn.

The fix: a pre-allocated arena

Grab one big buffer once, early (when the heap is fresh and contiguous), then hand out slices of it for the large transient buffers. Those buffers then never alloc/free at runtime, so they can’t fragment anything. Reuse the same arena bytes for work that doesn’t overlap in time.

lib/picogame_arena.py is a tiny, general implementation (it’s in the picogame lib but the Arena class is not game-specific):

import picogame_arena
AR = picogame_arena.Arena(4096)        # 4096 bytes, grabbed up front (size = your max)

# --- networking example: reuse ONE response buffer instead of churning ---
buf = AR.alloc(4096)                   # a memoryview slice, no per-request alloc
while True:
    AR.reset()                         # reuse the same bytes each request
    n = sock.recv_into(buf)            # read straight into the arena slice
    process(buf[:n])                   # parse without allocating another big buffer

# --- graphics example (picogame): back big Canvases with arena memory ---
AR = picogame_arena.Arena(320 * 80)    # pixels (x2 bytes); the biggest surface you need
AR.reset(); road = AR.canvas(320, 80)          # one screen's big surface
# later, a different screen (not alive at the same time) reuses the same arena:
AR.reset(); shapes = AR.canvas(320, 44); btn = AR.canvas(160, 48)

API: Arena(pixels) (allocates pixels*2 bytes), alloc(nbytes) -> memoryview, canvas(w, h, transparent=None) -> Canvas (needs the firmware Canvas(..., buffer=) arg), reset() (rewind the cursor — call at the start of each non-overlapping use), free().

Key point: the arena makes the big allocation happen once at startup and the slices never touch the heap — so a session can run indefinitely without the “90 KB free but can’t allocate 51 KB” failure.

Other techniques (combine as needed)

Allocate big/long-lived buffers first, at boot, and keep them — don’t free and re-create them per iteration.
Object pools for many small same-size objects (e.g. sprites, requests) — reuse instead of alloc/free churn. (picogame: picogame_pool.)
recv_into / readinto (and similar *_into APIs) read into an existing buffer instead of allocating a new bytes object each call.
gc.collect() at natural boundaries (end of a request/level) to merge adjacent free blocks — necessary but not sufficient (it can’t move live objects).
gc.threshold(n) to trigger GC earlier and keep the heap tidier.
CircuitPython already relocates import-time “long-lived” objects to the end of the heap on the first GC, keeping the low heap contiguous for working allocations — so importing your modules up front (not lazily, mid-run) helps.

Why not just defragment?

A true compacting/defragmenting GC isn’t feasible as an add-on: MicroPython objects reference each other by raw pointers (in Python, in C modules, in bytecode), and the GC scans the C stack conservatively — so it cannot safely move an object and rewrite every reference to it. That would require a different (precise / handle-based) object model in the VM core. The arena pattern is the practical answer: don’t let the big buffers churn in the first place.

See also the engine’s Canvas(..., buffer=) argument (back a drawing surface with arena memory) and the helper picogame_pool (object pools). Build and measure in the desktop simulator first; optimise only once you’ve measured where the RAM actually goes.