Modern C++ Firmware: Proven Strategies for Tiny, Critical Systems (Part 9/10)

Observability Belongs on the PC, Not in the Production Binary

Part 7 covered host-first testing. Part 8 added hardware-in-the-loop testing with an IoTest image and a Python harness. Part 9 is about what you do when the system is running and you need to understand behavior without turning the firmware into a logging framework.

The guiding principle is simple:

  • Keep target observability minimal and deterministic.
  • Move heavy analysis, visualization, and introspection to the host.

This avoids firmware bloat, keeps timing predictable, and makes debugging better rather than noisier.

The boundary: on-target telemetry vs off-target analysis

Most embedded observability failures come from mixing these concerns:

  • On-target code tries to format rich logs, allocate strings, and emit verbose traces.
  • Those logs change timing, overflow buffers, and create new failure modes.
  • Developers then debug the logging system instead of the firmware.

A better split:

  • On target: emit small, fixed-format events and counters.
  • Off target: decode, correlate, visualize, and analyze.

The target should produce data. The host should produce insight.

What “minimal” looks like on the target

Minimal does not mean “no observability.” It means “observability that cannot break determinism.”

A good target-side observability set:

  • Counters
    • loop slip count
    • queue overflow counts
    • parser error counts
    • watchdog resets, brownout events
  • State snapshots
    • current mode/state id
    • last fault code
    • a small set of key inputs and outputs
  • Event stream (optional)
    • fixed-size event records in a ring buffer
    • drained periodically, not emitted from ISRs unless absolutely necessary

Avoid:

  • dynamic formatting
  • iostreams
  • variable-length strings
  • “log everything” builds shipped as production candidates

Unsolicited advice: if your observability changes the system’s behavior, it is not observability. It is a new subsystem.

Use fixed-size event records

If you need traces, use fixed-size records so storage and bandwidth are predictable.

A typical record:

  • timestamp or tick count
  • event id
  • a small number of integral parameters

Keep it boring. Boring is debuggable.

One tight C++ example:

#include <array>
#include <cstdint>

struct TraceEvent final {
    std::uint32_t ticks{0U};
    std::uint16_t id{0U};
    std::int32_t  a{0};
    std::int32_t  b{0};
};

template <std::size_t N>
class TraceBuffer final {
public:
    void push(const TraceEvent& e) noexcept {
        this->buf_[this->write_] = e;
        this->write_ = (this->write_ + 1U) % N;
        if(this->count_ < N) {
            ++this->count_;
        } else {
            ++this->drop_count_;
        }
    }

    [[nodiscard]] std::uint32_t drop_count() const noexcept { return this->drop_count_; }

private:
    std::array<TraceEvent, N> buf_{};
    std::size_t write_{0U};
    std::size_t count_{0U};
    std::uint32_t drop_count_{0U};
};

Notes:

  • This is deterministic: fixed memory, fixed record size, explicit drop behavior.
  • You can flush it on demand via a command or periodically in a non-hot path.

If you are using ETL for target containers, the same pattern applies. The principle is fixed capacity and explicit overflow behavior.

Prefer binary on the wire, decode on the host

Human-readable ASCII is great for IoTest bring-up and a small set of status queries. But for ongoing observability, binary is usually the right default:

  • predictable size
  • lower bandwidth
  • less time spent formatting on the MCU
  • easier to version and evolve

You can still keep it debuggable by decoding on the host into human-readable form.

A practical pattern:

  • On target: emit compact records with ids and integers.
  • On host: map ids to names, apply scaling, and render rich views.

Make “versioning” part of your protocol

Observability that cannot evolve becomes a liability.

Include:

  • firmware build id
  • protocol version
  • record schema version for trace events

This avoids silent mismatches where tooling decodes the wrong format and produces nonsense.

Unsolicited advice: schema mismatch bugs waste days. Version everything.

Host-side tooling: where insight should live

If you keep the target signal clean, host tooling can be as rich as you want:

  • trace decoding into JSON or CSV
  • timeline views
  • state transition diagrams
  • slip and overflow dashboards
  • correlation with test scenarios

This is also where you can afford heavier dependencies: parsers, GUI libraries, plotting libraries, data processing pipelines.

If you are building a series around GitLab pipelines, host tooling also becomes a first-class artifact:

  • collected traces as pipeline artifacts
  • automatic decoding jobs
  • visual reports attached to merge requests

CI support: make observability actionable, not just available

Observability data is useful only if it is used.

Good pipeline patterns:

  • HIL jobs upload trace artifacts.
  • A decode job turns traces into readable reports.
  • Thresholds fail the pipeline when they indicate regressions:
    • slip count increased beyond a limit
    • overflow counters non-zero
    • unexpected fault codes
  • Reports are retained for comparison across releases.

Unsolicited advice: treat overflow counters like failed assertions. If you see them, you are already outside your design envelope.

What not to do

Avoid these common traps:

  • Shipping verbose logging in production builds “just in case.”
  • Printing from ISRs.
  • Allocating memory to build log strings on target.
  • Using observability that depends on timing-sensitive host reads.
  • Adding “temporary debug code” that becomes permanent.

If you need deep introspection, build a separate debug or IoTest variant and keep production deterministic.

Minimal checklist

  • On target: counters, minimal snapshots, fixed-size event records, explicit drop behavior.
  • On wire: prefer compact binary for traces, decode on host.
  • Version everything: build id, protocol version, schema version.
  • On host: rich analysis, visualization, automated report generation.
  • In CI: store trace artifacts, decode automatically, and enforce regression thresholds.

Part 10 will tie everything together: a GitLab pipeline blueprint and an incremental migration checklist, including how to structure jobs, enforce quality gates, and keep “deterministic firmware discipline” from becoming optional under schedule pressure.

The Complete “Modern C++ Firmware” Series:


Need professional firmware development help? Engage with Polyrhythm


Discover more from John Farrier

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from John Farrier

Subscribe now to keep reading and get access to the full archive.

Continue reading