[COFF] [TUHS] Re: To NDEBUG or not to NDEBUG, that is the question
David Barto via COFF
coff at tuhs.org
Tue Nov 11 09:17:32 AEST 2025
At a company I worked for we caught any exception (OOM, SIGTERM, SIGHUP as examples) that would cause
the application to exit. In the exception handler we wrote out 100’s of MB of state data of the program, including
stack traces for all the threads (1000’s of those) along with data structures and anything else we could think of.
(Memory allocation traces and queries that were running as examples). This was done with very carefully crafted
code which could not call any other functions, nor allocate any memory.
This was all written in a format that allowed us to load it into the same database in our office where we could then
write queries against the data to see what happened and where the program was when it occurred. We called
the data dump an 'x-ray' and the program that loaded it into the database and supported us examining the data
’the doctor’.
A common thing to hear was “I’m running the doctor on an x-ray from customer <foo>”, or “the X-ray showed that
we designed the query wrong, it should have had a join <here> which would reduce the memory footprint by N-GB”
As far as post-mortem debugging it was an amazing environment and was exceptional at finding bugs in the code
without having to use a standard debugger. No core files required.[1]
It also let us ’Take an X-Ray’ of the running system while on the phone with the customer, allowing us to examine
what was happening before they did “the next step” which would crash the system.
David
[1] - there were several users of the system who would not let a core file leave the building b/c of security.
> On Nov 10, 2025, at 10:08 AM, Steffen Nurpmeso via COFF <coff at tuhs.org> wrote:
>
> Bakul Shah via COFF wrote in
> <A99C3182-CEE3-49BC-AF37-3AF47E2C21AE at iitbombay.org>:
> |> On Nov 9, 2025, at 11:45 PM, Dan Cross via COFF <coff at tuhs.org> wrote:
> |> On Sun, Nov 9, 2025 at 10:22 PM <ori at eigenstate.org> wrote:
> |>> Quoth Dan Cross <crossd at gmail.com>:
> |>>> Post mortem analysis is undeniably useful. But I maintain that it is
> |>>> _mostly_ orthogonal to `assert`.
> |>>
> |>> What are you doing with the printed values of assert (or the
> |>> stack trace), other than post mortem analysis?
> |>
> |> That's reductive. Surely there is a qualitative difference between
> |> reading an error message and invoking a debugger, no? And as I said,
> |> there are instances where you `assert` and no core file (or broken
> |> process) to debug is produced.
> |>
> |> - Dan C.
> |>
> |> (And of course I must acknowledge that I did misread your earlier
> |> statement about stack traces being at times insufficient.)
> |
> |What I would like is to see on assert() failure is for the system
> |to invoke a debugger, provided matching source can be found. But
> |this requires compilers/linkers to *not* throw away information[1].
> |
> |If a decent protocol is defined and appropriate access permissions
> |are obtained, in theory a failure at a customer site can invoke
> |the debugger at the developer site[2]. Then instead of an autopsy
> |one can do a biopsy and may be even temporarily "cure" the patient!
> |
> |This can be useful when a system (or test) fails after many hours.
> |
> |[1] Would be nice to see C/C++/etc. compiled language tools to
> |catch up to Lisp systems of the last century!
> |
> |[2] Dealing with leakage of customer/personal info is a separate
> |issue but must be dealt with in any remote debugging protocol.
>
> Fwiw i totally disagree with any opinion who says that asserts
> shold remain in shipout code. For me there always has been debug-
> enabled developer-, and shipout code.
> The former goes many roads the latter will never see, for
> example format codec validates format string (not arguments
> though), getopt parser does this, and ensures long matches short
> equivalent etc, memory cache validates pointers before access,
> and all that. Except for the latter this is all developer only,
> but the latter should also not mean a thing in shipouts.
> For most of all that i even use preprocessor switches to avoid
> compilation overhead for users.
>
> What has not yet been mentioned at all is the runtime behavior
> difference in between debug and such optimized builds.
> This is a real problem. Especially so in true (let alone
> heavy) multithreading environments. In sofar i think the Salz'
> mentioned OSSL approach of having some kind of "verify" panicking
> or returning error is possibly best, but, i have not looked,
> even the different code layout (likely) resulting from that,
> ie, function call preparations, relative jump differences,
> different sizes of .RODATA etc, you know, could play a role.
> To me assertions are developer-only basic preconditions, which
> should never ever trigger in mature code. If there is only
> a slight change they could trigger, then regular error conditions
> are due.
>
> In fact i started to diversify my code a bit further after
> having seen that package maintainers sometimes enable debug code,
> resulting in development code paths to be included. (ASSERT is
> still based upon -DNDEBUG though). One maintainer (i am thankful
> for everyone who goes down that road!) of a distribution which
> only provides binaries now even explicitly uses git checkouts
> that include development cruft, even though the normal releases
> are based upon stripped such, for faster compilation, manual
> display, etc.
>
> That is to say that one should carefully take into account what
> could be done onto the software "downstream".
> For me all that will surely move further behind some "devel"opment
> curtain, not only "debug", or even only -DNDEBUG. I hate bugs,
> i hate all that, i do not want normal users to have a need to face
> such development mess. No.
>
> I mean, it is easy for OSSL, with their perl build environment,
> and they have the standing to simply say "that is unsupported".
> This will not work except with good will for most other projects.
>
> P.S.: i hate debuggers. In case of crash there are thread
> specific call graphs manages in software. Takes time, but gives
> a path over hundreds or more function calls.
> You say Potaetoe, and i say Potato. Maybe.
>
> --End of <A99C3182-CEE3-49BC-AF37-3AF47E2C21AE at iitbombay.org>
>
> --steffen
> |
> |Der Kragenbaer, The moon bear,
> |der holt sich munter he cheerfully and one by one
> |einen nach dem anderen runter wa.ks himself off
> |(By Robert Gernhardt)
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
David Barto
barto at kdbarto.org
More information about the COFF
mailing list