Logging answers one question: what happened in the machine?
Evidence answers three: what happened, what did it mean, and can you prove it to someone who was not there?
Most organizations running AI can answer the first question. Almost none can answer the second and third. That gap is not a logging problem. You cannot close it by logging more. It is a structural gap between activity telemetry and evidentiary infrastructure, and in high-consequence AI workflows, it is where liability concentrates.
What logging was designed to do
Logs were built for operations. For debugging, monitoring, incident response. They record that an event occurred, what system processed it, and when. They are optimized for retrieval, finding the entry that corresponds to the event you are investigating.
That is the right tool for the right problem. When a system fails, logs tell you where and when. When an unauthorized access occurs, logs tell you who and what.
What logs were not designed to do: establish that a decision was reasonable. Prove that a review was conducted under a specific standard. Demonstrate that an AI output was consistent with policy. Produce a verifiable record that can be examined by a regulator or opposing counsel and independently confirmed.
Those requirements belong to a different class of artifact.
What evidence requires
An evidentiary record has properties that logs do not.
It is tied to specific content, not just an event. The hash of the document reviewed, not just the timestamp of the review. The policy version that governed the decision, not just the system that made it. The structured findings that resulted, in a form that can be read and evaluated by someone outside the system.
It is reproducible. Given the same inputs and the same policy version, the same findings emerge. This is not how logs work. Logs record what happened. They do not guarantee that the same process would produce the same record again.
It is independently verifiable. A third party, a regulator, a court, opposing counsel, can examine the record and confirm it reflects what it claims to reflect. This requires cryptographic integrity, external timestamps, and policy lineage that does not depend on the producing party's own assertions.
Logs have none of these properties by design. They are not meant to. The problem is that organizations are using them as if they do.
Where this breaks down in practice
The Gordon Rees hallucination incidents are the clearest illustration in the public record.
After the first incident, the firm updated its policies. It added a cite-checking step. It told the court it was profoundly embarrassed and would ensure it did not happen again. Bloomberg Law covered the admission in October 2025.
The logging infrastructure recorded every subsequent filing. The activity telemetry existed.
What did not exist was an evidentiary record of the review process. A record that tied each citation in each brief to a verification step, a policy version, and a deterministic check. A record that could show, for any given filing, exactly what was reviewed, exactly what the check found, and exactly which standard governed the determination.
Without that record, the policy update was an assertion. It told the court what the firm intended to do. It could not prove what the firm actually did. And because it could not prove it, the next incident produced the same pattern: sanctions, embarrassment, updated policy language, no structural change.
The difference between logging and evidence is the difference between a firm that can say it has a cite-checking policy and a firm that can prove every citation in every filing was checked, when, against what standard, with what result.
One of those firms has a liability exposure. The other has an evidence record.
The direction of travel
Courts are not moving toward accepting policy assertions as proof of process. They are moving toward requiring verifiable records of what the AI produced, how it was reviewed, and under which standard.
The ElixirData analysis of decision traces vs logs captured the enforcement reality: opposing counsel may request the data evaluated by the AI, the rules applied, the alternatives considered, and the reasoning for the final outcome. Logs require manual reconstruction. Decision traces are complete, immutable, and auditable.
The firms building toward decision traces today are not over-engineering. They are building for the evidentiary standard that courts are already imposing and that regulators are moving toward codifying.
Logs tell you what happened. Evidence proves it. The two are not interchangeable. And in a sanctions motion, only one of them holds up.
Sources
Bloomberg Law, Gordon Rees AI misuse (October 2025). https://news.bloomberglaw.com/bankruptcy-law/gordon-rees-admits-ai-misuse-in-hospital-bankruptcy-repays-fees
ElixirData, AI agent decision traces vs. logs, audit trail compliance. https://www.elixirdata.co/blog/ai-agent-decision-traces-vs-logs-audit-trail-compliance