Security & Abuse-Path Review Summary
Marla — civic AI assistant for Friends of Midway Bluffs
From the Marla case study →System: Marla — civic AI assistant for Friends of Midway Bluffs, embedded at midwaybluffsfriends.org
Operator: Friends of Midway Bluffs
Review by: Taezo
Review date: 2026-05-06
System version reviewed: marla-v1 worker, version cdb31a18-ee45-44e9-9d12-e6e7ed64a31d
Time-to-completion: approximately 3 hours, end to end
Why security matters here
Marla speaks for Friends of Midway Bluffs in public. Anyone who arrives at the site can ask her questions about the access disputes and receive a substantive answer in her voice. That is the value she creates — and the public surface we have to treat seriously.
Three kinds of people can be affected by how Marla behaves: visitors who interact with her directly (their privacy), the organization itself (its reputation), and people who come up in the cases without ever interacting with her (their dignity, particularly defendants and other named individuals). Security work for a system like Marla isn’t just about the system continuing to function. It’s about making sure she doesn’t expose, misrepresent, or get made to harm any of those three groups.
We performed a scoped security and abuse-path review. The frame is straightforward: security is not a claim — it is a tested handoff. We say what we tested, what we found, and what would change the picture.
What we tested
Five surfaces:
The public chat surface. What visitors submit, how it’s processed, what protects Marla from misuse or hostile use.
Access controls. The limits on how often any one visitor can use Marla, and the boundaries around the administrative tools Taezo uses to maintain her.
Prompt-injection resistance. The category of attack specific to AI systems, where someone tries to make Marla ignore her instructions. We reviewed existing detection patterns and looked at real attempts in production data.
Data exposure through output. The rules that govern what Marla will and won’t say — including her name-suppression posture (she refers to private individuals by role, not by name) and the boundaries around her source material.
Deployment configuration. The settings around how Marla is hosted (Cloudflare Worker), how secrets are managed, what other websites are allowed to embed her, and how she logs her conversations.
What we found
Six findings total:
| Severity | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 2 |
| Low | 3 |
| Informational | 1 |
None of these findings represent active exploitation paths under current operating conditions. None indicate Marla has been misbehaving in production. They are the kind of findings a thorough review surfaces — places where the system could be hardened further, where observability could be improved, where future scale would require additional protection.
What we fixed during the review
Two issues were addressed inline:
An over-triggering observability flag. Marla maintains internal flags that help us see when something interesting happens in a conversation — for example, when she pushes back on a hostile premise. One of those flags was firing on routine substantive questions because of a code-level matching bug. The flag still works; it now fires accurately. This was an internal diagnostic issue, not a behavior issue — Marla was responding correctly throughout. We just couldn’t see the signal clearly in the logs.
Prompt caching and cost logging. The full system prompt was being sent fresh to the AI on every conversation, paying full cost each time. We enabled prompt caching, which produces roughly a tenfold cost reduction on cached calls, and added per-call cost logging so we can monitor spending continuously. This was a cost finding, not a security finding, but the fix shipped during the review and is worth naming.
What we deferred and why
Four findings were deferred to a future hardening pass. Each has a documented remediation path; none represent active risk under current operating conditions.
Telemetry on attempted prompt injections. Marla resists prompt-injection attempts well — her frame holds in production, and we have not seen successful attacks. What we don’t yet have is a counter for attempts. Adding one would let us see attempts even when they fail, which is useful for understanding what we’re up against. Scheduled for the next worker patch.
A maximum size on submitted questions. Existing rate limits make it hard to drive Marla’s costs through abusive usage, but a hard cap on question length would add a second layer of protection. Not urgent at current traffic levels; on the list.
Per-IP daily limits. The current daily request cap is global, which means a determined attacker rotating IP addresses could theoretically drain the day’s allowance. The per-minute rate limit makes this difficult; adding a per-IP daily cap would make it harder still.
A privacy-hygiene fix in the visitor-IP hashing. The value used to obscure visitor IPs is currently embedded in the source code rather than stored as a server secret. This would matter only if two separate protections failed at the same time: source-code privacy and database access. Worth fixing eventually; not urgent.
What would change the picture
Marla should be reviewed again if any of the following occur:
- The knowledge base she draws from changes materially
- Her prompt architecture changes
- The hosting platform changes
- Public traffic increases significantly (current volume is roughly 13 exchanges per week; tenfold growth would change the threat model)
- New data storage or logging is added
- The administrative tools are modified
- The website hosting Marla (Wix) is replaced with something we control directly
- The underlying language model is swapped (different models have different injection-resistance characteristics)
A weekly log-triage process now reviews flagged exchanges and cost trends. This complements but does not replace point-in-time security review.
The boundary
This was a scoped review. We tested the public surface of the system, the controls and configurations, and Marla’s behavior under representative pressure. We did not actively attack the administrative tools (read-only review this pass), audit the dependency tree end-to-end, or pen-test the platforms underneath us (Cloudflare, Wix, Anthropic’s API) — those are operated by parties we don’t control.
A separate one-page Verification Statement records the scope, findings, and retest triggers in dated, declarative form. This summary sits between two companion records: the Verification Statement for formal files, and a full technical report for technical review. The summary explains the work in plain language, without exposing implementation details or exploit paths.