Persistent memory.
Opt-in. Per-address. Deletable.
Pass plumb_memory=true as a request-body flag on any chat completion and Plumb will extract durable facts from both sides of the turn using a background worker. Extractions are embedded with a small local model and written to a pgvector index scoped to your session address.
On subsequent turns, the top-k relevant memories are retrieved and injected into the system prompt before the upstream call. A composed profile JSON is exposed at /memsync/profile — a denormalized view of who the user is and what they care about, derived deterministically from the memory rows.
Every memory is addressable by id and deletable. Because memory extraction happens asynchronously, the first completion does not block on it. If you turn memory off, nothing is written for that turn.
EX.04Enable memory on a turn
from plumb_sdk import Client
c = Client(base_url="https://api.plumbtech.xyz", session_token=SESSION)
# Turn 1 — memory is extracted in the background.
resp = c.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "I'm a Go dev learning React"}],
extra_body={"plumb_memory": True},
)
# Turn 2 — memories retrieved automatically.
resp = c.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "what should I focus on?"}],
extra_body={"plumb_memory": True},
)
# Inspect the composed profile at any time.
profile = c.memsync.profile()
print(profile.summary) # e.g. "Go dev learning React; prefers ..."