Content-addressed model hub.
Uploaders own the attribution.
The Hub gives every model a permanent, cryptographic address. Upload any ONNX, GGUF, or safetensors file and Plumb hashes the bytes with keccak256 — identical files produce identical hashes, divergent files cannot spoof each other. The hash is the only identifier the rest of the system needs.
A second step publishes the (hash, uploader) tuple to HubRegistry on Base Sepolia. Once registered, the uploader's address is the canonical attribution target for any inference run against that model — ready for revenue routing or provenance queries without trusting the gateway's database.
Upload + register (two endpoints, one flow)
/hub/upload accepts raw binary with a session token. The response is the keccak256 hash and on-disk storage URL. /hub/models/:hash/register (admin-gated) puts the tuple on-chain. Registration can happen immediately or later — the same hash can be re-registered idempotently.
# 1. upload raw bytes
curl -X POST https://api.plumbtech.xyz/hub/upload \
-H "Authorization: Bearer $PLUMB_SESSION" \
-H "Content-Type: application/octet-stream" \
-H "X-Plumb-Model-Name: my-classifier" \
-H "X-Plumb-Model-Framework: onnx" \
--data-binary @my-classifier.onnx
# → {"hash":"0x7577…","sizeBytes":1023,"storageUrl":"plumb://0x7577…","framework":"onnx","name":"my-classifier"}
# 2. register on-chain (operator call — gateway submits the tx)
curl -X POST https://api.plumbtech.xyz/hub/models/0x7577…/register \
-H "Authorization: Bearer $PLUMB_ADMIN_TOKEN"
# → {"hash":"0x7577…","txHash":"0xaf3…","metadataURI":"plumb://0x7577…","prevVersion":"0x000…"}Inference against a Hub model
Once a model is registered, Plumb runs it locally via onnxruntime (for ONNX) or through the provider shim (for GGUF/safetensors via llama.cpp). The/hub/inference endpoint accepts the hash and input tensors; the response is dims + data, signed with the same receipt discipline as chat completions.
An LRU session cache keeps the most-used models warm in memory — cold load is 100-500ms for small models, repeat calls are sub-millisecond. There's a hard 30-second timeout per inference so an adversarial or stuck model can't stall the worker.
H.03Versions, not tags
Unlike docker tags, Hub versions are a Merkle DAG: every registration carries a prevVersion pointing at the prior registration's hash (or bytes32(0) for the first version). Off-chain consumers walk the chain to see history; no tag is mutable, no version can be rewritten in place.
Size limits and storage backends
Default upload cap is 100 MB, configurable via PLUMB_HUB_MAX_UPLOAD_BYTES. Dev stores on the local filesystem at PLUMB_HUB_STORAGE_DIR; prod can swap to S3-compatible storage by implementing the StorageBackend interface. Uploads are DB-first: the gateway claims the row before writing bytes, so a crash between the two leaves no orphan files.
↳ HubRegistry addresses + ABI
↳ Python SDK hub client
↳ Hub inference receipts