RFC 0001
ZIP container format (v2.1)
Status: Draft — open for comments
Status#
| Status | Draft — nothing here is normative yet |
| Author | LeanCTX maintainers |
| Affects | spec (new container section), conformance (new vectors) |
| Comment | standard@ctxpkg.org |
1Summary#
A second, optional container for context packages: a ZIP archive with the
extension .ctxpkgz, carrying the manifest and content as
separate entries plus an assets/ directory for binary
attachments. The existing single-file JSON container stays the default and
is unaffected. Readers claiming v2.1 conformance MUST
accept both containers; writers MAY choose either.
2Motivation#
Three pressures on the single-JSON container, observed in practice:
| Pressure | Today |
|---|---|
| Size | Large knowledge graphs produce multi-megabyte JSON documents. Parsers must hold the whole document in memory before the first integrity check can run; registries must buffer entire uploads. |
| Binary assets | Diagrams, embeddings and fixture files can only ship base64-inlined, inflating size by a third and making diffs unreadable. |
| Partial reads | A reader that only wants the manifest (for indexing or trust decisions) still downloads and parses the full content. |
ZIP solves all three with boring, universally available technology: per-entry access, native binary storage, streamable verification.
3Design#
Proposed layout — exactly three reserved names, everything else is an asset:
package.ctxpkgz
├── manifest.json REQUIRED — same manifest as today, one addition
├── content.json REQUIRED — the content member, compact JSON
└── assets/ OPTIONAL — binary attachments
├── architecture.png
└── embeddings.bin
Integrity extends, it does not change: content_hash and
sha256 are computed over content.json exactly as
spec §8 defines for the inline member — the bytes of the entry are the
content_json_bytes. Assets get a new manifest member:
"assets": [
{ "path": "assets/architecture.png",
"sha256": "…64 hex chars…",
"media_type": "image/png",
"byte_size": 48211 }
]
Readers MUST verify every listed asset hash and
MUST reject archives containing entries not listed
in the manifest (no smuggling). The signature covers the manifest as today —
and therefore, transitively, every asset hash. Path traversal names
(../, absolute paths, drive letters) MUST
be rejected outright.
4Compatibility#
| Existing packages | Untouched. The JSON container remains valid indefinitely. |
| Existing readers | A v2.0 reader sees an unknown file type and fails closed — the desired behavior. Magic bytes differ (PK vs {), so misparsing is impossible. |
| Version bump | Minor: v2.1. Additive container, no change to any existing normative rule. |
| Registries | The registry protocol gains a content-type for the new container; the download endpoint is otherwise unchanged. |
5Conformance#
Acceptance requires four new golden vectors before the spec text merges:
| Vector | Expectation |
|---|---|
valid-zip-with-assets.ctxpkgz | accept |
invalid-zip-asset-hash.ctxpkgz | reject — asset bytes do not match manifest hash |
invalid-zip-unlisted-entry.ctxpkgz | reject — archive contains an entry the manifest does not list |
invalid-zip-path-traversal.ctxpkgz | reject — entry name escapes the archive root |
6Drawbacks#
A second container doubles the surface every conforming reader must test. ZIP itself has sharp edges — duplicate entry names, zip-slip, compression bombs — that the spec must explicitly close (entry allowlisting and size caps above). And "two ways to do it" is a real cost for a young standard; this RFC should only be accepted once a concrete package demonstrably does not fit the JSON container.
7Alternatives#
| Alternative | Why not |
|---|---|
| tar + zstd | Better compression, but no per-entry random access without reading the stream — kills the partial-read motivation. |
| Asset URLs in JSON (fetch at install) | Breaks the core promise: a package is one self-contained, offline-verifiable artifact. Remote assets rot and can change after signing. |
| Bigger inline base64 | Status quo. Works, but the three pressures in §2 only grow. |
8Open questions#
| Q1 | Compression: store-only (simplest verification) or permit DEFLATE with a decompressed-size cap? |
| Q2 | Should assets be allowed in the JSON container too (base64-inlined), keeping one logical model across both containers? |
| Q3 | Maximum archive size and per-entry caps — fixed in the spec or registry policy? |