ctxpkg.org

RFC 0001

ZIP container format (v2.1)

Status: Draft — open for comments

Status#

StatusDraft — nothing here is normative yet
AuthorLeanCTX maintainers
Affectsspec (new container section), conformance (new vectors)
Commentstandard@ctxpkg.org

1Summary#

A second, optional container for context packages: a ZIP archive with the extension .ctxpkgz, carrying the manifest and content as separate entries plus an assets/ directory for binary attachments. The existing single-file JSON container stays the default and is unaffected. Readers claiming v2.1 conformance MUST accept both containers; writers MAY choose either.

2Motivation#

Three pressures on the single-JSON container, observed in practice:

PressureToday
SizeLarge knowledge graphs produce multi-megabyte JSON documents. Parsers must hold the whole document in memory before the first integrity check can run; registries must buffer entire uploads.
Binary assetsDiagrams, embeddings and fixture files can only ship base64-inlined, inflating size by a third and making diffs unreadable.
Partial readsA reader that only wants the manifest (for indexing or trust decisions) still downloads and parses the full content.

ZIP solves all three with boring, universally available technology: per-entry access, native binary storage, streamable verification.

3Design#

Proposed layout — exactly three reserved names, everything else is an asset:

package.ctxpkgz
├── manifest.json          REQUIRED — same manifest as today, one addition
├── content.json           REQUIRED — the content member, compact JSON
└── assets/                OPTIONAL — binary attachments
    ├── architecture.png
    └── embeddings.bin

Integrity extends, it does not change: content_hash and sha256 are computed over content.json exactly as spec §8 defines for the inline member — the bytes of the entry are the content_json_bytes. Assets get a new manifest member:

"assets": [
  { "path": "assets/architecture.png",
    "sha256": "…64 hex chars…",
    "media_type": "image/png",
    "byte_size": 48211 }
]

Readers MUST verify every listed asset hash and MUST reject archives containing entries not listed in the manifest (no smuggling). The signature covers the manifest as today — and therefore, transitively, every asset hash. Path traversal names (../, absolute paths, drive letters) MUST be rejected outright.

4Compatibility#

Existing packagesUntouched. The JSON container remains valid indefinitely.
Existing readersA v2.0 reader sees an unknown file type and fails closed — the desired behavior. Magic bytes differ (PK vs {), so misparsing is impossible.
Version bumpMinor: v2.1. Additive container, no change to any existing normative rule.
RegistriesThe registry protocol gains a content-type for the new container; the download endpoint is otherwise unchanged.

5Conformance#

Acceptance requires four new golden vectors before the spec text merges:

VectorExpectation
valid-zip-with-assets.ctxpkgzaccept
invalid-zip-asset-hash.ctxpkgzreject — asset bytes do not match manifest hash
invalid-zip-unlisted-entry.ctxpkgzreject — archive contains an entry the manifest does not list
invalid-zip-path-traversal.ctxpkgzreject — entry name escapes the archive root

6Drawbacks#

A second container doubles the surface every conforming reader must test. ZIP itself has sharp edges — duplicate entry names, zip-slip, compression bombs — that the spec must explicitly close (entry allowlisting and size caps above). And "two ways to do it" is a real cost for a young standard; this RFC should only be accepted once a concrete package demonstrably does not fit the JSON container.

7Alternatives#

AlternativeWhy not
tar + zstdBetter compression, but no per-entry random access without reading the stream — kills the partial-read motivation.
Asset URLs in JSON (fetch at install)Breaks the core promise: a package is one self-contained, offline-verifiable artifact. Remote assets rot and can change after signing.
Bigger inline base64Status quo. Works, but the three pressures in §2 only grow.

8Open questions#

Q1Compression: store-only (simplest verification) or permit DEFLATE with a decompressed-size cap?
Q2Should assets be allowed in the JSON container too (base64-inlined), keeping one logical model across both containers?
Q3Maximum archive size and per-entry caps — fixed in the spec or registry policy?