docs/reference/security.md · gabriel/muse — MuseHub

gabriel / muse public

security.md markdown

505 lines 18.5 KB

7855ccd0 feat: harden, test, and document all quality-dial changes Gabriel Cardona <gabriel@tellurstori.com> 2d ago

1	# Security Architecture — Muse Trust Boundary Reference
2
3	Muse is designed to run at the scale of millions of agent calls per minute.
4	Every data path that crosses a trust boundary — user input, remote HTTP
5	responses, manifest keys from the object store, terminal output — is guarded
6	by an explicit validation primitive. This document describes each guard,
7	where it applies, and the attack it prevents.
8
9	---
10
11	## Table of Contents
12
13	1. [Threat Model](#threat-model)
14	2. [Trust Boundary Design](#trust-boundary-design)
15	3. [Validation Module — `muse/core/validation.py`](#validation-module)
16	4. [Object ID & Ref ID Validation](#object-id--ref-id-validation)
17	5. [Branch Name & Repo ID Validation](#branch-name--repo-id-validation)
18	6. [Path Containment — Zip-Slip Defence](#path-containment--zip-slip-defence)
19	7. [Display Sanitization — ANSI Injection Defence](#display-sanitization--ansi-injection-defence)
20	8. [Glob Injection Prevention](#glob-injection-prevention)
21	9. [Numeric Guards](#numeric-guards)
22	10. [XML Safety — `muse/core/xml_safe.py`](#xml-safety)
23	11. [HTTP Transport Hardening](#http-transport-hardening)
24	12. [Snapshot Integrity](#snapshot-integrity)
25	13. [Identity Store Security](#identity-store-security)
26	14. [Size Caps](#size-caps)
27
28	---
29
30	## Threat Model
31
32	Muse's primary threat surface has four entry points:
33
34	\| Entry point \| Source of untrusted data \|
35	\|---\|---\|
36	\| CLI arguments \| User shell input, agent-generated commands \|
37	\| Environment variables \| CI systems, compromised orchestrators \|
38	\| Remote HTTP responses \| MuseHub server, MitM attacker \|
39	\| On-disk data \| Tampered `.muse/` directory, crafted MIDI / MusicXML files \|
40
41	At the scale of millions of agents per minute, even a low-probability
42	exploitation path becomes a near-certainty. Every function that accepts
43	external data must validate it before use.
44
45	---
46
47	## Trust Boundary Design
48
49	Muse uses a layered trust model:
50
51	```
52	External world (untrusted)
53	\|
54	\| CLI args, env vars, HTTP responses, files
55	v
56	CLI commands ←──────────────── muse/cli/commands/
57	\|
58	\| validated, typed data only
59	v
60	Core engine ←──────────────── muse/core/
61	\|
62	\| content-addressed blobs
63	v
64	Object store ←──────────────── muse/core/object_store.py
65	```
66
67	Rule: data is validated at the point it crosses from the external world
68	into the CLI layer, or from the network into the core. Internal functions
69	that call each other do not re-validate data they receive from trusted callers.
70
71	The validation module — `muse/core/validation.py` — sits at the absolute
72	bottom of the dependency graph. It imports no other Muse module. Every layer
73	may import it; it imports nothing above itself.
74
75	---
76
77	## Validation Module
78
79	`muse/core/validation.py` — the single source of all trust-boundary
80	primitives.
81
82	```
83	muse/core/validation.py
84	├── validate_object_id(s) → str \| raises ValueError
85	├── validate_ref_id(s) → str \| raises ValueError
86	├── validate_branch_name(name) → str \| raises ValueError
87	├── validate_repo_id(repo_id) → str \| raises ValueError
88	├── validate_domain_name(domain)→ str \| raises ValueError
89	├── contain_path(base, rel) → pathlib.Path \| raises ValueError
90	├── sanitize_glob_prefix(prefix)→ str (never raises)
91	├── sanitize_display(s) → str (never raises)
92	├── clamp_int(value, lo, hi) → int \| raises ValueError
93	└── finite_float(value, fallback)→ float (never raises)
94	```
95
96	The convention: functions named `validate_*` raise on bad input; functions
97	named `sanitize_*` strip bad bytes and always return a safe string.
98
99	---
100
101	## Object ID & Ref ID Validation
102
103	Function: `validate_object_id(s)` and `validate_ref_id(s)`
104	Guard: enforces exactly 64 lowercase hexadecimal characters.
105	Attack prevented: path traversal via crafted object or commit IDs.
106
107	### Why this matters
108
109	Object IDs are used to construct filesystem paths:
110
111	```
112	.muse/objects/<id[:2]>/<id[2:]>
113	.muse/commits/<commit_id>.json
114	```
115
116	A crafted ID such as `../../../etc/passwd` followed by padding would construct
117	a path outside `.muse/`. Enforcing the 64-char hex format closes this class
118	of attack completely — no character in `[0-9a-f]{64}` can form a path
119	separator.
120
121	### Where applied
122
123	- `object_store.object_path()` — before constructing the shard path
124	- `object_store.restore_object()` — before reading a blob
125	- `object_store.write_object()` — verifies the provided ID is valid hex
126	and checks that the written content hashes to the provided ID
127	(content integrity, not just format integrity)
128	- `store.resolve_commit_ref()` — sanitizes user-supplied ref before prefix scan
129	- `store.store_pulled_commit()` — validates commit and snapshot IDs from remote
130	- `merge_engine.read_merge_state()` — validates IDs read from MERGE_STATE.json
131	- `merge_engine.apply_resolution()` — validates the resolution object ID
132
133	---
134
135	## Branch Name & Repo ID Validation
136
137	Function: `validate_branch_name(name)` and `validate_repo_id(repo_id)`
138	Guard: rejects backslashes, null bytes, CR/LF, leading/trailing dots,
139	consecutive dots, consecutive slashes, leading/trailing slashes, and names
140	longer than 255 characters.
141	Attack prevented: path traversal via branch names used in ref paths, null
142	byte injection, and log injection via CR/LF.
143
144	### Branch name rules
145
146	\| Allowed \| Rejected \|
147	\|---\|---\|
148	\| `main`, `dev`, `feature/my-branch` \| Backslash: `evil\branch` \|
149	\| Digits, hyphens, underscores \| Null byte: `branch\x00name` \|
150	\| Forward slashes (namespacing) \| CR or LF: `branch\rname` \|
151	\| Up to 255 characters \| Leading dot: `.hidden` \|
152	\| \| Trailing dot: `branch.` \|
153	\| \| Consecutive dots: `branch..name` \|
154	\| \| Consecutive slashes: `feat//branch` \|
155	\| \| Leading or trailing slash \|
156
157	### Where applied
158
159	- `cli/commands/init.py` — `--default-branch` and `--domain` arguments
160	- `cli/commands/commit.py` — HEAD branch detection (HEAD-poisoning guard)
161	- `cli/commands/branch.py` — creation and deletion targets
162	- `cli/commands/checkout.py` — new branch creation via `-b`
163	- `cli/commands/merge.py` — target branch name
164	- `cli/commands/reset.py` — branch before writing the ref file
165	- `store.get_head_commit_id()` — branch from the ref layer
166
167	---
168
169	## Path Containment — Zip-Slip Defence
170
171	Function: `contain_path(base: pathlib.Path, rel: str) -> pathlib.Path`
172	Guard: joins `base / rel`, resolves symlinks, then asserts the result is
173	inside `base`.
174	Attack prevented: zip-slip (path traversal via manifest keys or
175	user-supplied relative paths).
176
177	### The zip-slip attack
178
179	A malicious archive or snapshot manifest can contain a key like
180	`../../.ssh/authorized_keys`. If the restore loop does:
181
182	```python
183	dest = workdir / manifest_key
184	dest.write_bytes(blob)
185	```
186
187	…then a crafted key writes outside the working directory. `contain_path`
188	closes this by checking:
189
190	```python
191	resolved = (base / rel).resolve()
192	if not resolved.is_relative_to(base.resolve()):
193	raise ValueError("Path traversal detected")
194	```
195
196	### Symlink escape
197
198	`contain_path` resolves symlinks before the containment check. A symlink
199	inside `state/` that points to `/etc/passwd` would resolve to a path
200	outside `state/`, causing `contain_path` to raise before any data is
201	written.
202
203	### Where applied
204
205	- `cli/commands/checkout.py` — `_checkout_snapshot()` for every restored file
206	- `cli/commands/merge.py` — `_restore_from_manifest()` for every restored file
207	- `cli/commands/reset.py` — `--hard` reset restore loop
208	- `cli/commands/revert.py` — revert restore loop
209	- `cli/commands/cherry_pick.py` — cherry-pick restore loop
210	- `cli/commands/stash.py` — `stash pop` restore loop
211	- All 7 semantic write commands (arpeggiate, humanize, invert, quantize,
212	retrograde, velocity_normalize, midi_shard) — output file paths
213	- `merge_engine.read_merge_state()` — conflict path list from MERGE_STATE.json
214	- `merge_engine.apply_resolution()` — resolution target file path
215
216	---
217
218	## Display Sanitization — ANSI Injection Defence
219
220	Function: `sanitize_display(s: str) -> str`
221	Guard: strips all C0 control characters except `\t` and `\n`, plus DEL
222	(`\x7f`) and C1 control characters (`\x80–\x9f`).
223	Attack prevented: ANSI/OSC terminal escape injection via commit messages,
224	branch names, author fields, and other user-controlled strings echoed to the
225	terminal.
226
227	### The attack
228
229	A commit message like:
230
231	```
232	Add feature\x1b]2;Hacked terminal title\x07 (harmless-looking)
233	```
234
235	…would, when echoed to a terminal, silently change the terminal's title bar or
236	execute other OSC/CSI sequences. At millions of agent calls per minute, a
237	malicious agent could systematically inject escape sequences into commit
238	messages that other users' terminals execute.
239
240	### Characters stripped
241
242	\| Code point \| Name \| Why stripped \|
243	\|---\|---\|---\|
244	\| `\x00–\x08` \| C0 (NUL to BS) \| Control bytes; no legitimate use in display \|
245	\| `\x0b–\x0c` \| VT, FF \| Not standard line breaks; terminal control \|
246	\| `\x0d` \| CR \| Cursor return — log injection \|
247	\| `\x0e–\x1a` \| SO to SUB \| Control shift codes \|
248	\| `\x1b` \| ESC \| ANSI escape sequence start \|
249	\| `\x1c–\x1f` \| FS to US \| Control separators \|
250	\| `\x7f` \| DEL \| Backspace-style control \|
251	\| `\x80–\x9f` \| C1 \| CSI (`\x9b`) and other C1 escape starters \|
252
253	Preserved: `\t` (tab) and `\n` (newline) — legitimate in commit messages.
254
255	### Where applied
256
257	All `typer.echo()` paths that output user-controlled strings:
258	`log`, `tag`, `branch`, `checkout`, `merge`, `reset`, `revert`,
259	`cherry_pick`, `commit`, `find_phrase`, `agent_map`.
260
261	---
262
263	## Glob Injection Prevention
264
265	Function: `sanitize_glob_prefix(prefix: str) -> str`
266	Guard: strips the glob metacharacters `*`, `?`, `[`, `]`, `{`, `}` from
267	a string before it is used in a `pathlib.Path.glob()` pattern.
268	Attack prevented: glob injection turning a targeted prefix lookup into an
269	arbitrary filesystem scan.
270
271	The function `_find_commit_by_prefix()` in `store.py` constructs:
272
273	```python
274	list(commits_dir.glob(f"{sanitized}*.json"))
275	```
276
277	Without sanitization, a crafted prefix like `*/` would enumerate the
278	entire directory tree rooted at `.muse/commits/`.
279
280	---
281
282	## Numeric Guards
283
284	Function: `clamp_int(value, lo, hi, name)` and `finite_float(value, fallback)`
285	Guard: raises `ValueError` for out-of-range integers; returns `fallback`
286	for `Inf` / `-Inf` / `NaN` floats.
287	Attack prevented: resource exhaustion via large numeric arguments; NaN
288	propagation causing silent computation corruption.
289
290	### Where applied
291
292	\| Command \| Flag \| Bounds \|
293	\|---\|---\|---\|
294	\| `muse log` \| `--max-count` \| ≥ 1 \|
295	\| `muse find_phrase` \| `--depth` \| 1–10,000 \|
296	\| `muse agent_map` \| `--depth` \| 1–10,000 \|
297	\| `muse find_phrase` \| `--min-score` \| 0.0–1.0 \|
298	\| `muse humanize` \| `--timing` \| ≤ 1.0 beat \|
299	\| `muse humanize` \| `--velocity` \| ≤ 127 \|
300	\| `muse invert` \| `--pivot` \| 0–127 (MIDI note range) \|
301	\| MIDI parser \| `tempo` \| guard against `tempo=0` (division by zero) \|
302	\| MIDI parser \| `divisions` \| guard against negative or zero values \|
303
304	---
305
306	## XML Safety
307
308	Module: `muse/core/xml_safe.py`
309	Guard: wraps `defusedxml.ElementTree.parse()` behind a typed `SafeET`
310	class.
311	Attack prevented: Billion Laughs (entity expansion DoS), XXE (external
312	entity credential theft), and SSRF via XML.
313
314	### The attacks
315
316	Billion Laughs:
317	A DTD-defined entity that expands to another entity, repeated exponentially.
318	Parsing a single small file consumes gigabytes of memory.
319
320	XXE (XML External Entity):
321	```xml
322	<!ENTITY xxe SYSTEM "file:///etc/passwd">
323	<root>&xxe;</root>
324	```
325	The parser fetches the file and embeds its contents in the parse tree. With a
326	`SYSTEM "http://..."` URL, it becomes an SSRF vector.
327
328	### Why a typed wrapper
329
330	`defusedxml` does not ship type stubs. Importing it directly requires a
331	`# type: ignore` comment, which the project's zero-ignore rule bans.
332	`xml_safe.py` contains the single justified crossing of the typed/untyped
333	boundary and re-exports all necessary stdlib `ElementTree` types with full
334	type information.
335
336	```python
337	# Instead of:
338	import xml.etree.ElementTree as ET # unsafe — no XXE protection
339	ET.parse("score.xml")
340
341	# Use:
342	from muse.core.xml_safe import SafeET
343	SafeET.parse("score.xml") # fully typed, XXE-safe
344	```
345
346	---
347
348	## HTTP Transport Hardening
349
350	Module: `muse/core/transport.py` — `HttpTransport`
351
352	### Redirect refusal
353
354	`_STRICT_OPENER` is a `urllib.request.OpenerDirector` built with a custom
355	`_NoRedirectHandler` that raises on any HTTP redirect. This prevents:
356
357	- Authorization header leakage — a redirect to a different host would
358	carry the `Authorization: Bearer <token>` header to the attacker's server.
359	- Scheme downgrade — a redirect from `https://` to `http://` would
360	expose the bearer token over cleartext.
361
362	### HTTPS enforcement
363
364	`_build_request()` uses `urllib.parse.urlparse(url).scheme` to check for
365	HTTPS. A URL that uses any other scheme raises before a connection is
366	attempted.
367
368	### Response size cap
369
370	`_execute()` reads at most `MAX_RESPONSE_BYTES` (64 MB) from any HTTP
371	response. If a `Content-Length` header declares a larger body, the request is
372	rejected before reading begins. This prevents OOM attacks via an unbounded
373	response body.
374
375	### Content-Type guard
376
377	`_assert_json_content(raw, endpoint)` checks that the first non-whitespace
378	byte of a response body is `{` or `[` before calling `json.loads()`. This
379	catches HTML error pages (proxy intercept pages, Cloudflare challenges) that
380	would otherwise produce a misleading `JSONDecodeError`.
381
382	---
383
384	## Local File Transport Hardening
385
386	Module: `muse/core/transport.py` — `LocalFileTransport`
387
388	`LocalFileTransport` handles `file://` URLs — direct filesystem reads and
389	writes between two Muse repositories on the same host (or a shared network
390	mount). Because all I/O is local, the threat surface shifts from network
391	attacks to filesystem attacks.
392
393	### Symlink canonicalisation
394
395	`_repo_root()` calls `pathlib.Path.resolve()` on the path extracted from the
396	URL before any filesystem operation. `resolve()` dereferences all symlinks
397	and normalises `..` path components.
398
399	Attack prevented: a crafted `file://` URL or a pre-placed symlink at the
400	URL target that points to a sensitive directory (one without `.muse/`) is
401	rejected because the containment check is made on the canonical resolved
402	path, not the symlink itself.
403
404	### Branch name validation
405
406	`push_pack()` calls `validate_branch_name(branch)` before any I/O. This
407	rejects:
408
409	\| Input \| Why rejected \|
410	\|---\|---\|
411	\| `../evil` \| Leading `..` traversal \|
412	\| `foo\x00bar` \| Null byte injection \|
413	\| `branch\revil` \| CR log injection \|
414	\| `main\\escape` \| Backslash path separator \|
415	\| `foo..bar` \| Consecutive dots \|
416	\| `""` (empty) \| Cannot form a valid ref path \|
417
418	### Ref path containment
419
420	Even after `validate_branch_name` passes, the branch name is joined onto the
421	`.muse/refs/heads/` base path and validated with `contain_path()`.
422
423	`contain_path()` resolves symlinks on the result path and asserts it is
424	relative to the base directory. This provides defence-in-depth against:
425
426	- Pre-placed symlinks — an attacker who can write to `.muse/refs/heads/`
427	before a push could place a symlink named after a legitimate branch that
428	points outside the directory. `contain_path()` resolves that symlink and
429	rejects the write.
430	- Future branch-name edge cases — any branch name that somehow passes
431	`validate_branch_name` but resolves outside `refs/heads/` is still caught.
432
433	### Where applied
434
435	\| Guard \| Function \| Attack prevented \|
436	\|---\|---\|---\|
437	\| `resolve()` \| `_repo_root()` \| Symlink traversal on URL path \|
438	\| `validate_branch_name()` \| `push_pack()` \| Branch-as-path injection \|
439	\| `contain_path()` \| `push_pack()` \| Pre-placed symlink in refs/heads/ \|
440
441	---
442
443	## Snapshot Integrity
444
445	Module: `muse/core/snapshot.py`
446
447	### Null-byte separators in hash computation
448
449	`compute_snapshot_id()` and `compute_commit_id()` hash a canonical
450	representation of the manifest. The separator between key and value is the
451	null byte (`\x00`) rather than a printable character like `\|` or `:`.
452
453	Why this matters: if the separator is `:`, then a file named `a:b` with
454	object ID `c` and a file named `a` with object ID `b:c` produce the same hash
455	input. The null byte cannot appear in filenames on POSIX or Windows, making
456	collisions structurally impossible.
457
458	### Symlink and hidden-file exclusion
459
460	`walk_workdir()` skips:
461	- Symlinks — following symlinks during snapshot could include files
462	outside the working directory, leaking content.
463	- Hidden files and directories (names starting with `.`) — `.muse/` must
464	never be snapshotted; other dotfiles (`.env`, `.git`) are excluded to prevent
465	accidental credential capture.
466
467	---
468
469	## Identity Store Security
470
471	Module: `muse/core/identity.py`
472
473	The identity store (`~/.muse/identity.toml`) holds bearer tokens. Several
474	layered controls protect it:
475
476	\| Control \| Implementation \| Threat prevented \|
477	\|---\|---\|---\|
478	\| 0o700 directory \| `os.chmod(~/.muse/, 0o700)` \| Other local users cannot list or traverse the directory \|
479	\| 0o600 from byte zero \| `os.open()` + `os.fchmod()` before writing \| Eliminates the TOCTOU window that `write_text()` + `chmod()` creates \|
480	\| Atomic rename \| Temp file + `os.replace()` \| A crash or kill signal during write leaves the old file intact — never a partial file \|
481	\| Symlink guard \| Check `path.is_symlink()` before write \| Blocks pre-placed symlink attacks targeting a different credential file \|
482	\| Exclusive write lock \| `fcntl.flock(LOCK_EX)` on `.identity.lock` \| Prevents race conditions when parallel agents write simultaneously \|
483	\| Token masking \| All log calls use `"Bearer ***"` \| Tokens never appear in log output \|
484	\| URL normalisation \| `_hostname_from_url()` strips scheme, userinfo, path \| `https://admin:secret@musehub.ai/repos/x` and `musehub.ai` resolve to the same key \|
485
486	---
487
488	## Size Caps
489
490	\| Constant \| Value \| Where enforced \|
491	\|---\|---\|---\|
492	\| `MAX_FILE_BYTES` \| 256 MB \| `object_store.read_object()` — cap per-blob reads \|
493	\| `MAX_RESPONSE_BYTES` \| 64 MB \| `transport._execute()` — cap HTTP response body \|
494	\| `MAX_SYSEX_BYTES` \| 64 KiB \| `midi_merge._msg_to_dict()` — cap SysEx data per message \|
495	\| MIDI file size \| `MAX_FILE_BYTES` \| `midi_parser.parse_file()` — cap file size before parse \|
496
497	---
498
499	See also:
500
501	- [`docs/reference/auth.md`](auth.md) — identity lifecycle (`muse auth`)
502	- [`docs/reference/hub.md`](hub.md) — hub connection management (`muse hub`)
503	- [`docs/reference/remotes.md`](remotes.md) — push, fetch, clone transport
504	- [`muse/core/validation.py`](../../muse/core/validation.py) — implementation
505	- [`tests/test_core_validation.py`](../../tests/test_core_validation.py) — test suite

Content Address

Object ID (SHA-256)

e5001228c4f961f2c2dd258c6f06b5224ceac29eefc2a5cfbb64a3719bc90f3c

This file is immutable and content-addressed. The same SHA always refers to the same bytes, across every clone and every time.

File Info

Path docs/reference/security.md

Lines 505

Size 18.5 KB

Language markdown

Ref a44ac734

Snapshot 5380f2421dc5…

Last Modified

7855ccd0

feat: harden, test, and document all quality-dial changes

Gabriel Cardona <gabriel@tellurstori.com> 2d ago

View commit →

Links

Browse tree at a44ac734 All commits View raw