Designing access for Talk recordings

A recording is not a file with a video player. It is a long-lived artifact whose access rules outlive the call, the membership, and sometimes the room itself. Here's a small framework for getting those rules right — and what Nextcloud Talk actually does today.

01The premise

A meeting recording sits in an uncomfortable place. It looks like a file, but a file's access model — "whoever can open the path can play it" — is wrong for it. It looks like a chat message, but a chat message's access model — "anyone currently in the room" — is also wrong for it. Recordings outlive memberships. They are evidence. They are subject to legal hold. New users joining a room next year should not automatically gain access to last quarter's HR review, and old users removed from a room should not retain access to anything that was discussed after they left.

Most of the hard work in shipping a meeting recorder is therefore not capturing media. It is deciding, before any code is written, what authority controls who can later watch the result — and where that decision lives in the system. Below is a small framework for thinking about that authority, written for an engineer who knows Nextcloud Talk well enough to recognise the moving parts but is about to make architectural choices that will outlast the v1 ship date.

02Three layers, kept separate

Before access control, language. Three concepts in Talk get confused with each other, and confusing them is where most recording bugs start.

Room: The conversation container. In Talk's model¹ a room has a token, a type (ONE_TO_ONE, GROUP, PUBLIC, plus NOTE_TO_SELF, CHANGELOG, and historical variants), an object association (breakout, event, instant-meeting, file, email), listability (LISTABLE_NONE/USERS/ALL), read-only state, lobby state, default participant permissions, password and public access, federation state (HAS_FEDERATION_*), recording state (RECORDING_NONE/VIDEO/AUDIO/STARTING/FAILED), and recording-consent settings. It is durable. It is where recording metadata should hang.
Attendee: The durable relationship between a room and an actor. Spreed's Attendee enumerates nine actor types:² users, groups, guests, emails, circles, bridged, bots, federated_users, phones. An attendee row says this actor has a standing relationship with this room; it does not mean they were on the call. It is the correct anchor for "who can come back later and replay."
Session: The live-state layer. Who is connected to the signalling server right now, who has audio enabled, who's screensharing. Useful for presence indicators and call UI. Not suitable as the basis for recording ACL. If a moderator drops their wi-fi during the call, they did not lose access to the recording.

Authorize off the attendee table or the room's policy. Never authorize off the session table. Sessions tell you who is in the building today; they cannot tell you who is allowed to come back tomorrow.

03What Talk does today

Before designing v2 it is worth being honest about v1, because v1 is what every reader is replacing or extending.

The recorder itself is not part of the PHP server. It's a separate HTTP daemon — nextcloud-talk-recording³ — that joins the call as a participant via the standalone signalling server (HPB).⁴ It captures the media stream out-of-process, encodes a file, and then POSTs that file back to spreed at /api/{v}/recording/{token}/store, owned by a chosen user. The file lands in that user's Nextcloud Files.⁵ A second endpoint, /api/{v}/recording/{token}/share-chat, publishes it into the conversation as a chat-attached file-share.⁵

Three consequences worth holding in mind as you design anything new:

Storage today is Files-backed, not app-private. Whatever recording you ship is a regular Nextcloud file owned by a user, subject to that user's quota, trashbin behavior, sharing controls, and direct WebDAV access.
"Publishing to the room" is a chat-attached file share. A recording is visible to room members not because the room owns it, but because a file share with a chat reference exists. The file-share's lifecycle is decoupled from the room's lifecycle — that's the source of a lot of edge-case bugs.
The recording daemon is part of the trusted computing base. It receives raw call media. It uploads files using an HPB-shared-secret-authenticated channel. If you build new authorization on top, the daemon's upload path is a peer security boundary, not application code you can ignore. (The protocol is closely related to the one used by other external apps in Nextcloud — see the AppAPI piece for how those signed-bearer channels work in general.)

Any future "room-scoped recordings" model has to either keep this pipeline and reframe what the file means, or replace the pipeline and re-implement the recorder. The interesting work is the reframing.

04The decision that matters

Most product debates about recording features go in circles because they are arguing about a UX outcome — "can Alice see the recording?" — when the underlying question is about which moment in time Alice's membership is being evaluated. There are two coherent answers. They behave differently when membership changes, and the choice is load-bearing.

Current-room ACL

Access is evaluated at view time. If the actor is in the room now, they can see all of the room's recordings.

Simple to implement; matches chat-history semantics.
Removed users lose access immediately, including to recordings of meetings they attended.
New users gain access to all historical recordings the moment they're added.
Group and circle changes propagate automatically — sometimes too automatically.

Snapshot ACL

Access is frozen at stop time. The set of authorized viewers is recorded with the recording.

Better match for compliance, HR, customer-support, and interview contexts.
Harder with groups, circles, federation, and public rooms — the expansion has to be persisted or re-resolvable.
Removed users may keep access unless the snapshot is re-resolved on disable/offboard events.
New users do not silently inherit history.

The article you may be reading this to replace recommended current-room ACL as the v1 default. Don't do that. Current-room ACL is the right default for chat (which Talk already has) but it is the wrong default for recordings: recordings are the part of the conversation that outlives the conversation, and binding their visibility to today's membership silently leaks historical context every time a project room reorganises.

Default to snapshot ACL evaluated at recording stop, for authenticated room participants at that time. Offer current-room ACL as an explicit room-level opt-in, suitable for open collaboration rooms where shared history is the point. Treat "publish to room" as a separate, explicit act in either mode.

Snapshot is harder to implement — group and circle membership has to be expanded and persisted, federation has to be resolved into stable identifiers, deleted/disabled users need explicit cleanup — but the difficulty is doing the right thing, not doing the wrong thing.

05Storage is not authorization

Where the blob lives and who can read it are different decisions. The temptation is to let them collapse into each other: "we already have Files, just write a file and let Files permissions sort it out." That works for trivial cases and breaks for everything else. Decide them separately.

Files-backed: What Talk does today. Recordings are Nextcloud files owned by a user. Inherits quotas, trashbin, versioning, server-side encryption, the apps tab, and (crucially) the file-share surface. Easy to ship; easy to leak. Public-link sharing on the file silently escapes any recording-level policy you layer on top, so you have to constrain the share surface explicitly or pretend it doesn't exist.
App-private: The blob lives behind your own streaming and download endpoints; the recording metadata table is the only entrypoint. Cleaner trust boundary, harder to do well: you re-implement range requests, previews, retention, quotas, trashbin behavior, and a story for object/external storage. The right choice when recording access is materially different from "a file the user owns."

The decision affects more than tidiness. A non-exhaustive list of things storage choice changes:

End-to-end encryption. Server-side recording is fundamentally incompatible with E2EE: if the server can record the media stream, it can decrypt the media stream, and the room is not E2EE. Any future E2EE Talk mode either disables recording or moves it client-side with a key-holding participant.
Encryption at rest. Files-backed inherits whatever server-side encryption the deployment runs. App-private has to implement its own story, or be honest about not encrypting.
Quotas. Files-backed eats the owner's quota — choose the owner deliberately, because that user is the de-facto fiscal sponsor of every recording in every room they moderate. App-private bypasses user quota and forces you to design retention.
Trashbin, versions, public links, resharing. Files-backed inherits all of these for free, including the ways they can route around the recording policy. App-private inherits none, including the ways trashbin would have saved you from a fat-fingered delete.
Transcripts and summaries. Treat as separate sensitive artifacts. Whatever ACL the recording carries, the derived text should inherit. A summary leaking to a non-member is the same incident as the recording leaking.
Previews, transcodes, intermediate blobs. Every derived asset is a copy of the recording at a different fidelity. Same ACL applies, every endpoint that serves them passes through the same check.

06The policy matrix

A first draft of who-sees-what. These are defaults; specific deployments will want to relax or tighten them, but the defaults should be honest about what they imply.

Actor or situation	View	Why this default
Logged-in attendee, present at recording stop	yes	Authorized at the moment of capture. Loses access on offboard/disable.
Logged-in attendee, added after recording	no	Snapshot default. Joining a room does not grant retroactive replay.
Logged-in attendee, removed after recording	no	Disable/offboard events re-resolve the snapshot.
Room moderator at view time	yes	Matches Talk's moderator-centric recording controls; can also manage.
Anonymous guest in a public room	no	Replay is a stronger capability than live-call join.
User who can discover a listable room	no	Discoverability is not membership.
Holder of the room password	no	A password grants entry, not archive access. Different capability.
Federated user from a peer instance	policy	Requires cross-instance identity resolution. Default no until the federation layer can prove identity at view time.
Instance administrator (operational UI)	no	Admins can manage retention, storage, legal hold, audit. Playback requires an explicit, audited capability.
Instance administrator (legal hold export)	yes	An audited, justified capability — not silent playback.
Recording bot / daemon (write)	yes	Distinct actor; the upload path is its own auth surface.

07Where the simple model breaks

"Recordings are durable room artifacts" is a good slogan. It is also a half-truth, because some rooms are not the kind of thing that holds artifacts.

One-to-one rooms have no moderator concept distinct from participant; both parties are de facto equal. Recording controls and post-call ownership are symmetric. If one party deletes the conversation, the other's access has to survive somehow, which usually means the recording must outlive the room as a personal-files artifact.
Breakout rooms are children of a parent. The lifecycle is the parent's; the participants are a subset of the parent's. A recording made in a breakout is usually meaningful to that breakout's participants and the parent's moderators, not to every parent-room member. The simple "room-scoped" model has to learn about parent/child explicitly.
Federated rooms have attendees on remote instances. The recording file lives on the host instance, but a federated user's authentication lives on theirs. Either the host serves the recording over a cross-instance proxy, or the host pushes a copy to each peer, or federated viewers don't get access. There is no obvious right answer; the wrong answer is pretending the federation boundary doesn't exist.
Note-to-self rooms are personal. Treating recordings here as "room artifacts" is overkill — they are private user media. The model should special-case them or accept that the ACL is trivially "the user, and no one else."
Long-lived "project" rooms are where current-room ACL leaks the most badly. A room used for two years accumulates context that current members did not consent to having. Snapshot defaults pay rent here.
End-to-end encrypted rooms (when Talk grows them) cannot be server-side recorded at all. The bot has no keys. Any recording feature in an E2EE room must be client-side, by a participant holding the keys, with the recorded artifact opted-in explicitly.

08Architecture shape

Once the policy is decided, the implementation tries to do one thing: make every read of a recording resolve to the same access check. Not duplicated in five controllers. Not different between WebDAV, the chat surface, and the recordings list. Not subtly different in the background job that builds previews. One service. Every entrypoint.

Recording metadata is its own table.
Room id, room token, owner/recorder actor, start and stop timestamps, the file reference, media type, size, the ACL mode at capture, and the resolved snapshot of authorized actor identifiers. Separate from the file. Survives file moves.
Blob storage is a separate decision, not a security model.
Files-backed or app-private — pick deliberately. Do not let path visibility imply recording visibility.
Every read passes through one policy service.
Call it RecordingAccessPolicy or whatever you like. Modeled after Talk's own ParticipantService and Nextcloud's IShareManager: a single class with canRead(), canManage(), canPublish(), canDelete() methods. Controllers, jobs, WebDAV plugins, share-to-chat, transcripts, summaries, deletion, previews — all call into it.
Talk access is reached through a thin adapter.
A RoomAccessAdapter resolves room/participant facts on behalf of the policy. Hides the direct dependency on Talk internals so the policy survives spreed refactors and federation changes.
The recording daemon is a peer, not an implementation detail.
Its upload path is a distinct authentication surface (HPB-shared-secret today). Audit it separately. A compromised recorder is a compromised every recording.
The UI shows room resources, not file paths.
Recordings appear in the room sidebar and in a per-user "recordings I can access" view. Hidden recordings are filtered server-side: the client never receives metadata for things the policy says it cannot see.

09Implementation checklist

The compact version, for PR reviews and architecture discussions:

Recording is anchored to room id and room token, not to the file path.
Recording metadata is independent of the blob; metadata survives file moves and trashbin.
Every read endpoint — list, metadata, playback, download, transcript, summary, share, delete — calls the same policy service.
The client never receives unauthorized recording ids or metadata; filtering is server-side.
Start, stop, publish, and delete are distinct permissions; read is yet another.
The snapshot of authorized actor identifiers is persisted at stop time, not re-resolved from current membership at view time.
Disable / delete / offboard of a user removes them from snapshots they're in.
Anonymous and listable-discoverer access to recordings is off until a deployment explicitly enables it.
Transcripts, summaries, previews, transcodes inherit the recording's ACL — not the file's.
Public-link shares on the underlying file are disabled or constrained for recording-owned files.
The recording daemon's upload channel is a separate, audited authentication surface.
Federated and breakout-room cases have explicit policy entries, not a fallback to "current room members."
E2EE rooms cannot start server-side recording — the start endpoint refuses with an explicit error.
Talk internals are accessed only through a RoomAccessAdapter.
Audit logs record view, publish, delete, and permission-change events at the policy layer.

One sentence: recordings are durable artifacts whose visibility is decided by a central recording-access policy that authorizes against a snapshot of room access at capture time, treats storage as an implementation detail, and refuses to let any client surface bypass the check. Everything else is plumbing.

Sources

Room states and constants in spreed/lib/Room.php: types, object types, listable, recording state, federation, read-only, mention permissions.
Nine actor types in spreed/lib/Model/Attendee.php: users, groups, guests, emails, circles, bridged, bots, federated_users, phones. (Plus the system pseudo-actors cli, system, sample, changelog, which are not user-facing.)
The recording daemon — nextcloud/nextcloud-talk-recording. README: "official recording server to be used with Nextcloud Talk."
The signalling server — strukturag/nextcloud-spreed-signaling (HPB). Required by the recording daemon.
Spreed-side recording endpoints — RecordingController.php exposes POST /api/{v}/recording/{token}/store for the daemon's upload and POST /api/{v}/recording/{token}/share-chat to publish into the chat. Storage flows through RecordingService.php into $rootFolder->getUserFolder($owner) — Files-backed by construction.
Existing precedents for the central-policy pattern: Talk's own ParticipantService centralizes room/participant permission checks; the Nextcloud platform's OCP\Share\IManager centralizes share-permission checks. A RecordingAccessPolicy is a domain-specific instance of the same idea — not a new architectural pattern.

Cross-LLM fact-check on this piece: GPT-5.5, Gemini 3.1 Pro, and Kimi K2.6 each pushed back on the original draft's "current-room ACL by default" recommendation; all three flagged the missing actor types and the HPB/daemon trust boundary. Where one model offered claims I could not verify (a fourth GUEST access level on rooms, a Podman-version detail), I read the source directly rather than averaging votes.