Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sources are able to access and delete attachments/uploads because they know the file_id #10

Open
eaon opened this issue Feb 15, 2023 · 9 comments
Labels
protocol research Issues for tracking protocol research and choices

Comments

@eaon
Copy link
Contributor

eaon commented Feb 15, 2023

If I understand everything correctly, then the knowledge of a file_id allows anyone to delete the file on the server. file_id is also sent by sources to journalists - right after uploading an attachment (or even when messaging in the case of #8) malicious sources could immediately delete the file after.

If this is the case and if we want to mitigate that (attack scenarios I can imagine are repeated large uploads with subsequent deletes that would wear down the SSDs of the server eventually resulting in an eventual hardware failure based DoS), the easiest solution would be to introduce a shared secret between journalists and servers. That would make journalist traffic identifiable though which isn't desirable.

I believe I have an idea of what a solution could look like that avoids distinguishing between sources and journalists, but want to check with y'all before I invest time in thinking through that scheme.

@eaon eaon changed the title Sources may be able to repeatedly upload and delete attachments/messages because they know file_id Sources may be able to repeatedly upload and delete attachments/messages because they know the file_id Feb 15, 2023
@smaury
Copy link
Collaborator

smaury commented Feb 15, 2023

Nice catch!
I guess that as of now anyone on the internet could upload files as big as they want without being authenticated or referencing a valid ID.

https://github.com/lsd-cat/securedrop-poc/blob/main/server.py#L84-L99

@eaon eaon changed the title Sources may be able to repeatedly upload and delete attachments/messages because they know the file_id Sources are able to access and delete attachments/uploads because they know the file_id Feb 17, 2023
@eaon
Copy link
Contributor Author

eaon commented Feb 17, 2023

OK this expanded a bit since I posted the issue originally. I tried to make this as accessible as I could for everyone to follow along:

Present

My understanding of the present flow of attachment uploads is as such:

  1. p1 and p2 refer to either a journalist or a source, but never two sources.
  2. p1 uploads attachment
  3. Server responds with file_id
    a. file_id are known to p1 and the server
    b. attachment with file_id can now be accessed by the p1 for as long as it is stored on the server
    c. attachment with file_id can now be deleted by p1
  4. p1 embeds file_id into message to p2
  5. p2 checks messages, learns about file_id
    a. file_id now known to p1 + server + p2
    b. attachment with file_id can now be accessed by the p2 for as long as it is stored on the server
    c. attachment with file_id can now be deleted by p2

If this is correct, I believe that 2b and 2c are situations we'd prefer to avoid, as we'd essentially create an anonymous command and control server that does not interfere with the functionality of the rest of the server at all, or would allow malicious actors to deteriorate the system as fast as rate-limiting (of a non-proof-of-concept implementation) allows.

Desired functionality

In the short-hand I've used above: we want p2 to be able to download and delete uploads, but p1 only if they are a journalist. To expand it and contextualise it further:

  1. We want sources not to be able to access their uploads at any point
  2. We want sources not to be able to delete their uploads at any point
  3. Sources should be able to delete replies to them (in the case of Treat messages and metadata as attachments #8 they are uploads)
  4. We want to have symmetry for file handling (as Add symmetry also for /file endpoint  #3 points out)
  5. We want journalist and source interaction with the server look the same from the server's point of view

Potential solution (outdated)

Edit: this has been superseded by this comment

I spent some time to think through this and the only workable version I could come up with is described below. There may be better ways of arriving at the desired functionality, this is a first stab at it.

Prerequisites

  1. Introduce another pool of ephemeral public/private keys used exclusively for attachments
  2. Journalists share private keys for the attachment pool with each other (In-band or out-of-band)
  3. The public keys are used by the server to encrypt the file_id for journalists, preventing sources from knowing the file_id
  4. (Optional) The public keys may also be used by sources to encrypt the upload, therefore skipping the symmetric encryption stage

Flow

Below I'm describing 1-3. (4 is just speculation and would have to include a stage where the public key is also shared with the source and would replace the symmetric encryption.)

  1. Like before: p1 and p2 refer to either a journalist or a source, but never two sources.
  2. Journalists upload a long-term-key signed ephemeral public-key bundle for the attachment key pool
  3. p1 uploads file
  4. Server uses a random public key from the attachment key pool to encrypt file_id, resulting in encrypted_file_id, but stores attachment under file_id
  5. Server shares file_id_encrypted with the p1
  6. p1 embeds file_id_encrypted in the message to p2
    • If p1 is a journalist
      • they also embed the private key to allow sources to know the file_id of their reply
      • they know the file_id and are able to access it and delete it
    • If p1 is a source
      • they pad the message to make it indistinguishable from a journalist reply
      • they don't know the file_id, and are unable to access or delete it
  7. p2 checks messages, learns file_id_encrypted
    • If p2 is a journalist
      • They learn file_id by using the respective key from the attachment pool
    • If p2 is a source
      • They learn file_id by using the key embedded in the message they were able to download
  8. p2 can now download and delete uploads

Addendum

This bums me out a bit as it is quite a bit of overhead, but I believe it's manageable as the rest of the scheme so far is also quite minimal. Incidentally, this would also address #4 because the file_id isn't an arbitrary value set p1 anymore.

However: If we don't deem point 4 and 5 of Desired functionality important enough, the easiest "fix" would be a shared secret between server and journalists. This would imply that that a server would be able to distinguish between journalists and sources at every upload check, as uploads for sources and journalists would have to be treated differently.

Happy to discuss sync or async in this ticket.

@TheZ3ro
Copy link
Collaborator

TheZ3ro commented Feb 18, 2023

I read that carefully and I like your proposal when countering the "spam" and large/multiple uploads, but I think that even in this way a source could always submit multiple files (send multiple submission) even if we establish a Journalist-Server secure channel and we encrypt file_ids.

About deletion, IMHO we are pursuing the "wrong" goal. Even before the PoC I was not really into exposing the file download and file deletion APIs. The actual PoC implements those feature server-side but by definition we cannot trust the server into serving the right file or actually deleting the file (for a bug or if LE has control of the server).
Could actively deleting a file configure as "deleting evidences" and leak sensitive metadata?

So the solution is implement the function in the only parties we trust:

  • the Journalist by deleting the symmetric key and more importantly the ephemeral key so that the submission will not be fetched even if the server still serves it.
  • the source that performed submission by deleting the symmetric and ephemeral key for the same reasons.

@eaon
Copy link
Contributor Author

eaon commented Feb 18, 2023

Interesting points!

  • the Journalist by deleting the symmetric key and more importantly the ephemeral key so that the submission will not be fetched even if the server still serves it.
  • the source that performed submission by deleting the symmetric and ephemeral key for the same reasons.

On the journalism side I can see how this would work, on the source side I'm having a harder time imagining it playing nice with plausible deniability and the related UX implications.

To elaborate, since the source will not have local state and will always depend on what the server provides it with (other than its key derived from the passphrase, that is), not having a deletion API exposed would also mean that one of the advantages of the current design is slightly weakened: generally, when there's no reply on the server, there's no way to prove that this "account" (key) has ever been seen by the server, but, if there is a reply that cannot be deleted by the source, the existence of the source is provable for the window in which the server is keeping replies for it available, instead of allowing it to be decided by the source. For further context, in the current SD design, sources are already encouraged to delete replies from journalists as soon as they read them, but we also don't have reliable data on whether this actually happens. Source accounts however, can only be deleted by journalists.

But speaking of not having data, we also don't know how long we'd need to keep replies around in the first place, whether sources return regularly, sporadically or at all, and with the PoC basically making journalist and source interaction with the server the same, I'm worried that we'd then have to default to a very long retention period, effectively replicating the same issue with accounts being provable (as is the case with SD right now) despite there not being "real" accounts in the PoC.

As far as metadata leakage goes, I also wonder about this and would like more analysis of this point, but currently I'm sceptical that deletion is in any way worse than access in the first place.

Other than that, I am no lawyer but at the point where they can prove you are a whistleblower, "tampering with evidence" may be the least of your problems.

@cfm
Copy link
Member

cfm commented Feb 21, 2023

Trying to find a way to avoid #10 (comment)'s extra key-pool; please poke holes liberally:

  1. Source submits X encrypted to journalists
    • Source learns X's ID
  2. Server "submits" X' encrypted to journalists (cost: O(2n))
    • X' "acquires lock" on X
    • Source does not learn X''s ID
  3. Journalist decrypts X and X'
    • Journalist learns both IDs
  4. Journalist requests deletion of X' and X
    • Server deletes X'
    • X' "releases lock" on X
    • Server deletes X

Leakage: If the requestor knows about X', then they are Journalist.

@eaon
Copy link
Contributor Author

eaon commented Feb 21, 2023

Similar approach I've considered but discarded due to concerns re symmetry/potentially easier traffic analysis:

  • For every file upload, the server generates a (random) secret
  • This secret is then sent to all the journalists (using keys from the ephemeral pool)
  • This only allows journalists to delete/access source submissions
  • As for journalist replies, because journalists always receive secrets for uploads, journalist clients can wait until the arrival of that secret arrived to then share it with a source to allow them to access/deletion the attachment/file

Giulio also pointed out that another ephemeral pool or using ephemeral keys for this may not be necessary, as the long-term keys could be used for that too. The major downside of such a solution is the round-trips this would make necessary, or more explicitly: journalists would need to sync messages to send a message …

Edit: come to think of it, we could just have a journalist-only endpoint that shares secrets and only responds to signed requests. Even easier

@eaon
Copy link
Contributor Author

eaon commented Feb 22, 2023

Going through the shared secret bit again, I realised that I've only considered that the secret is coming from the server or is "static" and shared by server and journalists. Neither of these approaches work with the symmetry constraint I've tried to keep in mind. But if it is provided by journalists and used by servers like the public-key based scheme above, I think we'd still be good.

My instinct was to use public keys so as to not trust the server that much but we already have to trust it to at least serve the right files, from which follows that using cryptography here rather than "plain text" secrets is overkill. So here's the revised scheme:

  1. Like before: p1 and p2 refer to either a journalist or a source, but never two sources.
  2. Journalists upload a long-term-key signed public-secret key-value pairs. All public-secret key-value pairs must be shared among all journalists and the server.
  3. p1 uploads file
  4. Server saves file and uses a random public-secret pair to require token secret to access/delete file_id with
  5. However, in its response, the server shares file_id and public, which only points to the token secret for journalists who already have access to all public-secret pairs.
  6. p1 embeds file_id in the message to p2
    • If p1 is a journalist, they already know public-secret pairs, so using the response from the server they also embed the secret in the message to p2
    • If p1 is a source, they don't have access to secret, so instead they also embed public returned by the server
  7. p2 checks messages, learns file_id
    • If p2 is a journalist they also learn public which they can map to secret (or they might also learn secret immediately in journalist-to-journalist messages)
    • If p2 is a source they also learn secret because journalists embedded it
  8. p2 can now download and delete uploads

Downside is that this does also require a pool of public-secret key-value pairs, but since they don't have to be actual keys, generating them is much less resource intensive, so we can probably generate enough at once to last us a while.

@eaon
Copy link
Contributor Author

eaon commented Oct 30, 2023

Given the recent changes with regards to the protocol as well the changes to the codebase, I've been revisiting this because I've never really loved the token pair solution. Maintaining a list of token pairs is rather unnecessary, it's overall more straightforward if journalists had access to a private key that the server generates and distributes every time a new journalist is being onboarded/offboarded.

For every attachment, the server would generate an ephemeral secret, the public key would be sent back as a response to a successful upload. The attachment's address would then be the shared secret between the (server distributed) journalist's key, and the ephemeral upload key.

Bonus: when a party gets offboarded, all attachment addresses can be recalculated at once, barring an offboarded party from accessing attachments, as opposed to the token pair version in which (if they weren't replaced) an offboarded party could theoretically keep the private list and try to download attachments without being found out.

@eaon
Copy link
Contributor Author

eaon commented Oct 30, 2023

Devil is in the details:

Existing messages can't just be recalculated, because shared secrets (ie. attachment addresses) would be embedded in messages to sources. So old "addresses" / shared secrets would have to be retained until the last message of the set that's saved at the time (of private key rotation) has been deleted.

Some metadata to help identify which private key is to be used for a particular message would probably make this a lot easier to handle in practice.

Also: the server should probably not be the one to generate the private key and then send it out, that would make identity correlation way too easy. So probably a newsroom responsibility?

@lsd-cat lsd-cat added the protocol research Issues for tracking protocol research and choices label Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol research Issues for tracking protocol research and choices
Projects
None yet
Development

No branches or pull requests

5 participants