Archive for June, 2014

An MPC based privacy-preserving flexible cryptographic voting scheme

Some of the big problems with cryptographic voting schemes are to ensure anonymity for the voters, ensuring that votes can’t be manipulated or otherwise tampered with, that you can be certain your vote has been included and counted correctly, that the full vote count is performed correctly, that the implementation is secure, that votes can’t be selectively excluded, that fake votes won’t be counted, etc…

My suggested voting scheme below attempts to account for all these problems, as well as some more. Originally I posted about this idea on Reddit here; http://www.reddit.com/r/crypto/comments/r003r/are_others_interested_in_cryptographybased_voting/c42lo83

This voting scheme relies on the usage of public key encryption (asymmetric cryptography), Secure Multiparty Computation (MPC), Shamir’s Secure Sharing Scheme (SSSS), Zero-knowledge proofs (ZKP) and personal smartcards to implement signing and encryption of the votes.

Every voter has their a personal keypair, using asymmetric key cryptography, on a smartcard that may be embedded on a smartcard on for example state issued ID cards. As a simple way of improving the security margin (to reduce the risk of the private key having beenĀ  logged and/or of the key being extracted in transmit through TEMPEST class attacks), a new keypair is generated on the card when the owner has received it and digitally signs a notification to replace the old keypair. The card issuing entity verifies the identity of the voters and thus of the card owners, and tracks which public key is linked to each card.

Secure Multiparty Computation (MPC) can described as a way of letting several entities create a shared “virtual machine” that nobody can manipulate or see the inside of, in order to simulate a secure trusted third party server. Thanks to advanced cryptography, we can use distrust to our advantage since strong implementations of MPC can’t be exploited unless the majority of the participants collude maliciously against the rest. A number of different organizations with conflicting interests participate in the MPC based voting process, such as EFF, ACLU, NSA, FBI, White House, those running the election and more. Because they all run one node each following the MPC protocols, the know nothing other than what they put in and what they are supposed to get as output from it – and because they DO NOT want to work together to spy on or alter the result, it’s safe!

As a part of the initial setup process, they all create a random “seed” each (a large random number) that they provide as input to the MPC. First of all, when the MPC system has the random seeds, it XOR them all together to ensure it’s random (XOR anything with a random string and the output is random – this means that only one participant needs to be honest and use a true random number). Then that output is used as the seed for generating secure keys and random numbers, including the main MPC voting system main keypair. The MPC participants also provide a list of theĀ eligible voters and their respective public keys. All participants must provide IDENTICAL lists, or the MPC algorithm’s logic will detect it and just stop with an error. This means that all MPC participants have an equal chance to verify the list of voters in advance, because the list can’t be altered after they all have decided together which to use. Something like a “vote manifest” is also included to identify the particular vote and declare the rules in use.

The MPC system will then use its main keypair to sign the voter list and the manifest, and then it will use Shamir’s Secure Sharing Scheme (SSSS) to split it’s private key into one part for each MPC participant, and provide each MPC participant with the public key, the signed manifest, the voter list and an individual share of the main keypair’s private key. SSSS is a method of splitting up data so that it only can be recovered if you have enough shares, which in the case of the vote system would be all all the shares of all the MPC participants (setting other tresholds is possible, such as 2 of 3 or 45 of 57 or anything else you need). If you have less shares than the treshold then you aren’t any better off than if you had none if you are trying to restore the data.

Time for voting. The public MPC key is now distributed EVERYWHERE. On every advertisement about the vote, the key is there (maybe in Qr code form). This ensures that everybody knows what it is, and thus we prevent man-in-the-middle (MITM) attacks against voters (which would be somebody swapping out the MPC key to find out what people voted for).

Now, the voter makes his vote. He generate a nonce (unique number used once), makes his vote, signs it with his keypair, and encrypts this with the public MPC key (the signing and encryption is both done on the personal smartcard in one go). This vote is now sent to the voting management organization (maybe this is done on-the-spot if the voter is at a voting booth). Since the vote wasn’t encrypted with the keypair voter, he CAN NOT decrypt it which means that nobody can prove what he voted for using just the encrypted message. To know what a person votes for, you need to physically watch him vote.

To add a level of transparency in the vote submission process, all votes are registered on a blockchain (a series of data blocks all linked to the previous block in the chain using cryptographic hashes, so that you can’t replace or modify a block without all the hashes in the chain after that changing) such as Bitcoin’s or Namecoin’s, and they are digitally signed by the voting management organization to prove they have seen them. This means that you can nearly instantly verify that your vote is going to be included in the count and won’t be excluded. Attempts at excluding votes from certain areas or certain users would be obvious and provable within hours. Encrypted votes can’t be modified without detection, and they can also NOT be modified in a way which would change what it would count towards and remain valid – any modified votes WILL be detected by the MPC system and rejected. Fake votes will also be detected and rejected. To make sure your encrypted vote will be counted, you just need to make sure it is included unmodified. When the time to vote ends, new submissions is no longer accepted or signed by the vote management organization.

For efficiency in the MPC counting and for transparency, the voting management organization gathers all the encrypted votes that was signed and registered in the blockchain, takes the hash of the last block and generates a Zero-knowledge proof of that all votes submitted before that last block with the given hash is included in the vote list. They digitally sign this vote list and publishes it with the Zero-knowledge proof.

Then it is time for the vote counting. The MPC participants then hands the MPC their individual SSSS shares for the master keypair, the signed vote list with the blockchain hash and the Zero-knowledge proof, the manifest and list of voters, the counting rules, and random seeds, and all other data it needs. The MPC keypair is reassembled inside the MPC system using SSSS. It verifies the Zero-knowledge proof of the vote list being complete, decrypts the votes, verifies all votes (checks signatures, syntax and that it follows the rules from the manifest), checks that no voter’s key is used more than once (duplicates are discarded; also, a vote of yours registered later in the blockchain could replace previous ones), and counts them according to the chosen method of vote counting. When it is done it generates the voting statistics as output where each vote is listed together with all vote nonces listed next to it, it specifies which blockchain hash it was given (to show it has processed all votes registered in the blockchain), references the manifest, and the MPC then signs this output. Except for the vote result itself, the statistics could also include things like the number of possible voters (how many there was in the voting list), the number of votes, how many parties there were, how many votes each party got, etc…

So now you search for your nonce in the output and checks that the vote is correct. The nonce CAN NOT be tied to you, it’s just some random number. You can lie that it belongs to somebody else, you can pretend to have another one. The number of votes can be verified. However, done in this way we’re vulnerable to a so called “birthday attack”. The thing is that if there’s been 20 000 votes for political party X and their followers threaten 5 000 people, chances are that more than one voter will claim the same nonce voting for party X is theirs (roughly 22% risk per-voter). So how do we solve this? Simple: Let the voter make both one real vote and several fake votes (“decoy votes”). Then the voter has several false nonces that he can give, including one that says that he voted for party X. Only the voter himself can know which nonce belongs to the real vote! To prevent the adversary that threaten him from figuring out if and how many false votes the voter made, the size of the encrypted voting messages should be static with enough margin for a number of “decoy votes” (if there’s several possible adversaries that could threaten you based on your vote). Now these guys could threaten 30 000 people, but even if there’s just 20 000 voters for their party, they can’t say which 10 000 it was that voted for somebody else or prove anybody wrong.

The best part? We can use ANY type of voting, such as preferential, approval, wheighted, ranked, etc! It’s just a piece of text anyway that allows for arbitary syntax, and you can “encode” ANY kind of vote in it! You can use a simple most-number-of-votes, or score from 1-10, etc…

In the end, you know that your vote has been counted correctly, everybody knows no fake votes have been added, that none has been removed, it’s anonymous, and the only way to force individual voters to vote as you wish is to physically watch them vote.

If you trust that these maybe +10 agencies won’t all conspire together against the voters (including EFF & ACLU?), you can be pretty sure the voting has been anonymous AND secure. The only way to alter the counting or other computational parts on the side of the voting management requires nearly full cooperation between people in ALL participating organizations that have full access to the machines running the Secure Multiparty Computation protocol – and they MUST avoid ALL suspiscion while at it!

Advantages

If you can distribute personal keypairs securely to the voters, nobody can alter/fake votes outside the Secure Multiparty Computation system.

  • A majority of the Secure Multiparty Computation participants have to collude and be in (near) full agreement to break the security of the system. If their interests are conflicting, it just won’t happen.
  • The security of the system relies on the cryptographic security + the low risk of collusion among enough MPC participants. If you accept both of these points as strong, this system is strong enough for you.
  • It’s anonymous
  • You can verify your vote
  • You can’t be blackmailed/forced to reveal your vote, because you can fake *any* vote

Potential weaknesses

  • The public won’t fully understand it
  • The ID smartcards with the personal keypairs must be protected, the new personal keys must be generated securely
  • We need to ensure that the MPC and Zero-knowledge proof algorithms really are as secure as we assume they are

I’ve changed the scheme a bit now from the original version. It should be entirely secure against all “plausible” attacks except for hacking all the MPC participants at once or against an attacker that can watch you physically while you make the vote. The latter should not be an issue in most places and can probably not be defended against with any cryptographic scheme, while the first is all about infrastructure security, and also not cryptographic security.

Feedback is welcome. Am I missing anything? Do you have any suggestions for useful additions or modifications? Comment below.

Advertisements

Basic blueprint for a link encryption protocol with modular authentication

The last few years we have seen more and more criticism build up against one of the most commonly used link encryption protocols on the internet, called SSL (Secure Socket Layer, or more precisely it’s current successor TLS, Transport Layer Security) for various reasons. A big part of it is the Certificate Authority issued certificates model of authenticating websites where national security agencies easily can get fake certificates issued, and another big part is the complexity who have lead to numerous implementation bugs such as OpenSSL’s Heartbleed and Apple’s Goto Fail and many more, due to the sheer mass of code where you end up not being able to ensure all of it is secure simply because the effort required would be far too great. Another (although relatively minor) problem is that SSL is quite focused on the server-client model, despite that there’s a whole lot of peer-to-peer software using it where that model don’t make sense, and more.

There’s been requests for something simpler which can be verified as secure, something with opportunistic encryption enabled by default (to thwart passive mass surveillance and increase the cost of spying on connections), something with a better authentication model, and with more modern authenticated encryption algorithms. I’m going to make a high-level description here of a link encryption protocol blueprint with modular authentication, that has been inspired by the low-level opportunistic encryption protocol TCPcrypt and the PGP Web of Trust based connection authentication software Monkeysphere (which currently only hooks into SSH). In essence it is about the separation and simplification of the encryption and the authentication. The basic idea is quite simple, but what it enables is a huge amount of flexibility and features.

The link encryption layer is quite simple. While the protocol don’t really have separate defined server / client roles, I’m going to describe how the connections work with that terminology for simplicity. This will be a very high-level description. Applying it to P2P models won’t be difficult. So here it goes (and to the professional cryptographers in case any would read this, please don’t hit me if something is wrong or flawed, please tell me how and why it is bad and suggest corrections so I can try to fix it);

The short summary: A key exchange is made, an encrypted link is established and a unique session authentication token is derived from the session key.

A little longer summary: The client initiates the connection by sending a connection request to the server where it initates a key exchange (assuming a 3-step key exchange will be used). The server responds by continuing the key exchange and replying with it’s list of supported ciphers and cipher modes (prioritization supported). Then the client finishes the key exchange and generates a session key and selects a cipher from the list (if there is an acceptable option on the list), and tells the server what it chose (this choice can be hidden from the network since the client can send the HMAC or an encrypted message or similar of it’s choice to the server). The server then confirms the encryption choice, and the rest of the encryption is then encrypted using that session key using the chosen cipher. A session authentication token is derived from the session key, such as through hashing the session key with a predefined constant, and is the same for both the client and the server, and the token is exposed to the authentication system to be used to authenticate the connection (for this reason it is important that it is globally unique, untamperable and unpredictable). Note that to prevent cipher downgrade attacks the cipher lists must also be authenticated, which could be done by verifying the hashes of the lists together with the session auth token – if the hashes is incorrect, somebody has tampered with the cipher lists and the connection is shut down.

And for the modular authentication mechanism:

The short summary: Authentication is made through both cryptographically verifying that the other end is who he claims to be and verifying that both ends have the same session auth token (it must not be possible to manipulate the key exchange to control the value of the session key and thus the session auth token). It is important that the proof of knowing the session auth token and the authentication is combined and inseparable and can’t be replayed in other sessions, so the token should be used as a verifiable input in the authentication mechanism.

A little longer summary: What type of authentication is required varies among types of applications. Since the authentication is modular, both ends has to tell the other what type of authentication it supports. A public server would often only care about authenticating itself to visitors and not care about authenticating the visitors themselves. A browser would usually only care about identifying the servers it connects to. Not all supported methods must be declared (for privacy/anonymity and because listing them all rarely is needed), some can be secondary and manually activated. The particular list of authentication methods used can also be selected by the application based on several rules, including based on what server the user is connecting to.

There could be authentication modules hooking into DNSSEC + DANE, Namecoin, Monkeysphere, good old SSL certificates, custom corporate authentication modules, Kerberos, PAKE/SRP and other password based auth, or purely unauthenticated opportunistic encryption, and much more. The browser could use only the custom corporate authentication module (remotely managed by the corporate IT department) against intranet servers while using certificate based authentication against servers on the internet, or a maybe a Google specific authentication module against Google servers, and so on. The potential is endless, and the applications is free to choose what modules to use and how. It would also be possible to use multiple authentication modules in both directions, which sometimes could be useful for multifactor authentication systems like using a TOTP token & smartcards & PAKE towards the server with DNSSEC + DANE & custom certificates towards the client. It could also be possible for the authentication modules on both ends to request the continous presence of a smartcard or HSM on both ends to keep the connection active, which could be useful for high-security applications where simply pulling the smartcard out of the reader would instantly kill the connection. When multiple authentication modules is used, one should be the “primary” module which in turn invokes the others (such as a dedicated multifactor auth module, in turn invoking the smartcard and TOTP token modules) to simplify the base protocol.

Practically, the authentication could be done like in these examples: For SRP/PAKE and HMAC and other algorithms based on a pre-shared key (PSK) both sides generate a hash of the shared password/key, the session auth token, the cipher lists and potentially of additional nonces (one from each party) as a form of additional challenge and reply-resistance. If both sides have the same data, then the mutual authentication will work. For OpenPGP based authentication like with Monkeysphere, a signature would be generated for the session auth token, both parties’ public keys and nonces from both parties, and then that signature would be sent stand-alone to the other party (because the other party already have the input data if he is the intended recipient), potentially encrypted with the public key of the other party. For unauthenticated opportunistic encryption, you would just compare the cipher lists together with the session auth token (maybe using simple HMAC together with challenge nonces) to make a downgrade attack expensive (it might be cheaper to manipulate the initial data packet with the cipher list for many connections so that the ciphertext later can be decrypted if one of the algorithms is weak, than to outright pull off a full active MITM on all connections).

I have also thought about how to try to authenticate semi-anonymously, i.e. such that neither party reveals who they are unless both parties know each other. The only way I think this is possuble is through the usage of Secure Multiparty Computation (MPC) and similar algorithms (SRP/PAKE is capable of something similar, but would need on average a total of x*y/2 comparisons of shared passwords if party A has x passwords and B has y passwords). Algorithms like MPC can be said to cryptographically mimic a trusted third party server. It could be used in this way: Both parties have a list of public keys of entities it would be willing to identify itself to, and a list of corresponding keypairs it would use to identify itself with. Using MPC, both parties would compare those lists without revealing their contents to the other party – and if they both are found to have a matching set of keypairs the other recognize and is willing to authenticate towards, the MPC algorithm tells both parties which keypairs matches. If there’s no match, it just tells them that instead. If you use this over an anonymizing network like Tor or I2P, you can then suddenly connect to arbitary services and be able to prove who you are to those you already know, while remaining anonymous towards everybody else.

It would even be possible for an application to recognize a server it is connecting to as a front-end for several services, and tell the authentication manager to authenticate towards those services separately over encrypted connections (possibly relayed by the front-end server) – in particular this allows for secure authentication towards a site that uses both outsourced cache services (like Akamai) and encryption accelerator hardware (which you no longer have to trust with sensitive private keys), making it cheaper to securely implement services like private video hosting. In this case the device performing the server-side authentication could even be a separate HSM, performing authentication towards clients on the behalf of the server.

The protocol is also aware of who initiated the connection, but otherwise have no defined server / client roles. Although the authentication modules are free to introduce their own roles if they want to, for example based on the knowledge of who initated the connection and/or who the two parties of the connection is. It is also aware of the choice of cipher, and can therefore choose to provide limited access to clients who connects using ciphers that are considered having low security, but still secure enough to be granted access to certain services (this would mainly be important for reasons such as backwards compatibility and/or performance on embedded devices).

The authentication module could also request rekeying on the link encryption layer, which both could be done using a new key exchange or through ratcheting like in the Axolotl protocol, or simply through hashing the current session key to generate a new one and deleting the old one from RAM (to limit the room for cryptanalysis, and to limit how much of the encrypted session data can be recovered if the server is breached and the current session keys is extracted).

But what if you already have a link encryption layer with opportunistic encryption or other mechanism that allow you to generate a secure session auth token? You shouldn’t have to stack another layer of encryption on top of it just to be compatible if the one you already are using is secure enough. There’s a reason the link encryption and authentication is separate here – rather than hardcoding them together, they would be combined using a standardized API. Basically, if you didn’t use the “default” link encryption protocol, you would be using custom “wrapper software” that would make the link encryption you are using look like the default one to the authentication manager and provide the same set of basic features. The authentication manager is meant to only rely on the session auth token being globally unique and secure (unpredictable) to be able to authenticate the connection, so if you can achieve that for the link encryption then you’re good to go.

(More updates coming later)

References:

http://web.monkeysphere.info/

http://tcpcrypt.org/

https://gist.github.com/Natanael90/556350

https://mailman.stanford.edu/pipermail/tcpcrypt-dev/2010-August/000007.html

https://whispersystems.org/blog/advanced-ratcheting/

http://www.metzdowd.com/pipermail/cryptography/2014-May/021475.html

A decentralized hash-chained discussion system

After thinking for a while about how I want a discussion system to work. Since I’ve seen numerous forums get abandoned, old archived discussions getting lost when servers crash, discussions jumping between various forums and chat rooms and blogs and more, etc, I came to the conclusion that I want a commenting system that is directly focused on the posts themselves, and which is decentralized and can easily reference external discussions, and where you don’t simply lose the history of old discussions because one server went down.

So with inspiration by Git and the Bitcoin blockchain, I’ve got an idea about a discussion system based on posts encoded in JSON (or a similar format, maybe XML), where each comment is signed by it’s author(s), references its author’s profile/ID (ideally in a verifiable manner, such as referencing a cryptographic public key), has topic tagging, which references to the comments it replies to with the hashes of the comments (so that anybody reading it can verify exactly what the author was responding to), and more.

The default use case I’m considering is the one that would work like mailing lists (email servers that relay messages sent to it to its subscribers, so you can discuss topics with a group of people by email). In this system the equivalent would be what I call a channel. A channel would be defined by an initial “channel definition post” that declares that a specific method of delivery (or several!) of the messages is to be used, what topics is allowed, moderation rules, where you can find an archive of messages, and other relevant details. You could then register with the server from your client (if registration would be required on the channel), send in your messages through the defined method (uploading to the server by default – serverless distribution methods would also be possible) and it would then relay it to all subscribers. On open lists where anybody can post directly, moderation posts could be used to declare that certain messages that went through are to be considered spam or breaks the rules, so that the subscribers’ software clients could delete or hide them automatically so that you don’t have to see them when you open your client to read the discussions on the channel. In a similar manner, moderation posts could also flag for rule updates and more in the channel.

Since we would be defining a standard format for how to encode the comments from scratch, we could also enable full semantic tagging from the start. Dates and events can be marked as such, just like addresses, phone numbers, even nicknames, and more. Disambiguation would be made far easier when you don’t have to wonder if you should write a long explanation or put details in paranthesis or omit it entirely hoping nobody will misunderstand you. Whenever you think a phrase or word or expression is unclear, you can just add a tag that shows what it means which would be hidden by default and that the readers can chose to display or not (and it would be possible to clarify that for example a word is used as a verb, or even make a link back to a previous or latter sentence in your post).

And since the whole discussions are defined simply by signed messages of a defined format that references each other by hashes, it suddenly becomes easy to let a discussion jump as a whole to other forums when the commenters agree that they want to continue it elsewhere. No longer do you have to simply cut-and-paste raw text if you want to import the discussion history, instead the posts can be reuploaded to the new place together and the whole history can be fetched by the subscribers’ client software when they want to see which posts is referenced or quoted, in a verifiable manner (the digital signatures allow you to verify the comments haven’t been modified).

This actually even enables the subscribers of two or more separate channels to crosstalk easily, since you can directly quote messages from the other channels / forums and at the same time include a reference to the “channel definition post” so that your client can see how to fetch the rest of the history of the discussion. So for example, in a channel about RC cars a quote could be made of a post in an electronics channel, allowing the RC folks to look up the rest of the discussion with just a few clicks, and to even join that channel to post questions which in turn reference the initial crosspost, further allowing the commenters on both channels to follow the discussions on each side. There’s even the possibility of having a shared discussion among several channels on multiple servers where all commenters only needs to reply to the discussion on their own channel, having it automatically synchronized among all the channels on all servers.

Author identities could be verified in several ways. Posts would as I mentioned be digitally signed (using public key cryptography such as ECDSA), and the public key would be included in the message. This public key could then be tied to for example a Twitter or Facebook account, or a GPG key, or a Namecoin profile (see onename.io), or whatever else you like. Your client would then (by default) verify that the public key used to sign the message can be found on the profile in question or is signed by it’s keypair. Combined with the previously mentioned address book software here on my blog, your client could automatically show which posts has been verified to be made by people in your address book, and the client could automatically verify Namecoin registered profiles through the signatures, etc. This way you can verify which posts have been made by the same person, and not just by different people with the same nickname. And since your profile also could have an index of all your previous public comments, your client could also trivially allow you to look up all public posts from people from all channels on all servers where they’ve participated in discussions.

(More updates coming later)

%d bloggers like this: