deltadeltaandmoredeltas

SHA-1 collision by Google and CWI!

On Feb 23, 2017, Google and news agencies like WSJ published a startling find! Researchers from the CWI Institute in Amsterdam and Google were successful in generating a hash collision with SHA-1.

Collision means that two different messages when hashed separately had the same hash. Hashing is converting a document that may be like 5MB to a 40 digit number. If you’re thinking, why hasn’t a collision been found earlier? Compressing a document to only 40 digits means that eventually there would a collision. Eventually yes but, 40 digits is a huge number and thus impractical randomly. Thus being able to engineer a collision in astonishing, a huge technical feat!

What these researchers were able to do is find a collision in almost 100,000 times faster than a strict brute force attack. Yes, they leveraged Google’s tremendous cloud infrastructure to do so but the point remains is they did it. That means others will say Watch me too!. The attack required over a lot of computations (9,223,372,036,854,775,808) i.e. 9 quintillion i.e. 6,500 years of single-CPU computations or 110 years of single-GPU computations. If someone wanted to replicate it, they likely could and Google says this is likely within a month. Appropriately, Google has provided this free detection link here that one can use to better understand the attack and ways to mitigate it.

The impact of this find is that an attacker could submit a malicious document that has the same hash as a benign one. SHA-1 is not some simple hash function. Because SHA-1 has been used for certificate verification or validation of documents, having someone break this system is disturbing. SHA-1 has been used for browser security and code management. GIT and SVN do use SHA-1. From ZDNet , at least the creators of Git do not think it is a huge concern. However, they mention that Git will be sunsetting SHA-1.

For most browsers though, they key word would be “has been” as Google Chrome has been sunsetting uses of SHA-1 for several years and same with Firefox. There was a wake-up call in 2005 when researchers in China found a theoretical method to find a collision which prompted people in the industry starting to shift hash functions. Now in Feb 2017, Google and CWI Institute have proven it in the practical sense. Bitcoin uses a SHA-256 hash function. More of the concern for bitcoin would be that the source code is all in Git but again, at least the creators of Git say “Don’t Worry”.

Consensus without Identities

These are my notes of the eighth lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. While the lecture is lengthy and I had to watch it a few time to grasp the concepts, I thought it was interesting.

Questions answered in this Post:

What is bitcoin’s stance on node identity? Why?
What is a Sybil attack?
What is implicit consensus?
What are some attacks discussed that this consensus protocol protects against?
What is an orphan block?
What is a zero-confirmation transaction and how to prevent it?

One major point was that bitcoin nodes do not have persistent long-term identities. Because of this, the consensus protocol attributes deviate from what is in the distributed systems consensus protocols. So, while we know that bitcoin is different, why exactly do we care? One thought is that because of this deviation, does this mean the existing algorithms are not compatible and thus something new have to be invented? Why are identities useful for distributed consensus protocols?

Below are the specific examples provided in the lecture:
Pragmatic: some protocols need node identification (id) such as a protocol could say use the lowest ranked ID to do .
Security: assume less than 50% malicious nodes for this protocol to work.
Prevent from a Sybil Attack An adversary makes multiple nodes and uses them to break the system.

Why don’t Bitcoin nodes have identities?

Since Bitcoin is a P2P system, no central body to assign identities to nodes
Pseudo-anonymity is actually an inherent gol of bitcoin

So instead of that, Arvind suggested something weaker in that the system is able to pick a random node in the system as well as determine who is the owner thus making the all Sybils get a single token. With this procedure then something called implicit consensus can occur.

Implicit Consensus (AKA Simplified Bitcoin Consensus algorithm)

It’s a procedure of adding new blocks to the blockchain in consensus.

Random node is selected and make it
proposed the next block in the chain
Other nodes accept or reject the block by determining whether or not to build on top of it
If accept, then this block gets added. If reject, they ignore the block and build over the current chain they have.
Nodes acceptance means adding the block’s hash in the next block they create

Why does this work? or how can a malicious adversary subvert the process?

Stealing Bitcoin – not if the underlying crypto is good
Denial of Service Attack – not unless all other nodes are malicious
Double-spend attack – add the heuristic that only add to the longest block

Can Alice simple steal Bitcoins belonging to another user at a different address that she doesn’t control?
Example:
It is not Alice’s turn to propose the next block in this chain.
She can’t steal other bitcoins because she can’t forge their signatures and thus if the underlying crypto is solid, one cannot simple steal Bitcoins

If she hates a node Bob, she can choose not to accept any blocks proposed by Bob. So she’s denying service to Bob.

Thus if Bob’s block doesn’t make it into the next block Alice proposes, he will just wait another block until an honest node gets the chance to propose.

Arvind claims this is nothing more than a little annoyance. I guess in my mind it was larger. First with this little annoyance, I see it as a delay since you’re waiting for the next honest node. The next honest node could take a very long time.

The last one Arvind talks about is the double spending attack.

Alice is a customer of a merchant and Bob is the seller.
Alice goes to Bob’s website and buys for it in Bitcoins.
She creates a Bitcoin transaction from her address to Bob’s address.
Thus it gets broadcasted to the network

C_A -> B

transaction
signed by A
pay to pk_B: H()

This transaction should have existed from a previous transaction. Thus there’s a pointer to that previous block.

Let’s say Alice is now proposing a transaction, decides to ignore the bock with Bob’s stuff and create a new transaction

transaction
signed by A
pay to pk_A: H()

Will they both end up in the long term consensus chain

There is an idea that the honest node will extend the longest valid branch. From the facts of the transaction, both leafs have the same number of transactions. They have the same length.

There is a heuristic where you choose the node that you received first. Thus there is a chance that the other block would get chosen, so what happens with Bob.

The chain with Bob meant that Alice got free software and Bob did not get any payment. The remainder block is called an orphan block.

Thus this would be a successful double spend.
Zero-confirmation transaction Bob would send over the software without waiting to see if Alice’s transaction actually made it into the blockchain. It’s like giving someone a product with only seeing that they have the money in their hand and that they hadn’t actually given it to you.

However, instead Bob should be more careful. He should wait for more confirmations before sending the software. Because he knows that the longer chain gets chosen, then he’s in the clear if he sees that his transaction has more blocks

Double spend probability decreases exponentially with # of confirmation
Most common heuristic: 6 confirmations

Recap

protection against invalid transactions is cryptographic but enforced by consensus
Protection against double-spending is purely by consensus
You’re never 100% sure a transaction is in the consensus branch. Guarantee is probabilistic. 1- (1/2)^6 = .9844

What can a malicious node do?
Ignore the longest valid branch rule when proposing a new block.

Distributed Consensus

These are my notes of the seventh lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. This was a great lecture in that it started more from fundamentals to explain how decentralization has been done in bitcoin.

Questions answered in this Post:

Define consensus and distributed consensus.
What is a distributed consensys protocol?
What does consensus mean in Bitcoin
How does consensus work in bitcoin?
With all the pessimism with distributed consensus, what enables bitcoin/makes it less at risk for these shortfalls?

Before we can started talking about distributed consensus and consensus for bitcoin, what on earth is consensus? According to the Merriam-Webster dictionary…

Consensus:

general agreement about something; an idea or opinion that is shared by all people in a group

Great, now what is distributed consensus?
Well, when related to computer science, how can multiple main computer nodes agree on a single idea/piece of data if they are receiving different inputs. Then a consensus protocol is the answer to the question on “How” these nodes can come into agreement. If one is interested then please google “distributed systems protocols” to learn about these. Or just look at things like distributed hash tables or Paxos.

Distributed Consensus Protocol

Def
fixed number of nodes, n
each nodes have some input value

each nodes have some input value
protocol terminates and all correct nodes decide on the same value
this value must have been proposed by some correct node
in this scenario there are bad, faulty, and malicious nodes as well as the correct nodes.

Consensus in Bitcoin

Bitcoin is a peer to peer system. When Alice wants to pay Bob, she broadcasts the transaction to all Bitcoin nodes that comprise the peer-to-peer network. Also, Alice and Bob are not the only individuals broadcasting out transactions.

signed by Alice
pay to pk_Bob: H()
contains Hash of the coins history

Bob is not on the system he needs to be a bitcoin node to hear it

Consensus in bitcoin means if all these people are broadcasting their transactions and all these nodes are hearing them, how do you determine which transactions were broadcast, which were valid, and in what order. By the end, there should be a single, global ledger that is maintained by all. However, it also means that nodes will also have transactions they have that may not be on the block yet. These are called outstanding transactions.

How consensus could work in Bitcoin?
at any given time:

All nodes have a sequence of blocks of transactions they’ve reached consensus on
Each node has a set of outstanding transaction they’ve heard about (these have not reached consensus

Scrooge Coin: transactions were put into blocks

Why is bitcoin blockchain consensus hard?

Nodes may crash
Nodes may by malicious (put invalid transactions in blocks)
Network is imperfect
Not all pairs of nodes connected
Faults in network (poor network connectivity)
Latency since these nodes are all over the internet

With high latency, this also means that there is no notion of global time. The ordering agreed upon does not indicate time. All we know is that one node was put one the block chain prior not which transactions actually was broadcasted and heard first. Because of this constraint, many consensus protocols in literature tend to be pessimistic. You get these impossibility results.

Impossibility Result Examples

Byzantine Generals Problem which resulted in if a third or more of the generals (nodes) were traitors when it was impossible to achieve consensus.

Fischer-Lynch-Paterson (deterministic nodes)
which proved that consensus impossible with a single faulty node

Paxos – never produces inconsistent result
protocol can get stuck and never make any progress

These pessimistic models tend to shed light on the idea that bitcoin is the same problem as resolving a distributed database and thus can make concessions that the others were unable to.

Consensus in bitcoin: theory versus practice

While theory may sound pessimistic, practice seems to work better. This may be due to special features of bitcoin. Also, bitcoin contains the idea of incentives because it’s a currency. Second, bitcoin embraces randomness where as proper distributed systems hate it. Actually the consensus algorithm relies on randomness. So while you can never by 100% confident, you can be extremely confident when a block has entered the ledger. Also, the latency is fine since at this point, it’s one every ten minutes.

Bitcoin decentralization?

These are my notes of the sixth lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. This lecture seemed more like a teaser to the follow up lectures in that much of what was discussed was evaluating the phrase ” Bitcoin coin is decentralized”. What parts of bitcoin are actually decentralized and where are the limitations?

Questions answered in this Post:

Where does decentralization currently?
Why is centralization vs. decentralization more a spectrum as opposed to an “all of nothing”?
When you think “bitcoin decentralized?”, what components support this thought and what go against this notion?

At this point, the last lecture made clear why centralization did not work for bitcoin. Do we really trust Scrooge? NO! His name is Scrooge…

Arvind goes into how bitcoin works with this decentralization. Decentralization is not all-or-nothing. Email is a decentralized protocol (SMTP) but has been dominated by centralized webmail services. Some examples would be like google/hotmail/yahoo.

Questions regarding the decentralization in Bitcoin?

Who maintains the ledger?
Who has authority over which transactions are valid?
Who creates new bitcoins?
Who determines how the rules of the system change?
How do bitcoins acquire exchange value?

I’ll put in what my answers were prior to the lectures.

Some majority of people on the network. When I say network, I guess there is a group of people (IP addresses/public keys) online who have copies of the ledger that they are constantly pinging out.
This majority. Perhaps several people have to say yes this transaction is valid. Since these people are perhaps trusted in other networks then they can check the transactions.
Miners create bitcoins. I’m not sure who they are or what they are doing besides running massive computations solving math puzzles.
Same answer as number 1.
Similar to any other traded security product.

Aspects of decentralization in Bitcoin

Peer-to-peer network: open to anyone, low barrier to entry
Bitcoin Mining: open to anyone, but inevitable concentration of power in the Bitcoin mining community
Updates to the software: core developer trusted by the community have great power.

Bitcoin has a peer to peer network meaning that anyone can log on to start up a full node. Thus there is a low barrier to entry and is it open to anyone. While it is open to everyone, there is quite a bit of disc consumption required. This Bitcoin.com link from Feb 2017, indicated that one user calculated their cost to be approximately $20.00/month to run a node using AWS. Since I have never run a full Bitcoin node, I can neither confirm or deny this statement. Another reason it is decentralized in that anyone can mine bitcoin theoretically. However, if you don’t have special hardware and are not part of a mining pool, there is a low probability of success. Lastly was the point regarding updates to the software. This task falls on the core developers who are trusted and supported by the bitcoin community.

Example of a simple cryptocurrency

1.5 A Simple Cryptocurrency

These are my notes of the fifth lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. This lecture covers simpler cryptocurrencies to allow one to think about the consequences of certain properties. Thus the takeaways for this post focus on these properties that are relevant to current currencies.

Questions answered in this Post:

What happens when only one party create coins?
What is a double spend?
What does append only ledger mean and why does it help stop double spends
What is an issue with Scrooge coin?

GoofyCoin

The first coin talked about was called GoofyCoin. Here are the main rules

Only Goofy can create new coins. He owns the new coins created.
Whoever owns the coin can spend it
A coin is a string “CreateCoin [uniqueCoinID]” and Goofy’s signature
One can verify the coin by looking at Goofy’s valid signature as they walk up the chain of previous signatures of a coin.

Goofy can create new coins. When he creates new coins, he generates a uniqueCoinID that is signed by pk_Goofy (public key Goofy). All new coins are owned by Goofy. Whoever owns the coin can spend it. Spending a coin means transferring the coin from one person to another which is done by cryptographic operations.

So if Goofy wants to transfer a coin to Donald, he has to go through a several steps. First steps wold create a new statement “Pay this to Donald” where this is a has pointer that references the coin in. The word “Donald” in the previous sentence refers to Donald’s public key. Goofy also signs the string since he owns the coin, he must sign that he is spending it. This statement now indicates that Donald now owns to the coin and he can spend it if he wants in a similar fashion. Each time an action is done to the coin, it gets chained together. The chain could be thought of as a linked list of hash pointers where hash pointer contains the hash of the previous coin. The initial creating pointer only contains a Create Coin uniqueCoinID statement and signature of Goofy.

Spending a coin means

Take the coins history

Donald payment
signed by pk_Goofy
pay to pk_Donald: H(head)

head
signed by pk_Goofy
CreateCoin [uniqueCoinID]

Now Donald owns the coin and she has to present the full blockchain if required.
Now let’s say Donald pays the coin to Daffy then the chain now gets another entry.

Daffy payment
signed by pk_Donald
pay to pk_Daffy: H(Donald’s payment)

Donald’s payment
signed by pk_Goofy
pay to pk_Donald: H(head)

head
signed by pk_Goofy
CreateCoin [uniqueCoinID]

This is not a decentralized coin since only Goofy can create the coins. Also, the only people who know about the passing of coins are those present in the chain. Thus if Donald tries to give coins to Porky then there would almost be a two parallel paths that would not known about each other. Porky and Daffy could both look up the block chain they were given by Donald and verify that they had a valid coin created by Goofy. However, Donald was able to “double spend” his coin which is a severe issue.

double-spending attack – someone one tries to spend a coin more than once.

Brainstorm ways to stop it:

Have a single ledger that everyone looks at. That would be similar to a bank today in that the bank holds one’s balance.

Scrooge Coin

This coin tries to prevent what happened with Goofy Coin. However it still has the property that only Scrooge creates coins. Scrooge publishes a history of all transactions (block chain, signed by Scrooge) instead of having others pass it along.

Scrooge is publishing then an append-only ledger. An append-only ledger means that any data written to the ledger will remain forever and thus only a single history will occur even if transferring of coins occurs between different people.

1 transaction per block for simplicity

Now Daffy and Porky cannot both get paid because Scrooge would have published both transactions and thus dependent on which one came first the other would be invalid. So while one person would lose, and it is not clear since it depends on which Scrooge publishes first. Having this published history helps people detect double spending and thus Porky can reject the coin if he notices that Donald is trying to send invalid coin.

Scrooge may also have the job to look for double spends as well and thus would only accept one and reject the other.

Here’s what it will look at breaking down the transactions.

Create Coins Block
transID: ## type: CreateCoins
num | value | recipient (pk)

Pay Coins Block
transID: ** type: Paycoins
consumed coinIDs:

coins created to new recipients

Validity Rules for Scrooge

consumed coins valid
not already consumed (Within the same block?)
total value out == total value input
signed by owners of all consumed coins

This transaction is signed by all owners who are payers

Coins are immutable so they’re constantly created and destroyed when you pay people.

Coins need to have at least these properties:
Some transID(num) as its identifier
Some value
Some recipient that it belongs to

When a coin is paid, the old coin is consumed and a new coin is created

What are the issues with Scrooge Coin?

Still not decentralized and you have to trust Scrooge.
If Scrooge stops creating, validating coins, and telling the truth, then the coin should no longer be trusted. He can allow for double spends as well as not meet people’s expectations in terms of how much coin they transferred.

Quick Overview of some Ethereum Mining Choices

I researched what was ethereum mining and would like to share what I’ve learned so far.

In January 2016, there were many documents and videos showing 6 easy steps for ethereum mining.
One such video I watched was by Digital Decrypt called How to Mine Ethereum on a Windows PC — 6 Steps . Here are two other websites guides which are likely more detailed than mine:
CryptoCompares and 99bitcoins .

Mining using Geth and Ethminer

Get Geth
Run Geth (from cmd window type geth account new) Need an account to mine.
Activating Geth (cmd window type geth –rpc ) Start communicating with others
Get ethminer (CUDA or Genoil )
Configuring ethminer
Beginning to Mine (in command window type for GPU ethminer -G or for CPU etherminer)

Here is a short glossary of what you just downloaded.

Geth: Geth (implemented in the programming language Go) lets one create fully connected node to Ethereum. With Geth, you can connect to the Ethereum network which is necessary for mining.

ethMiner :(written in C++) This is the mining program. It will utilize your machines CPU/GPU to run the hashing program (EtHash)

EtHash: This is Ethereum’s proof of work algorithm.

Mist: the Go client GUI and web3 browser

From Oct 2016, I found a platform where one can mine called MinerGate. MinerGate is a mining pool which has fourteen currencies listed to mine including ethereum. Go here for the exact ones in case they change.

Mining with MinerGate:

Create an account with minergate.com
Download the MinerGate software and install it to your computer
Just go to the top tab Miner and click Eth. If you want MinerGate to do its job, stay at the top tab Smart Miner to proceed that way.
Start mining and don’t be worried when MinerGate create a DAG file onto your machine. That’s normal.

Miner with ethMiner and Genoil miner

Create an account with minergate.com
Download your miner software ie (ethMiner or Genoil miner)
Within the command window, you’ll type the below commands to mine the individual types.

For EthMiner, the pattern is below

CPU mining:

Ethminer.exe -C -F http://eth.pool.minergate.com:55751/YOUR_EMAIL --disable-submit-hashrate

For Genoil, the pattern is below. Note the difference between OpenCL and CUDA is the first parameter after ethminer.

OpenCL

setx GPU_FORCE_64BIT_PTR 0
setx GPU_MAX_HEAP_SIZE 100
setx GPU_USE_SYNC_OBJECTS 1
setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_SINGLE_ALLOC_PERCENT 100

ethminer -G -S eth.pool.minergate.com:45791 -O YOUR_EMAIL

CUDA

setx GPU_FORCE_64BIT_PTR 0 setx GPU_MAX_HEAP_SIZE 100 setx GPU_USE_SYNC_OBJECTS 1 setx GPU_MAX_ALLOC_PERCENT 100 setx GPU_SINGLE_ALLOC_PERCENT 100

ethminer -U -S eth.pool.minergate.com:45791 -O YOUR_EMAIL

Few difference between Ethereum versus Bitcoin mining

One major difference that I can see is that Ethereum blocks occur every 10-15 seconds on average while Bitcoin occurs every 10 minutes. The reward for Eth is 5 ETH while the current reward for Bitcoin is 25 BTC.

Another is that there is a different hashing algorithm to bitcoin so the ASICs developed for Bitcoin are not usable for Ethereum. Ethereum is ASIC resistant since Ethash has a memory hard algorithm. This means while GPU would help against CPU, ASICs should not work.

There seemed to have been a weird ETH ASIC scam going around some of the forums Steemit Ethereum and Ethereum.org Forum like 8 months ago (Aug 2016). Key takeaway was to watch out because it was a SCAM.

Hash Pointers

These are my notes of the fourth lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. The lecture gave me insight on some of the inner workings of blockchain which I greatly enjoyed.

Questions answered in this Post:

What is a hash pointer?
Once you have a hash pointer, what can you do with it?
What is the tamper evident property?

Hash Pointer

Simply, a hash pointer is a kind of data structure that contains two parts

pointer to where some info is stored
(cryptographic) hash of the info

One feature of a hash pointer is we can ask to get the info back and verify that the information hasn’t changed. If you’re familiar with basic C programming, the word pointer should be very familiar. Hash pointers are similar to regular pointers which provide the memory address of some information and through the pointer you can access this information when you dereference it.

key idea: build data structures with hash pointers

Linked List with hash pointers

According to CLRS, a linked list is a data structure in which the objects are arranged in a linear order. A singly linked list contains an object with an attribute key and a next pointer. The head of the linked list is a hash pointer.

Tamper Evident Log – log data structure that contains lots of data
If someone messes with data earlier in the log, we want to be able to detect it.

Why does the block chain have this property?

If someone messes with a block in the middle of the chain. If he changes the data, then the hash generated for this block which is stored in the previous node also would need to change. So even though the hash of the previous block and the data block line up, the previous previous block next pointer is based on a hash of the previous block. Because the previous block is composed of both the data and previous Hash (which has a change) these would also indicate tampering. Thus the only way to make this block well is if you change the entire blockchain but at that point then the head of the block chain is incorrect and since this is the value that the users store, the user would be able to detect the tampering.

The head of the block chain is known genesis block.

Binary tree with hash pointer (Merkle Tree)

One takes consecutive pairs of data blocks and computes the hash of them. Thinking about a tree, you combine each pair of leaves to create a code. Combine the two blocks to create two hash pointers. Then, from those two hash pointers which were created from the left and right child, you make a hash pointer of that block where you point to the data. This occurs recursively until you get to some head hash pointer which as previously is what you store.

Bonus: You only need O(logn) data since you need to know the path of the block from the head to the leaf (data).

Advantage of the Merkle Tree

Tree holds many items but only need to remember the root (I think you get that from the linked list)
Can verify membership in O(log n) time (this is the advantage over the linked list

Variant: Sorted Merkle Tree

can verify non-membership in O(log n)
(show items before, after the missing one)
One can prove that a block is not in a membership which I guess is just as important as well.

Realistically hash pointers can be used in any acyclic data structure.

Public Keys as Identities

These are my notes of the third lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. This was a fairly brief lecture. A simple idea explained: digital signatures enables the use of public keys as identities.

Questions answered in this Post:

What does “Public Keys as Identities” even mean?
How do you create an identity in this context?

Public Key == an Identity

Having a public key as an identity means that you can use the public key being used as a reference to some individual or body. Digital signatures enables this because one has the ability to verify the validity of a message based on the public key, message, and signature.

Thus, if you see signature such that verify(pk, msg, signature) [think of it as pk says, “[msg]”]

So in more concrete terms, there’s a public key named Alex and a msg “Hi”. Alex says, “Hi”
to “speak for” ok, you must know matching secure key (key)

Basically no one else can verbally speak for another person unless they’re that person or if you’re rich and famous then you have someone who tweets for you with your permission.

How to make a new identity

create a new, random key-pair (sk, pk) by just calling a function generateKeys()

pk is the public “name” (implement this with Hash(pk))
sk lets you “speak for” identity

So you control the identity, because only you know sk and since pk “looks random”, nobody knows who you are

One can use a new identity each time they create a message if they want. The downside is that it is possible at some point someone can group all these transactions together if they’re over a certain period of time.

Decentralized identity management

anybody can make a new identity
no central point of coordination

In common terms, there would be no need for a Social Security office required.
Bitcoin address -> public key/hash(public key)

Privacy?

addresses not directly connected to real-world
observer can link together an address’s activity over time, make inferences

Digital Signatures

These are my notes of the second lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017.

Questions answered in this Post:

What is a digital signature?
What are some of the requirements?
What are some practical implementations details?

Signature

only you can sign, but anyone can verify it
signature is tied to a particular document (can’t cut and paste to another doc)

If you had a digital signatures then this identity theft crime described below would not be possible:
The scenario works as follows, a person is called and is asked a question that guarantees that they’ll say the word “Yes”. The caller (criminal) will record the person answering “Yes”. Then the criminal will use the voice recording of the confirmation for different electronic bank verifiers. This concept of cut and paste in my example is using a voice recording out of context to commit a crime. With a digital signature, theoretically one cannot take the instance of the signature out of context and use it for other, potentially nefarious, purposes.

API for Digital Signatures

(sk, pk) := generateKeys(keysize)
sk: secret signing key
pk: public verification key

sig:= sign(sk, message)

isValid := verify(pk, message, sig)

These different functions hit the key idea of what is a signature. You generate the key with some size (256 bits) and get two outputs: a secret key and public verify key. Next you can now sign messages which people can verify by using the public verify key and the signature. This signature is tied to a specific document (msg) in this case as well. Thus the generateKeys and sign use randomized algorithms

Requirements for a Digital Signatures

able to verify valid signatures
unable to forge signature

“valid signatures verify” – verify(pk, msg, sign(sk, msg)) == true
“can’t forge signatures”
adversary who:
knows pk (public key)
gets to see signatures on other messages of his choice then he can’t produce a verifiable signature on another message

Game we play with an adversary

Challenger (TV judge)
Attacker claims that he can forge signatures

1. (sk, pk) generateKey
2. Judge gets the secret key, the attacker gets the public key
3. The attacker can get signatures on other document. He will see if he can use these the get the signature for this particular document.
4. Lastly the attacker thinks he has the document, then he sends a message and a signature. Thus the challenger can now run isValid to determine is it true.
5. This should not work!

Practical Mentions

1. Algorithms are randomized so you need a good source of randomness
2. Limit on message size so instead use hash(message) to keep the message less than 256 bits
3. fun trick: sign a hash pointer meaning signature “covers the whole structure”
If you sign a hash pointer than the entire data structure is protected. So the entire contents of the block chain is protected.

Bitcoin uses ECDSA standard (Elliptic Curve Digital Signature Algorithm)

relies on hairy math…
good randomness is essential because if you foul this up in generateKeys() or sign() then you probably leaked the private key so your “digital signature” is no longer secure.

Cryptographic Hash Function

These are my notes of the first lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017.

Questions answered in this Post:

What is a cryptographic hash function?
What are the security properties?
What are the applications of some of these properties?
What does SHA-256 do at a simple level?

Hash Function

takes any string as input
fixed size output (256 bit bitcoin uses)
efficiently computable (can figure out output in a reasonable amount of time)

Security Properties

collision-free
hiding
puzzle-friendly

Collision-free

Math definition – nobody can find x and y where x != y and H(x) = H(y)

in words:
nobody can find two different input strings so that if you apply the hash function to them, they have the same output.

the idea that nobody can find really means it’s really really really hard to find it….

Note that there has to be a collisions
since you’re going from a much larger input space t a fixed output space (2^256).

Hiding

Given H(x), it is infeasible to find x.

If r is chosen from a probability distribution that has min-entropy, then given H(r | x), it is infeasible to find x

High min-entropy means that the distribution is “very spread out” so that no particular value is chosen more than negligible probability.

in words: if you are given the hashed version of x then you will be unable to find x

Puzzle Friendly

for every possible output value y,
if k is chosen from a distribution with a high min entropy (super spread out) then it is infeasible to find x such that H(k|x) = y.

If someone wants to target a certain hash function, get some value of y, if part of the input is chosen in a random way, it’s difficult to find another value to target the hash function value.

Given a “puzzle ID” id ( from high min-entropy distribution) and a target set Y:
Try to find a “solution” x such that H(id|x) \in Y.

id | x means id concatenated x

Application of having something collision-free

Hash becomes a message digest meaning that you can use the has as a replacement of the msg
You can use H(x) and H(y) as smaller forms of the input string.

So if H(x) = H(y) then x = y.
Technically H(x) is only 256 bits while the x can be of some large arbitrary length

Application of puzzle friendly

Puzzle friendly property implies that no solving strategy is much better than trying random values of x

Application of the hiding property

In words, the idea is that you want to create a secret message that you can put in a public envelope.

Commitment API – to help explain public envelope example

Want to “seal a value in an envelope” and “open the envelope” later

Commit to a value, reveal it later

(com, key) := commit(msg)
match := verify(com, key, msg)

To seal msg in envelope:
(com, key) := commit(msg) — then publish com

To open envelope:
publish key, msg
anyone can use verify(com, key, msg) to check validity

Want two security properties:
Hiding: given com, infeasible to find msg;
Binding: infeasible to find msg != msg’ such that
verify(commit(msg, msg’) == true

commit(msg) := (H(key | msg), key) where key is 256 bit
verify(com, key, msg) := (H(key|msg) == com)

More about the Security Properties

key is a randomly generated 256 bit value

Hiding: Given H(key | msg) infeasible to find mg
Binding: Infeasible to find msg != msg’ such that
H(key | msg) == H(key | msg’)

Basically can’t find two different messages where you committed to one and verified with another one.

So you use the hash function in both commit and verify

What hash function will we use? SHA-256

Step by Step of what SHA-256 does

You have an input message of an arbitrary length
You also have a 256 bit value called IV (Initialization Vector (https://en.wikipedia.org/wiki/Initialization_vector)
The input string gets broken up into 512 bit blocks and we’ll reference them as block_1, block_2, block_3, and block_n will be the last block. The last block will also contain padding where the last 64 bits is the length of the input message and prior to that is a 1{0*}.
The 256 bit value and block_1 come together as inputs to a compression function (c) which basically squashes the 768 bits back into 256 bits.
Next the output of c gets mashed with block_2 and to the compression function and out pops a 256 bit value.
This continues along the entire length of the message where the last output is Voila, the hash.