D S – Page 4 – deltadeltaandmoredeltas

Hash Pointers

These are my notes of the fourth lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. The lecture gave me insight on some of the inner workings of blockchain which I greatly enjoyed.

Questions answered in this Post:

What is a hash pointer?
Once you have a hash pointer, what can you do with it?
What is the tamper evident property?

Hash Pointer

Simply, a hash pointer is a kind of data structure that contains two parts

pointer to where some info is stored
(cryptographic) hash of the info

One feature of a hash pointer is we can ask to get the info back and verify that the information hasn’t changed. If you’re familiar with basic C programming, the word pointer should be very familiar. Hash pointers are similar to regular pointers which provide the memory address of some information and through the pointer you can access this information when you dereference it.

key idea: build data structures with hash pointers

Linked List with hash pointers

According to CLRS, a linked list is a data structure in which the objects are arranged in a linear order. A singly linked list contains an object with an attribute key and a next pointer. The head of the linked list is a hash pointer.

Tamper Evident Log – log data structure that contains lots of data
If someone messes with data earlier in the log, we want to be able to detect it.

Why does the block chain have this property?

If someone messes with a block in the middle of the chain. If he changes the data, then the hash generated for this block which is stored in the previous node also would need to change. So even though the hash of the previous block and the data block line up, the previous previous block next pointer is based on a hash of the previous block. Because the previous block is composed of both the data and previous Hash (which has a change) these would also indicate tampering. Thus the only way to make this block well is if you change the entire blockchain but at that point then the head of the block chain is incorrect and since this is the value that the users store, the user would be able to detect the tampering.

The head of the block chain is known genesis block.

Binary tree with hash pointer (Merkle Tree)

One takes consecutive pairs of data blocks and computes the hash of them. Thinking about a tree, you combine each pair of leaves to create a code. Combine the two blocks to create two hash pointers. Then, from those two hash pointers which were created from the left and right child, you make a hash pointer of that block where you point to the data. This occurs recursively until you get to some head hash pointer which as previously is what you store.

Bonus: You only need O(logn) data since you need to know the path of the block from the head to the leaf (data).

Advantage of the Merkle Tree

Tree holds many items but only need to remember the root (I think you get that from the linked list)
Can verify membership in O(log n) time (this is the advantage over the linked list

Variant: Sorted Merkle Tree

can verify non-membership in O(log n)
(show items before, after the missing one)
One can prove that a block is not in a membership which I guess is just as important as well.

Realistically hash pointers can be used in any acyclic data structure.

Public Keys as Identities

These are my notes of the third lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017. This was a fairly brief lecture. A simple idea explained: digital signatures enables the use of public keys as identities.

Questions answered in this Post:

What does “Public Keys as Identities” even mean?
How do you create an identity in this context?

Public Key == an Identity

Having a public key as an identity means that you can use the public key being used as a reference to some individual or body. Digital signatures enables this because one has the ability to verify the validity of a message based on the public key, message, and signature.

Thus, if you see signature such that verify(pk, msg, signature) [think of it as pk says, “[msg]”]

So in more concrete terms, there’s a public key named Alex and a msg “Hi”. Alex says, “Hi”
to “speak for” ok, you must know matching secure key (key)

Basically no one else can verbally speak for another person unless they’re that person or if you’re rich and famous then you have someone who tweets for you with your permission.

How to make a new identity

create a new, random key-pair (sk, pk) by just calling a function generateKeys()

pk is the public “name” (implement this with Hash(pk))
sk lets you “speak for” identity

So you control the identity, because only you know sk and since pk “looks random”, nobody knows who you are

One can use a new identity each time they create a message if they want. The downside is that it is possible at some point someone can group all these transactions together if they’re over a certain period of time.

Decentralized identity management

anybody can make a new identity
no central point of coordination

In common terms, there would be no need for a Social Security office required.
Bitcoin address -> public key/hash(public key)

Privacy?

addresses not directly connected to real-world
observer can link together an address’s activity over time, make inferences

Digital Signatures

These are my notes of the second lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017.

Questions answered in this Post:

What is a digital signature?
What are some of the requirements?
What are some practical implementations details?

Signature

only you can sign, but anyone can verify it
signature is tied to a particular document (can’t cut and paste to another doc)

If you had a digital signatures then this identity theft crime described below would not be possible:
The scenario works as follows, a person is called and is asked a question that guarantees that they’ll say the word “Yes”. The caller (criminal) will record the person answering “Yes”. Then the criminal will use the voice recording of the confirmation for different electronic bank verifiers. This concept of cut and paste in my example is using a voice recording out of context to commit a crime. With a digital signature, theoretically one cannot take the instance of the signature out of context and use it for other, potentially nefarious, purposes.

API for Digital Signatures

(sk, pk) := generateKeys(keysize)
sk: secret signing key
pk: public verification key

sig:= sign(sk, message)

isValid := verify(pk, message, sig)

These different functions hit the key idea of what is a signature. You generate the key with some size (256 bits) and get two outputs: a secret key and public verify key. Next you can now sign messages which people can verify by using the public verify key and the signature. This signature is tied to a specific document (msg) in this case as well. Thus the generateKeys and sign use randomized algorithms

Requirements for a Digital Signatures

able to verify valid signatures
unable to forge signature

“valid signatures verify” – verify(pk, msg, sign(sk, msg)) == true
“can’t forge signatures”
adversary who:
knows pk (public key)
gets to see signatures on other messages of his choice then he can’t produce a verifiable signature on another message

Game we play with an adversary

Challenger (TV judge)
Attacker claims that he can forge signatures

1. (sk, pk) generateKey
2. Judge gets the secret key, the attacker gets the public key
3. The attacker can get signatures on other document. He will see if he can use these the get the signature for this particular document.
4. Lastly the attacker thinks he has the document, then he sends a message and a signature. Thus the challenger can now run isValid to determine is it true.
5. This should not work!

Practical Mentions

1. Algorithms are randomized so you need a good source of randomness
2. Limit on message size so instead use hash(message) to keep the message less than 256 bits
3. fun trick: sign a hash pointer meaning signature “covers the whole structure”
If you sign a hash pointer than the entire data structure is protected. So the entire contents of the block chain is protected.

Bitcoin uses ECDSA standard (Elliptic Curve Digital Signature Algorithm)

relies on hairy math…
good randomness is essential because if you foul this up in generateKeys() or sign() then you probably leaked the private key so your “digital signature” is no longer secure.

Cryptographic Hash Function

These are my notes of the first lecture from Coursera’s Bitcoin and Cryptocurrency Technologies during Dec 2016 – Feb 2017.

Questions answered in this Post:

What is a cryptographic hash function?
What are the security properties?
What are the applications of some of these properties?
What does SHA-256 do at a simple level?

Hash Function

takes any string as input
fixed size output (256 bit bitcoin uses)
efficiently computable (can figure out output in a reasonable amount of time)

Security Properties

collision-free
hiding
puzzle-friendly

Collision-free

Math definition – nobody can find x and y where x != y and H(x) = H(y)

in words:
nobody can find two different input strings so that if you apply the hash function to them, they have the same output.

the idea that nobody can find really means it’s really really really hard to find it….

Note that there has to be a collisions
since you’re going from a much larger input space t a fixed output space (2^256).

Hiding

Given H(x), it is infeasible to find x.

If r is chosen from a probability distribution that has min-entropy, then given H(r | x), it is infeasible to find x

High min-entropy means that the distribution is “very spread out” so that no particular value is chosen more than negligible probability.

in words: if you are given the hashed version of x then you will be unable to find x

Puzzle Friendly

for every possible output value y,
if k is chosen from a distribution with a high min entropy (super spread out) then it is infeasible to find x such that H(k|x) = y.

If someone wants to target a certain hash function, get some value of y, if part of the input is chosen in a random way, it’s difficult to find another value to target the hash function value.

Given a “puzzle ID” id ( from high min-entropy distribution) and a target set Y:
Try to find a “solution” x such that H(id|x) \in Y.

id | x means id concatenated x

Application of having something collision-free

Hash becomes a message digest meaning that you can use the has as a replacement of the msg
You can use H(x) and H(y) as smaller forms of the input string.

So if H(x) = H(y) then x = y.
Technically H(x) is only 256 bits while the x can be of some large arbitrary length

Application of puzzle friendly

Puzzle friendly property implies that no solving strategy is much better than trying random values of x

Application of the hiding property

In words, the idea is that you want to create a secret message that you can put in a public envelope.

Commitment API – to help explain public envelope example

Want to “seal a value in an envelope” and “open the envelope” later

Commit to a value, reveal it later

(com, key) := commit(msg)
match := verify(com, key, msg)

To seal msg in envelope:
(com, key) := commit(msg) — then publish com

To open envelope:
publish key, msg
anyone can use verify(com, key, msg) to check validity

Want two security properties:
Hiding: given com, infeasible to find msg;
Binding: infeasible to find msg != msg’ such that
verify(commit(msg, msg’) == true

commit(msg) := (H(key | msg), key) where key is 256 bit
verify(com, key, msg) := (H(key|msg) == com)

More about the Security Properties

key is a randomly generated 256 bit value

Hiding: Given H(key | msg) infeasible to find mg
Binding: Infeasible to find msg != msg’ such that
H(key | msg) == H(key | msg’)

Basically can’t find two different messages where you committed to one and verified with another one.

So you use the hash function in both commit and verify

What hash function will we use? SHA-256

Step by Step of what SHA-256 does

You have an input message of an arbitrary length
You also have a 256 bit value called IV (Initialization Vector (https://en.wikipedia.org/wiki/Initialization_vector)
The input string gets broken up into 512 bit blocks and we’ll reference them as block_1, block_2, block_3, and block_n will be the last block. The last block will also contain padding where the last 64 bits is the length of the input message and prior to that is a 1{0*}.
The 256 bit value and block_1 come together as inputs to a compression function (c) which basically squashes the 768 bits back into 256 bits.
Next the output of c gets mashed with block_2 and to the compression function and out pops a 256 bit value.
This continues along the entire length of the message where the last output is Voila, the hash.