bitcoin

Bitcoin

bitcoin 主要解决的问题(我相信你们在各种地方看过一万遍了):

  • forgery: 伪造
  • double spending
  • theft

我们需要考量的问题是:

  • 区块链是怎么存储的,Transaction Block 这些结构的关系是什么
  • 如何防止错误
  • 如何防止distributed system 经常出现的问题

Transaction

Bitcoin 的 idea 是把数据当成是 signed sequence of transactions

every coin has a sequence of transaction records one for each time this coin was transferred as payment a coin’s latest transaction indicates who owns it now

在论文中, txn 格式是:

1
2
3
4
pub(user1): public key of new owner
hash(prev): hash of this coin's previous transaction record
sig(user2): signature over transaction by previous owner's private key
(BitCoin is much more complex: amount (fractional), multiple in/out, ...)

可以看看他的 Transaction Example

1
2
3
Y owns a coin, previously given to it by X: 
// X->Y coin, T7: pub(Y), hash(T6) 前一次交易, sig(X) X 用 private key 签名出的唯一值
T7: pub(Y), hash(T6), sig(X)

跟着上面的

1
2
3
4
5
6
Y buys a hamburger from Z and pays with this coin
// Y 拿到 Z 的 pub key
Z sends public key to Y
// pub key 可以用于加密,Y 创建记录
Y creates a new transaction and signs it
T8: pub(Z), hash(T7), sig(Y)

验证:T8 的 sig 和 T7 的 pub 是一样的

1
2
3
4
Y sends transaction record to Z
Z verifies:
T8's sig() corresponds to T7's pub()
Z gives hamburger to Y

Z 的余额是 Z 知道的没有花费的 transaction 记录(感觉这个过程类似于从 log 中读)

1
Z "owns" a coin = Z knows private key for "new owner" public key in latest xaction

也就是说,Z 有一个币,表示 Z 知道 T8 的 pub(Z) 可以解析出来多少 (xaction == Transaction)

如何防止一个签名的 double-spending

1
2
3
4
5
what do we need?
publish log of all transactions to everyone, in same order
so Q knows about Y->Z, and will reject Y->Q
a "public ledger"
ensure Y can't un-publish a transaction

在事实上要求对方广播 下面这段介绍了和 raft 之类 consensus 的区别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
why not publish transactions like this:
1000s of peers, run by anybody, no trust required in any one peer
peers flood new transactions over "overlay"
transaction Y->Z only acceptable if majority of peers think it is valid
i.e. they don't know of any Y->Q
hopefully majority overlap ensures double-spend is detected
how to count votes?
how to even count peers so you know what a majority is?
perhaps distinct IP addresses?
problem: "sybil attack" // 女巫攻击,制造大量节点加入系统
IP addresses are not secure -- easy to forge
attacker pretends to have 10,000 computers -- majority
when Z asks, attacker's majority says "we only know of Y->Z"
when Q asks, attacker's majority says "we only know of Y->Q"
voting is hard in "open" p2p schemes

Block Chain

是一个 agreement on transaction log to prevent double-spending

包含了 transaction 的整个记录,然后有很多 replica (论文中叫 peers, 这里指的是别的区块链系统,叫 replica 也不太合适)

  • 每个包含完整的备份(太 tm 浪费了吧)
  • new block / transaction 需要广播给其他的 peers

每个 block 包含:

  • hash(prev_block) // 注意是 block
  • 一个 transaction 的顺序集合
  • “nonce” (not quite a nonce in the usual cryptographic sense)
  • current time: 现在的时间戳

尽量每10分钟产生一个 block chain, 同时记录只有被放在 block chain 上才会被信任

在进行随机散列运算时,工作量证明机制引入了对某一个特定值的扫描工作,比方说 SHA‐256 下,随机散列值以一个或多个 0 开始。那么随着 0 的数目的上升, 找到这个解所需要的工作量 将呈指数增长,而对结果进行检验则仅需要一次随机散列运算。

block 的创建是由 mining 来做的,多个 cpu 尝试很多次,来挖出新的 block. 在这个系统中可能会有几个问题,我记得我们在 Raft 中也有类似的问题

  • 顺序提交:消息的提交要求是顺序的,这个在 Raft 中由单个 master 来处理
  • blockchain fork: 由于网络分区或者两个peer 同时算出了新的block等原因, blockchain 被分成没有共识的多个。

先介绍第二个问题,bitcoin 解决的方案是:

  • 同样长度、新 block 不同的 chain 都会被保持, 直到新的最长纪录覆盖
  • 假设新产生了 BZ/BQ 两条链,有更多被观测到的一方,有更多机器,更大概率找到新的节点,然后更长的 block-chain 会被 apply
  • 在一个系统(分区内部)会下载被 apply 的区块链

第二个问题还会有变体:Y 用同一个 hash 给不同 block chain 发送,会造成分裂;给同一个发送的时候,不会容许这种顺序出现。

激励

前文介绍了 mining, 挖矿的过程中, 新区块会附带一个由创造者拥有的电子货币。

同时,也有交易费。输出小于输入时,差额是交易费

Reclaiming Disk Space

Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space.

FAQ

是不是 signature 就足够了

答:不是,可能会有人把这个 signature copy 一份,然后重复转账交易行为

Proof-of-work 的 purpose

攻击者需要 51% 攻击

有比 pow 更节省资源的算法么

1
2
3
4
5
6
7
A: Proof-of-work is hard to fake or simulate, a nice property in a
totally open system like Bitcoin where you cannot trust anyone to
follow rules. There are some alternate schemes; search the web for
proof-of-stake or look at Algorand and Byzcoin, for example. In a
smallish closed system, in which the participants are known though
not entirely trusted, Byzantine agreement protocols could be used, as
in Hyperledger.

传播很多消息的处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
There's a small chance that two miners find blocks at the same time,
perhaps B50' containing "pay Bob" and B50'' containing "pay Charlie". At
this point there's a fork in the block chain. These two blocks will be
flooded to all the nodes. Each node will start mining a successor to one
of them (the first it hears). Again the most likely outcome is that a
single miner will finish significantly before any other miner, and flood
the successor, and most peers will switch that winning fork. The chance
of repeatedly having two miners simultaneously find blocks gets very
small as the forks get longer. So eventually all the peers will switch
to the same fork, and in fork there will be only one spend of the coin.

The possibility of accidentally having a short-lived fork is the reason
that careful clients will wait until there are a few successor blocks
before believing a transaction.

随着区块链的增长,新加入一个节点会不会耗时很久

  • 一旦下载完了就不再需要了
  • 一些 user 可以信任身边有完整的 chain 的用户(或者矿工?)

51 攻击

对于 51% 攻击:https://www.coindesk.com/51-attacks-real-threat-bitcoin 实际上出现 51% 这个系统就 GG 了

SHA-256 之外的方案

1
2
3
4
5
6
Q: Are there any ways for Bitcoin mining to do useful work, beyond simply
brute-force calculating SHA-256 hashes?

A: Maybe -- here are two attempts to do what you suggest:
https://www.cs.umd.edu/~elaine/docs/permacoin.pdf
http://primecoin.io/

阅读比特币论文有感:

  • 嗯,每个区块会存所有的信息…所有信息?什么几把?哦有个 merkle tree, 等等你又没有 snapshot 或者 compaction, 在这有鬼用啊,难道你跟 buddy 算法一样 coin 有新纪录就 mark as delete 然后把 sibling 一起删掉么
  • 嗯,处理脑裂…哦这个叫 fork… 等会儿,也就是说这个系统只有最终一致性,不能保证读到正确的数据?网络分区怎么办?哦多等几个 block 验证…什么几把
  • 嗯,一个链要保证所有的记录是有序的…fuck…这代价太高了吧