《精通比特币》英文版批注导读•第8章比特币网络 — ScalersTalk成长会 – 持续行动,刻意学习 – ScalersTalk Wonderland

《精通比特币》英文版批注导读•第8章比特币网络

成长分享 scalerstalk 浏览 0条评论

ScalersTalk 成长会 2018 年火热招募中,目前报名人数已经突破 1200 人,参见《持续行动,为三年后的自己,扎心地做点事——ScalersTalk 成长会 2018 年会员资格开放申请》

今天我们进入《精通比特币》第8章,这一章讲的比特币的网络的结构以及SPV技术。通过这一章节你可以对比特币网络的构成以及SPV的技术原理有所了解,而且你还会知道交易池到底是怎样的组织形式。

本章原文地址:

https://github.com/bitcoinbook/bitcoinbook/blob/develop/ch08.asciidoc

相关文章:

《精通比特币》英文版批注导读•第1章

《精通比特币》英文版批注导读•第2章 比特币工作原理

《精通比特币》英文版批注导读•第3-4章 比特币密钥与地址

《精通比特币》英文版批注导读•第4章(2) 比特币地址

《精通比特币》英文版批注导读•第5章 比特币钱包技术

《精通比特币》英文版批注导读•第6章(1) 比特币交易记录

《精通比特币》英文版批注导读•第6章(2) 比特币交易记录

《精通比特币》英文版批注导读•第7章(1) 比特币高级交易与脚本技术

《精通比特币》英文版批注导读•第7章(2) 比特币高级交易与脚本技术

Chapter 8 The Bitcoin Network

Peer-to-Peer Network Architecture

Bitcoin is structured as a peer-to-peer network architecture on top of the internet. The term peer-to-peer, or P2P, means that the computers that participate in the network are peers to each other, that they are all equal, that there are no “special” nodes, and that all nodes share the burden of providing network services. The network nodes interconnect in a mesh network with a “flat” topology. There is no server, no centralized service, and no hierarchy within the network. Nodes in a P2P network both provide and consume services at the same time with reciprocity acting as the incentive for participation. P2P networks are inherently resilient, decentralized, and open. A preeminent example of a P2P network architecture was the early internet itself, where nodes on the IP network were equal. Today’s internet architecture is more hierarchical, but the Internet Protocol still retains its flat-topology essence. Beyond bitcoin, the largest and most successful application of P2P technologies is file sharing, with Napster as the pioneer and BitTorrent as the most recent evolution of the architecture.

这一章主要讲P2P网络,关于本章的话题,在文章《从零开始区块链:对等网络与电子现金是什么?——比特币经典论文研读 (1)》有所讨论。

Bitcoin’s P2P network architecture is much more than a topology choice. Bitcoin is a P2P digital cash system by design, and the network architecture is both a reflection and a foundation of that core characteristic. Decentralization of control is a core design principle that can only be achieved and maintained by a flat, decentralized P2P consensus network.

The term “bitcoin network” refers to the collection of nodes running the bitcoin P2P protocol. In addition to the bitcoin P2P protocol, there are other protocols such as Stratum that are used for mining and lightweight or mobile wallets. These additional protocols are provided by gateway routing servers that access the bitcoin network using the bitcoin P2P protocol and then extend that network to nodes running other protocols. For example, Stratum servers connect Stratum mining nodes via the Stratum protocol to the main bitcoin network and bridge the Stratum protocol to the bitcoin P2P protocol. We use the term “extended bitcoin network” to refer to the overall network that includes the bitcoin P2P protocol, pool-mining protocols, the Stratum protocol, and any other related protocols connecting the components of the bitcoin system.

比特币网络指的就是运行比特币P2P协议的网络。比特币网络也通过网关支持其他的协议。

Node Types and Roles

Although nodes in the bitcoin P2Pnetwork are equal, they may take ondifferent roles depending on the functionality they are supporting. Abitcoin node is a collection of functions: routing,the blockchain database, mining, and wallet services. A full node with allfour of these functions is shown in A bitcoin network node with all fourfunctions: wallet, miner, full blockchain database, and network routing.

虽然比特币节点对等,但是也支持不同的功能:路由、区块链数据库、挖矿以及钱包服务。

Figure 1. Abitcoin network node with all four functions: wallet, miner, full blockchaindatabase, and network routing

All nodes include the routing function to participate in the network and might include other functionality. All nodes validate and propagate transactions and blocks,and discover and maintain connections to peers. In the full-node examplein A bitcoin network node with all four functions: wallet, miner, full blockchain database, and network routing, the routing function is indicated by an orange circle named “Network Routing Node” or with the letter “N.”

所有的比特币网络节点都包括了路由的功能,可以验证并传播交易区块,发现并维护节点连接。

Some nodes, called full nodes, also maintain a complete and up-to-date copy of the blockchain. Full nodes can autonomously and authoritatively verify any transaction without external reference. Some nodes maintain only a subset of the blockchain and verify transactions using a method called simplified payment verification, or SPV. These nodes are known as SPV nodes or lightweight nodes. In the full-node example in the figure, the full-node blockchain database function is indicated by a circle called “Full Blockchain” or the letter “B.” In The extended bitcoin network showing various node types, gateways, and protocols, SPV nodes are drawn without the “B” circle, showing that they do not have a full copy of the blockchain.

全节点包含了所有的区块链信息,在验证的时候不需要引用网络上的其他信息。部分节点没有所有的区块链信息,于是在验证的时候需要引用网络数据,采用的方法叫做SPV验证。

Mining nodes compete to create newblocks by running specialized hardware to solve the Proof-of-Work algorithm.Some mining nodes are also full nodes, maintaining a full copy of the blockchain, while others are lightweight nodes participating in pool mining and depending on a pool server to maintain a full node. The mining function isshown in the full node as a black circle called “Miner” or the letter “M.”

User wallets might be part of a full node, as is usually the case with desktop bitcoin clients. Increasingly, manyuser wallets, especially those running on resource-constrained devices such as smart phones, are SPV nodes. The wallet function is shown in A bitcoin network node with all fourfunctions: wallet, miner, full blockchain database, and network routing as a green circle called “Wallet” or the letter “W.”

In addition to the main node types onthe bitcoin P2P protocol, there are servers and nodes running other protocols,such as specialized mining pool protocols and lightweight client-access protocols.

Different types of nodes on the extended bitcoin network shows the most common node types on the extended bitcoin network.

The Extended Bitcoin Network

The main bitcoin network, running the bitcoin P2P protocol, consists of between 5,000 and 8,000 listening nodes running various versions of the bitcoin reference client (Bitcoin Core) and a few hundred nodes running various other implementations of the bitcoin P2P protocol, such as Bitcoin Classic, Bitcoin Unlimited, BitcoinJ, Libbitcoin, btcd, and bcoin. A small percentage of the  nodes on the bitcoin P2P network are also mining nodes, competing in the mining process, validating transactions, and creating new blocks. Various large companies interface with the bitcoin network by running full-node clients based on the Bitcoin Core client, with full copies of the blockchain and a network node, but without mining or wallet functions.These nodes act as network edge routers, allowing various other services(exchanges, wallets, block explorers, merchant payment processing) to be builton top.

网络上有5000-8000个监听节点,上面运行了各种比特币的核心客户端。各类大公司会基于比特币核心客户端接入比特币网络,拥有全量区块数据,但是没有挖矿或者钱包功能。这些接入的节点发挥了边缘服务器的作用,为其他服务例如兑换、钱包、浏览器以及支付提供承载。

The extended bitcoin network includesthe network running the bitcoin P2P protocol, described earlier, as well asnodes running specialized protocols. Attached to the main bitcoin P2P network are a number of pool servers and protocol gateways that connect nodes running other protocols. These other protocolnodes are mostly pool mining nodes (see [mining]) and lightweight wallet clients, which donot carry a full copy of the blockchain.

The extended bitcoin network showingvarious node types, gateways, and protocols shows the extended bitcoin network with the various types of nodes, gateway servers, edge routers, and wallet clients and the various protocols they use to connect to each other.

比特币扩展网络主要由矿池以及轻量级钱包构成。

Figure 2.Different types of nodes on the extended bitcoin network

Figure 3. The extended bitcoin network showing various node types, gateways, and protocols

Bitcoin Relay Networks

While the bitcoin P2P network serves the general needs of a broad variety of node types, it exhibits too high network latency for the specialized needs of bitcoin mining nodes.

Bitcoin miners are engaged in atime-sensitive competition to solve the Proof-of-Work problem and extend the blockchain (see [mining]). While participating in this competition, bitcoin miners must minimize the time between the propagation of a winning block and the beginning of the next round of competition. In mining, network latency is directly related to profit margins.

在挖矿工作中,时延直接决定了矿工赚钱与否。矿工需要最小代挖出矿的时间和开启下一轮挖矿竞争的时间。比特币中继网络就是为了在矿工中缩短传输延迟而于2015年提出的。在2016年,这个机制升级成了FIBRE网络,采用的是UDP协议进行传输。另外一个版本的中继网络是康奈尔大学的Falcon,采用的是传输部分区块而非等到完整区块到达后再传输。

Bitcoin Relay Network is a network that attempts to minimize the latency in the transmission of blocks between miners. The original Bitcoin Relay Network was created by core developer Matt Corallo in 2015 to enable fast synchronization of blocks between miners with very low latency. The network consisted of several specialized nodes hosted on the Amazon Web Services infrastructure around the world and served to connect the majority of miners and mining pools.

The original Bitcoin Relay Network was replaced in 2016 with the introduction of the Fast Internet Bitcoin Relay Engine or FIBRE, also created by core developer Matt Corallo. FIBRE is a UDP-based relay network that relays blocks within a network of nodes. FIBRE implements compact block optimizationto further reduce the amount of data transmitted and the network latency.

Another relay network (still in theproposal phase) is Falcon, based on research at Cornell University. Falcon uses “cut-through-routing”instead of “store-and-forward” to reduce latency by propagating parts of blocks as they are received rather than waiting until a complete block is received.

Relay networks are not replacements for bitcoin’s P2P network. Instead they are overlay networks that provide additional connectivity between nodes with specialized needs. Like freeways are not replacements for rural roads, but rather shortcuts between two points with heavy traffic, you still need small roads to connect to the freeways.

中继网络不是比特币网络的替代物,而是为了增加连接性使用的覆盖网络。

Network Discovery

When a new node boots up, it must discover other bitcoin nodes on the network in order to participate. To start this process, a new node must discover at least one existing node on the network and connect to it. The geographic location of other nodes is irrelevant; the bitcoin network topologyis not geographically defined. Therefore, any existing bitcoin nodes can be selected at random.

新节点启动的时候,至少需要找到一个现成的网络节点并连接。找到以后通过8333端口交换握手信息。

To connect to a known peer, node sestablish a TCP connection, usually to port 8333 (the port generally known as the one used by bitcoin), or an alternative port if one is provided. Upon establishing a connection, the node will start a “hand shake”(see The initial handshake between peers) by transmitting a version message, which contains basic identifying information,including:

(See GitHub for an example of the version network message.)

The version message is always the first message sent by any peer to another peer. The local peer receiving a version message will examine the remote peer’s reported nVersion and decide if the remote peer is compatible. If the remote peer iscompatible, the local peer will acknowledge the version message and establish aconnection by sending a verack.

其他当地节点在收到版本信息后,如果验证一致,即说明可以接受,于是发回接受包。

How does a new node find peers? Thefirst method is to query DNS using anumber of “DNS seeds,” which are DNS servers that provide a list of IP addresses of bitcoin nodes. Some of those DNS seeds provide a static list of IP addresses of stable bitcoin listening nodes. Some of the DNS seeds are custom implementations of BIND (Berkeley Internet Name Daemon) that return a random subset from a list of bitcoin node addresses collected by a crawler or a long-running bitcoin node. TheBitcoin Core client contains the names of five different DNS seeds. The diversity of ownership and diversity of implementation of the different DNS seeds offers a high level of reliability for the initial bootstrapping process. Inthe Bitcoin Core client, the option to use the DNS seeds is controlled by the option switch -dnsseed (set to 1 by default, to use the DNS seed).

新节点通过向DNS种子发起查询来获得节点信息,DNS种子来自预先配置的一些IP地址,这些地址有的来自爬虫采集,记录的是一些在比特币网络上长期运行的节点地址。由于DNS种子多样性程度较高,可以有很好的有效性保证。

Alternatively, a bootstrapping node thatknows nothing of the network must be given the IP address of at least one bitcoin node, after which it can establish connections through further introductions. The command-line argument-seed node can be used to connect to one node just for introductions using it as a seed. After the initial seed node is used to form introductions, the client will disconnect from it and usethe newly discovered peers.

另外的一种方式就是提供一个IP地址,通过该IP地址获取更多的其他节点的信息后,断开与原地址的连接,建立新连接。

Figure 4. The initial handshake between peers

Once one or more connections are established, the new node will send an addr message containingits own IP address to its neighbors. The neighbors will, in turn, forward the addr message to their neighbors, ensuring that the newly connected node becomes well known and better connected. 

Additionally, the newly connected node can send get addr to the neighbors, asking them to return a list of IP addresses of other peers.That way, a node can find peers to connect to and advertise its existence onthe network for other nodes to find it. Address propagation and discovery shows theaddress discovery protocol.

一旦连接建立,新节点给相邻的节点发送IP地址,相邻节点继续将地址发给其相邻的节点,确保新连接的节点与其他节点充分连接。

Figure 5.Address propagation and discovery

A node must connect to a few different peers in order to establish diverse paths into thebitcoin network. Paths are not reliable—nodes come and go—and so the nodemust continue to discover new nodes as it loses old connections as well as assist other nodes when they bootstrap. Only one connection is needed to bootstrap, because the first node can offer introductions to its peer nodes andthose peers can offer further introductions. It’s also unnecessary and wasteful of network resources to connect to more than a handful of nodes. After bootstrapping, a node will remember its most recent successful peer connections,so that if it is rebooted it can quickly reestablish connections with its former peer network. If none of the former peers respond to its connection request, the node can use the seed nodes to bootstrap again.

一个节点会和不同的其他节点建立连接,这样可以确保连上比特币网络。如果节点的所有连接节点都没有响应,可以重新通过种子节点引导。

On a node running the Bitcoin Coreclient, you can list the peer connections with the command get peer info:

To override the automatic management of peers and to specify a list of IP addresses, users can provide the option-connect=<IPAddress> and specify one or more IP addresses. If this optionis used, the node will only connect to the selected IP addresses, instead of discovering and maintaining the peer connections automatically.

If there is no traffic on a connection,nodes will periodically send a message to maintain the connection. If a node has not communicated on a connection for more than 90 minutes, it is assumed to be disconnected and a new peer will be sought. Thus, the network dynamically adjusts to transient nodes and network problems,and can organically grow and shrink as needed without any central control.

如果一条连接没有通信,节点定期会发送一条消息保持联络,如果90分钟内没有通信,则认为节点已经不在线。比特币网络动态调整适应节点与网络的情况。

Full Nodes

Full nodes are nodes that maintain afull blockchain with all transactions. More accurately, they probably should becalled “full blockchain nodes.” In the early years of bitcoin, all nodes were full nodes and currently the Bitcoin Core client is a full blockchain node. In the past two years, however,new forms of bitcoin clients have been introduced that do not maintain a full blockchain but run as lightweight clients. We’ll examine these in more detailin the next section.

全节点其实是全区块链节点,保存了所有的区块链信息。但是如果要运行全节点需要上百G的存储空间.

Full blockchain nodes maintain acomplete and up-to-date copy of the bitcoin blockchain with all the transactions, which they independently build and verify, starting with the veryfirst block (genesis block) and building up to the latest known block in thenetwork. A full blockchain node can independently and authoritatively verify any transaction without recourse or reliance on any other node or source ofinformation. The full blockchain node relies on the network to receive updates about new blocks of transactions, which it then verifies and incorporates intoits local copy of the blockchain.

Running a full blockchain node gives you the pure bitcoin experience: independent verification of all transactions without the need to rely on, or trust, any other systems. It’s easy to tell if you’re running a full node because it requires more than one hundred gigabytes of persistent storage (disk space) to store thefull blockchain. If you need a lot of disk and it takes two to three daysto sync to the network, you are running a full node. That is the price of complete independence and freedom from central authority.

There are a few alternative implementations of full blockchain bitcoin clients, built using different programming languages and software architectures. However, the most common implementation is the reference client Bitcoin Core, also known as the Satoshiclient. More than 75% of the nodes on the bitcoin network run various versions of Bitcoin Core. It is identified as “Satoshi” in the sub-version stringsent in the version message and shown by the command get peer info as we saw earlier; for example, /Satoshi:0.8.6/.

Exchanging “Inventory” 

The first thing a full node will do onceit connects to peers is try to construct a complete blockchain. If it is a brand-new node and has no blockchain at all, it only knows one block, the genesis block, which isstatically embedded in the client software. Starting with block #0 (thegenesis block), the new node will have to download hundreds of thousands ofblocks to synchronize with the network and reestablish the full blockchain.

如果是一个空节点,那只有软件里的创始区块没有其他的区块信息。

The process of syncing the blockchain starts with the version message, because that contains BestHeight, a node’scurrent blockchain height (number of blocks). A node will see the version messages from its peers, know how many blocks they each have, and be able to compare to how many blocks it has in its own blockchain. Peered nodes will exchange a get blocks message that contains the hash(finger print) of the top block on their local blockchain. One of the peers will be able to identify the received hash as belonging to a block that is notat the top, but rather belongs to an older block, thus deducing that its ownlocal blockchain is longer than its peer’s.

节点之间会交换块顶信息的哈希值,这样如果有哪个节点不是最新值,那就可能通过比较自有的最高块哈希和收到的哈希值知道结果。

The peer that has the longer blockchain has more blocks than the other node and can identify which blocks the othernode needs in order to “catch up.” It will identify the first 500 blocks to share and transmit their hashes using an inv (inventory) message. The node missing these blocks will then retrieve them, by issuing a series of get data messages requesting the full block data and identifying the requested blocks using the hashes from the invmessage.

节点会采用inv消息给新入节点传递区块信息,而且是传前500个区块的哈希信息。收到500个区块的哈希值以后,节点再向其他周围的节点请求信息,以平衡负载。而且还会采用字段记载哪些正在传输中,以确保不会重复传送。

Let’s assume, for example, that a nodeonly has the genesis block. It will then receive an inv message from its peerscontaining the hashes of the next 500blocks in the chain. It will start requesting blocks from all of itsconnected peers, spreading the load and ensuring that it doesn’t overwhelm any peer with requests. The node keeps track of how many blocks are “in transit” per peer connection,meaning blocks that it has requested but not received, checking that it does not exceed a limit (MAX_BLOCKS_IN_TRANSIT_PER_PEER). This way, if it needs a lot of blocks, it will only request new ones as previous requests are fulfilled, allowing the peers to control the pace of updates and not overwhelm the network. As each block is received, it is added to the blockchain, as we will see in [blockchain]. As the loca lblockchain is gradually built up, more blocks are requested and received, andthe process continues until the node catches up to the rest of the network.

这里说到的对比本地区块与网络区块的过程可以在节点加入或者退出网络的任意时间执行,所以节点可以任意加入或者退出网络。

This process of comparing the local blockchain with the peers and retrieving any missing block shappens any time a node goes offline for any period of time. Whether a nodehas been offline for a few minutes and is missing a few blocks, or a month andis missing a few thousand blocks, it starts by sending getblocks, gets an invresponse, and starts downloading the missing blocks. Node synchronizing the blockchain by retrieving blocks from a peer shows the inventory and block propagation protocol.

Figure 6. Node synchronizing the blockchain by retrieving blocks from a peer

Simplified Payment Verification (SPV) Nodes

Not all nodes have the ability to store the full blockchain. Many bitcoin clients are designed to run on space- andpower-constrained devices, such as smartphones, tablets, or embedded systems.For such devices, simplifiedpayment verification(SPV) method is used to allow them to operate without storing the full blockchain. These types of clients are called SPV clientsor lightweight clients. As bitcoin adoption surges, the SPV node is becoming the most common form of bitcoin node, especially for bitcoin wallets.

由于空间和能力限制,并不是所有的节点都能成为全节点。于是有SPV技术,帮助一些钱包类应用进行验证。SPV节点只下载区块链头,不下载交易,这样数据内容可以小1000倍,但是验证的时候就需要依靠其他节点提供更多信息。

SPV nodes download only the block headers and do not download the transactions includedin each block. The resulting chain of blocks, without transactions, is 1,000 times smaller than the full blockchain. SPV nodes cannot construct a full picture ofall the UTXOs that are available for spending because they do not know aboutall the transactions on the network. SPV nodes verify transactions using a slightly different method that relies onpeers to provide partial views of relevant parts of the blockchain on demand.

As an analogy, a full node is like atourist in a strange city, equipped with a detailed map of every street andevery address. By comparison, an SPV node is like a tourist in a strange city asking random strangers for turn-by-turn directions while knowing only one main avenue. Although both tourists can verify the existence of a street by visitingit, the tourist without a map doesn’t know what lies down any of the side streets and doesn’t know what other streets exist. Positioned in front of 23 Church Street, the tourist without a map cannot know if there are a dozen other”23 Church Street” addresses in the city and whether this is the right one. The mapless tourist’s best chance is to ask enough people and hope some of them are not trying to mug him.

SPV verifies transactions by reference to their depth in the blockchain instead of their height.Where as a full blockchain node will construct a fully verified chain of thousands of blocks and transactions reaching down the blockchain (back in time) all the way to the genesis block, an SPV node will verify the chain of all blocks (but not all transactions) and link that chain to the transaction of interest.

全节点拥有从创世区块到最新区块的所有区块,可以用块高度来衡量;但是SPV只衡量从当前节点到关联节点的深度信息,看他被后续节点埋了有多深。SPV无法验证一个UTXO的有效性,只能通过merkle路径来建立包含的关联。例如,SPV在看到某一区块300000之后增加了6个区块以后,通过其他节点的确认来确保没有双花。

For example, when examining a transactionin block 300,000, a full node links all 300,000 blocks down to the genesis block and builds a full database of UTXO, establishing the validity of the transaction by confirming that the UTXO remains unspent. An SPV node cannot validate whether the UTXO is unspent. Instead, the SPV node will establish a link between the transaction and the block that contains it, using a merkle path (see [merkle_trees]). Then, the SPV node waits until it sees the six blocks 300,001 through 300,006 piled on top of the block containing the transaction and verifies it by establishing its depthunder blocks 300,006 to 300,001. The fact that other nodes on the network accepted block 300,000 and then did the necessary work to produce six moreblocks on top of it is proof, by proxy, that the transaction was not adouble-spend.

An SPV node cannot be persuaded that a transaction exists in a block when the transaction does not in fact exist. The SPV node establishes the existence of a transactionin a block by requesting a merkle path proof and by validating the Proof-of-Work in the chain of blocks. However, a transaction’s existence can be”hidden” from an SPV node. An SPV node can definitely prove that a transaction exists but cannot verify that a transaction, such as a double-spend of the same UTXO, doesn’t exist becauseit doesn’t have a record of all transactions. This vulnerability can beused in a denial-of-service attack or for a double-spending attack against SPV nodes. To defend against this, an SPV node needs to connect randomly to several nodes, to increase the probability thatit is in contact with at least one honest node. This need to randomly connect means that SPV nodes also are vulnerable to network partitioning attacks or Sybil attacks, where they are connected to fake nodes or fake networks and do not have access to honest nodes or the real bitcoin network.

SPV不会把不存在的交易误认为已经存在,但是无法确认一笔交易是不是不存在,因为SPV没有全网所有的区块信息。这个缺陷容易导致拒绝服务攻击,应对的办法就是随机多连接一些节点,这样连接到诚实节点的概率就会更高。

For most practical purposes,well-connected SPV nodes are secure enough, striking a balance between resourceneeds, practicality, and security. For infallible security, however, nothing beats running a full blockchain node

To get the block headers, SPV nodes use a get headers message insteadof get blocks. The responding peer will send up to 2,000 block headers using a single headers message. The process is otherwise the same as that used by a full node to retrieve full blocks. SPV nodes also set a filter on the connection to peers, to filter the stream of future blocks and transactions sent by the peers. Any transactions of interest are retrieved using a get data request. The peer generates a tx message containing the transactions, inresponse. SPV node synchronizing the block headers shows the synchronization of block headers.

SPV采用get headers消息获得区块链的头信息,但是这样的坏处是,由于需要获取特定信息,所以很容易泄露钱包的真实地址,从而暴露使用者的隐私。为了规避这个问题,开发者引入了bloom 过滤器来解决这个问题。

Because SPV nodes need to retrieve specific transactions in order to selectively verify them, they also create aprivacy risk. Unlike full blockchain nodes, which collect all transactions within each block, the SPV node’s requests for specific data can inadvertently reveal the addresses in their wallet. For example, a third party monitoring a network could keep track of all the transactions requested by a wallet on an SPV node and use those to associate bitcoin addresses with the user of that wallet, destroying the user’sprivacy.

Figure 7. SPV node synchronizing the block headers

Shortly after the introduction of SPV/lightweight nodes, bitcoin developers added a feature called bloom filters to address the privacy risks ofSPV nodes. Bloom filters allow SPV nodes to receive a subset of thetransactions without revealing precisely which addresses they are interestedin, through a filtering mechanism that uses probabilities rather than fixed patterns.

Bloom Filters    

A bloom filter is a probabilistic search filter, a way to describe a desired pattern without specifying it exactly. Bloom filters offer an efficient way toexpress a search pattern while protecting privacy. They are used by SPV nodes to ask their peers for transactions matching a specific pattern, without revealing exactly which addresses, keys, or transactions they are searchingfor.

Bloom filter提供了一种高效的搜索方法,而且可以保护隐私。用户可以在查询精度与保护隐私的两个极端中取得平衡。

In our previous analogy, a tourist without a map is asking for directions to a specific address, “23 ChurchSt.” If she asks strangers for directions to this street, she inadvertently reveals her destination. A bloom filter is like asking, “Arethere any streets in this neighborhood whose name ends in R-C-H?” A questionlike that reveals slightly less about the desired destination than asking for”23 Church St.” Using this technique, a tourist could specify thedesired address in more detail such as “ending in U-R-C-H” or lessdetail as “ending in H.” By varying the precision of the search, the tourist reveals more or less information, at the expense of getting more or less specific results. If she asks a less specific pattern, she gets a lot more possible addresses and better privacy, but many of the results are irrelevant.If she asks for a very specific pattern, she gets fewer results but loses privacy.

Bloom filters serve this function byallowing an SPV node to specify a search pattern for transactions that can betuned toward precision or privacy. Amore specific bloom filter will produce accurate results, but at the expense of revealing what patterns the SPV node is interested in, thus revealing the addresses owned by the user’s wallet. A less specific bloom filter willproduce more data about more transactions, many irrelevant to the node, butwill allow the node to maintain better privacy.

How Bloom Filters Work

Bloom filtersare implemented as a variable-size array of N binary digits (a bit field) and avariable number of M hash functions. The hash functions are designed to always produce an output that is between 1 and N, corresponding to the array of binary digits. The hash functions are generated deterministically, so that any node implementing a bloom filter will always use the same hash functions and get the same results for a specific input. By choosing different length (N) bloomfilters and a different number (M) of hash functions, the bloom filter can betuned, varying the level of accuracy and therefore privacy.

BF由一个N比特的数组和M个哈希函数组成,哈希函数的输出就是1N,而且是确定性函数。

In An example of a simplistic bloom filter,with a 16-bit field and three hash functions, we use a very small array of 16 bits and a set of three hash functions to demonstrate how bloom filters work.

Figure 8. An example of a simplistic bloom filter, with a 16-bit field and three hash functions

The bloom filter is initialized so thatthe array of bits is all zeros. To add a pattern to the bloom filter, the pattern is hashed by each hash function in turn. Applying the first hash function to the input results in a number between 1 and N. The corresponding bit in the array (indexed from 1 to N) is found and set to 1, thereby recordingthe output of the hash function. Then, the next hash function is used to setanother bit and so on. Once all M hashfunctions have been applied, the search pattern will be “recorded” in the bloom filter as M bits that have been changed from 0 to 1.

BF函数要存储一个模式进数组的时候,就把M个哈希函数分别计算对应的哈希值,然后在N个比特中对应的M个位上,设置为1.

Adding a pattern “A” to oursimple bloom filter is an example of adding a pattern “A” tothe simple bloom filter shown in An example of a simplistic bloom filter,with a 16-bit field and three hash functions.

Adding a second pattern is as simple as repeating this process. The pattern is hashed by each hash function in turn and the result is recorded by setting the bits to 1. Note that as a bloom filter isfilled with more patterns, a hash function result might coincide with a bitthat is already set to 1, in which case the bit is not changed. In essence, as more patterns record on overlapping bits, the bloom filter starts to become saturated with more bits set to 1 and the accuracy of the filter decreases. This is why the filteris a probabilistic data structure—it gets less accurate as more patterns are added. The accuracy depends on the number of patterns added versus the size ofthe bit array (N) and number of hash functions (M). A larger bit array and morehash functions can record more patterns with higher accuracy. A smaller bit array or fewer hash functions will record fewer patterns and produce less accuracy.

但是当把更多的模式存储到比特串中的时候,可能会出现某个位置已经有数值的情况,这就出现了饱和,会降低存储的精度。

Figure 9. Addinga pattern “A” to our simple bloom filter

Adding a second pattern “B” toour simple bloom filter is an example of adding a second pattern”B” to the simple bloom filter.

Figure 10.Adding a second pattern “B” to our simple bloom filter

To test if a pattern is part of a bloom filter, the pattern is hashed by each hash function and the resulting bit pattern is tested against the bit array. If all the bits indexed by the hash functions are set to 1, then the patternis probably recorded in the bloom filter. Because the bits may be set because of overlap from multiple patterns, the answer is notcertain, but is rather probabilistic. In simple terms, a bloom filter positivematch is a “Maybe, Yes.”

所以如果要检查一个模式是不是存在于bloom filter中,就检查一下所有 M个哈希函数计算出的值,在对应的位置是否都为1.如果是的话,那这个模式工可能匹配的概率非常大。但是如果对应的位置是0的话,那就一定没有匹配成功。

Testing the existence of pattern”X” in the bloom filter. The result is a probabilistic positive match, meaning “Maybe.” is an example of testing the existence of pattern “X” in the simple bloom filter. The corresponding bits are set to 1, so the pattern is probably a match.

Figure 11.Testing the existence of pattern “X” in the bloom filter. The resultis a probabilistic positive match, meaning “Maybe.”

On the contrary, if a pattern is tested against the bloom filter and any one of the bits is set to 0, this proves thatthe pattern was not recorded in the bloom filter. A negative result is not a probability, it is a certainty. Insimple terms, a negative match on a bloom filter is a “Definitely Not!”

Testing the existence of pattern”Y” in the bloom filter. The result is a definitive negative match,meaning “Definitely Not!” is an example of testing the existence of pattern “Y” in the simple bloom filter. One of the corresponding bits is set to 0, so the pattern is definitely not a match.

Figure 12.Testing the existence of pattern “Y” in the bloom filter. The resultis a definitive negative match, meaning “Definitely Not!”

 How SPV Nodes Use Bloom Filters  

Bloom filters are used to filter the transactions (and blocks containing them) that an SPV node receives from itspeers, selecting only transactions of interest to the SPV node without revealing which addresses or keys it is interested in.

SPV中,为了不泄露地址与公钥信息,采用抽取特征的方式,把需要匹配的公钥哈希、脚本哈希、交易ID以及对应的UTXO存到bloom filter中,把这个过滤器发给节点。节点在收到以后,可以把自己手里已经有的记录与bloom filter进行比对,如果能匹配上,就通过网络传输给节点。这样就避免了泄漏信息。SPV节点在收到交易以后,把不符合的交易剔除,剩下的就是待验证的符合条件的交易。

An SPV node will initialize a bloom filter as “empty”; in that state the bloom filter will not match any patterns. The SPV node will then make a list of all the addresses, keys, and hashes that it is interested in. It will do this by extracting the public key hash and script hash and transaction IDsfrom any UTXO controlled by its wallet. The SPV node then adds each ofthese to the bloom filter, so that the bloom filter will “match” ifthese patterns are present in a transaction, without revealing the patterns themselves.

The SPV node will then send a filter load message to the peer, containing the bloom filter touse on the connection. On the peer, bloom filters are checked against eachincoming transaction. The full node checks several parts of the transaction against the bloom filter, looking for a match including:

  • The transaction ID

  • The data components from the locking scripts of each   of the transaction outputs (every key and hash in the script)

  • Each of the transaction inputs

  • Each of the input signature data components (or  witness scripts)

By checking against all the secomponents, bloom filters can be used to match public key hashes, scripts,OP_RETURN values, public keys in signatures, or any future component of a smartcontract or complex script.

After a filter is established, the peer will then test each transaction’s outputs against the bloom filter. Only transactions that match the filter are sent to the node.

In response to a get data message from the node, peers will send a merkle block message that contains only block headers for blocks matching the filter and a merkle path (see [merkle_trees]) for each matching transaction. The peer will then also send tx messages containing the transactions matched by the filter.

As the full node sends transactions tothe SPV node, the SPV node discards any false positives and uses the correctly matched transactions to update its UTXO set and wallet balance. As it updates its own view of the UTXO set, it also modifies the bloom filter to match any future transactions referencing the UTXOit just found. The full node then uses the new bloom filter to match new transactions and the whole process repeats.

The node setting the bloom filter can interactively add patterns to the filter by sending a filter add message. Toclear the bloom filter, the node can send a filterclear message. Because it is not possible to remove a pattern from a bloom filter, a node has to clear and resend a new bloom filter if a pattern is no longer desired.

The network protocol and bloom filter mechanism for SPV nodes is defined in BIP-37 (PeerServices).

SPV Nodes and Privacy

使用SPV会比使用全节点的方案损失更多的隐私性,如果有人长期监听,可以通过数据流量还原出钱包的地址。

Nodes that implement SPV have weaker privacy than a full node. A full node receives all transactions and therefore reveals no information about whether it is using some address in its wallet. AnSPV node receives a filtered list of transactions related to the addresses that are in its wallet. As a result, itreduces the privacy of the owner.

Bloom filters are a way to reduce the loss of privacy. Without them, an SPV node would have to explicitly list the addresses it was interested in, creating a serious breach of privacy. However,even with bloom filters, an adversary monitoring the traffic of an SPV client or connected to it directly as a node in the P2P network can collect enough information over time to learn the addresses in the wallet of the SPV client.

Encrypted and Authenticated Connections

Most new users of bitcoin assume that the network communications of a bitcoin node are encrypted. In fact, the original implementation ofbitcoin communicates entirely in the clear. While this is not a major privacy concern for full nodes, it is a big problem for SPV nodes.

比特币最原始的实现是明文传输。有两种方式提升隐私性,一种是采用Tor,一种是采用BIP-150/151方案。

As a way to increase the privacy andsecurity of the bitcoin P2P network, there are two solutions that provide encryption of the communicationsTor Transport and P2P Authentication and Encryption with BIP-150/151.

Tor Transport

Tor, which stands for The Onion Routing network, is a software project and network that offers encryptionand encapsulation of data through randomized network paths that offer anonymity, untraceability and privacy.

Bitcoin Core offers several configuration options that allow you to run a bitcoin node withits traffic transported over the Tor network. In addition, Bitcoin Core can also offer a Tor hidden service allowing other Tor nodes to connect to your node directlyover Tor.

比特币支持直接采用Tor匿名网络进行传输。

As of Bitcoin Core version 0.12, a node will offer a hidden Tor service automatically if it is able to connect to alocal Tor service. If you have Tor installed and the Bitcoin Core process runs as a user with adequate permissions to access the Tor authentication cookie, its hould work automatically. Use the debug flag to turn on Bitcoin Core’s debugging for the Tor service like this:

$ bitcoind –daemon –debug=tor

You should see “tor: ADD_ONION successful” in the logs, indicating that Bitcoin Core has added a hiddenservice to the Tor network.

You can find more instructions onrunning Bitcoin Core as a Tor hidden service in the Bitcoin Core documentation(docs/tor.md) and various online tutorials.

Peer-to-Peer Authentication and Encryption

Two Bitcoin Improvement Proposals,BIP-150 and BIP-151, add support for P2P authentication and encryption in thebitcoin P2P network. These two BIPs define optional services that may be offered by compatible bitcoin nodes. BIP-151 enables negotiated encryption for all communications between two nodes that support BIP-151. BIP-150 offers optional peer authentication that allows nodes to authenticate each other’s identity using ECDSA and private keys. BIP-150 requires that prior to authentication the two nodes have established encrypted communications as per BIP-151.

BIP-150提供了对等认证,允许节点通过ECDSA和私钥进入认证。BIP-150要求认证前两个节点已经通过BIP-151建立了加密通信。两个标准截止20171月还没有被比特币核心网络采纳,但是bcoin版的已经实现。两种协议均可以连接到可信的全节点,通过加密与认证保护SPV客户端的安全性。

As of January 2017, BIP-150 and BIP-151are not implemented in Bitcoin Core. However, the two proposals have been implemented by at least one alternative bitcoin client named bcoin.

BIP-150 andBIP-151 allow users to run SPV clients that connect to a trusted full node,using encryption and authentication to protect the privacy of the SPV client.

Additionally, authentication can be used to create networks of trusted bitcoin nodes and prevent Man-in-the-Middleattacks. Finally, P2P encryption, if deployed broadly, would strengthen there sistance of bitcoin to traffic analysis and privacy-eroding surveillance,especially in totalitarian countries where internet use is heavily controlled and monitored.

The standard is defined in BIP-150 (Peer Authentication) and BIP-151 (Peer-to-Peer CommunicationEncryption).

Transaction Pools

Almost every node on the bitcoin network maintains a temporary list of unconfirmed transactions called the memory poolmempool, or transaction pool. Nodes use this pool to keep track of transactions that are known to the network but are not yet included in the blockchain. For example, a wallet node will use the transaction pool to track incoming payments to the user’s wallet that have been received on the network but are not yet confirmed.

所有的节点都维护了一个临时的未确认的交易列表,也称作交易池。钱包维护这个交易池的目前就是能知道网络中已经收到但是没有确认的交易。

As transactions are received and verified, they are added to the transaction pool and relayed to the neighboring nodes to propagate on the network.

交易一旦收到并验证,就会放入交易池中,并传递到相邻节点继续传播。部分比特币的实现还会专门采用一个孤立交易池,存储那些没有找到引用的交易。这些交易池仅保存在内存中,并且随着交易的增减而动态变化。

Some node implementations also maintaina separate pool of orphaned transactions. If a transaction’s inputs refer to atransaction that is not yet known, such as a missing parent, the orphan transaction will be stored temporarily in the orphan pool until the parent transaction arrives.

When a transaction is added to the transaction pool, the orphan pool is checked for any orphans that reference this transaction’s outputs (its children). Any matching orphans are then validated. If valid, they are removed from the orphan pool and added to the transaction pool, completing the chain that started with the parent transaction. In light of the newly added transaction, which is no longer an orphan, the process is repeated recursively looking for any further descendants, until no more descendants are found. Through this process, the arrival of a parent transaction triggers a cascade reconstruction of an entirechain of interdependent transactions by re-uniting the orphans with their parents all the way down the chain.

Both the transaction pool and orphan pool (where implemented) are stored in local memory and are not saved on persistent storage; rather, they are dynamically populated from incoming network messages. When a node starts, both pools are empty and are gradually populated with new transactions received on the network.

还有一些比特币实现会引入 UTXO 池,把所有未花的交易记录于其中。这个就和另外两个交易池不同,UTXO一初始化,就是非空,因为需要保存自区块链创始以来的各种未花出的交易。

Some implementations of the bitcoin client also maintain an UTXO database or pool, which is the set of all unspent outputs on the blockchain. Althoughthe name “UTXO pool” sounds similar to the transaction pool, itrepresents a different set of data. Unlike the transaction and orphan pools,the UTXO pool is not initialized empty but instead contains millions of entries of unspent transaction outputs,everything that is unspent from all the way back to the genesis block. The UTXO pool may be housed in local memory or as an indexed database table on persistent storage.

Where as the transaction and orphan pools represent a single node’s local perspective and might vary significantly from node to node depending upon when the node was started or restarted, the UTXO pool represents the emergent consensus of the network and therefore will vary little between nodes. Furthermore, the transaction and orphan pools only contain unconfirmed transactions, while the UTXO pool only contains confirmed outputs.

交易池可能在每个节点中的数据都不一样,取决于节点什么时候接入网络;但是UTXO在每个节点中的样子是相同的。不过交易池存的一般是未确认的交易,但是UTXO存储的是已经确认的交易。

欢迎大家关注我的新微信公众号,“刻意学习区块链”,我会把我所有关于区块链和比特币学习解析的文章,汇总在上面便于检索,这是ScalersTalk成长持续论的一个分叉。 搜索“刻意学习区块链”或者长按扫二维码关注。 

  1. 用苹果手机,一竿子打赏给S私人红包

新书《刻意学习》热卖中

ScalersTalk成长持续论   

 ★★★★★

成长会是由Scalers发起的面向成长、实践行动且凝聚了来自全球各地各行各业从业者的社群,有意入会者请和Scalers直接联系。我和其他会员会和你直接交流关于成长行动等各方面的经验教训。

微信公众号  l  ScalersTalk成长持续论

新 浪 微 博   l  @Scalers

网          站   l   ScalersTalk.com

开 放 社 群   l  100小时训练群C 456036104

畅 销 书 籍   《刻意学习》火热销售中

 ★★★★★ 

2018年成长会申请说明

《持续行动,为三年后的自己,扎心地做点事——ScalersTalk成长会2018年会员资格开放申请(2017.12)》(请点击)

与本文相关的文章