Skip to main content

mvcc translate

· 18 min read

5.1 INTRODUCTION 对于多版本并发控制算法来说,对于每个数据实例x写操作都会生成一个X的新的副本(或者叫做版本). 数据管理器因此会保存一个包含数据管理器赋值过给X的所有版本的列表. 对于每个读操作Read(x),调度器不仅把读操作发送给数据管理器,还会告诉数据管理器他想要读x的哪一个版本. In a multiversion concurrency control algorithm, each Write on a data item x produces a new copy (or version) of X. The DM that manages x therefore keeps a list of versions of X, which is the history of values that the DM has assigned to X. For each Read(x), the scheduler not only decides when to send the Read to the DM, but it also tells the DM which one of the versions of x to read. 多版本并发控制的优点在于帮助调度器避免拒绝太晚的操作(也就是说晚一点的操作不会被拒绝).举个例子,(单版本的情况下)调度器会拒绝读他本该读但是被覆盖的数据. 在多版本的情况下,旧的值不会被覆盖,因此可以延迟去读.调度器可以通过读旧版本的值来避免(单版本下)对于读操作的拒绝. 保持多版本并对于并发控制并不会花费太多成本,因为对于故障恢复算法来说也是需要版本的信息. The benefit of multiple versions for concurrency control is to help the scheduler avoid rejecting operations that arrive too late. For example, the scheduler normally rejects a Read because the value it was supposed to read has already been overwritten. With multiversions, such old values are never overwritten and are therefore always available to tardy Reads. The scheduler can avoid rejecting the Read simply by having the Read read an old version.’ Maintaining multiple versions may not add much to the cost of concurrency control, because the versions may be needed anyway by the recovery algorithm. As we’ll see in the next chapter, many recovery algorithms have to maintain some before image information, at least of those data items that have been updated by active transactions; the recovery algorithm needs those before images in case any of the active transactions abort. The before images of a data item are exactly its list of old versions. It is a small step for the DM to make those versions explicitly available to the scheduler. 一个很明显的花费是保存多个版本需要很多存储空间,为了控制这些存储的现在,版本内容必须要定期周期性地清理或者归档. 因为某些特定的版本还会被活跃的事务(也就是没有提交或者没有回滚回滚的事务),清理版本信息需要同时兼顾活跃的事务. 清理动作也是mvcc的另外一个花费. An obvious cost of maintaining multiple versions is storage space. To control this storage requirement, versions must periodically be purged or archived. Since certain versions may be needed by active transactions, purging versions must be synchronized with respect to active transactions. This purging activity is another cost of multiversion concurrency control.

我们假设当事务被抛弃,那些那么这个事务创建的版本也会被销毁.在我们后续的讨论中,词版本描述的是已提交事务或者活跃事务的数据对应的值.因此,当调度器决定分配x的特定版本给操作Read(x),返回的值不会包含被抛弃的事务. 如果版本读产生于活跃的事务,可恢复性(也就是回滚)要求读操作对应的事务的提交必须晚于被读的活跃事务的提交.

We assume that if a transaction is aborted, any versions it created are destroyed. In our subsequent discussion, the term “version” will refer to the value of a data item produced by a transaction that’s either active or committed. Thus, when the scheduler decides to assign a particular version of x to Read(x), the value returned is not one produced by an aborted transaction. If the version read is one produced by an active transaction, recoverability requires that the reader’s commitment be delayed until the transaction that produced the version has committed. 如果被读的事务最后被抛弃了(这个事务对应的版本也无效了),该活跃的事务也需要因此而被抛弃. If that transaction actually aborts (thereby invalidating its version), the reader must also be aborted. 当前存在的多版本内容仅仅对调度器和数据管理器可见,对于用户使用事务是不可见的. The existence of multiple versions is only visible to the scheduler and DM, not to user transactions. 事务只会持有该数据比如x和y.除了数据库处理系统,用户本身看上去只有一个版本,即,在用户的角度看是最后一个会写入 Transactions still reference data items, such as x and y. Users therefore expect the DBS to behave as if there were only one version of each data item, namely, the last one that was written from that user’s perspective. 调度器会通过使用多版本来减少拒绝的操作,从而提升性能. The scheduler may use multiple versions to improve performance by rejecting operations less frequently. 不过最后的结果会和单版本的结果看上去一样. But it must not change the system’s functionality over a single version view of the database. There are many applications of databases in which users do want to explicitly access each of the multiple versions of a data item. For example, a user may wish to maintain several versions of a design database: the last design released for manufacturing, the last design checked for correctness, and the most recent working design. The user may update each version of the design independently. Since the existence of these multiple versions is not transparent to the user, such applications are not appropriate for the multiversion concurrency control algorithms described in this chapter.

Analyzing Correctness 分析mvcc的正确性 为了分析mvcc算法的正确性,我们需要扩展可序列化理论.我们需要扩展两类历史:运行在多版本数据库的多版本历史,在用户看来是单版本的单版本历史.用户会把序列化但版本历史(因为我们把事务看成只有一个版本的,我们所有的目标就是多版本执行的内容和但版本看到的一样,相类似的例子就是编译器经过ssr优化后很多顺序都变了但是看上去都一样) To analyze the correctness of multiversion concurrency control algorithms, we need to extend serializability theory. This extension requires two types of histories: multiversion (MV) histories that represent the DM’s execution of operations on a multiversion database, and single version (IV) histories that represent the interpretation of MV histories in the users’ single version view of the database. Serial 1V histories are the histories that the user regards as correct. 不过实际上系统是多版本的(只是看上去和单版本的一样).所以为了证明这个并发控制算法是正确的,我们必须证明多版本的历史的约束和单版本的是等价的.那么多版本历史单版本历史等价是什么意思?(也就是多版本历史和单版本历史等价的动作语义是什么) But the system actually produces MV histories. So, to prove that a concurrency control aIgorithm is correct, we must prove that each of the MV histories that it can produce is equivalent to a serial 1V history, What does it mean for an MV history to be equivalent to a 1V history? 我们通过拓展单版本历史之间的等价来描述多版本历史单版本历史等价.为了做这个扩展,我们需要引入一些符号. Let’s try to answer this by extending the definition of equivalence of 1V histories that we used in Chapters 2-4. To attempt this extension, we need a little notation. 对于每个数据实例x,我们用xi,xj...来表示x的版本,下标是写这个版本的事务的编号(也就是xi就是表示事务i写了一个版本x), 因此对于多版本历史,永远都是这个下标Wi[Xi],版本的下标和和事务的下标一样.多版本历史的读操作则没有那么特殊,举个例子ri[xj]. For each data item X, we denote the versions of x by xi, xj, . . . , where the subscript is the index of the transaction that wrote the version. Thus, each Write in an MV history is always of the form Wi[Xi], where the version subscript equals the transaction subscript. Reads are denoted in the usual way, such as ri[xj]. 假如我们说多版本历史单版本历史等价的(equivalence)的定义是:如果多版本的每个操作的重复和单版本的冲突都一样. 考虑多版本历史

H1 = w0[x0]c0w1[x1]c1r2[x0]w2[y2]c2

H1这个历史里面,只有w0[x0]r2[x0]是冲突的,写操作w1[x1]w0[x0]以及r2[x0]不冲突,因为他们操作的是不同版本的数据,即x1.现在我们来考虑单版本历史:

H2 = w0[x]c0w1[x]c1r2[x]w2[y]c2 

我们通过去掉操作的版本的下标(也就是去掉版本号)来构造历史H2,比如x1映射成x,x0也映射成x,y2映射成y.这种情况下(H2的单版本历史),w0[x]r2[x0]是冲突的.按照定义(如果他们冲突一样)则他们等价,但是这其实是不合理的(虽然他们都冲突).在历史H2,T2T1x.但是在历史H2,T1读的是T0(也就是说冲突是一样但是读的数据来源是不一样的).因为这两个历史(H1和H2)读的来源不一致,所以他们最后写的操作也不一致 Suppose we adopt a definition of equivalence that says an MV history HM” is equivalent to a 1V history HIV if every pair of conflicting operations in Hbp, is in the same order in HIV. Consider the MV history H, = wobol co WEA cl rz[xol w,[yzl cz. The only two operations in H, that conflict are w,[x,] and r,[x,]. The operation w,[x,] does not conflict with either w,[x,] or r,[x,], because it operates on a different We constructed H, by mapping each operation on versions x0, x,, and yz in H, into the same operation on the corresponding data items x and y. Notice that the two operations in H, that conflict, w,[x,] and r,[x,], are in the same order in H, as in H,. So, according to the definition of equivalence just given, H, is equivalent to H,. But this is not reasonable. In H,, T, reads x from T,, whereas in H,, T, reads x from T,,.’ Since T2 reads a different value of x in H, and H,, it may write a different value in y. 我们的对于(单版本和多版本之间)的冲突之所以有一点问题,是因为多版本历史单版本历史操作的是不同的对象: 一个是对版本的操作,一个是对数据的操作(可以类比一个是"打11岁的你"和"打你"是不同语义的).他们的操作是有不同的冲突属性. 举个例子,多版本情况下w1[x1]r2[x0]不冲突,(相对应的版本历史.怎么对应?当然是把下标去掉)单版本情况下w1[x]r2[x]是冲突的. 因此,如果仅仅通过冲突来定义他们是等价的,这是不精确的. This definition of equivalence based on conflicts runs into trouble because MVand 1V histories have slightly different operations - version operations versus data item operations. These operations have different conflict properties. For example, w,[x,] does not conflict with yz[xo], but their corresponding 1V operations w,[x] and TJX] do conflict. Therefore, a definition of equivalence based on conflicts is inappropriate. 因此,为了解决这个问题(读的来源不一样的问题),我们需要回到2.6节的视图等价. 回想一下,如果两个历史的读的源都一样而且最后写也一样则他们视图等价. 比较历史H1H2.在H1(多版本历史)中,T2从T1读x,但是对于H2(单版本历史),T2从T0读x,因此H1和H2视图不等价 To solve this problem, we need to return to first principles by adopting the more fundamental definition of view equivalence developed in Section 2.6. Recall that two histories are view equivalent if they have the same reads-from relationships and the same final writes. Comparing histories H, and H,, we see that T, reads x from T, in H,, but T, reads x from T, in H,. Thus, H, is not view equivalent to H2. 然后我们获得满足条件等价(单版本和多版本之间的等价,定义是视图等价,那么两者是等价的)的定义. 我们需要通过一个对单版本历史与多版本历史之间等价的方式. 其中一个方式是SG(H)是无环的,所以历史H是等价于序列化多版本历史,但是仅仅是这样是没有太大帮助.因为(无环)序列化历史不是等价于序列化单版本历史. 举个例子: H3 = w0[x0]w0[y0]c0R// todo 如果我们把版本的内容作为单独的数据实例,就会构造一下的序列化图,虽然这个序列化图是无环的,H3还是和单版本的不等价,因为他们映射的读来源不一样

Now that we have a satisfactory definition of equivalence, we need a way of showing that every MV history H produced by a given multiversion concurrency control algorithm is equivalent to a serial 1V history. One way would be to show that SG( H) is acyclic, so H is equivalent to a serial MV history, Unfortunately, this doesn’t help much, because not every serial MV history is equivalent to a serial 1V history. For example, consider the serial MV history 仅仅是多版本历史的子集,也就是l-serial MV histories才会等价于序列化单版本历史,如果所有的read-from关系,要么读自己的事务,要么读最新的事务,这样就说l-serial MV histories, 这样的serial MV histories是可以于单版本的历史等价 Only a subset of serial MV histories, called l-serial MV histories, are equivalent to serial 1V histories. Intuitively, a serial MV history is I-serial if for each reads-from relationship, say T, reads x from T,, T, is the last trdnsaction preceding T, that writes any version of x. Notice that Ii, is not l-serial because TL reads x from T,), not T,, which is the last transaction preceding T2 that writes x.

mv2pl > 1sr > 5.3 Let H be an MV history over ?: C(H) is equivalent to a serial, 1V history over T iff H is 1SR.

我整理的mvcc原理

1SR 定义

An MV history is one-copy serializable (or 1SR) if its committed projection is equivalent to a l-serial MV history.

1-series MV history:

每个事务读先于他的事务里面写版本最大的 1 多版本事务Hi 和事务Hj,如果Hi从事务Hj读x,那么满足约束: Hj的x的版本是先于Hi的所有事务里面写x的版本最新的一个

  • 一个小问题:一个事务先于另外一个事务,这是个偏序关系,这里有个疑问,怎么定义这个先于

MV History Equivalence:

两个历史的事务中,读的来源都一样则两个多版本事务视图等价

mv2pl imply mvsg mvsg imply 1sr 1sr imply 1-serial

相关阅读

mysql的select

· 4 min read

为什么想看select的代码

有一个场景,遇到一个表只有十多万,但是表大小有几十g,为什么呢?因为有个字段是longtext. 放了很多很长的文本.发现select * from table limit 1000就已经读不出来了

一个简单的select语句

select id from test wher  id < 100 ;

这个流程究竟发生了什么?

第一步其实和php差不多,先是编译原理的前端几步lex和parse 第二步就是逻辑优化和物理优化, 其实可以想成是常用的编程语言的常量折叠或者数据流图的分析,就是编译时的优化 第三步也就是语义动作了,也就是真正的执行过程也可以想做是运行时: 但是条件对于读来说是不可见的,条件是作用于索引上 , 然后返回所有行 , 再根据列筛选出来,然后再join和排序,对于这个sql来说,他唯一作用就是通过索引读出内容,内存和硬盘对于他来说是不存在的

首先是读表,这个表是从ibd文件来的.所以终究是需要调用系统调用读文件,那么linux用系统调用是pread

索引在哪保存? 保存在表空间里面 内容保存到哪里? 保存到表空间里面 关联是在哪里发生? 发生在从索引从表空间读出来 关联是整整一行关联吗? 是的. 二级索引怎么读的? 通过二级索引读一级索引,一级索引读内容.

所以实际上是有一个语义上的层是: sql -> 语义动作 作用于索引 -> 索引访问表空间(有点像交换空间或者物理内存和虚拟内存的关系一样一样, 需要的时候才从硬盘硬盘加载)

下面是相关代码

(gdb) bt
#0 srv_start (create_new_db=create_new_db@entry=false) at /home/ubuntu/mysql-8.0.23/storage/innobase/srv/srv0start.cc:1857
#1 0x00005555577275b6 in innobase_init_files (tablespaces=0x7fffea1f1380, dict_init_mode=DICT_INIT_CHECK_FILES) at /home/ubuntu/mysql-8.0.23/storage/innobase/handler/ha_innodb.cc:5042
#2 innobase_ddse_dict_init (dict_init_mode=DICT_INIT_CHECK_FILES, version=<optimized out>, tables=0x7fffea1f1360, tablespaces=0x7fffea1f1380)
at /home/ubuntu/mysql-8.0.23/storage/innobase/handler/ha_innodb.cc:12323
#3 0x00005555573d2aef in dd::bootstrap::DDSE_dict_init (thd=thd@entry=0x55555b899410, dict_init_mode=dict_init_mode@entry=DICT_INIT_CHECK_FILES, version=80023)
at /home/ubuntu/mysql-8.0.23/sql/dd/impl/bootstrap/bootstrapper.cc:737
#4 0x00005555575f92e4 in dd::upgrade_57::do_pre_checks_and_initialize_dd (thd=0x55555b899410) at /home/ubuntu/mysql-8.0.23/sql/dd/upgrade_57/upgrade.cc:911
#5 0x0000555556697ec5 in bootstrap::handle_bootstrap (arg=arg@entry=0x7fffffffda10) at /home/ubuntu/mysql-8.0.23/sql/bootstrap.cc:323
#6 0x0000555557b934a1 in pfs_spawn_thread (arg=0x55555b834c80) at /home/ubuntu/mysql-8.0.23/storage/perfschema/pfs.cc:2900
#7 0x00007ffff7bbb6db in start_thread (arg=0x7fffea1f2700) at pthread_create.c:463
#8 0x00007ffff61b571f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) info threads 
Id Target Id Frame
1 Thread 0x7ffff7fe7880 (LWP 17988) "mysqld" 0x00007ffff7bbcd2d in __GI___pthread_timedjoin_ex (threadid=140737121298176, thread_return=thread_return@entry=0x0, abstime=abstime@entry=0x0,
block=block@entry=true) at pthread_join_common.c:89
* 2 Thread 0x7fffea1f2700 (LWP 18000) "mysqld" __libc_pread64 (fd=4, buf=0x7fffe81d0000, count=65536, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:29
4 Thread 0x7fffe8f49700 (LWP 18269) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42eda10)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
5 Thread 0x7fffe3b32700 (LWP 18270) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edab0)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
6 Thread 0x7fffe3331700 (LWP 18271) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edb50)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
7 Thread 0x7fffe2b30700 (LWP 18272) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edbf0)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
8 Thread 0x7fffe232f700 (LWP 18273) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edc90)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
9 Thread 0x7fffe1b2e700 (LWP 18274) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edd30)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
10 Thread 0x7fffe132d700 (LWP 18275) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42eddd0)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
11 Thread 0x7fffe0b2c700 (LWP 18276) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42ede70)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
12 Thread 0x7fffd3d5f700 (LWP 18277) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edf10)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
13 Thread 0x7fffd355e700 (LWP 18278) "mysqld" 0x00007ffff7bc1ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffe42edfb0)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
14 Thread 0x7fffd2d5d700 (LWP 18279) "mysqld" 0x00007ffff6ace280 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

相关阅读

RSA

· One min read

前提相关定义

质数的集合为$Pri$

rsa 的证明

M是需要任意整数 , $e,d$需要满足

$$
e·d\equiv 1 \pmod{\phi(n)} \qquad(1)
$$
我们要证明
$$
M^{e·d} ≡ M^{k·\phi(n)+1} \equiv M \pmod{n} \qquad(2)
$$
其中
#### 欧拉函数$\phi(x)$
$\phi(x)$定义是小于或等于n的正整数中与n互质的数的数目.
其中,如果`x`是质数,则欧拉函数的求值结果为`x-1`
$$\phi(x)=x-1 , x\in Pri $$

下面我们来证明公式$(2)$的右边部分 , 也就是:

$$
M^{k·\phi(n)+1} \equiv M \pmod{n} \qquad(3)
$$
- 当M可以被n整除的时候,也就是满足
$$
M ≡ 0 (mod p)
$$

成立

相关阅读

three value prediate and sql

· 2 min read

三值逻辑

三值的谓词: 谓词求值后有三个元素{T,F,U} ,也就是true , false , 和unknow

求值和谓词

SELECT 1=NULL

在mysql , 这个会返回一行,这行是值是null

① 然后根据规范 where casehaving 子句都只取三值逻辑真值中的true

   SELECT 1 NOT IN (1,  NULL)    ## false 所以不会被条件筛选出来
SELECT 1 NOT IN ( NULL) ## null , 因为上面的规则①所以也不会被筛选出来

所以在where子句中使用not innot int 中包含null的时候会筛选不出来

exist 规范规定exist是个二值函数,所以要映射成true或者false , mysql中则是只要返回一行真值就是true

pushdown

· One min read

谓词下推:

为什么可以谓词下推?

因为交换律结合律

怎么下推,也就是触发条件?

rule instance

语义

tcp与消息队列与paxos与顺序

· One min read

我们如何保证消息的可靠性?

前提: 每块消息都是分割成一小块

如何保证不丢消息?

每块消息映射一个id , 我们只要保证每个id都有就能保证我们的消息必然是全的(没有丢失的 , 因为id是全序的 , id映射的内容本身已是不会丢失的)

泛型

· 2 min read

对于没有泛型的情况 比如

max(a : int , b:int){
xxx
}

入参是a , b ,这两个参数的类型的约束是 int , 也就是这个函数的约束是 max(int , int) 泛型的语义就是:

max(a :<T> , b:<T>){

}

这个约束是什么?

约束变成 max(<T> ,<T>)

作用: 我们的约束变更加范化了,这个如果按照编译原理或者解空间来说,我们可选的映射更多了.

泛型的作用:和类的继承差不多,因为继承既是优点也是缺点.

泛型这个约束也是既有优点也有缺点,泛型的优点是更加接近无类型类型,所以缺点也是大家会滥用无类型的内容

就像继承一样,其实很多继承是没有必要的,或者重构的继承非常难.

说到底,如果我们写了一个通用的轮子,如果和多地方用到了这个轮子,那么如果这个轮子经常更改,就需要考虑用到这个轮子的相关代码是不是要兼容

如果这个轮子被共用的地方少,那么就不用兼容那么多

所以我们抽象就比如面临改动频繁的缺点

约束和结构

· One min read

结构和性质是同构的吗?

约束和结构是同构的吗?

我一直觉得,我们之所以很难维护好业务代码,是因为我们的约束和我们的业务不是同构的,我们一个又一个函数在传递,大部分时间都工作得很好,但是总有一些误差经过传递之后会被触发.

为什么代码庞大之后很难改? 因为经过传递之后依赖太多值了.对于一个函数来说(a , b) -> (c) , 一开始只依赖a和b

过了几天,我们改了a的依赖,a依赖于d,e 然后就这样了(d,e) -> (a)

这样一直迭代后,依赖就没人能理清了