参考
ceph代码和名词解释 http://accelazh.github.io/ceph/Ceph-Code-Deep-Dive
ceph 网页书籍 http://blog.sina.com.cn/s/blog_153c9453d0102xvwi.html
ceph-deploy源码剖析 有其他博客 http://www.hl10502.com/2017/06/15/ceph-deploy-cli/
ceph在终端敲命令后怎么调用 https://blog.csdn.net/qq_36118718/article/details/79195621
Ceph monitor and paxos :http://catkang.github.io/2016/07/17/ceph-monitor-and-paxos.html
Ceph monitor 实现 https://www.jianshu.com/p/60b34ba5cdf2
ceph 源码剖析 https://blog.csdn.net/qq_36118718/article/details/79234737
src/ceph_mon.cc monitor启动流程 main()函数: https://blog.csdn.net/kakaxi8891/article/details/10921297
ceph-monmap命令处理流程 https://blog.csdn.net/carny/article/details/52583560
src/mon/
monitor leader选举源码分析 https://blog.csdn.net/sy_yu/article/details/79102202
https://segmentfault.com/a/1190000010413557
博客 https://blog.csdn.net/sy_yu?t=1
monitor leader选举 https://blog.csdn.net/sy_yu/article/details/79102202
Monitor 选举机制 https://blog.csdn.net/scaleqiao/article/details/52315468
https://www.cnblogs.com/shanno/p/3967116.html
MonitorDB源码分析 monmap同步 https://blog.csdn.net/qq_36118718/article/details/79234737
paxos 源码注释 https://blog.csdn.net/fishermandong/article/details/72805660
paxos 源码 phase1 https://blog.csdn.net/fishermandong/article/details/72805660
phase2 https://blog.csdn.net/fishermandong/article/details/76360237
paxos 算法 https://blog.csdn.net/qq_36118718/article/details/79134887
https://blog.csdn.net/skdkjzz/article/details/41979521
Maps https://ceph-doc.readthedocs.io/en/latest/Monitor/
PGmap,OSDmap http://ju.outofmemory.cn/entry/76367
A Ceph Monitor maintains a master copy of the cluster map. A robust ceph cluster usually contains a cluster of monitors which provide the cluster map to the clients.
the basic framework of a monitor
A ceph monitor consists of K/V store, paxos and the paxosService. The K/V store is for persistent store of monitor data. The paxos provides consistent data access logic for the paxosService layer. Each paxosService represents a kind of state information of the cluster. They change their data to the form of Key-value and then write to the paxos layer.
the initialization and leader election of a monitor
The monitor will connect other monitors according to the monmap once it starts or restarts. If it starts at the first time, it needs to build the monmap using the ceph configuration file and store it to the MonitorDBStore. If not the first time, it gets the monmap from MonitorDBStore. So once a monitor starts, it initializes the MonitorDBStore. Messenger is the network thread module. The monitor initializes it and registers the callback function which will be executed after the reception of requests. The paxos and paxosService will be described in detail later. The bootstrap process will be called time and again,which plays an important role in the lifecycle of a monitor.
After bootstrap, the monitor is in STATE_PROBING, it communicates and synchronizes with other monitors. After synchronization the cluster starts election, and decides the roles of the monitors. The detailed process is as follows.
Probing and synchronizing process:
Leader election process:
Paxos : recovery and propose
The following data structures are important in Paxos. They need to be kept in the DBStore.
|
|
---|---|
|
|
last_pn | Last Proposal Number |
accepted_pn | The last Proposal Number we have accepted.On the Leader, it will be the Proposal Number picked by the Leader itself. On the Peon, however, it will be the proposal sent by the Leader and it will only be updated if its value is higher than the one already known by the Peon. |
uncommitted_pn | Uncommitted value's Proposal Number.We use this variable to assess if the Leader should take into consideration an uncommitted value sent by a Peon. Given that the Peon will send back to the Leader the last Proposal Number it accepted, the Leader will be able to infer if this value is more recent than the one the Leader has, thus more relevant. |
first_committed | First committed value's version |
last_committed | Last committed value's version. On both the Leader and the Peons, this is the last value's version that was accepted by a given quorum and thus committed, that this instance knows about. |
uncommitted_v | Uncommitted value's version.If we have, or end up knowing about, an uncommitted value, then its version will be kept in this variable. |
uncommitted_value | If the system fails in-between the accept replies from the Peons and the instruction to commit from the Leader, then we may end up with accepted but yet-uncommitted values. During the Leader's recovery, it will attempt to bring the whole system to the latest state, and that means committing past accepted but uncommitted values. This variable will hold an uncommitted value, which may originate either on the Leader, or learnt by the Leader from a Peon during the collect phase. |
After the leader election process, the roles of leader and peon are clear. Before the consistent read and write, the mon cluster should do phase1:RECOVERY to make PN( proposal number) consistent firstly. The flow is as follows.
After phase1, we go to phase2 which is the working flow of proposing, accepting and committing when the monitors are under normal working.
The detailed processes are as follows.
how the client's requests are dealt with
When the client sends a request to the monitor, the monitor firstly dispatches the request to the corresponding PaxosService. Then PaxosService calls the functions according to whether it's a reading operation or writing. And it decides whether the propose process should be triggered.