必须启用序列化才能在群集中的ActorSystems (nodes) 之间发送消息。在很多情况下,使用Jackson序列化是一个不错的选择,官网建议使用。
有关特定文档主题,请参见:
- 何时何地使用Akka集群
- 集群规范
- 群集成员服务
- 更高级别的群集工具
- 滚动更新
- 操作、管理、可观察性
1、依赖包
val AkkaVersion = "2.6.9" libraryDependencies += "com.typesafe.akka" %% "akka-cluster-typed" % AkkaVersion
2、Cluster API Extension 集群API扩展
The Cluster extension gives you access to management tasks such as Joining, Leaving and Downing and subscription of cluster membership events such as MemberUp, MemberRemoved and UnreachableMember, which are exposed as event APIs.
集群扩展允许您访问管理任务,如加入、退出和关闭,以及订阅集群成员身份事件,如MemberUp、MemberRemoved和UnreachableMember,它们作为事件API公开。
It does this through these references on the Cluster extension:
它通过集群扩展上的这些引用来实现这一点:
- manager: An ActorRef[akka.cluster.typed.ClusterCommand] where a ClusterCommand is a command such as: Join, Leave and Down管理者:一个集群命令,例如:加入、离开、关闭
- subscriptions: An ActorRef[akka.cluster.typed.ClusterStateSubscription] where a ClusterStateSubscription is one of GetCurrentState or Subscribe and Unsubscribe to cluster events like MemberRemoved
- 订阅:一个ActorRef[akka.cluster.typed。其中ClusterStateSubscription是GetCurrentState或Subscribe和Unsubscribe等群集事件之一
- state: The current CurrentClusterState。状态当前的CurrentClusterState
以下所有用例都假定导入下列引用:
import akka.actor.typed._
import akka.actor.typed.scaladsl._
import akka.cluster.ClusterEvent._
import akka.cluster.MemberStatus
import akka.cluster.typed._
The minimum configuration required is to set a host/port for remoting and the akka.actor.provider = "cluster"
.
所需的最低配置是为远程处理和akka.actor.provider=“集群”。
akka { actor { provider = "cluster" } remote.artery { canonical { hostname = "127.0.0.1" port = 2551 } } cluster { seed-nodes = [ "akka://ClusterSystem@127.0.0.1:2551", "akka://ClusterSystem@127.0.0.1:2552"] downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider" } }
Accessing the Cluster
extension on each node:
访问每个节点上的群集扩展:
val cluster = Cluster(system)
ActorSystem
must be the same for all members, which is passed in when you start the ActorSystem
.注意:集群的ActorSystem的名称对于所有成员都必须相同,这是在启动ActorSystem时传入的。
Joining and Leaving a Cluster
加入和离开集群
If not using configuration to specify seed nodes to join, joining the cluster can be done programmatically via the manager
.
如果不使用配置指定要加入的种子节点,则可以通过管理器以编程方式加入集群。
cluster.manager ! Join(cluster.selfMember.address)
Leaving the cluster and downing a node are similar:
离开群集和关闭节点类似:
cluster2.manager ! Leave(cluster2.selfMember.address)
Cluster Subscriptions
群集订阅
Cluster subscriptions can be used to receive messages when cluster state changes. For example, registering for all MemberEvents, then using the manager to have a node leave the cluster will result in events for the node going through the Membership Lifecycle.
群集订阅可用于在群集状态更改时接收消息。例如,注册所有MemberEvents,然后使用管理器让一个节点离开集群,将导致该节点的事件经历成员资格生命周期。
This example subscribes to a subscriber: ActorRef[MemberEvent]:
本例订阅了一个订阅服务器:ActorRef[MemberEvent]:
cluster.subscriptions ! Subscribe(subscriber, classOf[MemberEvent])
Then asking a node to leave:
然后请求节点离开:
cluster.manager ! Leave(anotherMemberAddress) // subscriber will receive events MemberLeft, MemberExited and MemberRemoved //订阅服务器将接收事件MemberLeft、MemberExited和MemberRemoved
Cluster State
群集状态
Instead of subscribing to cluster events it can sometimes be convenient to only get the full membership state with Cluster(system).state
. Note that this state is not necessarily in sync with the events published to a cluster subscription.
与订阅群集事件不同,有时使用cluster(system.state)只获取完全成员身份状态可能比较方便。请注意,此状态不一定与发布到群集订阅的事件同步。
See Cluster Membership more information on member events specifically. There are more types of change events, consult the API documentation of classes that extends akka.cluster.ClusterEvent.ClusterDomainEvent
for details about the events.
有关成员事件的详细信息,请参见集群成员资格。有更多类型的更改事件,请参考扩展的类的API文档akka.cluster.ClusterEvent有关活动的详细信息。
3、Cluster Membership API 群集成员资格API
Joining
The seed nodes are initial contact points for joining a cluster, which can be done in different ways:
种子节点是加入集群的初始接触点,可以通过不同的方式实现:
- automatically with Cluster Bootstrap 自动使用群集引导
- with configuration of seed-nodes 配置种子节点
- programatically 程序化的
After the joining process the seed nodes are not special and they participate in the cluster in exactly the same way as other nodes.
在加入过程之后,种子节点并不特殊,它们以与其他节点完全相同的方式参与集群。
Joining automatically to seed nodes with Cluster Bootstrap
使用集群引导自动加入种子节点
Automatic discovery of nodes for the joining process is available using the open source Akka Management project’s module, Cluster Bootstrap. Please refer to its documentation for more details.
使用开源Akka管理项目的模块clusterbootstrap可以自动发现加入过程中的节点。更多细节请参考其文档。
Joining configured seed nodes
加入已配置的种子节点
When a new node is started it sends a message to all seed nodes and then sends join command to the one that answers first. If no one of the seed nodes replied (might not be started yet) it retries this procedure until successful or shutdown.
You can define the seed nodes in the configuration file (application.conf):
当一个新节点启动时,它会向所有种子节点发送一条消息,然后向最先应答的节点发送join命令。如果没有一个种子节点响应(可能尚未启动),它将重试此过程,直到成功或关闭。
您可以在配置文件中定义种子节点(应用程序.conf):
akka.cluster.seed-nodes = [ "akka://ClusterSystem@host1:2552", "akka://ClusterSystem@host2:2552"]
This can also be defined as Java system properties when starting the JVM using the following syntax:
当使用以下语法启动JVM时,也可以将其定义为Java系统属性:
-Dakka.cluster.seed-nodes.0=akka://ClusterSystem@host1:2552 -Dakka.cluster.seed-nodes.1=akka://ClusterSystem@host2:2552
When a new node is started it sends a message to all configured seed-nodes
and then sends a join command to the one that answers first. If none of the seed nodes replied (might not be started yet) it retries this procedure until successful or shutdown.
当一个新节点启动时,它会向所有配置的种子节点发送一条消息,然后向首先应答的节点发送一个join命令。如果没有任何种子节点响应(可能尚未启动),它将重试此过程,直到成功或关闭。
The seed nodes can be started in any order. It is not necessary to have all seed nodes running, but the node configured as the first element in the seed-nodes
list must be started when initially starting a cluster. If it is not, the other seed-nodes will not become initialized, and no other node can join the cluster. The reason for the special first seed node is to avoid forming separated islands when starting from an empty cluster. It is quickest to start all configured seed nodes at the same time (order doesn’t matter), otherwise it can take up to the configured seed-node-timeout
until the nodes can join.
种子节点可以按任何顺序启动。没有必要让所有种子节点都运行,但在初始启动集群时,必须启动配置为种子节点列表中第一个元素的节点。否则,其他种子节点将不会初始化,其他节点也不能加入集群。特殊的第一种子节点是为了避免在从空簇开始时形成分离的孤岛。同时启动所有已配置的种子节点是最快的(顺序无关紧要),否则会占用配置的种子节点超时时间,直到节点可以加入为止。
As soon as more than two seed nodes have been started, it is no problem to shut down the first seed node. If the first seed node is restarted, it will first try to join the other seed nodes in the existing cluster. Note that if you stop all seed nodes at the same time and restart them with the same seed-nodes
configuration they will join themselves and form a new cluster, instead of joining remaining nodes of the existing cluster. That is likely not desired and can be avoided by listing several nodes as seed nodes for redundancy, and don’t stop all of them at the same time.
一旦启动了两个以上的种子节点,就可以关闭第一个种子节点。如果第一个种子节点重新启动,它将首先尝试加入现有集群中的其他种子节点。请注意,如果您同时停止所有种子节点,并使用相同的种子节点配置重新启动它们,则这些节点将自行连接并形成一个新的群集,而不是连接现有群集的其余节点。这可能是不需要的,可以通过将几个节点列为种子节点来避免冗余,并且不要同时停止所有节点。
If you are going to start the nodes on different machines you need to specify the ip-addresses or host names of the machines in application.conf
instead of 127.0.0.1
如果要在不同的计算机上启动节点,则需要在中指定计算机的ip地址或主机名应用程序.conf而不是127.0.0.1
Joining programmatically to seed nodes
以编程方式连接到种子节点
Joining programmatically is useful when dynamically discovering other nodes at startup through an external tool or API.
当通过外部工具或API在启动时动态发现其他节点时,以编程方式连接非常有用。
-
import akka.actor.Address import akka.actor.AddressFromURIString import akka.cluster.typed.JoinSeedNodes val seedNodes: List[Address] = List("akka://ClusterSystem@127.0.0.1:2551", "akka://ClusterSystem@127.0.0.1:2552").map(AddressFromURIString.parse) Cluster(system).manager ! JoinSeedNodes(seedNodes)
The seed node address list has the same semantics as the configured seed-nodes
, and the the underlying implementation of the process is the same, see Joining configured seed nodes.
种子节点地址列表具有与配置的种子节点相同的语义,并且该进程的底层实现也是相同的,请参见连接已配置的种子节点。
When joining to seed nodes you should not include the node itself, except for the node that is supposed to be the first seed node bootstrapping the cluster. The desired initial seed node address should be placed first in the parameter to the programmatic join.
当加入到种子节点时,不应该包括节点本身,除非该节点应该是引导集群的第一个种子节点。所需的初始种子节点地址应首先放在编程联接的参数中。
Tuning joins
调整联接
Unsuccessful attempts to contact seed nodes are automatically retried after the time period defined in configuration property seed-node-timeout
. Unsuccessful attempts to join a specific seed node are automatically retried after the configured retry-unsuccessful-join-after
. Retrying means that it tries to contact all seed nodes, then joins the node that answers first. The first node in the list of seed nodes will join itself if it cannot contact any of the other seed nodes within the configured seed-node-timeout
.
在配置属性seed node timeout中定义的时间段之后,将自动重试未成功尝试联系种子节点。不成功的加入特定种子节点的尝试将在配置的重试失败后自动重试。重试意味着它尝试联系所有种子节点,然后加入首先应答的节点。如果在配置的种子节点超时时间内无法联系任何其他种子节点,则种子节点列表中的第一个节点将加入自身。
The joining of given seed nodes will, by default, be retried indefinitely until a successful join. That process can be aborted if unsuccessful by configuring a timeout. When aborted it will run Coordinated Shutdown, which will terminate the ActorSystem by default. CoordinatedShutdown can also be configured to exit the JVM. If the seed-nodes
are assembled dynamically, it is useful to define this timeout, and a restart with new seed-nodes should be tried after unsuccessful attempts.
默认情况下,将无限期地重试连接节点,直到成功。如果不成功,可以通过配置超时来中止该进程。当中止时,它将运行协调关闭,这将在默认情况下终止actor系统。CoordinatedShutdown也可以配置为退出JVM。如果种子节点是动态组装的,那么定义这个超时是很有用的,在尝试失败之后,应该尝试使用新的种子节点重新启动。
akka.cluster.shutdown-after-unsuccessful-join-seed-nodes = 20s
akka.coordinated-shutdown.terminate-actor-system = on
If you don’t configure seed nodes or use one of the join seed node functions, you need to join the cluster manually by using JMX or HTTP.
如果不配置seed节点或使用join seed node函数之一,则需要使用JMX或HTTP手动加入集群。
You can join to any node in the cluster. It does not have to be configured as a seed node. Note that you can only join to an existing cluster member, which for bootstrapping means a node must join itself and subsequent nodes could join them to make up a cluster.
您可以加入群集中的任何节点。它不必配置为种子节点。请注意,您只能连接到现有的集群成员,这对于引导来说意味着一个节点必须连接自身,后续节点可以将它们连接起来组成集群。
An actor system can only join a cluster once, additional attempts will be ignored. Once an actor system has successfully joined a cluster, it would have to be restarted to join the same cluster again. It can use the same host name and port after the restart. When it come up as a new incarnation of an existing member in the cluster and attempts to join, the existing member will be removed and its new incarnation allowed to join.
参与者系统只能加入集群一次,其他尝试将被忽略。一旦actor系统成功加入集群,就必须重新启动它才能再次加入同一集群。重新启动后,它可以使用相同的主机名和端口。当它作为集群中现有成员的新化身出现并尝试加入时,该现有成员将被移除并允许其新的化身加入。
Leaving
离开
There are a few ways to remove a member from the cluster.
有几种方法可以从集群中删除成员。
- The recommended way to leave a cluster is a graceful exit, informing the cluster that a node shall leave. This is performed by Coordinated Shutdown when the
ActorSystem
is terminated and also when a SIGTERM is sent from the environment to stop the JVM process.建议离开集群的方法是优雅地退出,通知集群一个节点应该离开。当ActorSystem终止时,以及从环境中发送SIGTERM以停止JVM进程时,这是通过协调关闭来执行的。 - Graceful exit can also be performed using HTTP or JMX.也可以使用HTTP或JMX执行优雅的退出。
- When a graceful exit is not possible, for example in case of abrupt termination of the the JVM process, the node will be detected as unreachable by other nodes and removed after Downing. 当无法正常退出时,例如在JVM进程突然终止的情况下,该节点将被其他节点检测为不可访问,并在关闭后将其删除。
Graceful leaving offers faster hand off to peer nodes during node shutdown than abrupt termination and downing.
与突然终止和关闭相比,优雅离开在节点关闭期间为对等节点提供了更快的切换。
The Coordinated Shutdown will also run when the cluster node sees itself as Exiting
, i.e. leaving from another node will trigger the shutdown process on the leaving node. Tasks for graceful leaving of cluster, including graceful shutdown of Cluster Singletons and Cluster Sharding, are added automatically when Akka Cluster is used. For example, running the shutdown process will also trigger the graceful leaving if not already in progress.
当集群节点认为自己正在退出时,协调关闭也将运行,即离开另一个节点将触发离开节点上的关闭过程。使用Akka集群时,会自动添加集群优雅离开的任务,包括集群单体的优雅关闭和集群分片。例如,运行关闭进程还将触发优雅离开(如果尚未进行)。
Normally this is handled automatically, but in case of network failures during this process it may still be necessary to set the node’s status to Down
in order to complete the removal, see Downing.
通常这是自动处理的,但是如果在此过程中发生网络故障,可能仍然需要将节点的状态设置为Down才能完成删除,请参见Downing。
Downing
关闭
In many cases a member can gracefully exit from the cluster, as described in Leaving, but there are scenarios when an explicit downing decision is needed before it can be removed. For example in case of abrupt termination of the the JVM process, system overload that doesn’t recover, or network partitions that don’t heal. I such cases the node(s) will be detected as unreachable by other nodes, but they must also be marked as Down
before they are removed.
在许多情况下,成员可以优雅地退出集群,如离开中所述,但是在某些情况下,需要一个显式的downing决策才能将其删除。例如,在JVM进程突然终止、系统过载无法恢复或网络分区无法修复的情况下。在这种情况下,节点将被其他节点检测为不可访问,但它们也必须在被删除之前标记为关闭。
When a member is considered by the failure detector to be unreachable
the leader is not allowed to perform its duties, such as changing status of new joining members to ‘Up’. The node must first become reachable
again, or the status of the unreachable member must be changed to Down
. Changing status to Down
can be performed automatically or manually.
当一个成员被故障检测器认为是不可到达的时候,领导者不被允许执行其职责,例如将新加入成员的状态更改为“Up”。节点必须首先再次变为可访问,或者无法访问成员的状态必须更改为Down。可以自动或手动将状态更改为关闭。
We recommend that you enable the Split Brain Resolver that is part of the Akka Cluster module. You enable it with configuration:
我们建议您启用分割大脑解析器,这是AkkA集群模块的一部分。您可以通过配置启用它:
akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
You should also consider the different available downing strategies.
你还应该考虑不同的可用的击倒策略。
If a downing provider is not configured downing must be performed manually using HTTP or JMX.
如果未配置downing提供程序,则必须使用HTTP或JMX手动执行downing。
Note that Cluster Singleton or Cluster Sharding entities that are running on a crashed (unreachable) node will not be started on another node until the previous node has been removed from the Cluster. Removal of crashed (unreachable) nodes is performed after a downing decision.
请注意,在崩溃(不可访问)节点上运行的集群单例或集群分片实体将不会在另一个节点上启动,直到从集群中删除前一个节点。崩溃(无法访问)节点的删除是在一个失败的决定之后执行的。
Downing can also be performed programmatically with Cluster(system).manager ! Down(address)
, but that is mostly useful from tests and when implementing a DowningProvider
.
也可以使用Cluster(system).manager以编程方式执行Downing!Down(address),但这在测试和实现DowningProvider时非常有用。
If a crashed node is restarted and joining the cluster again with the same hostname and port, the previous incarnation of that member will first be downed and removed. The new join attempt with same hostname and port is used as evidence that the previous is no longer alive.
如果一个崩溃的节点重新启动,并使用相同的主机名和端口再次加入集群,那么该成员的前一个化身将首先被关闭并删除。具有相同主机名和端口的新加入尝试被用作前一个不再有效的证据。
If a node is still running and sees its self as Down
it will shutdown. Coordinated Shutdown will automatically run if run-coordinated-shutdown-when-down
is set to on
(the default) however the node will not try and leave the cluster gracefully.
如果一个节点仍在运行,并且认为自己关闭了,那么它将关闭。如果run Coordinated Shutdown设置为on(默认值),则协调关闭将自动运行,但是节点不会尝试优雅地离开集群。
Node Roles
节点角色
Not all nodes of a cluster need to perform the same function. For example, there might be one sub-set which runs the web front-end, one which runs the data access layer and one for the number-crunching. Choosing which actors to start on each node, for example cluster-aware routers, can take node roles into account to achieve this distribution of responsibilities.
并非集群的所有节点都需要执行相同的功能。例如,可能有一个子集运行web前端,一个子集运行数据访问层,另一个子集用于数字处理。选择在每个节点上启动哪些参与者,例如集群感知路由器,可以考虑节点角色来实现这种职责分配。
The node roles are defined in the configuration property named akka.cluster.roles
and typically defined in the start script as a system property or environment variable.
节点角色在名为的配置属性中定义akka.cluster.roles公司通常在启动脚本中定义为系统属性或环境变量。
The roles are part of the membership information in MemberEvent
that you can subscribe to. The roles of the own node are available from the selfMember
and that can be used for conditionally start certain actors:
角色是MemberEvent中成员资格信息的一部分,您可以订阅这些信息。自身节点的角色可从selfMember获得,并可用于有条件地启动某些参与者:
-
val selfMember = Cluster(context.system).selfMember if (selfMember.hasRole("backend")) { context.spawn(Backend(), "back") } else if (selfMember.hasRole("frontend")) { context.spawn(Frontend(), "front") }
Failure Detector
故障探测器
The nodes in the cluster monitor each other by sending heartbeats to detect if a node is unreachable from the rest of the cluster. Please see:
集群中的节点通过发送心跳信号来检测节点是否无法从集群的其余部分访问来相互监视。请看
- Failure Detector specification
故障探测器规范
- Phi Accrual Failure Detector implementation
功率因数累积故障检测器的实现
- Using the Failure Detector
使用故障检测器
Using the Failure Detector
使用故障检测器
Cluster uses the akka.remote.PhiAccrualFailureDetector
failure detector by default, or you can provide your by implementing the akka.remote.FailureDetector
and configuring it:
群集使用akka.remote.phiAccountalfailures检测器默认情况下,也可以通过实现远程故障探测器并对其进行配置:
akka.cluster.implementation-class = "com.example.CustomFailureDetector"
In the Cluster Configuration you may want to adjust these depending on you environment:
在群集配置中,您可能需要根据您的环境调整这些配置:
- When a phi value is considered to be a failure
akka.cluster.failure-detector.threshold 当phi值被认为是故障时
- Margin of error for sudden abnormalities
akka.cluster.failure-detector.acceptable-heartbeat-pause 突发异常的误差范围
Higher level Cluster tools
更高级别的群集工具
Cluster Singleton
单原子簇
For some use cases it is convenient or necessary to ensure only one actor of a certain type is running somewhere in the cluster. This can be implemented by subscribing to member events, but there are several corner cases to consider. Therefore, this specific use case is covered by the Cluster Singleton.
对于某些用例,确保只有一个特定类型的actor在集群中的某个地方运行是很方便或必要的。这可以通过订阅成员事件来实现,但是有几个角落的情况需要考虑。因此,这个特定的用例由集群Singleton覆盖。
See Cluster Singleton.
Cluster Sharding
集群分片
Distributes actors across several nodes in the cluster and supports interaction with the actors using their logical identifier, but without having to care about their physical location in the cluster.
将参与者分布在集群中的几个节点上,并支持使用参与者的逻辑标识符与参与者进行交互,但不必关心它们在集群中的物理位置。
See Cluster Sharding.
Distributed Data
分布式数据
Distributed Data is useful when you need to share data between nodes in an Akka Cluster. The data is accessed with an actor providing a key-value store like API.
当您需要在Akka集群中的节点之间共享数据时,分布式数据非常有用。数据由一个actor访问,actor提供了一个类似API的键值存储。
See Distributed Data.
Distributed Publish Subscribe
分布式发布订阅
Publish-subscribe messaging between actors in the cluster based on a topic, i.e. the sender does not have to know on which node the destination actor is running.
基于主题在集群中的参与者之间发布订阅消息,即发送方不必知道目标参与者在哪个节点上运行。
See Distributed Publish Subscribe.
Cluster aware routers
群集感知路由器
Distribute messages to actors on different nodes in the cluster with routing strategies like round-robin and consistent hashing.
将消息分发给集群中不同节点上的参与者,使用诸如循环调度和一致哈希等路由策略
See Group Routers.
Cluster across multiple data centers
跨多个数据中心群集
Akka Cluster can be used across multiple data centers, availability zones or regions, so that one Cluster can span multiple data centers and still be tolerant to network partitions.
Akka集群可以跨多个数据中心、可用区域或区域使用,因此一个集群可以跨多个数据中心,并且仍然能够容忍网络分区。
See Cluster Multi-DC.
Reliable Delivery
可靠的交付
Reliable delivery and flow control of messages between actors in the Cluster.
集群中参与者之间消息的可靠传递和流控制。