public interface CPSubsystemManagementService
Moreover, the current CP subsystem implementation works only in memory
without persisting any state to disk. It means that a crashed CP member
will not be able to recover by reloading its previous state. Therefore,
crashed CP members create a danger for gradually losing majority of
CP groups and eventually total loss of availability of the CP subsystem.
To prevent such kind of situations, CPSubsystemManagementService
offers APIs for dynamic management of CP members.
The CP subsystem relies on Hazelcast's failure detectors to test reachability of CP members. Before removing a CP member from the CP subsystem, please make sure that it is declared as unreachable by Hazelcast's failure detector and removed from Hazelcast's member list.
CP member additions and removals are internally handled by performing a single membership change at a time. When multiple CP members are shutting down concurrently, their shutdown process is executed serially. First, the Metadata CP group creates a membership change plan for CP groups. Then, scheduled changes are applied to CP groups one by one. After all removals are done, the shutting down CP member is removed the active CP members list and its shutdown process is completed.
When a CP member is being shut down, it is replaced with another available CP member in all of its CP groups, including the Metadata group, in order to not to decrease or more importantly not to lose majority of CP groups. If there is no available CP member to replace a shutting down CP member in a CP group, that group's size will be reduced by 1 and its majority value will be recalculated.
A new CP member can be added to the CP subsystem to either increase number of available CP members for new CP groups or to fulfill missing slots in existing CP groups. After the initial Hazelcast cluster startup is done, an existing Hazelcast member can be be promoted to the CP member role. This new CP member will automatically join to CP groups that has missing members, and majority value of these CP groups will be recalculated.
A CP member may crash due to hardware problems or a defect in user code, or it may become unreachable because of connection problems, such as network partitions, network hardware failures, etc. If a CP member is known to be alive but only has temporary communication issues, it will catch up the other CP members and continue to operate normally after its communication issues are resolved. If it is known to be crashed or communication issues cannot be resolved in a short time, it can be preferable to remove this CP member from the CP subsystem, hence from all its CP groups. In this case, the unreachable CP member should be terminated to prevent any accidental communication with the rest of the CP subsystem.
When majority of a CP group is lost for any reason, that CP group cannot
make progress anymore. Even a new CP member cannot join to this CP group,
because all membership changes also go through the Raft consensus algorithm.
For this reason, the only option is to force-destroy the CP group via the
forceDestroyCPGroup(String)
API. When this API is used, the CP
group is terminated non-gracefully, without the Raft algorithm mechanics.
Then, all CP data structure proxies that talk to this CP group fail with
CPGroupDestroyedException
. However, if a new proxy is created
afterwards, then this CP group will be re-created from scratch with a new
set of CP members. Losing majority of a CP group can be likened to
partition-loss scenario of AP Hazelcast.
Please note that CP groups that have lost their majority must be force-destroyed immediately, because they can block the Metadata CP group to perform membership changes.
Loss of majority of the Metadata CP group is the doomsday scenario for the
CP subsystem. It is a fatal failure and the only solution is to reset the
whole CP subsystem state via the restart()
API. To be able to
reset the CP subsystem, the initial size of the CP subsystem must be
satisfied, which is defined by CPSubsystemConfig.getCPMemberCount()
.
For instance, CPSubsystemConfig.getCPMemberCount()
is 5 and only 1
CP member is currently alive, when restart()
is called,
additional 4 regular Hazelcast members should exist in the cluster.
New Hazelcast members can be started to satisfy
CPSubsystemConfig.getCPMemberCount()
.
CPMember
,
CPSubsystemConfig
Modifier and Type | Method and Description |
---|---|
ICompletableFuture<Void> |
forceDestroyCPGroup(String groupName)
Unconditionally destroys the given active CP group without using
the Raft algorithm mechanics.
|
ICompletableFuture<CPGroup> |
getCPGroup(String name)
Returns the active CP group with the given name.
|
ICompletableFuture<Collection<CPGroupId>> |
getCPGroupIds()
Returns all active CP group ids.
|
ICompletableFuture<Collection<CPMember>> |
getCPMembers()
Returns the current list of CP members
|
ICompletableFuture<Void> |
promoteToCPMember()
Promotes the local Hazelcast member to a CP member.
|
ICompletableFuture<Void> |
removeCPMember(String cpMemberUuid)
Removes the given unreachable CP member from the active CP members list
and all CP groups it belongs to.
|
ICompletableFuture<Void> |
restart()
Wipes & resets the whole CP subsystem and initializes it
as if the Hazelcast cluster is starting up initially.
|
ICompletableFuture<Collection<CPGroupId>> getCPGroupIds()
ICompletableFuture<CPGroup> getCPGroup(String name)
ICompletableFuture<Void> forceDestroyCPGroup(String groupName)
CPGroupDestroyedException
.
Once a CP group is destroyed, it can be created again with a new set of CP members.
This method is idempotent. It has no effect if the given CP group is already destroyed.
ICompletableFuture<Collection<CPMember>> getCPMembers()
ICompletableFuture<Void> promoteToCPMember()
This method is idempotent. If the local member is already in the active CP members list, then this method will have no effect. When the current member is promoted to CP member, its member UUID is assigned as CP member UUID.
Once the returned Future
object is completed, the promoted CP
member has been added to the CP groups that have missing members, i.e.,
whose size is smaller than CPSubsystemConfig.getGroupSize()
.
If the local member is currently being removed from
the active CP members list, then the returned Future
object
will throw IllegalArgumentException
.
If there is an ongoing membership change in the CP subsystem when this
method is invoked, then the returned Future
object throws
IllegalStateException
If the CP subsystem initial discovery process has not completed when
this method is invoked, then the returned Future
object throws
IllegalStateException
IllegalArgumentException
- If the local member is currently being
removed from the active CP members listIllegalStateException
- If there is an ongoing membership change
in the CP subsystemICompletableFuture<Void> removeCPMember(String cpMemberUuid)
Before removing a CP member from the CP subsystem, please make sure that it is declared as unreachable by Hazelcast's failure detector and removed from Hazelcast's member list. The behaviour is undefined when a running CP member is removed from the CP subsystem.
IllegalStateException
- When another CP member is being removed
from the CP subsystemIllegalArgumentException
- if the given CP member is already
removed from the CP member listICompletableFuture<Void> restart()
After this method is called, all CP state and data will be wiped and CP members will start with empty state.
This method can be invoked only from the Hazelcast master member.
This method must not be called while there are membership changes in the cluster. Before calling this method, please make sure that there is no new member joining and all existing Hazelcast members have seen the same member list.
Use with caution: This method is NOT idempotent and multiple invocations can break the whole system! After calling this API, you must observe the system to see if the restart process is successfully completed or failed before making another call.
IllegalStateException
- When this method is called on
a Hazelcast member that is not the Hazelcast cluster masterIllegalStateException
- if current member count of the cluster
is smaller than CPSubsystemConfig.getCPMemberCount()
Copyright © 2019 Hazelcast, Inc.. All Rights Reserved.