public interface CPSubsystemManagementService
Unlike the dynamic nature of Hazelcast clusters, CP Subsystem requires manual intervention while expanding/shrinking its size, or when a CP member crashes or becomes unreachable. When a CP member becomes unreachable, it is not automatically removed from CP Subsystem because it could be still alive and partitioned away.
Moreover, by default CP Subsystem works in memory without persisting any
state to disk. It means that a crashed CP member will not be able to recover
by reloading its CP identity and state. Therefore, crashed CP members create
a danger for gradually losing majority of CP groups and eventually total
loss of the availability of CP Subsystem. To prevent such situations,
CPSubsystemManagementService
offers a set of APIs. In addition, CP
Subsystem Persistence can be enabled to make CP members persist their CP
state to stable storage. Please see CPSubsystem
and
CPSubsystemConfig.setPersistenceEnabled(boolean)
for more details
about CP Subsystem Persistence.
CP Subsystem relies on Hazelcast's failure detectors to test reachability of CP members. Before removing a CP member from CP Subsystem, please make sure that it is declared as unreachable by Hazelcast's failure detector and removed from Hazelcast cluster's member list.
CP member additions and removals are internally handled by performing
a single membership change at a time. When multiple CP members are shutting
down concurrently, their shutdown process is executed serially. When a CP
membership change is triggered, the METADATA CP group creates a membership
change plan for CP groups. Then, the scheduled changes are applied to the CP
groups one by one. After all CP group member removals are done, the shutting
down CP member is removed from the active CP members list and its shutdown
process is completed. A shut-down CP member is automatically replaced with
another available CP member in all of its CP groups, including the METADATA
CP group, in order not to decrease or more importantly not to lose
the majority of CP groups. If there is no available CP member to replace
a shutting down CP member in a CP group, that group's size is reduced by 1
and its majority value is recalculated. Please note that this behaviour is
when CP Subsystem Persistence is disabled. When CP Subsystem Persistence is
enabled, shut-down CP members are not automatically removed from the active
CP members list and they are still considered as part of CP groups
and majority calculations, because they can come back by restoring their
local CP state from stable storage. If you know that a shut-down CP member
will not be restarted, you need to remove that member from CP Subsystem via
removeCPMember(UUID)
.
A new CP member can be added to CP Subsystem to either increase the number of available CP members for new CP groups or to fill the missing slots in existing CP groups. After the initial Hazelcast cluster startup is done, an existing Hazelcast member can be be promoted to the CP member role. This new CP member automatically joins to CP groups that have missing members, and majority values of these CP groups are recalculated.
CPSubsystem
,
CPMember
,
CPSubsystemConfig
Modifier and Type | Method and Description |
---|---|
boolean |
awaitUntilDiscoveryCompleted(long timeout,
TimeUnit timeUnit)
Blocks until the CP discovery process is completed, or the given
timeout occurs, or the current thread is interrupted, whichever
happens first.
|
CompletionStage<Void> |
forceDestroyCPGroup(String groupName)
Unconditionally destroys the given active CP group without using
the Raft algorithm mechanics.
|
CompletionStage<CPGroup> |
getCPGroup(String name)
Returns the active CP group with the given name.
|
CompletionStage<Collection<CPGroupId>> |
getCPGroupIds()
Returns all active CP group ids.
|
CompletionStage<Collection<CPMember>> |
getCPMembers()
Returns the current list of CP members
|
CPMember |
getLocalCPMember()
Returns the local CP member if this Hazelcast member is part of
CP Subsystem, returns null otherwise.
|
boolean |
isDiscoveryCompleted()
Returns whether the CP discovery process is completed or not.
|
CompletionStage<Void> |
promoteToCPMember()
Promotes the local Hazelcast member to the CP role.
|
CompletionStage<Void> |
removeCPMember(UUID cpMemberUuid)
Removes the given unreachable CP member from the active CP members list
and all CP groups it belongs to.
|
CompletionStage<Void> |
reset()
Wipes and resets the whole CP Subsystem state and initializes it
as if the Hazelcast cluster is starting up initially.
|
CPMember getLocalCPMember()
This field is initialized when the local Hazelcast member is one of
the first CPSubsystemConfig.getCPMemberCount()
members
in the cluster and the CP discovery process is completed.
HazelcastException
if CP Subsystem is not
enabled.HazelcastException
- if CP Subsystem is not enabledisDiscoveryCompleted()
,
awaitUntilDiscoveryCompleted(long, TimeUnit)
CompletionStage<Collection<CPGroupId>> getCPGroupIds()
CompletionStage<CPGroup> getCPGroup(String name)
CompletionStage<Void> forceDestroyCPGroup(String groupName)
CPGroupDestroyedException
. However, if a new proxy is created
afterwards, then this CP group is re-created from scratch with a new set
of CP members.
This method is idempotent. It has no effect if the given CP group is already destroyed.
CompletionStage<Collection<CPMember>> getCPMembers()
CompletionStage<Void> promoteToCPMember()
This method is idempotent. If the local member is already in the active CP members list, i.e., it is already a CP member, then this method has no effect. When the local member is promoted to the CP role, its member UUID is assigned as its CP member UUID.
Once the returned Future
object is completed, the promoted CP
member has been added to CP groups that have missing CP members, i.e.,
whose current size is smaller than
CPSubsystemConfig.getGroupSize()
.
If the local member is currently being removed from
the active CP members list, then the returned Future
object
will throw IllegalArgumentException
.
If there is an ongoing membership change in CP Subsystem when this
method is invoked, then the returned Future
object throws
IllegalStateException
.
If the CP discovery process has not completed yet when this method is
invoked, then the returned Future
object throws
IllegalStateException
.
If the local member is a lite member, the returned Future
object
throws IllegalStateException
.
CompletionStage<Void> removeCPMember(UUID cpMemberUuid)
Before removing a CP member from CP Subsystem, please make sure that it is declared as unreachable by Hazelcast's failure detector and removed from Hazelcast's member list. The behavior is undefined when a running CP member is removed from CP Subsystem.
IllegalStateException
- When another CP member is currently being
removed from CP SubsystemIllegalArgumentException
- if the given CP member is already
removed from CP SubsystemCompletionStage<Void> reset()
After this method is called, all CP state and data are wiped and CP members start with empty state.
This method can be invoked only from the Hazelcast master member, which is the first member in the Hazelcast cluster member list.
This method must not be called while there are membership changes in the Hazelcast cluster. Before calling this method, please make sure that there is no new member joining and all existing Hazelcast members have seen the same member list.
To be able to use this method, the initial CP member count of CP
Subsystem, which is defined by
CPSubsystemConfig.getCPMemberCount()
, must be satisfied. For
instance, if CPSubsystemConfig.getCPMemberCount()
is 5 and
only 1 CP member is alive, when this method is called, 4 additional AP
Hazelcast members should exist in the cluster, or new Hazelcast members
must be started.
This method also deletes all data written by CP Subsystem Persistence.
This method triggers a new CP discovery process round. However, if the new CP discovery round fails for any reason, Hazelcast members are not terminated, because Hazelcast members are likely to contain data for AP data structures and their termination can cause data loss. Hence, you need to observe the cluster and check if the CP discovery process completes successfully.
Use with caution: This method is NOT idempotent and multiple invocations can break the whole system! After calling this API, you must observe the system to see if the reset process is successfully completed or failed before making another call.
IllegalStateException
- When this method is called on
a Hazelcast member that is not the Hazelcast cluster masterIllegalStateException
- if current member count of the cluster
is smaller than CPSubsystemConfig.getCPMemberCount()
boolean isDiscoveryCompleted()
true
if the CP discovery process completed,
false
otherwiseawaitUntilDiscoveryCompleted(long, TimeUnit)
boolean awaitUntilDiscoveryCompleted(long timeout, TimeUnit timeUnit) throws InterruptedException
timeout
- maximum time to waittimeUnit
- time unit of the timeouttrue
if CP discovery completed, false
otherwiseInterruptedException
- if interrupted while waitingisDiscoveryCompleted()
Copyright © 2023 Hazelcast, Inc.. All rights reserved.