Interface CPSubsystem
- All Known Implementing Classes:
CPSubsystemStubImpl
HazelcastInstance.getCPSubsystem()
. Its data
structures are CP with respect to the CAP principle, i.e., they always
maintain linearizability and prefer consistency over availability during
network partitions. Besides network partitions, CP Subsystem withstands
server and client failures.
Currently, CP Subsystem contains only the implementations of Hazelcast's
concurrency APIs. Since these APIs do not maintain large states, all members
of a Hazelcast cluster do not necessarily take part in CP Subsystem.
The number of Hazelcast members that takes part in CP Subsystem is specified
on member's configuration. Say that it is
configured as N. Then, when a Hazelcast cluster starts, the first N members
form CP Subsystem. These members are called CPMember
s, and they can
also contain data for other regular -AP- Hazelcast data structures, such as
IMap
.
Data structures in CP Subsystem run in CPGroup
s. Each CP group
elects its own Raft leader and runs the Raft consensus algorithm
independently. CP Subsystem runs 2 CP groups by default. The first one is
the METADATA CP group which is an internal CP group responsible for managing
CP members and CP groups. It is initialized during cluster startup if CP
Subsystem is enabled via configuration and CP member count greater than 0.
The second CP group is the DEFAULT CP group, whose name is given in
CPGroup.DEFAULT_GROUP_NAME
. If a group name is not specified while
creating a CP data structure proxy, that data structure is mapped to
the DEFAULT CP group. For instance, when a CP IAtomicLong
instance
is created via .getAtomicLong("myAtomicLong")
, it is initialized on
the DEFAULT CP group. Besides these 2 pre-defined CP groups, custom CP
groups can be created at run-time while fetching CP data structure proxies.
For instance, if a CP IAtomicLong
is created by calling
.getAtomicLong("myAtomicLong@myGroup")
, first a new CP group is
created with the name "myGroup" and then "myAtomicLong" is initialized on
this custom CP group.
This design implies that each CP member can participate to more than one CP group. CP Subsystem runs a periodic background task to ensure that each CP member performs the Raft leadership role for roughly equal number of CP groups. For instance, if there are 3 CP members and 3 CP groups, each CP member becomes Raft leader for only 1 CP group. If one more CP group is created, then one of the CP members gets the Raft leader role for 2 CP groups. This is done because Raft is a leader-based consensus algorithm. A Raft leader node becomes responsible for handling incoming requests from callers and replicating them to follower nodes. If a CP member gets the Raft leadership role for too many CP groups compared to other CP members, it can turn into a bottleneck.
CP Subsystem achieves horizontal scalability thanks to all of the aforementioned CP group management capabilities. You can scale out the throughput and memory capacity by distributing your CP data structures to multiple CP groups (i.e., manual partitioning / sharding) and distributing those CP groups over CP members (i.e., choosing a CP group size that is smaller than the CP member count configuration). Nevertheless, the current set of CP data structures have quite low memory overheads. Moreover, related to the Raft consensus algorithm, each CP group makes use of internal heartbeat RPCs to maintain authority of the Raft leader and help lagging CP group members to make progress. Last, the new CP lock and semaphore implementations rely on a brand new session mechanism. In a nutshell, a Hazelcast server or a client starts a new session on the corresponding CP group when it makes its very first lock or semaphore acquire request, and then periodically commits session heartbeats to this CP group in order to indicate its liveliness. It means that if CP locks and semaphores are distributed to multiple CP groups, there will be a session management overhead on each CP group.
CP Subsystem runs a discovery process on cluster startup. When CP Subsystem is enabled by setting a positive value to member count, say N, the first N members in the Hazelcast cluster member list initiate this discovery process. Other Hazelcast members skip this step. The CP discovery process runs out of the box on top of Hazelcast's cluster member list without requiring any custom configuration for different environments. It is completed when each one of the first N Hazelcast members initializes its local CP member list and commits it to the METADATA CP group. A soon-to-be CP member terminates itself if any of the following conditions occur before the CP discovery process is completed:
- Any Hazelcast member leaves the cluster,
- The local Hazelcast member commits a CP member list which is different from other members' committed CP member lists,
- The local Hazelcast member fails to commit its discovered CP member list for any reason.
The CP data structure proxies differ from the other Hazelcast data
structure proxies in two aspects. First, an internal commit is performed on
the METADATA CP group every time you fetch a proxy from this interface.
Hence, callers should cache returned proxy objects. Second, if you call
DistributedObject.destroy()
on a CP data structure proxy, that data
structure is terminated on the underlying CP group and cannot be
reinitialized until the CP group is force-destroyed via cp management service
For this reason, please make sure that you are completely done with a CP data
structure before destroying its proxy.
By default, CP Subsystem works only in memory without persisting any state to disk. It means that a crashed CP member is not able to join to the cluster back by restoring its previous state. Therefore, crashed CP members create a danger for gradually losing majority of CP groups and eventually cause the total loss of availability of CP Subsystem. To prevent such situations, crashed CP members can be removed from CP Subsystem and replaced in CP groups with other available CP members. This flexibility provides a good degree of fault-tolerance at run-time.
Moreover, there is a subtle point about concurrent shutdown of CP
members when CP Subsystem Persistence is disabled. If there are N CP members
in CP Subsystem, HazelcastInstance.shutdown()
can be called on N-2
CP members concurrently. Once these N-2 CP members complete their shutdown,
the remaining 2 CP members must be shut down serially. Even though
the shutdown API can be called concurrently on multiple CP members, since
the METADATA CP group handles shutdown requests serially, it would be simpler
to shut down CP members one by one, by calling
HazelcastInstance.shutdown()
on the next CP member once the current
CP member completes its shutdown. This rule does not apply when CP Subsystem
Persistence is enabled so you can shut down your CP members concurrently
if you enabled CP Subsystem Persistence. Please see CPSubsystem
to learn more about the shut-down behaviour of CP members. It is
enough for users to recall this rule while shutting down CP members when
CP Subsystem Persistence is disabled. Interested users can read the rest of
this paragraph to learn the reasoning behind this rule. Each shutdown
request internally requires a Raft commit to the METADATA CP group when CP
Subsystem Persistence is disabled. A CP member proceeds to shutdown after it
receives a response of this commit. To be able to perform a Raft commit,
the METADATA CP group must have its majority up and running. When only 2 CP
members are left after graceful shutdowns, the majority of the METADATA CP
group becomes 2. If the last 2 CP members shut down concurrently, one of
them is likely to perform its Raft commit faster than the other one and
leave the cluster before the other CP member completes its Raft commit.
In this case, the last CP member waits for a response of its commit attempt
on the METADATA CP group, and times out eventually. This situation causes
an unnecessary delay on shutdown process of the last CP member. On the other
hand, when the last 2 CP members shut down serially, the N-1th member
receives the response of its commit after its shutdown request is committed
also on the last CP member. Then, the last CP member checks its local data
to notice that it is the last CP member alive, and proceeds its shutdown
without attempting a Raft commit on the METADATA CP group.
CP Subsystem's fault tolerance capabilities are summarized below. For the sake of simplicity, let's assume that both the CP member count and CP group size configurations are configured as the same and we use only the DEFAULT CP group. In the list below, "a permanent crash" means that a CP member either crashes while CP Subsystem Persistence is disabled, hence it cannot be recovered with its CP identity and data, or it crashes while CP Subsystem Persistence is enabled but its CP data cannot be recovered, for instance, due to a total server crash or a disk failure.
- If a CP member leaves the Hazelcast cluster, it is not automatically removed from CP Subsystem because CP Subsystem cannot certainly determine if that member has actually crashed or just disconnected from the cluster. Therefore, absent CP members are still considered in majority calculations and cause a danger for the availability of CP Subsystem. If the user knows for sure that an absent CP member is crashed, she can remove that CP member from CP Subsystem via server-side management service. This API call removes the given CP member from all CP groups and recalculates their majority values. If there is another available CP member in CP Subsystem, the removed CP member is replaced with that one, or the user can promote an AP member of the Hazelcast cluster to the CP role via server-side management service.
- There might be a small window of unavailability after a CP member crash even if the majority of CP members are still online. For instance, if a crashed CP member is the Raft leader for some CP groups, those CP groups run a new leader election round to elect a new leader among remaining CP group members. CP Subsystem API calls that internally hit those CP groups are retried until they have new Raft leaders. If a failed CP member has the Raft follower role, it causes a very minimal disruption since Raft leaders are still able to replicate and commit operations with the majority of their CP group members.
- If a crashed CP member is restarted after it is removed from CP Subsystem, its behaviour depends on if CP Subsystem Persistence is enabled or disabled. If CP Subsystem Persistence is enabled, a restarted CP member is not able to restore its CP data from disk because after it joins back to the cluster it notices that it is no longer a CP member. Because of that, it fails its startup process and prints an error message. The only thing to do in this case is manually delete its CP persistence directory since its data is no longer useful. On the other hand, if CP Subsystem Persistence is disabled, a failed CP member cannot remember anything related to its previous CP identity, hence it restarts as a new AP member.
- A CP member can be encounter a network issue and disconnect from the cluster. If it is removed from CP Subsystem by the user even though this CP member is actually alive but only disconnected, this CP member should be terminated to prevent any accidental communication with the other CP members in CP Subsystem.
- If a network partition occurs, behaviour of CP Subsystem depends on how CP members are divided in different sides of the network partition and to which sides Hazelcast clients are connected. Each CP group remains available on the side that contains the majority of its CP members. If a Raft leader falls into the minority side, its CP group elects a new Raft leader on the other side and callers that are talking to the majority side continue to make successful API calls on CP Subsystem. However, callers that are talking to the minority side fail with operation timeouts. When the network problem is resolved, CP members reconnect to each other and CP groups continue their operation normally.
- CP Subsystem can tolerate failure of the minority of CP members (less than N / 2 + 1) for availability. If N / 2 + 1 or more CP members crash, CP Subsystem loses its availability. If CP Subsystem Persistence is enabled and the majority of CP members become online by successfully restarting some of failed CP members, CP Subsystem regains its availability back. Otherwise, it means that CP Subsystem has lost its majority irrevocably. In this case, the only solution is to wipe-out the whole CP Subsystem state by performing a server-side force-reset.
When cp members count is greater than group size, CP groups are formed by selecting a subset of CP members. In this case, each CP group can have a different set of CP members, therefore different fault-tolerance and availability conditions. In the following list, CP Subsystem's additional fault tolerance capabilities are discussed for this configuration case.
- When the majority of a non-METADATA CP group permanently crash, that CP
group cannot make progress anymore, even though other CP groups in CP
Subsystem are running fine. Even a new CP member cannot join to this CP
group, because membership changes also go through the Raft consensus
algorithm. For this reason, the only option is to force-destroy this CP
group via server-side management service. When this
API is called, the CP group is terminated non-gracefully without the Raft
mechanics. After this API call, all existing CP data structure proxies that
talk to this CP group fail with
CPGroupDestroyedException
. However, if a new proxy is created afterwards, then this CP group is re-created from scratch with a new set of CP members. Losing majority of a non-METADATA CP group can be likened to partition-loss scenario of AP Hazelcast. Please note that non-METADATA CP groups that have lost their majority must be force-destroyed immediately, because they can block the METADATA CP group to perform membership changes on CP Subsystem. - If the majority of the METADATA CP group permanently crash, unfortunately it is equivalent to the permanent crash of the majority CP members of the whole CP Subsystem, even though other CP groups are running fine. In fact, existing CP groups continue serving to incoming requests, but since the METADATA CP group is not available anymore, no management tasks can be performed on CP Subsystem. For instance, a new CP group cannot be created. The only solution is to perform a force-reset which wipes-out the whole CP Subsystem state via server-side management service.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
Constant identifying service forIAtomicLong
, to be used withgetObjectInfos(CPGroupId, String)
static final String
Constant identifying service forIAtomicReference
, to be used withgetObjectInfos(CPGroupId, String)
static final String
Constant identifying service forICountDownLatch
, to be used withgetObjectInfos(CPGroupId, String)
static final String
Constant identifying service forCPMap
, to be used withgetObjectInfos(CPGroupId, String)
static final String
Constant identifying service forFencedLock
, to be used withgetObjectInfos(CPGroupId, String)
static final String
Constant identifying service forISemaphore
, to be used withgetObjectInfos(CPGroupId, String)
-
Method Summary
Modifier and TypeMethodDescriptionRegisters a new CPGroupAvailabilityListener to listen CP group availability changes.addMembershipListener
(CPMembershipListener listener) Registers a new CPMembershipListener to listen CP membership changes.getAtomicLong
(String name) Returns a proxy for anIAtomicLong
instance created on CP Subsystem.<E> IAtomicReference<E>
getAtomicReference
(String name) Returns a proxy for anIAtomicReference
instance created on CP Subsystem.getCountDownLatch
(String name) Returns a proxy for anICountDownLatch
instance created on CP Subsystem.Returns all active CP group ids.Returns a proxy for anFencedLock
instance created on CP Subsystem.<K,
V> CPMap<K, V> Returns a proxy for aCPMap
.getObjectInfos
(CPGroupId groupId, String serviceName) Returns info about all objects of given type within the given groupgetSemaphore
(String name) Returns a proxy for anISemaphore
instance created on CP Subsystem.getTombstoneInfos
(CPGroupId groupId, String serviceName) Returns info about all tombstones of given type within the given groupboolean
Removes CPGroupAvailabilityListener registration.boolean
Removes membership listener registration.
-
Field Details
-
ATOMIC_LONG
Constant identifying service forIAtomicLong
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
ATOMIC_REFERENCE
Constant identifying service forIAtomicReference
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
COUNT_DOWN_LATCH
Constant identifying service forICountDownLatch
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
SEMAPHORE
Constant identifying service forISemaphore
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
LOCK
Constant identifying service forFencedLock
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
CP_MAP
Constant identifying service forCPMap
, to be used withgetObjectInfos(CPGroupId, String)
- See Also:
-
-
Method Details
-
getAtomicLong
Returns a proxy for anIAtomicLong
instance created on CP Subsystem. Hazelcast'sIAtomicLong
is a distributed version of java.util.concurrent.atomic.AtomicLong. If no group name is given within the "name" parameter, then theIAtomicLong
instance will be created on the DEFAULT CP group. If a group name is given, like.getAtomicLong("myLong@group1")
, the given group will be initialized first, if not initialized already, and then theIAtomicLong
instance will be created on this group. ReturnedIAtomicLong
instance offers linearizability and behaves as a CP register. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Parameters:
name
- name of theIAtomicLong
proxy- Returns:
IAtomicLong
proxy for the given name- Throws:
HazelcastException
- if CP Subsystem is not enabled
-
getAtomicReference
Returns a proxy for anIAtomicReference
instance created on CP Subsystem. Hazelcast'sIAtomicReference
is a distributed version of java.util.concurrent.atomic.AtomicReference. If no group name is given within the "name" parameter, then theIAtomicReference
instance will be created on the DEFAULT CP group. If a group name is given, like.getAtomicReference("myRef@group1")
, the given group will be initialized first, if not initialized already, and then theIAtomicReference
instance will be created on this group. ReturnedIAtomicReference
instance offers linearizability and behaves as a CP register. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Type Parameters:
E
- the type of object referred to by the reference- Parameters:
name
- name of theIAtomicReference
proxy- Returns:
IAtomicReference
proxy for the given name- Throws:
HazelcastException
- if CP Subsystem is not enabled
-
getCountDownLatch
Returns a proxy for anICountDownLatch
instance created on CP Subsystem. Hazelcast'sICountDownLatch
is a distributed version of java.util.concurrent.CountDownLatch. If no group name is given within the "name" parameter, then theICountDownLatch
instance will be created on the DEFAULT CP group. If a group name is given, like.getCountDownLatch("myLatch@group1")
, the given group will be initialized first, if not initialized already, and then theICountDownLatch
instance will be created on this group. ReturnedICountDownLatch
instance offers linearizability. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Parameters:
name
- name of theICountDownLatch
proxy- Returns:
ICountDownLatch
proxy for the given name- Throws:
HazelcastException
- if CP Subsystem is not enabled
-
getLock
Returns a proxy for anFencedLock
instance created on CP Subsystem. Hazelcast'sFencedLock
is a distributed version of java.util.concurrent.locks.Lock. If no group name is given within the "name" parameter, then theFencedLock
instance will be created on the DEFAULT CP group. If a group name is given, like.getLock("myLock@group1")
, the given group will be initialized first, if not initialized already, and then theFencedLock
instance will be created on this group. ReturnedFencedLock
instance offers linearizability. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Parameters:
name
- name of theFencedLock
proxy- Returns:
FencedLock
proxy for the given name- Throws:
HazelcastException
- if CP Subsystem is not enabled
-
getSemaphore
Returns a proxy for anISemaphore
instance created on CP Subsystem. Hazelcast'sISemaphore
is a distributed version of java.util.concurrent.Semaphore. If no group name is given within the "name" parameter, then theISemaphore
instance will be created on the DEFAULT CP group. If a group name is given, like.getSemaphore("mySemaphore@group1")
, the given group will be initialized first, if not initialized already, and then theISemaphore
instance will be created on this group. ReturnedISemaphore
instance offers linearizability. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Parameters:
name
- name of theISemaphore
proxy- Returns:
ISemaphore
proxy for the given name- Throws:
HazelcastException
- if CP Subsystem is not enabled
-
addMembershipListener
Registers a new CPMembershipListener to listen CP membership changes.- Parameters:
listener
- membership listener- Returns:
- id of the listener registration
- Since:
- 4.1
-
removeMembershipListener
Removes membership listener registration. Previously registered listener will not receive further events.- Parameters:
id
- of the registration- Returns:
- true if listener registration is removed, false otherwise
- Since:
- 4.1
-
addGroupAvailabilityListener
Registers a new CPGroupAvailabilityListener to listen CP group availability changes.- Parameters:
listener
- group availability listener- Returns:
- id of the listener registration
- Since:
- 4.1
-
removeGroupAvailabilityListener
Removes CPGroupAvailabilityListener registration.- Parameters:
id
- of the registration- Returns:
- true if listener registration is removed, false otherwise
- Since:
- 4.1
-
getMap
Returns a proxy for aCPMap
. Enterprise Only.If no group name is given within the
name
parameter, then theCPMap
instance will be created on the DEFAULT CP group. If a group name is given, like.getMap("myMap@group1")
, the given group will be initialized first, if not initialized already, and then theCPMap
instance will be created on this group. The returnedCPMap
instance offers linearizability. When a network partition occurs, proxies that exist on the minority side of its CP group lose availability.Each call of this method performs a commit to the METADATA CP group. Hence, callers should cache the returned proxy.
- Type Parameters:
K
- Key type of the mapV
- Value type of the map- Parameters:
name
- Name of the map- Returns:
- Proxy for
CPMap
- Since:
- 5.4
-
getCPGroupIds
Returns all active CP group ids.- Since:
- 5.4
-
getObjectInfos
@Beta @Nonnull Iterable<CPObjectInfo> getObjectInfos(@Nonnull CPGroupId groupId, @Nonnull String serviceName) Returns info about all objects of given type within the given group- Parameters:
groupId
- groupId for which to return the object infosserviceName
- service name for which the objects are returned- Since:
- 5.4
-
getTombstoneInfos
@Beta @Nonnull Iterable<CPObjectInfo> getTombstoneInfos(@Nonnull CPGroupId groupId, @Nonnull String serviceName) Returns info about all tombstones of given type within the given group- Parameters:
groupId
- groupId for which to return the tombstone infosserviceName
- service name for which the tombstones are returned- Since:
- 5.4
-