🗓️ 18022025 1441
📎
flink_savepoints
Consistent image of the execution state of a streaming job, created via Flink’s [checkpointing mechanism](https://nightlies
Usage
- Stop and resume
- Fork / update flink job
Components
- Directory with (large) binary files on stable storage
Net data of job's execution state image
- Meta data file (relatively small)
Primarily contains pointers to all files on stable storage that are part of the Savepoint, in form of relative paths
In order to allow upgrades between programs and Flink versions, it is important to check out the following section about assigning IDs to your operators.
Operator UID
Used to scope the state of each operator
It is highly recommended that you specify operator IDs via the uid(String)
method
Generated IDs depend on the structure of your program and are sensitive to program changes
If you do not specify the IDs manually they will be generated automatically
You can automatically restore from the savepoint as long as these IDs do not change
Operator ID | State
------------+------------------------
source-id | State of StatefulSource
mapper-id | State of StatefulMapper
You can think of a savepoint as holding a map of
Operator ID -> State
for each stateful operator
Savepoint format
Canonical Format
- :D
- Format that is unified across all state backends
- Most stable format
- Targeted at maintaining most compatibility with previous versions / schemas / modifications etc.
- Can store using one state backend and restore it using another
- D:
-
Slow
-
native format
Creates a snapshot in the format specific for the used state backend (e.g. SST files for RocksDB)
Claim mode
Determines determines who takes ownership of the files that make up a Savepoint or [externalized checkpoints](https://nightliesThe
Claim Mode
.apache.org/flink/flink-docs-release-1.20/docs/ops/state/checkpoints//#resuming-from-a-retained-checkpoint) after restoring i
- Snapshots can be owned either by a user or Flink itself
- If a snapshot is owned by a user, Flink will not delete its files
- Flink can not depend on the existence of the files from such a snapshot, as it might be deleted outside of Flink’s control
Each claim mode serves a specific purposes
Still, we (the ppl writing the docs) believe the default NO_CLAIM mode is a good tradeoff in most situations, as it provides clear ownership with a small price for the first checkpoint after the restore
NO_CLAIM (default)
- In the NO_CLAIM mode Flink will not assume ownership of the snapshot
- It will leave the files in user’s control and never delete any of the files
- In this mode you can start multiple jobs from the same snapshot
In order to make sure Flink does not depend on any of the files from that snapshot, it will force the first (successful) checkpoint to be a full checkpoint as opposed to an incremental one
Once the first full checkpoint completes, all subsequent checkpoints will be taken as usual/configured. Consequently, once a checkpoint succeeds you can manually delete the original snapshot. You can not do this earlier, because without any completed checkpoints Flink will - upon failure - try to recover from the initial snapshot.
CLAIM
- Flink claims ownership of the snapshot and essentially treats it like a checkpoint
- its controls the lifecycle
- Might delete it if it is not needed for recovery anymore
- Hence, it is not safe to manually delete the snapshot or to start two jobs from the same snapshot
- Flink keeps around a configured number of checkpoints.