Skip to content

Latest commit

 

History

History
1331 lines (876 loc) · 48.2 KB

reference-manual.adoc

File metadata and controls

1331 lines (876 loc) · 48.2 KB

Reference Manual

Autor: Adam Leszczyński <aleszczynski@bersler.com>, version: 1.6.1, date: 2024-06-01

This document describes configuration parameters and usage of OpenLogReplicator.

Program parameters

OpenLogReplicator program is non-interactive. The only parameters accepted are:

  • -f|--file <config file> — configuration file name (default: "OpenLogReplicator.json"),

  • -p <process name> — process name (default: "OpenLogReplicator") displayed in the process list; useful when multiple instances are running,

  • -v|--version — print version and exit.

All parameters are defined in OpenLogReplicator.json config file which should be placed in the same directory. The file should be in JSON format. For start please check example config files in scripts folder. Refer to a full parameters list for more details.

All output messages are sent to stderr stream. Optionally, JSON output can be sent to stdout stream when test parameter is set to a no zero value.

The only language of the documentation and error messages and used in program is English.

Folder permissions

In some interval, the program writes a checkpoint file which contains information about the last processed transaction (sent to Kafka output).

OpenLogReplicator should have read, write and execute permissions for the checkpoint directory. It creates or deletes files like <database>-chkpt.json and <database>-chkpt-<scn>.json files. <database> is the database name defined in OpenLogReplicator.json file and <scn> is some database SCN number.

OpenLogReplicator.json file format

JSON config file main elements

JSON config file main elements

The file is in JSON format. The file should contain a single object with the following parameters:

Table 1. Global parameters
Parameter Specification Notes

source

list of source elements, mandatory

The list should contain just one source element.

target

list of target elements, mandatory

The list should contain just one target element.

version

string, max length: 256, mandatory

The value must be equal to "1.1.0".

TIP: This is a safe-checker to make sure to check the content of the JSON configuration file during program upgrade. During upgrade, always check the documentation for parameter changes and verify that the JSON configuration file is correct.

dump-path

string, max length: 256, default: "."

The location where the logdump files are created. The path can be relative to the current directory.

NOTE: This parameter is only valid when dump-redo-log parameter is set to non-zero value.

dump-raw-data

number, min: 0, max: 1, default: 0

Print hex dump of vector data for all dumped OP codes.

Possible values are:

  • 0 -– No hex dump is added to DUMP-<nnn>.trace file.

  • 1 -– Before logdump information for every vector, the full vector is dumped in HEX format – useful for analysis of the content.

NOTE: This parameter is only valid when dump-redo-log parameter is set to non-zero value.

dump-redo-log

number, min: 0, max: 2, default: 0

Create output similar to logdump command which can be compared as string to verify if certain parameters have been correctly decoded.

Possible values are:

  • 0 — No `logdump file is created.

  • 1 — For every processed redo log file, a file <database>-<nnn>.logdump is created (<database> -– database name, <nnn> — redo log sequence).

  • 2 — like 1 but additional information is printed which is not originally printed in logdump output — for example details about supplemental log groups.

CAUTION: The result doesn’t have to fully match the results of logdump.There can be some inconsistency. Not all redo OP codes are parsed and analyzed, and there is no guarantee that the results should be exactly the same.

log-level

number, min: 0, max: 4, default: 2

Messages verbose level.

All messages are sent to stderr output stream.

Possible values are:

  • 0 — Silent — don’t print anything.

  • 1 — Error — print only error messages.

  • 2 — Warning — print error and warning messages (default setting).

  • 3 — Info — print error, warning and info messages.

  • 4 — Debug — print all messages.

trace

number, min: 0, max: 524287, default: 0

Print debug information.

The value is a sum of various trace parameters, please refer to source code for details.

CAUTION: The codes can change without prior notice.

Table 2. Source element
Parameter Specification Notes

alias

string, max length: 256, mandatory

The name of the source -– referenced later in a target element.

TIP: This is just a logical name used in the config file. It doesn’t have to match the actual database SID.

format

element of format, mandatory

Configuration of output data.

name

string, max length: 256, mandatory

This name is used for identifying database connection. This name is mentioned in the output and in the checkpoint files.

WARNING: After starting replication, the value shouldn’t change, otherwise the checkpoint files would not be properly read.

TIP: This is just a logical name used in the config file. It doesn’t have to match the actual database SID.

reader

element of reader, mandatory

Configuration of redo log reader.

arch

string, max length: 256, default is online for an online type; path for offline type; list for batch type

Way of getting an archive redo log file list.

Possible values are:

  • online -– Archived log list is read directly from the database using database connection. The database connection is closed during program work, open occasionally to read an archived redo log list.

  • online-keep -– Like online, but the database connection is kept open.

  • path -– Archived redo log file list is read from disk.

  • list — Like path but the list of files is provided by user. This is the only mode used for batch type.

TIP: This parameter is only valid for online reader type.

arch-read-sleep-us

number, default: 10000000

Time to sleep between two attempts to read an archived redo log list.

Number in microseconds.

arch-read-tries

number, max: 1000000000, default: 10

Number of retries to read an archived redo log list before failing.

debug

element of debug

Group of options used for debugging.

filter

element of filter

Group of options used to filter the contents of the database and define which tables are replicated.

CAUTION: The filter is applied only to the data, not to the DDL operations.

IMPORTANT: During the first run, the schema is read only for tables which are selected by the filter. If the filter is changed, the schema would not update. Startup would fail because the set of users present in checkpoint files would not match the set of users defined in config file. The schema would update only when the program is reset, (i.e., the checkpoint files are removed and forced recreation).

metrics

element of metrics

Group of options used for collecting metrics of OpenLogReplicator.

flags

number, min: 0, max: 524287, default: 0

A sum of various flags. Flags define various options for the program.

Possible values are:

  • 0x0001 — Read-only archived redo logs. Online redo log files aren’t read at all.

CAUTION: This option would cause a delay of data replication. When the redo log files are big or the operation of switching redo log groups is done, infrequent delay can occur. Transactions would not be read until the redo log group is switched.

  • 0x0002 — Schemaless mode. The program can operate without a schema.

NOTE: Refer to details in the User Manual for details.

  • 0x0004 -– Adaptive schema mode. This mode is only valid when schemaless mode has been chosen.

NOTE: Refer to details in the User Manual for details.

  • 0x0008 — Don’t use direct read (O_DIRECT) for reading redo log files.

TIP: Direct IO bypasses the disk caching mechanism. Using this option is not recommended and should be used only in special cases.

  • 0x0010 -– Ignore basic errors and continue redo log processing.

CAUTION: This option is not recommended. It is useful only for debugging. For most cases when the program fails, it is better to stop the program and fix the problem. The program is not designed to continue after error as this can lead to schema data inconsistency and nondeterministic data can be sent to output.

  • 0x0020 — Show text of DDL commands in output.

  • 0x0040 — Show invisible (hidden) columns in output.

  • 0x0080 -– Show guard columns in output.

  • 0x0100 — Show nested columns in output.

  • 0x0200 — Show unused columns in output.

  • 0x0400 — Include incomplete transactions in output.

TIP: Incomplete transactions are transactions that have started before replication was set up. Some starting elements of such transactions may be missing in the output. By default, such transactions are ignored.

  • 0x0800 — Include system transactions in output.

  • 0x1000 — Show checkpoint information in output.

TIP: The checkpoint records are useful to monitor the progress of replication. They’re also used to detect the last processed transaction. If the checkpoint records are hidden and there is low activity of data changes, it may be challenging to detect OpenLogReplicator failure.

  • 0x2000 — Don’t delete old checkpoint files.

TIP: The number of checkpoint files left is defined by parameter keep-checkpoints. This flag overrides this number and leaves the checkpoint file.

  • 0x4000 — Reserved for future use.

  • 0x8000 — Send column data to output in raw (hex) format.

  • 0x10000 — Decode binary XMLType data (experimental). Refer to details in binary xmltype chapter for details.

  • 0x20000 — Pass JSON data values to output in binary format (experimental).

  • 0x40000 — Support UPDATE operations for NOT NULL columns with occasional NULL values (experimental).

memory

element of memory

Configuration of memory settings.

redo-read-sleep-us

number, min: 0, default: 50000

The amount of time the program would sleep when all data from online redo log is and the program is waiting for more transactions.

Number in microseconds.

IMPORTANT: The default setting is 50.000 microseconds meaning which is equal to 1/20 s or 50 ms. This means that 20 times a second OpenLogReplicator polls disk for new changes on disk (until there is no activity — after new data appears, it is read sequentially to the end). With default setting, in the worst case, the read process would notice after 50 ms that new data is ready. This is actually rapid and a proper setting for most cases. If this delay is potentially too big — the value can be decreased, but this would increase CPU usage.

redo-verify-delay-us

number, min: 0, default: 0

When this parameter is set to non-zero value, the redo log file data is read second time for verification after defined delay. Double read mode applies only to online redo log files.

Number in microseconds.

IMPORTANT: Some filesystems (like ext_4 or btrfs) can share disk read cache between multiple processes. This can cause problems when the redo log files are read by multiple processes. This can cause read inconsistencies when the database process is writing to the same memory buffer as the OpenLogReplicator process is reading. The checksum for disk blocks is just two bytes, so it is impossible to detect if the data is corrupted or not. The only way to detect this is to read the data again and compare the data. This parameter defines time delay after which the redo log file data is read second time for verification.

CAUTION: Instead of double read, it is recommended to use Direct IO disk operations instead. This option disables disk read cache and guarantees that the data is read from disk. Use this option just as a workaround in case when Direct IO is not possible.

refresh-interval-us

number, min: 0, default: 10000000

During online redo log reading, a new redo log group could be created, and the program would need to refresh the list of redo log groups. In case there is a situation when old redo log file has been completely processed, but still no new group is created, the program would need to refresh the list of redo log groups.

Number in microseconds.

Table 3. Reader element
Parameter Specification Notes

max-mb

number, min: 16, default: 1024

The maximum amount of memory the program can allocate.

Number in megabytes.

IMPORTANT: This number doesn’t include memory allocated for sending big JSON messages to Kafka – this memory is not included here and is allocated on demand separately. It does also not include memory used for LOB processing.

min-mb

number, min: 16, max: max-mb, default: 32

Amount of memory allocated at startup and desired amount of allocated memory during work. If memory is dynamically allocated in greater amount, it will be released as soon as it is not required any more. See notes for max-mb about memory for Kafka buffer.

Number in megabytes.

read-buffer-max-mb

number, min: 1, max: max-mb, default: min(max-mb / 4, 32)

Size of memory buffer used for disk read.

Number in megabytes.

IMPORTANT: Greater buffer size increases performance, but also increases memory usage. Disk buffer memory is part of the main memory (controlled by max-mb and min-mb). It is important to not allocate too much memory for disk buffer, otherwise the program would not be able to allocate memory for other purposes. This memory is never swapped to disk, and it may happen that OpenLogReplicator would suffer when there is not enough memory for other purposes.

Table 4. Reader element
Parameter Specification Notes

type

string, max length: 256, default

Possible values are:

  • online -– Primary mode to read online and archived redo logs and connect to a database for reading metadata. When the connection to the database is lost, the program will try to reconnect.

Example config file: OpenLogReplicator.json.example.

  • offline -– Like online, but metadata is only read from previously created checkpoint file; no connection to the database is required.

Example config file: OpenLogReplicator.json.example-offline.

  • batch -– Process only redo log files provided as a list and then stop.

Example config file: OpenLogReplicator.json.example-batch.

con-id

signed number, min: -32768, max: 32767, default: -1

Define container ID for the database. This is used for multi-tenant databases.

TIP: `-1' is the default value and means that the database is single-tenant.

db-timezone

string, default: database DBTIMEZONE value

Overwrites database DBTIMEZONE value.

Timezone should be in format +xx:yy or -xx:yy.

The time zone is used only as base timezone for values for TIMESTAMP WITH LOCAL TZ type.

disable-checks

number, min: 0, max: 15, default: 0

A sum of numbers:

  • 0x0001 — During startup, don’t check if the database user has appropriate grants to system tables.

  • 0x0002 — During startup, don’t check if listed tables contain supplemental logging for primary keys.

  • 0x0004 — Disable CRC check for read blocks.

NOTE: This field is valid only for online type.

IMPORTANT: This might increase performance a bit, but it is not recommended to use this option.

  • 0x0008 — Don’t check if JSON checkpoint and schema files and OpenLogReplicator.json configuration file contain invalid JSON tags.

NOTE: For performance reasons, user might disable those checks. They are recommended to be enabled in production environment, especially when during program upgrades, the field names change. Referring to old invalid field names might cause the program to fail.

host-timezone

string, default: time zone of OpenLogReplicator host

Time zone used by the host where the database is running.

Timezone should be in format +xx:yy or -xx:yy.

If OpenLogReplicator is running on a host with a different time zone, adjust this parameter to the proper value.

log-archive-format

string, max length: 4000

Format of expected archived redo log files. This parameter defines how to parse the redo log file name to read the sequence number.

When FRA is configured the format of files is expected to be o1_mf_%t_%s_%h_.arc. When FRA is not used the value use for this parameter is read from database configuration parameter log_archive_format.

log-timezone

string, default: time zone of OpenLogReplicator host

Time zone used for logging messagees.

Timezone should be in format +xx:yy or -xx:yy.

By default, log messages are printed in the local time zone of the host where OpenLogReplicator is being run. To print messages with log in the UTC timezone, set the value to '+00:00'. Used log timezone is printed on startup.

IMPORTANT: The value of this parameter can be configured by setting the environment variable OLR_LOG_TIMEZONE.

password

string, max length: 128

Password for connecting to database instance.

NOTE: This field is valid only for online type.

CAUTION: The password is stored in unencrypted string in the configuration file.

path-mapping

list of string pairs, max length: 2048

List of pairs of files [before1, after1, before2, after2, …]. Every path (of online and archived redo log) is compared with the list. If a prefix of the path matches with beforeX it is replaced with afterX.

NOTE: This field is valid only for online and offline types.

TIP: The parameter is useful when OpenLogReplicator operates on a different host than the database server is running and the paths differ. For example, the path may be: /opt/fra/o1_mf_1_1991_hkb9y64l_.arc, but file is mounted using sshfs under a different path so having "path-mapping": ["/db/fra", "/opt/fast-recovery-area"], the program would look for /opt/fast-recovery-area/o1_mf_1_1991_hkb9y64l_.arc instead.

redo-copy-path

string, max length: 2048

Debugging parameter which allows to copy all contents of processed redo log files to defined folder.

TIP: This parameter is useful for diagnosing disk-read related problems. When consistency errors are detected, the redo log file is copied to the defined folder. The file name is in format: path/<database>_<seq>.arc. Having a copy of read redo log file allows easier post-mortem analysis, since the file contains exactly the same data as those which were processed.

redo-log

list of string, max length: 2048

List of redo logs files which should be processed in batch mode. Elements could be files but also folders. In the second case, all files in this folder would be processed.

NOTE: This field is valid only for batch type.

Example config file: OpenLogReplicator.json.example-batch.

server

string, max length: 4096

Connect string for connecting to the database instance. Format should be in form like: //<host>:<port>/<service>.

NOTE: This field is valid only for online type.

start-scn

number, min: 0

The first SCN number to be processed. If not specified, the program will start from the current SCN.

CAUTION: Setting a very low value of starting SCN might cause problems during program startup if the schema has changed since this SCN and the schema is not available to read using database flashback. In such a case, the program will not be able to read the metadata and will stop.

IMPORTANT: Setting this parameter to some value would mean that transactions started before this SCN would not be processed.

start-seq

number, min: 0

First sequence number to be processed.

IMPORTANT: If not specified, the first sequence would be determined by reading SCN boundaries assigned to particular redo log files and matched to starting SCN.

start-time-rel

number, min: 0

Determine starting SCN by relative time. The value and is relative to the current time using TIMESTAMP_TO_SCN sql function. For example, if the value is set to 3600, the program will start from the SCN, which was active 1 hour ago.

Number in seconds.

NOTE: This field is valid only for online type.

CAUTION: It is invalid to use this parameter when start-scn is specified.

start-time

string, max length: 256

Determine a starting SCN value by absolute time. The value is in format YYYY-MM-DD HH24:MI:SS and is converted to SCN using TIMESTAMP_TO_SCN sql function. For example, if the value is set to 2018-01-01 00:00:00, the program will start from the SCN, which was active at the beginning of 2018.

NOTE: This field is valid only for online type.

CAUTION: It is invalid to use this parameter when start-scn or start-time-rel is specified.

state

element of state

Configuration of state settings to store checkpoint information.

user

string, max length: 128

Database user for connecting to database instance.

NOTE: This field is valid only for online type.

transaction-max-mb

number, min: 0, default: 0

An upper limit for transaction size. If the transaction size is greater than this value, the transaction is split into multiple transactions.

Number in megabytes.

CAUTION: The intention of this parameter is for debugging purposes only. It is not recommended to use it in production environment. The transaction splitting is intended to limit memory usage and assumes that the transaction is committed while splitting is performed. If the transaction is not committed, the first part of the transaction would be sent to output anyway. If the transaction contains a large number of partially rolled back DML operations, they might appear in output in spite of the rollback.

Table 5. State element
Parameter Specification Notes

interval-mb

number, min: 0, default: 500

Threshold of processed redo log data after which checkpoint file is created.

Number in megabytes.

interval-s

number, min: 0, default: 600

Threshold of processed redo log data time after which checkpoint file is created.

Number in seconds.

IMPORTANT: The time refers not to processing time by OpenLogReplicator but to time of the redo log data. For example, the default setting of 600 seconds means that if the last checkpoint was created after processing redo log data created at 10:40 when the processing reaches data created at 10:50 new checkpoint file is created.

keep-checkpoints

number, min: 0, default: 100

Number of checkpoint files which should be kept. The oldest checkpoint files are deleted.

TIP: Value 0 disables checkpoint files deletion.

TIP: Keeping a larger number of checkpoint files allows adjusting starting SCN more precisely. It provides more security in case of filesystem corruption and the last checkpoint file not being available.

CAUTION: The number of checkpoint files may be actually larger than this parameter (exactly up to keep-checkpoints + schema-force-interval). Checkpoint file might be deleted only if it is not referred in some consecutive checkpoint files (that don’t contain schema data).

path

string, max length: 2048, default: "checkpoint"

The path to store checkpoint files.

NOTE: This field is valid only for disk type.

IMPORTANT: The path should be accessible for writing by the user which runs the program.

schema-force-interval

_number_m min: 0, default: 20

To increase operating speed, not all checkpoint files would contain the full schema of the database. In case the schema didn’t change, it is not necessary to repeat the schema in every checkpoint file. The value determines the consecutive number of checkpoint files which may not contain the full schema.

TIP: The value of 0 means that the schema is always included in the checkpoint file.

type

string, max length: 256, default: "disk"

Only disk is supported.

Table 6. Debug element
Parameter Specification Notes

stop-log-switches

number, min: 0, default: 0

For debug purposes only. Stop program after specified number of log switches.

stop-checkpoints

number, min: 0, default: 0

For debug purposes only. Stop program after specified number of LWN checkpoints.

stop-transactions

number, min: 0, default: 0

For debug purposes only. Stop program after specified number of transactions.

owner

string, max length: 128

Owner of the debug table.

table

string, max length: 128

This is a technical parameter primary used only for running test cases and defines table name. If any DML transactions occur for this table (like insert, update or delete), the program would stop. The transaction doesn’t necessary need to be committed.

Table 7. Format element
Parameter Specification Notes

type

string, max length: 256, required

Possible values are:

  • json — Transactions in JSON OpenLogReplicator format.

  • protobuf — Transactions in Protocol Buffer format.

Refer to details in output format chapter for details.

CAUTION: Protocol buffer support is in experimental state. It is not fully tested and might not work properly. Don’t use it for production without testing.

attributes

number, min: 0, max: 7, default: 0

Transaction attributes location.

Field value is a sum of:

  • 0 — add attributes to the begin message of the transaction.

  • 1 — add attributes to every DML message of the transaction.

  • 2 — add attributes to the commit message of the transaction.

char

number, min: 0, max: 3, default: 0

Format for (n)char, (n)varchar(2) and clob column types.

By default, the value is written in Unicode format, using UTF-8 to code characters.

Field value is a sum of:

  • 0x0001 — No character set transformation is applied, the characters are copied from source "as is".

  • 0x0002 — Instead of characters, the output is in HEX format (using hex format — for example, "column":"4b4c204d").

column

numeric, min: 0, max: 2, default: 0

Column duplicate specification.

  • 0 — Default behavior, INSERT and DELETE contain only non-null values. UPDATE contains only changed columns or those which are member of the primary key.

TIP: This is the format that takes less space. There is an assumption that if the column doesn’t appear in the INSERT of DELETE statement, it means that the value is NULL.

CAUTION: For LOB columns the before value is not available in the REDO stream. Therefore, the column is not included in the output. Only after value is included.

  • 1 — INSERT and DELETE contain all values. UPDATE contains only changed columns or those which are member of a primary key.

  • 2 — JSON output would contain all columns that appear in REDO stream, including those which didn’t change.

CAUTION: It is technically not possible to differentiate if the column was actually mentioned by UPDATE DML command or not. UPDATE X SET A = A might have the same redo log vector as UPDATE X SET A = A, B = B — in some cases (especially for tables with large schema). The receiver of the output stream shouldn’t make any assumption that the user included a column in the UPDATE operation if it appeared in the output stream and has the same before and after image.

db

number, min: 0, max: 3, default: 0

Present database name in payload.

Value is a sum of:

  • 0x0000 — Database name is not present.

  • 0x0001 -– Database name is present in db field in every DML message.

  • 0x0002 -– Database name is present in db field in every DDL message.

flush-buffer

numeric, min: 0, default: 1048576

Number of bytes after which the output buffer is flushed.

When set to 0 then the buffer is flushed immediately as a new message arrives.

interval-dts

number, min: 0, max: 10, default: 0

INTERVAL DAY TO SECONDS field format.

Possible values are:

  • 0 — Value in nanoseconds — "val": 123456000000000.

  • 1 — Value in microseconds (possible data precision loss) — "val": 123456000000.

  • 2 — Value in milliseconds (possible data precision loss) — "val": 123456000.

  • 3 — Value in seconds (possible data precision loss) — "val": 123456.

  • 4 — Value in nanoseconds stored as a string — "val": "123456000000000".

  • 5 — Value in microseconds stored as a string (possible data precision loss) — "val": "123456000000".

  • 6 — Value in milliseconds stored as a string (possible data precision loss) — "val": "123456000".

  • 7 — Value in seconds stored as a string (possible data precision loss) — "val": "123456".

  • 8 — Value stored in part of ISO-8601 format stored as a string — "val": "01 06:00:00.123456789".

  • 9 — Value stored in part of ISO-8601 format stored as a string using "," as a separator between the number of days and time — "val": "01,06:00:00.123456789".

  • 10 — Value stored in part of ISO-8601 format stored as a string using "-" as a separator between the number of days and time — "val": "01-06:00:00.123456789".

interval-ytm

number, min: 0, max: 4, default: 0

INTERVAL YEAR TO MONTH field format.

Possible values are: * 0 — Value in months — "val": 20 (1 year, 8 months).

  • 1 — Value in months as a string — "val": "20".

  • 2 — Value in string format, number of years and months separated by " " — "val": "1 8".

  • 3 — Value in string format, number of years and months separated by "," — "val": "1,8".

  • 4 — Value in string format, number of years and months separated by "-" — "val": "1-8".

message

number, min: 0, max: 31, default: 0

Message format specification.

Value is a sum of:

  • 0x0001 -– One message for the whole transaction.

TIP: By default, the transaction is split to many messages: begin, DML, DML, …​, commit. Using this flag would cause to combine all messages into one. For performance reasons, this is not recommended when using Kafka when transactions could be in hundreds of megabytes in size.

  • 0x0002 -– Add num field to every message. The field would contain a sequence number of the message in the transaction.

For JSON only target, the following additional flags are available:

  • 0x0004 — Skip begin message (when using flag 0x0001).

  • 0x0008 — Skip commit message (when using flag 0x0001).

  • 0x0010 — Add information about data offset (for debugging purpopses).

rid

number, min: 0, max: 1, default: 0

Add rid field for every row in output with the Row ID.

Possible values are:

  • 0 — Don’t add rid field (default).

  • 1 — Add rid field for every row in output with the Row ID.

schema

number, min: 0, max: 7, default: 0

Schema format sent to output.

By default, the schema is not sent to output.

Example output: {"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","obj":0},"after":{"A":100,"B":999,"C":10.22,"D":"xx2","E":"yyy","F":1564662896000}}]}

The field is a sum of values:

  • 0x0001 — Print full schema (including column descriptions), but just with the first message for every table.

TIP: This optimization is based on the fact that it is meaningless to attach the same schema definition every time if it didn’t change. It is assumed that the client would cache the schema and would not request it again. If the schema changes, the first message where new schema is used would contain the full schema.

Example output: {"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","columns":[{"name":"A","type":"number","precision":-1,"scale":0,"nullable":1},{"name":"B","type":"number","precision":10,"scale":0,"nullable":1},{"name":"C","type":"number","precision":10,"scale":2,"nullable":1},{"name":"D","type":"char","length":10,"nullable":1},{"name":"E","type":"varchar2","length":10,"nullable":1},{"name":"F","type":"timestamp","length":11,"nullable":1},{"name":"G","type":"date","nullable":1}]},"after":{"A":100,"B":999,"C":10.22,"D":"xx2 ","E":"yyy","F":1564662896000}}]} {"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","after":{"A":100,"B":999,"C":10.22,"D":"xx3 ","E":"yyy","F":1564662896000}}]}

  • 0x0002 — Add full schema definition (including column descriptions) to every message.

TIP: Remember to use flag 0x0001 together with flag 0x0002. The flag 0x0002 alone has no effect.

  • 0x0004 — Add objn field to schema description which contains database object ID.

Example output: {"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2"},"after":{"A":100,"B":999,"C":10.22,"D":"xx2 ","E":"yyy","F":1564662896000}}]}

scn

number, min: 0, max: 3, default: 0

SCN field format.

By default, every DML operation contains scn field with SCN value which is derived from the redo vector which contains DML data.

Possible values are:

  • 0 — SCN is stored as a decimal number in scn field.

  • 1 -– SCN values are stored in a text format in hexadecimal format (in "C" format – like 0xFF) in scns field.

  • 2 — SCN values for all DML operations are copied from commit SCN record.

scn-all

number, min: 0, max: 1, default: 0

Include scn field in every payload.

Possible values are:

  • 0 — Put scn field only in the first message.

  • 1 — Put scn field in every message.

timestamp

number, min: 0, max: 15, default: 0

Format of timestamp values.

In the following description, the following timestamp is used as an example: "2022-05-01 06:00:00.123456789". Possible values are:

  • 0 — Unix with nanoseconds — "tm": 1651384800123456789.

  • 1 — Unix with a precision to the microsecond (possible data precision loss) — "tm": 1651384800123457.

  • 2 — Unix with precision to the millisecond (possible data precision loss) — "tm": 1651384800123.

  • 3 — Unix with precision to the second (possible data precision loss) — "tm": 1651384800.

  • 4 — Unix with nanoseconds precision stored as a string — "tms": "1651384800123456789".

  • 5 — Unix with microsecond precision stored as a string (possible data precision loss) — "tms": "1651384800123457".

  • 6 — Unix with millisecond precision stored as a string (possible data precision loss) — "tms": "1651384800123".

  • 7 — Unix with second precision stored as a string (possible data precision loss) — "tms": "1651384800".

  • 8 — ISO-8601 format stored with nanosecond precision — "tms": "2022-05-01T06:00:00.123456789Z".

  • 9 — ISO-8601 format stored with microsecond precision as a string — "tms": "2022-05-01T06:00:00.123456Z".

  • 10 — ISO-8601 format stored with millisecond precision as a string — "tms": "2022-05-01T06:00:00.123Z".

  • 11 — ISO-8601 format stored second precission as a string — "tms": "2022-05-01T06:00:00Z".

  • 12 — ISO-8601 format stored with nanosecond precision as a string without "TZ" — "tms": "2022-05-01 06:00:00.123456789".

  • 13 — ISO-8601 format stored with microsecond precision as a string without "TZ" — "tms": "2022-05-01 06:00:00.123456".

  • 14 — ISO-8601 format stored with millisecond precission as a string without "TZ" — "tms": "2022-05-01 06:00:00.123".

  • 15 — ISO-8601 format stored second precission as a string without "TZ" — "tms": "2022-05-01 06:00:00".

NOTE: This format is also used for type timestamp with local time zone since this type internally does not contain time zone data.

timestamp-tz

number, min: 0, max: 4, default: 0

Format of timestamp with time zone values.

In the following description, the following timestamp with time zone is used as an example: "2022-05-01 06:00:00.123456789 Europe/Warsaw".

Possible values are:

  • 0 — Unix with nanoseconds stored as a string with time zone after comma sign — "tms": "1651384800123456789,Europe/Warsaw".

  • 1 — Unix with microsecond precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800123457,Europe/Warsaw".

  • 2 — Unix with millisecond precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800123,Europe/Warsaw".

  • 3 — Unix with second precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800,Europe/Warsaw".

  • 4 — ISO-8601 format stored with nanosecond precision with time zone after space sign — "tms": "2022-05-01T06:00:00.123456789Z Europe/Warsaw".

  • 5 — ISO-8601 format stored with microsecond precision as a string with time zone after space sign-- "tms": "2022-05-01T06:00:00.123456Z Europe/Warsaw".

  • 6 — ISO-8601 format stored with millisecond precision as a string with time zone after space sign-- "tms": "2022-05-01T06:00:00.123Z Europe/Warsaw".

  • 7 — ISO-8601 format stored second precission as a string with time zone after space sign — "tms": "2022-05-01T06:00:00Z Europe/Warsaw".

  • 8 — ISO-8601 format stored with nanosecond precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123456789 Europe/Warsaw".

  • 9 — ISO-8601 format stored with microsecond precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123456 Europe/Warsaw".

  • 10 — ISO-8601 format stored with millisecond precission as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123 Europe/Warsaw".

  • 11 — ISO-8601 format stored second precission as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00 Europe/Warsaw".

timestamp-all

number, min: 0, max: 1, default: 0

Include timestamp field in every payload.

Possible values are:

  • 0 — Put timestamp field only in the first message.

  • 1 — Put timestamp field in every message.

unknown

number, min: 0, max: 1, default: 0

Unknown value reporting. For unknown values ‘?’ is sent to output.

Possible values are:

  • 0 — Silently ignore unknown values.

  • 1 — Output to stderr information about decoding mismatch.

xid

number, min: 0, max: 2, default: 0

Format of the Transaction ID (XID).

Possible values are:

  • 0 — classic hex format (like: "xid":"0x0002.012.00004162").

  • 1 — decimal format (like: "xid":"2.18.16738").

  • 2 — a single 64-bit number format (like: "xidn":563027262849378).

Table 8. Filter element
Parameter Specification Notes

table

list of a table element

List of table regex rules which should be tracked in the redo log stream and sent to output.

A table that matches at least one of the rules is tracked, thus the rules can overlap.

Example: "table": {{"table": "owner1.table1"}, {"table": "owner2.table2", "key": "col1, col2, col3"}, {"table":"sys.%"}}.

skip-xid

list of string elements, max length: 32

List of transaction IDs which should be skipped. The format if XID should be one of: UUUUSSSSQQQQQQQQ, UUUU.SSS.QQQQQQQQ, UUUU.SSSS.QQQQQQQQ, 0xUUUU.SSS.QQQQQQQQ, 0xUUUU.SSSS.QQQQQQQQ.

Example: "skip-xid": ["0x0002.012.00004162"]

dump-xid

list of string elements, max length: 32

Debug option to dump to stderr internals about certain XID. The format is the same as for skip-xid.

Table 9. Metrics element
Parameter Specification Notes

type

string, max length: 128, mandatory

Name of the metrics module. Currently only prometheus is supported.

bind

string, max length: 128, mandatory for prometheus

Network address used to bind the metrics module for Prometheus. The format is <host>:<port>. Prometheus uses this address to connect to OpenLogReplicator.

Example: "bind": "127.0.0.1:8080"

tag-names

string, max length: 128

Define tags for dml_op metrics.

Possible values are:

  • all — Provide schema and table tags for every metrics. This equals to filter + sys options.

  • filter — Provide schema and table tags only for metrics for tables which are defined in filter section, thus are replicated.

  • none — Default, don’t provide schema or table tags.

  • sys — Provide schema and table tags just for system tables which are tracked for OpenLogReplicator to work properly.

Table 10. Table element
Parameter Specification Notes

owner

string, max length: 128, mandatory

Regex pattern for matching owner name. The pattern is case-sensitive.

table

string, max length: 128, mandatory

Regex pattern for matching table name. The pattern is case-sensitive.

key

string, max length: 4096

A string field with a list of columns which should be used as a primary key. The columns are separated by comma. The column names are case-sensitive.

TIP: If a table doesn’t contain a primary key, a custom set of columns can be treated as a primary key.

condition

string, max length: 16384

An expression which should be evaluated for every row. The format of the field is C-like.

Example: "condition": "([op] != 'd') || ([login username] != 'USER1')"

The expression is evaluated from left to right. The following tokens can be used:

  • || — logical OR,

  • ! — logical NOT,

  • && — logical AND,

  • () — parentheses to define the order of evaluation,

  • == — equal,

  • != — not equal.

The expression can contain the following tokens, which has name derived from the attribute list of the transaction:

  • [audit sessionid]

  • [client id]

  • [client info]

  • [current username]

  • [login username] — the username which performed the operation;

  • [machine name]

  • [op] — type of operation: c - create (insert), u - update, d - delete, ddl - DDL operation;

  • [OS process name]

  • [OS process id]

  • [OS terminal]

  • [serial number]

  • [session number]

  • [transaction name] — the name of the transaction;

  • [version]

Table 11. Target element
Parameter Specification Notes

alias

string, max length: 256, mandatory

A logical name of the target used in JSON file for referencing.

source

string, max length: 256, mandatory

A logical name of the source which this target should be connected with.

writer

element of a writer, mandatory

Configuration of output processor.

Table 12. Writer element
Parameter Specification Notes

topic

string, max length: 256, mandatory

Name of a Kafka topic used to send transactions as JSON messages.

NOTE: This field is valid only for kafka type.

type

string, max length: 256, mandatory

Possible values are:

  • discard — No-op writer.

Perform all actions like parsing redo log, producing messages, but messages are discarded and not sent to any target.

TIP: This target is useful for testing purposes, to verify if redo log file parsing works correctly. This writer does not accept any parameters.

  • file — Write output messages directly to a file.

  • kafka — Connect directly to a Kafka message system and send transactions.

  • network — Stream using plain TCP/IP transmission.

This mode assumes that OpenLogReplicator acts as a server. A client connects to the server and receives the messages. If the client disconnects, the server will wait for a new client to connect and buffer transactions while no client connection is present.

  • zeromq — Stream using ZeroMQ messaging.

TIP: Technically this is the same as network but instead of using plain TCP/IP connection it uses ZeroMQ messaging.

uri

string, max length: 256, mandatory

For network writer type: <host>:<port> — information for network listener.

For zeromq writer type: <protocol>://<host>:<port> -– URI for ZeroMQ connection.

NOTE: This field is valid only for network and zeromq types.

append

number, min: 0, max: 1, default: 1

If define output file for transaction exists, append to it. If not, create a new file.

NOTE: This field is valid only for file type.

CAUTION: Parameter output can’t be used together with append.

max-message-mb

number, min: 1, max: 953, default: 100

Maximum size of a message sent to Kafka.

Number in megabytes.

CAUTION: Memory for this buffer is allocated independently of memory defined as min-mb/max-mb when a big message to Kafka is being constructed. If the transaction is close to this value, it would be divided in many parts. Every time such a situation occurs, a warning is printed to the log.

NOTE: This field is valid only for kafka type.

max-file-size

number, min: 0, default: 0

Maximum file size for output file. The size can be defined only when output parameter is set and is using %i or %t placeholder.

NOTE: This field is valid only for file type.

new-line

number, min: 0, max: 2, default: 0

Put a new line after each transaction.

Possible values are:

  • 0 — no new line.

  • 1 — new line after each transaction in Unix format (\n).

  • 2 — new line after each transaction in Windows format (\r\n).

NOTE: This field is valid only for file type.

output

string, max length: 256

Format of output file. The format is the same as for strftime function.

The following placeholders are supported:

  • %i — autogenerated sequence id, starting from 0.

  • %t — date and time in format defined by timestamp-format parameter.

  • %s — database sequence number.

NOTE: There should be only one placeholder in the format. When using %i or %t format max-file-size parameter must be set to value greater than 0.

NOTE: This field is valid only for file type.

poll-interval-us

number, min: 100, max: 3600000000, default: 100000

Interval for polling for new messages.

Number in microseconds.

TIP: This parameter defines how often the client library checks for new messages. The smaller the value, the more often the client library checks for new messages. The larger the value, the more messages are buffered in the client library.

NOTE: This field is valid only for kafka, network and zeromq types.

properties

map of string to string

Additional properties for Kafka producer. Refer to librdkafka documentation for full list of parameters. Typically used parameters are:

  • "brokers": "host1:9092, host2:9092" — list of Kafka brokers;

  • "compression.codec": "snappy" — compression codec;

  • "message.send.max.retries": "3" — number of retries for sending a message;

  • "retry.backoff.ms": "500" — delay between retries;

  • "queue.buffering.max.ms": "1000" — maximum time in milliseconds to buffer messages in memory;

  • "enable.idempotence": "true" — enable idempotence for producer;

This field allows also setting customer Kafka security related parameters like authentication, encryption, etc.

CAUTION: You should not set the message.max.bytes parameter as maximum message size is defined by the max-message-mb parameter.

NOTE: This field is valid only for kafka type.

queue-size

number, min: 1, max: 1000000, default: 65536

Size of message queue.

TIP: This parameter defines how many messages can be sent to output. If the message offers a level of parallelism, messages can be sent in parallel. If the message transport doesn’t offer a level of parallelism, messages are sent one by one. The larger the value, the more messages can be sent in parallel.

timestamp-format

string, max length: 256, default: "%F_%T"

Format of timestamp (defined using placeholder %t in field output) in output file name. The format is the same as for strftime function in C. Refer to the documentation of your C library for more information.

NOTE: This field is valid only for file type.