What is needed to configure an integration?

In order to be able to use AWS Redshift integrations, an existing AWS Redshift instance's connection data and credentials must be provided. The required fields are:

Is there any limitation on databases and data types?

Since AWS Redshift is a hosted service, no limitation exist on the server it can connect to, provided its service is active.

Supported data types

Note: Columns of other types, more complex ones, are not supported and will silently be ignored on data synchronization.

How does the stream replication work?

Two modes are supported: "Key Based" / "Incremental", and "Full Table".

"Full Table" completely wipes out any data in our platform the stream previously downloaded, and then the whole data from the stream's source table is dumped into our platform. This mode is not recommended for big tables.

"Key Based" (also called "Incremental") requires a particular field to be selected as "key". That field is meant to be treated as a sequential or "sortable" field which increases by time (i.e. records added in the future will have never descending values in this field), and will be tracked across several extraction executions from the (same) integration's stream.

The first run, the whole table data will be downloaded into our platform, and the maximum value for that tracked key (i.e. the one from the last record retrieved) is kept for future runs. This will be time-consuming (as time-consuming as Full Table replication, but done only once) and the source servers may undergo a lot of resources consumption, so these runs should be scheduled on what the customer deems as low-activity hourly ranges (e.g. nightly).

When another extraction is run later for this incremental stream, only a subset of the data will be retrieved: Those having values, in the chosen key-field, greater than or equal to the last stored value.

As an example: Let's say the source table contains "sold product units", containing these fields: