What is needed to configure an integration?

In order to be able to use AWS Redshift integrations, an existing AWS Redshift instance's connection data and credentials must be provided. The required fields are:

Server address and port, and whether the connection should use SSL.
Username and password.
Database and schema (e.g. "public").

Is there any limitation on databases and data types?

Since AWS Redshift is a hosted service, no limitation exist on the server it can connect to, provided its service is active.

Supported data types

Note: Columns of other types, more complex ones, are not supported and will silently be ignored on data synchronization.

How does the stream replication work?

Two modes are supported: "Key Based" / "Incremental", and "Full Table".

"Full Table" completely wipes out any data in our platform the stream previously downloaded, and then the whole data from the stream's source table is dumped into our platform. This mode is not recommended for big tables.

"Key Based" (also called "Incremental") requires a particular field to be selected as "key". That field is meant to be treated as a sequential or "sortable" field which increases by time (i.e. records added in the future will have never descending values in this field), and will be tracked across several extraction executions from the (same) integration's stream.

The first run, the whole table data will be downloaded into our platform, and the maximum value for that tracked key (i.e. the one from the last record retrieved) is kept for future runs. This will be time-consuming (as time-consuming as Full Table replication, but done only once) and the source servers may undergo a lot of resources consumption, so these runs should be scheduled on what the customer deems as low-activity hourly ranges (e.g. nightly).

When another extraction is run later for this incremental stream, only a subset of the data will be retrieved: Those having values, in the chosen key-field, greater than or equal to the last stored value.

As an example: Let's say the source table contains "sold product units", containing these fields:

sold_at: The date and time the product was sold. We will assume that this field will have increasing values when added further (i.e. records in the past will be never added).
product_id: The id of the product being sold.
amount: The amount of units being sold.
unit_price: The price each product unit was sold at.