Overview of the Greenplum-Informatica Connector
Pivotal Greenplum Database is a massively parallel processing database server specially designed to manage large scale analytic data warehouses and business intelligence workloads. Informatica PowerCenter is a high-speed platform for integrating Enterprise data.
The Pivotal Greenplum-Informatica Connector provides high speed data transfer from Informatica PowerCenter to a Pivotal Greenplum Database cluster to support batch and continuous (streaming) ETL.
The Pivotal Greenplum-Informatica Connector architecture consists of the connector itself, which runs on an Informatica PowerCenter node and on Informatica client machines, and the Greenplum Stream Server (GPSS) service, which runs in the Pivotal Greenplum Database cluster. The GPSS service can run anywhere in the Greenplum Database cluster, and interacts with Greenplum Database master and segment hosts as necessary to transfer data from Informatica.
Figure: Greenplum-Informatica Connector Architecture
A typical sequence of events for performing an ETL task using the connector involves:
An Informatica user accesses the PowerCenter server with client tools and initiates one or more ETL load requests with the Greenplum-Informatica Connector.
The connector uses the gRPC protocol to transmit the load requests to the GPSS service running in the Pivotal Greenplum Database cluster.
The GPSS service submits each load request transaction to the Greenplum Database cluster master instance, and creates the external tables needed to store data. Each load request can configure session properties to customize the services that GPSS provides in the Greenplum Database cluster.
The GPSS service transfers the requested data from the PowerCenter node into segments of the Greenplum Database cluster.