LATEST VERSION: 1.0 - RELEASE NOTES
Pivotal Greenplum®-Informatica Connector v1.0

Installing and Configuring GPSS in Greenplum Database

The Greenplum Streaming Service (GPSS) manages communication and data transfer between the Pivotal Greenplum-Informatica Connector and Greenplum Database. In addition to installing the Pivotal Greenplum-Informatica Connector components in your Informatica cluster, you must install, configure and start GPSS in the Greenplum Database cluster.

Prerequisites

The Greenplum Streaming Server is automatically installed with Greenplum Database 5.6 or later. Install and start a compatible Greenplum Database version before you continue with the procedure.

Process

Follow these steps to install GPSS support in Greenplum Database:

Step 1: Register the GPSS Extension

You must register the Greenplum Streaming Server (GPSS) extension in each database in which you will use the Greenplum-Informatica Connector to write data to Greenplum tables.

Perform the following procedure to register the GPSS extension:

  1. Open a new terminal window, log in to the Greenplum Database master host as the gpadmin administrative user, and set up the Greenplum environment. For example:

    $ ssh gpadmin@gpmaster
    gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
    
  2. Start the psql subsystem, connecting to a database in which you want to register the GPSS extension. For example:

    gpadmin@gpmaster$ psql -d testdb
    
  3. Enter the following command to register the extension:

    testdb=# CREATE EXTENSION gpss;
    
  4. Perform steps 2 and 3 for each database in which you will use the Greenplum-Informatica Connector to write client data.

After creating the function, you can start GPSS to receive connections from the Greenplum-Informatica Connector.

Step 2: Configure and Start GPSS

Follow these steps to create a GPSS configuration file and start the GPSS process using the file:

  1. Log in to the Pivotal Greenplum Database master or segment node on which you will run the GPSS service.

  2. Create a JSON configuration file to configure connection information for the GPSS service. The following output shows the contents of a sample configuration file named config.json:

    {
        "listenaddress": {
            "Host": "",
            "Port": 5000,
            "Encryption": {
                "CertFile": "/home/gpadmin/gpdb_bin/ext/server.crt",
                "KeyFile": "/home/gpadmin/gpdb_bin/ext/server.key",
                "CAFile": "/home/gpadmin/gpdb_bin/ext/rootCA.pem"
            }
    
        },
        "gpfdist": {
            "Host": "",
            "Port": 5001,
            "Encryption": {
                "CertFile": "/home/gpadmin/gpdb_bin/gpdb/server.crt",
                "KeyFile": "/home/gpadmin/gpdb_bin/gpdb/server.key",
                "CAFile": "/home/gpadmin/gpdb_bin/gpdb/rootCA.pem"
            }
        }
    }
    

    Each property of the GPSS configuration file is described in the following table.

    Property Description
    listenaddress Specifies the Host, Port, and optional Encryption settings that the GPSS service uses to listen for connection requests from the Greenplum-Informatica connector. The default listen address is 0.0.0.0:5000.
    gpfdist Specifies the Host, Port, and optional Encryption settings that GPSS uses to provide the gpfdist service for loading data into Greenplum Database. The default value is ‘HOSTNAME:5001’, where “HOSTNAME” is the output of the hostname command.
    Encryption The Encryption property is optional. If you include Encryption as part of the listenaddress object, GPSS uses an HTTPS connection between itself and the Greenplum-Informatica connector, and you must specify the CertFile, KeyFile, and CAFile properties described below. If you include Encryption as part of the gpfdist object, the GPSS service uses the encrypted gpfdists protocol to transfer data between itself and Greenplum Database, and CertFile, KeyFile, and CAFile are also needed to secure the connection. If you do not specify the Encryption property in one or both contexts, GPSS uses unencrypted communication (HTTP for communicating with the connector, or unencrypted gpfdist protocol for communicating with Greenplum Database).
    CertFile Specifies the absolute path to the server certificate used for authenticating the HTTPS or gpfdists connection to GPSS. The CN in CertFile must exactly match the hostname on which GPSS runs. For HTTPS communication between the connector and GPSS, the CN must also match the Loader host address property that the Greenplum-Informatica connector specifies in its connection configuration (see Creating a New Connection to Greenplum Streaming Server).
    KeyFile Specifies the absolute path to the server key file used for authenticating the HTTPS or gpfdists connection to GPSS.
    CAFile Specifies the absolute path to the Certificate Authority file used for authenticating the HTTPS or gpfdists connection. The CAFile must contain the entire Certificate Authority chain. For HTTPS communication between the connector and GPSS, the same Certificate Authority must be specified in the Certificate Authority file specified in the Greenplum-Informatica connector configuration (see Creating a New Connection to Greenplum Streaming Server).
  3. Use the gpss command to start the GPSS service, specifying the JSON configuration file to use. For example:

    $ gpss config.json