Installing and Configuring GPSS in Greenplum Database

The Greenplum Stream Server (GPSS) manages communication and data transfer between the Pivotal Greenplum-Informatica Connector and Greenplum Database. In addition to installing the Pivotal Greenplum-Informatica Connector components in your Informatica cluster, you must install, configure and start GPSS in the Greenplum Database cluster.

Prerequisites

The Greenplum Stream Server is automatically installed with Greenplum Database 5.6 or later. Install and start a compatible Greenplum Database version before you continue with the procedure.

Process

Follow these steps to install GPSS support in Greenplum Database:

  1. Register the GPSS Extension as described in the Pivotal Greenplum Stream Server documentation.

  2. Starting in Greenplum Database version 5.15.x, GPSS uses a new JSON configuration file format. Depending upon the version of your Greenplum Database installation, configure and start a GPSS instance as follows:

    1. If you are using the Greenplum-Informatica Connector with Greenplum Database version 5.15.x or later, configure and start a GPSS instance as described in the Pivotal Greenplum Stream Server documentation.
    2. If you are using the Connector with Greenplum Database version 5.14.x or earlier, Configure and Start GPSS as described below.

Configure and Start GPSS

If you are using the Greenplum-Informatica Connector to write data to Greenplum Database version 5.14.x or earlier, follow these steps to create a GPSS configuration file and start the GPSS process:

  1. Log in to the Pivotal Greenplum Database master or segment node on which you will run the GPSS service.

  2. Create a JSON configuration file to configure connection information for the GPSS service. The following output shows the contents of a sample configuration file named config.json:

    {
        "listenaddress": {
            "Host": "",
            "Port": 5000,
            "Encryption": {
                "CertFile": "/home/gpadmin/gpdb_bin/ext/server.crt",
                "KeyFile": "/home/gpadmin/gpdb_bin/ext/server.key",
                "CAFile": "/home/gpadmin/gpdb_bin/ext/rootCA.pem"
            }
    
        },
        "gpfdist": {
            "Host": "",
            "Port": 5001,
            "Encryption": {
                "CertFile": "/home/gpadmin/gpdb_bin/gpdb/server.crt",
                "KeyFile": "/home/gpadmin/gpdb_bin/gpdb/server.key",
                "CAFile": "/home/gpadmin/gpdb_bin/gpdb/rootCA.pem"
            }
        }
    }
    

    Each property of the GPSS configuration file is described in the following table.

    Property Description
    listenaddress Specifies the Host, Port, and optional Encryption settings that the GPSS service uses to listen for connection requests from the Greenplum-Informatica connector. The default listen address is 0.0.0.0:5000.
    gpfdist Specifies the Host, Port, and optional Encryption settings that GPSS uses to provide the gpfdist service for loading data into Greenplum Database. The default value is ‘HOSTNAME:5001’, where “HOSTNAME” is the output of the hostname command.
    Encryption The Encryption property is optional. If you include Encryption as part of the listenaddress object, GPSS uses an HTTPS connection between itself and the Greenplum-Informatica connector, and you must specify the CertFile, KeyFile, and CAFile properties described below. If you include Encryption as part of the gpfdist object, the GPSS service uses the encrypted gpfdists protocol to transfer data between itself and Greenplum Database, and CertFile, KeyFile, and CAFile are also needed to secure the connection. If you do not specify the Encryption property in one or both contexts, GPSS uses unencrypted communication (HTTP for communicating with the connector, or unencrypted gpfdist protocol for communicating with Greenplum Database).
    CertFile Specifies the absolute path to the server certificate used for authenticating the HTTPS or gpfdists connection to GPSS. The CN in CertFile must exactly match the hostname on which GPSS runs. For HTTPS communication between the connector and GPSS, the CN must also match the Loader host address property that the Greenplum-Informatica connector specifies in its connection configuration (see Creating a New Connection to Greenplum Stream Server).
    KeyFile Specifies the absolute path to the server key file used for authenticating the HTTPS or gpfdists connection to GPSS.
    CAFile Specifies the absolute path to the Certificate Authority file used for authenticating the HTTPS or gpfdists connection. The CAFile must contain the entire Certificate Authority chain. For HTTPS communication between the connector and GPSS, the same Certificate Authority must be specified in the Certificate Authority file specified in the Greenplum-Informatica connector configuration (see Creating a New Connection to Greenplum Stream Server).
  3. Use the gpss command to start the GPSS service, specifying the JSON configuration file to use. For example:

    $ gpss config.json