src/backend/cdb/endpoint/README

With the size of Greenplum cluster increasing, the performance bottleneck on
the coordinator node becomes more and more obvious.

The parallel retrieve cursor feature is designed to reduce the heavy burdens of
the coordinator node. For a query, it can redirect the results to segments
instead of gathering them to the coordinator node. When it is executed, user
can setup retrieve mode connections to segments for retrieving result data in
parallel.


Parallel Retrieve Cursor
========================

Like a normal cursor, user can declare a parallel retrieve cursor on
coordinator, then retrieve the query results on each segment directly.

Parallel retrieve cursor has similar declaration and syntax as normal cursor
does. However, some cursor operations are not supported in parallel retrieve
cursor(e.g. MOVE).

#NOTE: Orca doesn't support PARALLEL RETRIEVE CURSOR for now. Greenplum would
fall back to postgres optimizer automatically.

Endpoint
========

Once a parallel retrieve cursor has been declared on QD, a corresponding
endpoint will be created on the same segment whose QE contains the query
result. These endpoints are backed by individual backends on each QE. Then,
those endpoints can be used as the source, and results can be retrieved from
them in parallel on each QE.


Declare Parallel Retrieve Cursor
================================

Before retrieving data from endpoints in parallel, a parallel retrieve cursor
variable needs to be created alongside with the query statement.

Syntax:
DECLARE cursor_name PARALLEL RETRIEVE CURSOR FOR query;

The cursor_name needs to be unique in current transaction.

Examples:

postgres=# BEGIN;
BEGIN
postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
DECLARE


List Parallel Retrieve Cursors and their endpoints
==================================================

To retrieve the query results of a parallel retrieve cursor, the related
endpoint information is needed before start a retrieve session on QEs. The UDF
gp_get_endpoints() can be used to list parallel retrieve cursors and their
endpoints information. This UDF could be run on the coordinator only.

For a superuser, it can list all endpoints information of all users', but for
non-superuser, it can only list the current user's endpoints information for
security reason.

Definition: gp_get_endpoints()

gp_get_endpoints() Columns:
|-------------+-----------+------------------------------------------|
| Column Name | Data Type | Description                              |
|-------------+-----------+------------------------------------------|
| dbid        | integer   | The QE's dbid                            |
|-------------+-----------+------------------------------------------|
| auth_token  | text      | Retrieve-session authentication token    |
|-------------+-----------+------------------------------------------|
| cursorname  | text      | Parallel retrieve cursor name            |
|-------------+-----------+------------------------------------------|
| sessionid   | integer   | The session where the cursor was created |
|-------------+-----------+------------------------------------------|
| hostname    | text      | The host to retrieve from                |
|-------------+-----------+------------------------------------------|
| port        | integer   | The port number to start the retrieve    |
|             |           | session                                  |
|-------------+-----------+------------------------------------------|
| userid      | oid       | The oid of the user                      |
|-------------+-----------+------------------------------------------|
| state       | text      | One of the following state for this      |
|             |           | endpoint:                                |
|             |           | READY:                                   |
|             |           | The endpoint is ready to be retrieved    |
|             |           | ATTACHED:                                |
|             |           | The endpoint has been attached by a      |
|             |           | a retrieve connection                    |
|             |           | RETRIEVING:                              |
|             |           | The endpoint is being retrieved          |
|             |           | FINISHED:                                |
|             |           | The endpoint has been fully retrieved    |
|             |           | RELEASED:                                |
|             |           | Due to errors, this endpoint has been    |
|             |           | released and cannot be connected anymore |
|-------------+-----------+------------------------------------------|
| endpointname| text      | The identification of this endpoint      |
|-------------+-----------+------------------------------------------|

Examples:

postgres=# select * from gp_get_endpoints();
 dbid |            auth_token            | cursorname | sessionid | hostname | port | userid | state |    endpointname
------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
    2 | 75ebe7b49c3e09f35e017fc0181c62cf | c3         |       105 | host67   | 7002 |     10 | READY | c30000006900000005
    3 | 3ee0a64a465dde2813d4940961747c88 | c3         |       105 | host67   | 7003 |     10 | READY | c30000006900000005
    4 | 97a8eeee337798f718319c5234ea1440 | c3         |       105 | host67   | 7004 |     10 | READY | c30000006900000005
(3 rows)

There is another similar gp_get_session_endpoints() that shows the endpoint
information that belong to this session only.

Start A Retrieve Session
========================

Once a parallel retrieve cursor has been declared, retrieve sessions can be
started on each endpoint's host by using the endpoint's token as the session
authentication password.

gp_retrieve_conn=true needs to be set to start retrieve session.

Examples:

$> PGOPTIONS='-c gp_retrieve_conn=true' psql -h host67 -p 7002 -d postgres -U user1
Password: 75ebe7b49c3e09f35e017fc0181c62cf
postgres=#

Here the "75ebe7b49c3e09f35e017fc0181c62cf", "host67" and "7002" are the
endpoint "c30000006900000005"'s token, hostname and port.

Besides the user and password should match, the user also needs login
permission. Otherwise this role is not allowed to login, even with correct
password.

Multiple parallel retrieve cursors can be retrieved from the same retrieve
session as long as their token is the same.



Retrieve From Endpoint
======================

In each retrieve session, the query result on that segment can be
retrieved by using statement "RETRIEVE" and its corresponding endpoint
name.

Syntax:
RETRIEVE { ALL | count } FROM ENDPOINT endpoint_name;

Parameter "ALL" means to retrieve all the results from the endpoint.
Otherwise, a positive integer value needs to be provided as the "count" to
specify how many rows to retrieve.  An empty set will be returned if no more
tuples for the endpoint.

Examples:

postgres=# RETRIEVE ALL FROM ENDPOINT c30000006900000005;
 col1 | col2
------+------
 0    | 0
 1    | 1
(2 rows)


List Endpoints In Utility Session On Endpoint QE
================================================

It is possible to list all sessions' endpoints in the UTILITY connection to
specific endpoint (coordinator or segment node) by using UDF
gp_get_segment_endpoints(). Same as the UDF gp_get_endpoints(), a superuser can see the
endpoint information of all users, but non-superuser can see its endpoints
information only for security reason.

Definition: gp_get_segment_endpoints()

gp_get_segment_endpoints() Columns:
|--------------+-----------+------------------------------------------|
| Column Name  | Data Type | Description                              |
|--------------+-----------+------------------------------------------|
| auth_token   | text      | Retrieve session authentication token    |
|--------------+-----------+------------------------------------------|
| databaseid   | integer   | The database where the cursor is created |
|--------------+-----------+------------------------------------------|
| senderpid    | integer   | The process where the query results are  |
|              |           | sent from                                |
|--------------+-----------+------------------------------------------|
| receiverpid  | integer   | The process where the results are being  |
|              |           | received on                              |
|--------------+-----------+------------------------------------------|
| state        | text      | The state of the endpoint                |
|              |           | See gp_get_endpoints() for more details  |
|--------------+-----------+------------------------------------------|
| dbid         | integer   | The QE's dbid                            |
|--------------+-----------+------------------------------------------|
| sessionid    | integer   | The session where the cursor created     |
|--------------+-----------+------------------------------------------|
| userid       | oid       | The user who created this endpoint       |
|--------------+-----------+------------------------------------------|
| endpointname | text      | The identification of this endpoint      |
|--------------+-----------+------------------------------------------|
| cursorname   | text      | Parallel retrieve cursor name            |
|--------------+-----------+------------------------------------------|

Examples:

# Connect the segment in utility mode
$> PGOPTIONS="-c gp_role=utility" psql -h host67 -p 7002 -d postgres

postgres=# select * from gp_get_segment_endpoints();
            auth_token            | databaseid | senderpid | receiverpid | state | dbid | sessionid | userid |    endpointname    | cursorname
----------------------------------+------------+-----------+-------------+-------+------+-----------+--------+--------------------+------------
 75ebe7b49c3e09f35e017fc0181c62cf |      13361 |      3854 |          -1 | READY |    2 |       105 |     10 | c30000006900000005 | c3
(1 row)

Wait Parallel Retrieve Cursor To Be Fully Retrieved
===================================================

UDF gp_wait_parallel_retrieve_cursor() is designed to block until all the
endpoints have been fully retrieved for the given parallel retrieve cursor
until timeout happens. It will block the coordinator session until all the
relevant endpoints are fully retrieved unless timeout or error happens. When
error happens, the current transaction will be aborted, all parallel retrieve
cursors and their endpoints in current transaction will be freed. If it is
interrupted by signals, the transaction will be aborted as well. When timeout
happens, it will report whether all retrieve operations finish or not. Specify
timeout_sec as 0 to check the status immediately and specify timeout_sec as -1
to block until retrieve finishes or error happens.

Definition: gp_wait_parallel_retrieve_cursor(cursor_name, timeout_sec)

It will only return true if all endpoints fully retrieved. In any other cases,
an error message will be thrown.

Examples:

postgres=# SELECT gp_wait_parallel_retrieve_cursor('c3', -1); <waiting...>
 gp_wait_parallel_retrieve_cursor
-----------------------------------
 t
(1 row)


Error Handling In Retrieve Session
==================================

When error happens in a retrieve session, the endpoint will be removed from QE.
The other retrieve sessions can still retrieve from other endpoints which
belong to this session at this time. But if gp_wait_parallel_retrieve_cursor()
or CLOSE are executed on QD, the other retrieve sessions will be canceled since
the error is delivered to QD and treated as an error happens in the
transaction.

When the endpoint is partially retrieved and then the corresponding retrieve
session quits, the case will be treated as a retrieve session error as well.
The endpoint will be removed and an error will be reported on QD when checking
or waiting on the retrieve state. "CLOSE" will also report error in this case.


Close Parallel Retrieve Cursor
==============================

Use "CLOSE" statement to close the parallel retrieve cursor and release
its endpoints. All the related on-going retrieve statements will be
canceled.

Syntax:
CLOSE cursorname;

Examples:

postgres=# CLOSE C3
CLOSE


Walk-Through Example
====================

Assumes we have a table named "t1" with an integer column "id", and its
data distributed as following:

postgres=# select gp_segment_id, * from t1;
 gp_segment_id | id
---------------+----
             0 |  2
             0 |  3
             0 |  4
             2 |  5
             1 |  0
             1 |  1
(6 rows)

To declare a parallel retrieve cursor on coordinator in order to select all
tuples from t1:

-- Begin transaction first
postgres=# BEGIN;
BEGIN

-- Create the parallel retrieve cursor for the intent query
postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
DECLARE

-- List endpoints to get the needed information to start retrieving
-- sessions on segments
postgres=# SELECT * FROM gp_get_endpoints();
 dbid |            auth_token            | cursorname | sessionid | hostname | port | userid | state |    endpointname
------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
    2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | READY | c1000000160000000a
    3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | READY | c1000000160000000a
    4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY | c1000000160000000a
(3 rows)

Now start a retrieve session on the first segment (i.e. host67:7002)

# gp_retrieve_conn=true is needed as the pg option.
$> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7002 -d postgres
Password: c5c116a13e2fdb8b436cdbc8e1bc7365

-- Retrieve all rows of this endpoint
postgres=# RETRIEVE ALL FROM ENDPOINT c1000000160000000a;
 id
----
  2
  3
  4
(3 rows)

Now the state of endpoint c1000000160000000a for dbid 2 (host67:7002) should
become "FINISHED" since all results on the segment have been retrieved.

-- List endpoints on coordinator to check
postgres=# SELECT * FROM gp_get_endpoints();
 dbid |            auth_token            | cursorname | sessionid | hostname | port | userid |  state   |    endpointname
------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
    2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | FINISHED | c1000000160000000a
    3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | READY    | c1000000160000000a
    4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY    | c1000000160000000a
(3 rows)

Start another retrieve session on host67:7003

$> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7003 -d postgres
Password: 5f59f7f3338212f20d3bb1527fac9fcd

-- Retrieve 1 row on this endpoint
postgres=# RETRIEVE 1 FROM ENDPOINT c1000000160000000a;
 id
----
  0
(1 row)

The state of endpoint  will firstly becomes "RETRIEVING", then becomes
"ATTACHED".  RETRIEVING means that a receiver is retrieving data from the
endpoint. If a receiver finishes retrieving a part of data (but has not
retrieved all the data), the state becomes ATTACHED, means that this endpoint
has been attached by a receiver. If the receiver has retrieved all the data
from the endpoint, the state becomes FINISHED.

-- List endpoints on coordinator to check
postgres=# SELECT * FROM gp_get_endpoints();
 dbid |            auth_token            | cursorname | sessionid | hostname | port | userid |  state   |    endpointname
------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
    2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | FINISHED | c1000000160000000a
    3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | ATTACHED | c1000000160000000a
    4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY    | c1000000160000000a
(3 rows)

-- To check if the retrieving finished on the coordinator connection.
postgres =# select gp_wait_parallel_retrieve_cursor('c1', 0);
 gp_wait_parallel_retrieve_cursor
-----------------------------------
 f
(1 row)

We can just close the cursor now if we decide to not retrieve the
remaining query results (on the coordinator connection).

postgres=# CLOSE c1;
CLOSE CURSOR

-- All endpoints are gone
postgres=# SELECT * FROM gp_get_endpoints();
 dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname
------+------------+------------+-----------+----------+------+--------+-------+--------------
(0 rows)

-- End the transaction
coordinator=# END;
END

Implementation internals
====================
The basic idea is that for a declared parallel retrieve cursor, we create the
distributed plan that does not gather the tuples to the coordinator as usual.
The top node in the plan might be on all segments or on partial segment or on
one segment, depending on the query itself. The cursor backend (executor)
process would output the tuples to the separate retrieve connection via the
shm_mq based dest receiver.

The communication between the cursor backend process and the retrieve
connection is the endpoint that is represented as EndpointData. There is an
array of EndpointData (sharedEndpoints[]) in shared memory for the whole
cluster use. The cursor backend process would allocate one endpoint and the
retrieve connection would try to use that endpoint thus authentication of
retrieve connection is surely required.
