接上文 pgpool-II的性能缺陷:
前文已经说到,pgpool-II在replication mode状态下,是顺次而非并行执行SQL文给各个DB节点。
从Source的角度,可以看到:
SimpleQuery → pool_send_and_wait → send_simplequery_message
/* * Process Query('Q') message * Query messages include an SQL string. */ POOL_STATUS SimpleQuery(POOL_CONNECTION *frontend, POOL_CONNECTION_POOL *backend, int len, char *contents){ …… /* log query to log file if necessary */ if (pool_config->log_statement){ pool_log("statement: %s", contents); }else{ pool_debug("statement2: %s", contents); } …… string = query_context->original_query; if (!RAW_MODE){ …… /* * Query is not commit/rollback */ if (!commit){ char *rewrite_query; …… /* * Optimization effort: If there's only one session, we do * not need to wait for the master node's response, and * could execute the query concurrently. */ if (pool_config->num_init_children == 1){ /* Send query to all DB nodes at once */ status = pool_send_and_wait(query_context, 0, 0); /* free_parser(); */ return status; } /* Send the query to master node */ if (pool_send_and_wait(query_context, 1, MASTER_NODE_ID)
!= POOL_CONTINUE) { free_parser(); return POOL_END; } } /* * Send the query to other than master node. */ if (pool_send_and_wait(query_context, -1, MASTER_NODE_ID)
!= POOL_CONTINUE{
free_parser(); return POOL_END; } …… }else{ …… } return POOL_CONTINUE; }
/*
* Send simple query and wait for response
* send_type:
* -1: do not send this node_id
* 0: send to all nodes
* >0: send to this node_id
*/
POOL_STATUS pool_send_and_wait(POOL_QUERY_CONTEXT *query_context,
int send_type, int node_id)
{
……
/* Send query */
for (i=0;i<NUM_BACKENDS;i++){
……
per_node_statement_log(backend, i, string);
if ( send_simplequery_message(CONNECTION(backend, i),
len, string, MAJOR(backend)) != POOL_CONTINUE) {
return POOL_END;
}
}
/* Wait for response */
for (i=0;i<NUM_BACKENDS;i++){
……
if (wait_for_query_response(frontend, CONNECTION(backend, i),
MAJOR(backend)) != POOL_CONTINUE){
/* Cancel current transaction */
CancelPacket cancel_packet;
cancel_packet.protoVersion = htonl(PROTO_CANCEL);
cancel_packet.pid = MASTER_CONNECTION(backend)->pid;
cancel_packet.key= MASTER_CONNECTION(backend)->key;
cancel_request(&cancel_packet);
return POOL_END;
}
……
}
return POOL_CONTINUE;
}
经过对程序的进一步分析和试验,可以得出以下的结论:
在 Master Node 和其他各Node之间,对SQL文的执行是串行的。
在 Master Node以外的其他各Node之间,是并行执行的。其实是
/* Send query */ 一段,无阻塞方式向各节点发送SQL文。
/* Wait for response */ 一段,虽然也是个循环,但是是串行。
不过好在向各节点发SQL文的时候,几乎是同时地发送命令,
所以 Wait for response 对一个节点检查获得SQL文执行结束消息以后,
几乎同时也会获得下一个节点SQL文执行结束的消息。
综合以上:如果对一个节点单独执行一段批处理耗时1小时,那么在replication mode 多个节点运行条件下,执行时间将变成 2小时。
至于为何 pgpool-II把对 Master Node和 其他Node的执行分开,也许有特殊考虑,也许是为了保证Master Node的正确性。