Replicate PostgreSQL database using RServ and ssh

2003-08-04 Dobrica Pavlinusic <dpavlin@rot13.org>

This HOWTO is based on Setting up RServ with PostgreSQL but with replication to remote host with ssh security. It's written for Debian GNU/Linux, so if you are using something else, you might have to adjust paths to commands.

You should also note that this HOWTO uses RServ utilities from RServ improved by community which is special version improved by Nélio Alves Pereira Filho and myself. However, special care has been taken to make RServ 0.3 work with normal RServ 0.1 PostgreSQL module so that you can just install postgresql-contrib package and just perl scripts from RServ 0.3 (this makes upgrades easy). You will have to do this only on master host. Slave host doesn't need postgresql-contrib package or RServ installation.

Bare in mind that RServ in PostgreSQL contrib, is version 0.1. While I haven't had any problems using it, PostgreSQL, Inc. is actively working on eRServer with more features, including HA.

Installation, step-by-step

Let's first give a rough overview of parameters used in this HOWTO:
Database name: rep
Remote host: luna
Remote user and password for RServ replication: rserv and BATyLAaf

  1. Create ssh keys so that you are not asked for password when accessing remote host luna.
    How to do this (ssh-keygen -t dsa and friends) is left as exercise to reader.
  2. Optionally, if you want just to test replication, you can use rep-demo database.
    dpavlin@llin:~/rserv$ createdb rep
    CREATE DATABASE
    dpavlin@llin:~/rserv$ psql < misc/rep-demo.sql rep
    ERROR:  table "a_b" does not exist
    ERROR:  table "a" does not exist
    ERROR:  table "b" does not exist
    ERROR:  table "c" does not exist
    NOTICE:  CREATE TABLE will create implicit sequence 'a_id_seq' for SERIAL column 'a.id'
    CREATE TABLE
    CREATE INDEX
    NOTICE:  CREATE TABLE will create implicit sequence 'b_id_seq' for SERIAL column 'b.id'
    CREATE TABLE
    CREATE INDEX
    NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
    CREATE TABLE
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'c_pkey' for table 'c'
    CREATE TABLE
    
  3. Dump database which is replicated
    dpavlin@llin:~/rserv$ pg_dump rep > /tmp/rep.sql
    
  4. Create rep database on remote host and fill it
    dpavlin@llin:~/rserv$ ssh luna createdb rep
    CREATE DATABASE
    dpavlin@llin:~/rserv$ ssh luna psql rep < /tmp/rep.sql
    ...
    
    You might run into problems if you try to reload pg_dump from higher PostgreSQL version to lower (e.g. from 7.3.x to 7.2.x).
  5. Init master database for RServ replication
    dpavlin@llin:~/rserv$ ./bin/MasterInit rep
    NOTICE:  CREATE TABLE will create implicit sequence '_rserv_servers__server_seq' for SERIAL column '_rserv_servers_.server'
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index '_rserv_servers__pkey' for table '_rserv_servers_'
    NOTICE:  CREATE TABLE / UNIQUE will create implicit index '_rserv_servers__host_key' for table '_rserv_servers_'
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index '_rserv_tables__pkey' for table '_rserv_tables_'
    NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
    NOTICE:  CREATE TABLE will create implicit trigger(s) for FOREIGN KEY check(s)
    
  6. Create remote user rserv for database replication with password BATyLAaf. Use something else than BATyLAaf, please!
    dpavlin@llin:~/rserv$ ssh luna "psql -c \"create user rserv with encrypted password 'BATyLAaf' nocreatedb nocreateuser\" rep"
    CREATE USER
    
  7. Allow connections on luna to user rserv from localhost with password. Make ssh connection to luna and edit /etc/postgresql/pg_hba.conf to include:
    # TYPE  DATABASE        IP_ADDRESS      MASK            AUTHTYPE  MAP
    host    rep             127.0.0.1       255.255.255     md5
    
    and then reload your database and test if it works.
    dpavlin@luna:~$ sudo /etc/init.d/postgresql reload
    postmaster successfully signaled
    dpavlin@luna:~$ psql -U rserv -W -h 127.0.0.1 rep
    Password:	enter BATyLAaf
    Welcome to psql, the PostgreSQL interactive terminal.
    ...
    rep=> \q
    
  8. Start ssh tunnel to PostgreSQL on luna. This tunnel with use ssh to tunnel port 5432 on luna to port 15432 on master host. This will provide encryption and optional compression (if you specify it using -C flag to ssh).
    dpavlin@llin:~/rserv$ ssh -N -L 15432:localhost:5432 luna &
    [1] 5197	this number if PID of ssh. it will be different.
    dpavlin@llin:~/rserv$ tunnel_pid=$!
    
  9. Optional step is to test if you can connect from master host to PostgreSQL on luna. If you don't do this and it doesn't work, you will get error message later...
    dpavlin@llin:~/rserv$ psql -U rserv -W -h 127.0.0.1 -p 15432 rep
    Password:	enter BATyLAaf
    Welcome to psql 7.3.3, the PostgreSQL interactive terminal.
    ...
    rep=> \q
    
  10. Init slave (remote) database.
    dpavlin@llin:~/rserv$ ./bin/SlaveInit --slavehost=localhost --slaveport=15432 --slaveuser=rserv --slavepassword=BATyLAaf rep rep
    
  11. Select which tables from master database will be replicated. This is known to be boring end error-some task, so let's see how we can make it easier.
    Please note that in original HOWTO is sentence:
    You can give a specific column name in the table, there is no restriction on this in this version (0.1), although the author's document says only this column will be replicated. Useful Bug !!! :-) Any column changed in the table will be properly updated in the master database.
    That is not true. This is not column name, but name of unique key. And since this data is used when using snapshots and updates I doubt that it will work correctly if you just enter anything. Also, it's quite possible that you will run into problems if you are using compound keys (that is, if your table has key which is composed of more than one field).

    Since, I'm using really simple schema for this HOWTO, I do following:

    dpavlin@llin:~/rserv$ ./bin/MasterAddTable rep a id
    NOTICE:  CreateTrigger: changing return type of function _rserv_log_() from OPAQUE to TRIGGER
    dpavlin@llin:~/rserv$ ./bin/MasterAddTable rep b id
    dpavlin@llin:~/rserv$ ./bin/MasterAddTable rep a_b a_id
    dpavlin@llin:~/rserv$ ./bin/MasterAddTable rep c foo
    
    You will notice that I'm not using unique key for table a_b. That is because you can specify just one column and unique key for that table is (a_id,b_id). You will also notice that table c has primary key id, and I'm entering column foo. That is intentional mistake so we can test what happens.
  12. Select which tables from slave database will be replicated.
    FIXME
    
  13. Start replication.
    ./Replicate --slavehost=localhost --slaveuser=rserv --slavepassword=BATyLAaf --slaveport=15432 rep rep
    
  14. Test replication.
  15. Kill ssh tunnel to luna
    $ kill $tunnel_pid
    

Warnings

Yes, this code is really based on RServ 0.1, but I consider it quite stable. However, don't try to play with RServ tables. If you change something and internal bookkeeping becomes inconsistent (remember, we are talking about two different databases, you can't have transactions between different databases -- yet!) you will have to find someone who can fix it. I must admit that authors which worked on RServ comes to mind. But, please use commands. They are designed to do hard work, not you.

Alternatives

Usogres.

Next steps

This project has public TODO list. Most importantly, next step is to create console GUI which will simplify this process to just few selects (using keyboard, not psql :-)

References