Optimizing PostgreSQL Foreign Data Wrappers for Multi-Database Integration

Master PostgreSQL Foreign Data Wrappers (FDW) for seamless multi-database integration, cross-database queries, and database federation. Complete guide with performance optimization and real-world examples.

26 views

Optimizing PostgreSQL Foreign Data Wrappers for Multi-Database Integration

Introduction

PostgreSQL Foreign Data Wrappers (FDW) enable seamless integration with external data sources, allowing you to query remote databases, files, and even web APIs as if they were local tables. This powerful feature enables database federation, cross-database joins, and centralized data access without complex ETL pipelines.

What are Foreign Data Wrappers?

FDW implements the SQL/MED (Management of External Data) standard, providing:

  • Transparent access: Query remote data using standard SQL
  • Virtual tables: No data duplication required
  • Real-time data: Always current, no sync delays
  • Join capability: Combine local and remote data
  • Write operations: Some FDWs support INSERT/UPDATE/DELETE

Common Use Cases

  1. Database Migration: Access legacy systems during migration
  2. Multi-tenant Architecture: Query across tenant databases
  3. Data Aggregation: Combine data from multiple sources
  4. Hybrid Cloud: Connect on-premise and cloud databases
  5. Reporting: Create centralized reporting database
  6. Microservices: Query across service boundaries

Available Foreign Data Wrappers

postgresql_fdw (Built-in)

  • Connect to other PostgreSQL databases
  • Full feature support
  • Best performance

mysql_fdw

  • Connect to MySQL/MariaDB
  • Read and write support

oracle_fdw

  • Connect to Oracle databases
  • Commercial and open-source versions

mongodb_fdw

  • Access MongoDB collections
  • JSON data integration

file_fdw (Built-in)

  • Read CSV and text files
  • Log file analysis

Others

  • redis_fdw, sqlite_fdw, couchdb_fdw, etc.

Setting Up postgres_fdw

Step 1: Install Extension

-- On the local database
CREATE EXTENSION postgres_fdw;

Step 2: Create Foreign Server

CREATE SERVER remote_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (
        host 'remote-db.example.com',
        port '5432',
        dbname 'production_db',
        fetch_size '10000'
    );

Step 3: Create User Mapping

-- Map local users to remote credentials
CREATE USER MAPPING FOR postgres
    SERVER remote_db
    OPTIONS (
        user 'remote_user',
        password 'secure_password'
    );

-- Map all local users
CREATE USER MAPPING FOR PUBLIC
    SERVER remote_db
    OPTIONS (
        user 'readonly_user',
        password 'readonly_pass'
    );

Step 4: Create Foreign Tables

Manual table definition:

CREATE FOREIGN TABLE remote_users (
    id INTEGER,
    username VARCHAR(50),
    email VARCHAR(100),
    created_at TIMESTAMP
)
SERVER remote_db
OPTIONS (schema_name 'public', table_name 'users');

Import entire schema:

IMPORT FOREIGN SCHEMA public
    FROM SERVER remote_db
    INTO local_schema;

Import specific tables:

IMPORT FOREIGN SCHEMA public
    LIMIT TO (users, orders, products)
    FROM SERVER remote_db
    INTO public;

Step 5: Query Foreign Data

-- Simple query
SELECT * FROM remote_users WHERE created_at > NOW() - INTERVAL '7 days';

-- Join local and remote data
SELECT 
    l.order_id,
    l.amount,
    r.username,
    r.email
FROM local_orders l
JOIN remote_users r ON l.user_id = r.id
WHERE l.status = 'pending';

Advanced Configuration

Performance Tuning Options

CREATE SERVER fast_remote
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (
        host 'db.example.com',
        port '5432',
        dbname 'mydb',

        -- Fetch size: rows retrieved per round trip
        fetch_size '50000',

        -- Enable query pushdown optimizations
        use_remote_estimate 'true',

        -- Connection pooling
        keep_connections 'on',

        -- Parallel query execution
        parallel_commit 'on',
        parallel_abort 'on'
    );

Table-Level Options

CREATE FOREIGN TABLE optimized_table (
    id INTEGER,
    data TEXT
)
SERVER remote_db
OPTIONS (
    schema_name 'public',
    table_name 'large_table',

    -- Override fetch size for this table
    fetch_size '100000',

    -- Use remote estimates for better query planning
    use_remote_estimate 'true'
);

Column-Level Options

CREATE FOREIGN TABLE mapped_columns (
    local_id INTEGER OPTIONS (column_name 'remote_user_id'),
    local_name TEXT OPTIONS (column_name 'remote_username')
)
SERVER remote_db
OPTIONS (table_name 'users');

Write Operations

Enable Write Operations

-- Grant permissions
GRANT INSERT, UPDATE, DELETE ON remote_users TO app_user;

-- Insert data
INSERT INTO remote_users (username, email)
VALUES ('newuser', '[email protected]');

-- Update data
UPDATE remote_users
SET email = '[email protected]'
WHERE username = 'newuser';

-- Delete data
DELETE FROM remote_users WHERE username = 'newuser';

Transactions Across Databases

BEGIN;
    -- Insert into local table
    INSERT INTO local_orders (user_id, amount) VALUES (123, 99.99);

    -- Update remote table
    UPDATE remote_users SET last_order = NOW() WHERE id = 123;

    -- Both operations committed together
COMMIT;

Note: Two-phase commit ensures atomicity across databases.

Multi-Database Federation

Connecting Multiple Databases

-- Connect to production database
CREATE SERVER prod_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'prod.example.com', dbname 'production');

CREATE USER MAPPING FOR PUBLIC SERVER prod_db
    OPTIONS (user 'readonly', password 'pass1');

-- Connect to analytics database
CREATE SERVER analytics_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'analytics.example.com', dbname 'warehouse');

CREATE USER MAPPING FOR PUBLIC SERVER analytics_db
    OPTIONS (user 'readonly', password 'pass2');

-- Connect to archive database
CREATE SERVER archive_db
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'archive.example.com', dbname 'historical');

CREATE USER MAPPING FOR PUBLIC SERVER archive_db
    OPTIONS (user 'readonly', password 'pass3');

Unified Views Across Databases

-- Import tables
IMPORT FOREIGN SCHEMA public 
    LIMIT TO (current_orders) 
    FROM SERVER prod_db INTO public;

IMPORT FOREIGN SCHEMA public 
    LIMIT TO (archived_orders) 
    FROM SERVER archive_db INTO public;

-- Create unified view
CREATE VIEW all_orders AS
SELECT *, 'current' AS source FROM current_orders
UNION ALL
SELECT *, 'archived' AS source FROM archived_orders;

-- Query across all databases
SELECT 
    source,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount
FROM all_orders
WHERE created_at > '2024-01-01'
GROUP BY source;

Connecting to MySQL

Install mysql_fdw

# Ubuntu/Debian
sudo apt install postgresql-15-mysql-fdw

# CentOS/RHEL
sudo yum install mysql_fdw_15

Configure MySQL Connection

CREATE EXTENSION mysql_fdw;

CREATE SERVER mysql_server
    FOREIGN DATA WRAPPER mysql_fdw
    OPTIONS (
        host 'mysql-db.example.com',
        port '3306'
    );

CREATE USER MAPPING FOR PUBLIC
    SERVER mysql_server
    OPTIONS (
        username 'mysql_user',
        password 'mysql_pass'
    );

CREATE FOREIGN TABLE mysql_products (
    id INTEGER,
    name VARCHAR(100),
    price DECIMAL(10,2)
)
SERVER mysql_server
OPTIONS (dbname 'ecommerce', table_name 'products');

-- Query MySQL data from PostgreSQL
SELECT * FROM mysql_products WHERE price > 100;

Performance Optimization

1. Use WHERE Clause Pushdown

PostgreSQL pushes filters to remote server:

-- Good: Filter applied remotely
SELECT * FROM remote_users WHERE country = 'US';

-- Check query plan
EXPLAIN (VERBOSE) SELECT * FROM remote_users WHERE country = 'US';
-- Look for: "Remote SQL: SELECT ... WHERE country = 'US'"

2. Limit Data Transfer

-- Bad: Fetches all columns
SELECT * FROM remote_large_table;

-- Good: Only needed columns
SELECT id, username FROM remote_large_table;

-- Good: Use LIMIT
SELECT * FROM remote_large_table LIMIT 1000;

3. Optimize Fetch Size

-- For large result sets
ALTER SERVER remote_db OPTIONS (SET fetch_size '50000');

-- For small, frequent queries
ALTER SERVER remote_db OPTIONS (SET fetch_size '1000');

4. Use Remote Estimates

ALTER SERVER remote_db OPTIONS (ADD use_remote_estimate 'true');

-- PostgreSQL will query remote EXPLAIN for accurate cost estimates

5. Index on Remote Tables

-- Ensure remote tables have appropriate indexes
-- Connect directly to remote database:

CREATE INDEX idx_users_country ON users(country);
CREATE INDEX idx_orders_created_at ON orders(created_at);

6. Materialized Views for Frequently Accessed Data

-- Cache remote data locally
CREATE MATERIALIZED VIEW cached_remote_data AS
SELECT * FROM remote_users WHERE active = true;

-- Refresh periodically
REFRESH MATERIALIZED VIEW cached_remote_data;

-- Auto-refresh with cron job or trigger
CREATE EXTENSION pg_cron;
SELECT cron.schedule('refresh-cache', '0 */6 * * *', 
    'REFRESH MATERIALIZED VIEW cached_remote_data');

Monitoring and Troubleshooting

View Foreign Servers

SELECT 
    srvname AS server_name,
    srvoptions AS options
FROM pg_foreign_server;

View Foreign Tables

SELECT 
    foreign_table_schema,
    foreign_table_name,
    foreign_server_name
FROM information_schema.foreign_tables;

Check User Mappings

SELECT 
    um.umuser::regrole AS local_user,
    fs.srvname AS foreign_server,
    um.umoptions AS options
FROM pg_user_mapping um
JOIN pg_foreign_server fs ON um.umserver = fs.oid;

Analyze Query Performance

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT * FROM remote_users WHERE created_at > NOW() - INTERVAL '1 day';

Connection Pooling Check

-- Check active foreign connections
SELECT 
    datname,
    usename,
    application_name,
    client_addr,
    state
FROM pg_stat_activity
WHERE application_name LIKE '%fdw%';

Common Issues and Solutions

Issue 1: Connection Timeout

Error: could not connect to server

Solutions:

-- Add connection timeout
ALTER SERVER remote_db OPTIONS (ADD connect_timeout '10');

-- Check network connectivity
-- From shell:
psql -h remote-db.example.com -U user -d dbname

Issue 2: Slow Queries

Solutions:

-- Enable remote estimates
ALTER SERVER remote_db OPTIONS (ADD use_remote_estimate 'true');

-- Increase fetch size
ALTER SERVER remote_db OPTIONS (SET fetch_size '100000');

-- Check if WHERE clause is pushed down
EXPLAIN VERBOSE SELECT * FROM remote_table WHERE id > 1000;

Issue 3: Authentication Failure

Error: password authentication failed

Solutions:

-- Update user mapping
ALTER USER MAPPING FOR postgres
    SERVER remote_db
    OPTIONS (SET password 'new_password');

-- Check pg_hba.conf on remote server
-- Ensure it allows connections from local server

Issue 4: Column Type Mismatch

Error: column type mismatch

Solution:

-- Explicitly cast columns
CREATE FOREIGN TABLE remote_data (
    id INTEGER,
    data TEXT,
    created_at TIMESTAMP
)
SERVER remote_db
OPTIONS (table_name 'source_table');

-- Or import with correct types
DROP FOREIGN TABLE remote_data;
IMPORT FOREIGN SCHEMA public 
    LIMIT TO (source_table) 
    FROM SERVER remote_db INTO public;

Security Considerations

1. Use Read-Only Accounts

-- On remote database, create limited user
CREATE ROLE fdw_readonly;
GRANT CONNECT ON DATABASE production_db TO fdw_readonly;
GRANT USAGE ON SCHEMA public TO fdw_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO fdw_readonly;

-- Use in user mapping
CREATE USER MAPPING FOR PUBLIC
    SERVER remote_db
    OPTIONS (user 'fdw_readonly', password 'safe_pass');

2. Secure Password Storage

-- Use .pgpass file instead of hardcoded passwords
-- ~/.pgpass format:
-- hostname:port:database:username:password

CREATE USER MAPPING FOR postgres
    SERVER remote_db
    OPTIONS (user 'remote_user');
    -- Password read from .pgpass

3. Network Security

# Use SSL connections
# In postgresql.conf on remote:
ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
-- Require SSL in user mapping
CREATE SERVER secure_remote
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (
        host 'remote.example.com',
        dbname 'mydb',
        sslmode 'require'
    );

Best Practices

  1. Minimize data transfer: Select only needed columns
  2. Use indexes: Ensure remote tables are properly indexed
  3. Cache frequently accessed data: Use materialized views
  4. Monitor performance: Regular EXPLAIN ANALYZE
  5. Limit write operations: Prefer read-only for most FDW usage
  6. Use connection pooling: Enable keep_connections
  7. Set appropriate fetch_size: Tune based on query patterns
  8. Test failover scenarios: Plan for remote database unavailability
  9. Document foreign dependencies: Maintain clear documentation
  10. Regular security audits: Review user mappings and permissions

Conclusion

PostgreSQL Foreign Data Wrappers provide powerful database federation capabilities:

  • Seamless integration: Query remote data with standard SQL
  • Real-time access: No data duplication or sync delays
  • Multi-database support: PostgreSQL, MySQL, Oracle, and more
  • Performance: Query pushdown and optimization
  • Flexibility: Read and write operations

Start with read-only queries to remote databases, optimize performance with proper indexing and fetch sizes, then expand to more complex integrations as needed.