- Published on
How to generate fake data in Postgres with faker_fdw
- Authors
- Name
- Nico Prananta
- Follow me: @2co_p
Few months ago I wrote about how to populate a Postgres database quickly using generate_series
function. It is a function that generates a set of values, from start to stop, at the interval specified by step. But in my example, the samples have values selected from a limited set of options.
In this post, I'll show you a better way to generate fake data in Postgres using faker_fdw. You can follow the installation steps from the repository of faker_fdw to install faker_fdw. But if you want to quickly give it a try, you can simply use the following Dockerfile to create a Docker image that has Postgres and faker_fdw installed:
# Stage 1: Build environment
FROM postgres:16 AS builder
# Switch to root to install dependencies
USER root
# Install necessary build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
wget \
postgresql-server-dev-16 \
python3 python3-dev python3-setuptools python3-pip python3.11-venv
# Create a virtual environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Faker and faker_fdw
RUN pip3 install --no-cache-dir Faker
RUN pip3 install https://github.com/guedes/faker_fdw/archive/v0.2.4.zip
# Install multicorn2
RUN wget https://github.com/pgsql-io/multicorn2/archive/refs/tags/v2.5.tar.gz && \
tar -xvf v2.5.tar.gz && \
cd multicorn2-2.5 && \
make && \
make install
# Stage 2: Runtime environment
FROM postgres:16
# Copy the built artifacts from the builder stage
COPY /usr/lib/postgresql/16/lib/multicorn.so /usr/lib/postgresql/16/lib/
COPY /usr/lib/postgresql/16/lib/bitcode/multicorn /usr/lib/postgresql/16/lib/bitcode/multicorn
COPY /usr/share/postgresql/16/extension/multicorn* /usr/share/postgresql/16/extension/
COPY /opt/venv /opt/venv
# Set environment variable for the virtual environment
ENV PATH="/opt/venv/bin:$PATH"
# Install Python and required libraries
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
python3-dev \
libpython3.11 \
&& rm -rf /var/lib/apt/lists/*
# Set the library path to include Python libraries
ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}"
# Switch back to postgres user
USER postgres
# Expose the PostgreSQL port
EXPOSE 5432
# Set the default command to run when starting the container
CMD ["postgres"]
Then you can run the container with the following command:
docker run \
--name postgres-faker \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
postgres_faker_fdw
Once the container is running, run the following query in the Postgres database:
CREATE EXTENSION multicorn;
CREATE SERVER faker_srv
FOREIGN DATA WRAPPER multicorn
OPTIONS (wrapper 'faker_fdw.FakerForeignDataWrapper');
Then you can use the faker_srv
server to generate fake data. First, create a foreign table, then you can use the people
table to generate fake data:
CREATE FOREIGN TABLE "public".people_random (
ssn varchar,
name varchar,
address text,
msisdn text,
email text,
ipv4_public text
) SERVER faker_srv OPTIONS (max_results '100');
SELECT * FROM people_random;
You'll see something like this:
You can use any of the provider names from Faker as the table name here to generate the data that you want. For example, you can use credit_card_number
and credit_card_provider
as explained in the credit_card providers document to generate credit card numbers and providers.
One thing to note is that the foreign table will return different results every time you run the query. To make it deterministic, you can either set a seed option
alter foreign table fake.person options ( add seed '1234' );
or insert the data into a normal table first
create TABLE "public".people (
ssn varchar,
name varchar,
address text,
msisdn text,
email text,
ipv4_public text
);
insert into people select * from people_random;
Are you working in a team environment and your pull request process slows your team down? Then you have to grab a copy of my book, Pull Request Best Practices.