17 December 2024

Quick Benchmark: Improvements to Large Object Dumping in Postgres 17

Version 17 of PostgreSQL has been released for a while. One of the many features is a change by Tom Lane called “Rearrange pg_dump’s handling of large objects for better efficiency”. In the past, we have seen our customers have several problems with a large number of large objects being a performance issue for dump/restore. The main reason for this is that large objects are quite unlike to TOAST (The Oversized Attribute Storage Technique): while TOASTed data is completely transparent to the user, large objects are stored out-of-line in a pg_largeboject table with a link to the particular row in that table being an OID in the table itself.

Introduction To Large
Objects

Here is an example on how large objects can be used:

postgres=# CREATE TABLE test(id BIGINT, blob OID);
CREATE TABLE
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
postgres=# SELECT * FROM test;
 id | blob
----+-------
  1 | 33280
(1 row)

postgres=# SELECT * FROM pg_largeobject;
 loid  | pageno |                    data
-------+--------+--------------------------------------------
 33280 |      0 | \x44656269616e20474e552f4c696e75782031320a
(1 row)

postgres=# SELECT lo_export(test.blob, '/tmp/foo') FROM test;
 lo_export
-----------
         1
(1 row)

postgres=# SELECT pg_read_file('/tmp/foo');
    pg_read_file
---------------------
 Debian GNU/Linux 12+

(1 row)

postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1

Now if we dump the database in custom format with both version 16 and 17 of pg_dump and then use pg_restore -l to display the table of contents (TOC), we see a difference:

$ for version in 16 17; do /usr/lib/postgresql/$version/bin/pg_dump -Fc -f lo_test_$version.dmp; \
> pg_restore -l lo_test_$version.dmp | grep -v ^\; > lo_test_$version.toc; done
$ diff -u lo_test_{16,17}.toc
--- lo_test_16.toc  2024-12-11 09:05:46.550667808 +0100
+++ lo_test_17.toc  2024-12-11 09:05:46.594670235 +0100
@@ -1,5 +1,4 @@
 215; 1259 33277 TABLE public test postgres
-3348; 2613 33280 BLOB - 33280 postgres
-3349; 2613 33281 BLOB - 33281 postgres
+3348; 2613 33280 BLOB METADATA - 33280..33281 postgres
 3347; 0 33277 TABLE DATA public test postgres
-3350; 0 0 BLOBS - BLOBS
+3349; 0 0 BLOBS - 33280..33281 postgres

The dump with version 17 combines the large object metadata into BLOB METADATA, creating only one entry in the TOC for them.

Further, if we use the directory dump format, we see that pg_dump creates a file for each large object:

$ pg_dump -Fd -f lo_test.dir
$ ls lo_test.dir/
3347.dat.gz  blob_33280.dat.gz  blob_33281.dat.gz  blobs.toc  toc.dat

If there are only a few large objects, this is not a problem. But if the large object mechanism is used to create hundreds of thousands or millions of large objects, this will become a serious problem for pg_dump/pg_restore.

Finally, in order to fully remove the large objects, it does not suffice to drop the table, the large object needs to be unlinked as well:

postgres=# DROP TABLE test;
DROP TABLE
postgres=# SELECT COUNT(*) FROM pg_largeobject;
 count
-------
     2
(1 row)

postgres=# SELECT lo_unlink(loid) FROM pg_largeobject;
 lo_unlink
-----------
         1
         1
(2 rows)

postgres=# SELECT COUNT(*) FROM pg_largeobject;
 count
-------
     0
(1 row)

Benchmark

We generate one million large objects in a PostgreSQL 16 instance:

lotest=# SELECT lo_create(id) FROM generate_series(1,1000000) AS id;
 lo_create
-----------
         1
         2
[...]
    999999
   1000000
(1000000 rows)

lotest=# SELECT COUNT(*) FROM pg_largeobject_metadata;
  count
---------
 1000000
(1 row)
(1 row)

We now dump the database with pg_dump from both version 16 and 17, first as a custom and then as a directory dump, using the time utility to track runtime and memory usage:

$ for version in 16 17; do echo -n "$version: "; \
> /usr/bin/time -f '%E %Mk mem' /usr/lib/postgresql/$version/bin/pg_dump \
> -Fc -f lo_test_$version.dmp lotest; done
16: 0:36.73 755692k mem
17: 0:34.69 217776k mem
$ for version in 16 17; do echo -n "$version: "; \
> /usr/bin/time -f '%E %Mk mem' /usr/lib/postgresql/$version/bin/pg_dump \
> -Fd -f lo_test_$version.dir lotest; done
16: 8:23.48 755624k mem
17: 7:51.04 217980k mem

Dumping using the directory format takes much longer than with the custom format, while the amount of memory is very similar for both. The runtime is slightly lower for version 17 compared to version 16, but the big difference is in the used memory, which is 3,5x smaller.

Also, when looking at the file size for the custom dump or the file size of the table-of-contents (TOC) file, the difference becomes very clear:

$ ls -lh lo_test_1?.dmp | awk '{print $5 " " $9}'
211M lo_test_16.dmp
29M lo_test_17.dmp
$ ls -lh lo_test_1?.dir/toc.dat | awk '{print $5 " " $9}'
185M lo_test_16.dir/toc.dat
6,9M lo_test_17.dir/toc.dat

The custom dump is roughly 7x smaller while the TOC file of the directory dump is around 25x smaller. We also tested for different numbers of large objects (from 50k to 1.5 million) and found only a slight variance in those ratios: the used memory ratio increases from around 2x at 50k to 4x at 1.5 million while the TOC ratio goes down from around 30x at 50k to 25x at 1.5 million.

Conclusion

The changes regarding dumps of large objects in Postgres 17 are very welcome for users with a huge number of large objects. Memory requirements are much lower on PostgreSQL 17 compared to earlier versions, both for dumps in custom and directory format.

Unfortunately, neither the number of files in the directory nor the directory size changes much, each large object is still dumped as its own file, which can lead to problems if there are a lot files:

$ for version in 16 17; do echo -n "$version: "; find lo_test_$version.dir/ | wc -l; done
16: 1000003
17: 1001002
$ du -s -h lo_test_??.dir
4,1G    lo_test_16.dir
3,9G    lo_test_17.dir

This might be an area for future improvements in Postgres 18 and beyond.

Categories: PostgreSQL®
Tags: planetpostgresql postgresql17

About the author

Michael Banck

zur Person

Michael Banck ist seit 2009 Mitarbeiter der credativ GmbH, sowie seit 2001 Mitglied des Debian Projekts und auch in weiteren Open Source Projekten aktiv. Als Mitglied des Datenbank-Teams von credativ hat er in den letzten Jahren verschiedene Kunden bei der Lösung von Problemen mit und dem täglichen Betrieb von PostgreSQL®, sowie bei der Einführung von Hochverfügbarkeits-Lösungen im Bereich Datenbanken unterstützt und beraten.

View posts


Beitrag teilen: