Categories: | PostgreSQL® |
---|---|
Tags: | planetpostgresql postgresql17 |
Version 17 of PostgreSQL has been released for a while. One of the many features is a change by Tom Lane called “Rearrange pg_dump’s handling of large objects for better efficiency”. In the past, we have seen our customers have several problems with a large number of large objects being a performance issue for dump/restore. The main reason for this is that large objects are quite unlike to TOAST (The Oversized Attribute Storage Technique): while TOASTed data is completely transparent to the user, large objects are stored out-of-line in a pg_largeboject
table with a link to the particular row in that table being an OID in the table itself.
Here is an example on how large objects can be used:
postgres=# CREATE TABLE test(id BIGINT, blob OID);
CREATE TABLE
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
postgres=# SELECT * FROM test;
id | blob
----+-------
1 | 33280
(1 row)
postgres=# SELECT * FROM pg_largeobject;
loid | pageno | data
-------+--------+--------------------------------------------
33280 | 0 | \x44656269616e20474e552f4c696e75782031320a
(1 row)
postgres=# SELECT lo_export(test.blob, '/tmp/foo') FROM test;
lo_export
-----------
1
(1 row)
postgres=# SELECT pg_read_file('/tmp/foo');
pg_read_file
---------------------
Debian GNU/Linux 12+
(1 row)
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
Now if we dump the database in custom format with both version 16 and 17 of pg_dump
and then use pg_restore -l
to display the table of contents (TOC), we see a difference:
$ for version in 16 17; do /usr/lib/postgresql/$version/bin/pg_dump -Fc -f lo_test_$version.dmp; \
> pg_restore -l lo_test_$version.dmp | grep -v ^\; > lo_test_$version.toc; done
$ diff -u lo_test_{16,17}.toc
--- lo_test_16.toc 2024-12-11 09:05:46.550667808 +0100
+++ lo_test_17.toc 2024-12-11 09:05:46.594670235 +0100
@@ -1,5 +1,4 @@
215; 1259 33277 TABLE public test postgres
-3348; 2613 33280 BLOB - 33280 postgres
-3349; 2613 33281 BLOB - 33281 postgres
+3348; 2613 33280 BLOB METADATA - 33280..33281 postgres
3347; 0 33277 TABLE DATA public test postgres
-3350; 0 0 BLOBS - BLOBS
+3349; 0 0 BLOBS - 33280..33281 postgres
The dump with version 17 combines the large object metadata into BLOB METADATA
, creating only one entry in the TOC for them.
Further, if we use the directory dump format, we see that pg_dump
creates a file for each large object:
$ pg_dump -Fd -f lo_test.dir
$ ls lo_test.dir/
3347.dat.gz blob_33280.dat.gz blob_33281.dat.gz blobs.toc toc.dat
If there are only a few large objects, this is not a problem. But if the large object mechanism is used to create hundreds of thousands or millions of large objects, this will become a serious problem for pg_dump/pg_restore
.
Finally, in order to fully remove the large objects, it does not suffice to drop the table, the large object needs to be unlinked as well:
postgres=# DROP TABLE test;
DROP TABLE
postgres=# SELECT COUNT(*) FROM pg_largeobject;
count
-------
2
(1 row)
postgres=# SELECT lo_unlink(loid) FROM pg_largeobject;
lo_unlink
-----------
1
1
(2 rows)
postgres=# SELECT COUNT(*) FROM pg_largeobject;
count
-------
0
(1 row)
We generate one million large objects in a PostgreSQL 16 instance:
lotest=# SELECT lo_create(id) FROM generate_series(1,1000000) AS id;
lo_create
-----------
1
2
[...]
999999
1000000
(1000000 rows)
lotest=# SELECT COUNT(*) FROM pg_largeobject_metadata;
count
---------
1000000
(1 row)
(1 row)
We now dump the database with pg_dump
from both version 16 and 17, first as a custom and then as a directory dump, using the time
utility to track runtime and memory usage:
$ for version in 16 17; do echo -n "$version: "; \
> /usr/bin/time -f '%E %Mk mem' /usr/lib/postgresql/$version/bin/pg_dump \
> -Fc -f lo_test_$version.dmp lotest; done
16: 0:36.73 755692k mem
17: 0:34.69 217776k mem
$ for version in 16 17; do echo -n "$version: "; \
> /usr/bin/time -f '%E %Mk mem' /usr/lib/postgresql/$version/bin/pg_dump \
> -Fd -f lo_test_$version.dir lotest; done
16: 8:23.48 755624k mem
17: 7:51.04 217980k mem
Dumping using the directory format takes much longer than with the custom format, while the amount of memory is very similar for both. The runtime is slightly lower for version 17 compared to version 16, but the big difference is in the used memory, which is 3,5x smaller.
Also, when looking at the file size for the custom dump or the file size of the table-of-contents (TOC) file, the difference becomes very clear:
$ ls -lh lo_test_1?.dmp | awk '{print $5 " " $9}'
211M lo_test_16.dmp
29M lo_test_17.dmp
$ ls -lh lo_test_1?.dir/toc.dat | awk '{print $5 " " $9}'
185M lo_test_16.dir/toc.dat
6,9M lo_test_17.dir/toc.dat
The custom dump is roughly 7x smaller while the TOC file of the directory dump is around 25x smaller. We also tested for different numbers of large objects (from 50k to 1.5 million) and found only a slight variance in those ratios: the used memory ratio increases from around 2x at 50k to 4x at 1.5 million while the TOC ratio goes down from around 30x at 50k to 25x at 1.5 million.
The changes regarding dumps of large objects in Postgres 17 are very welcome for users with a huge number of large objects. Memory requirements are much lower on PostgreSQL 17 compared to earlier versions, both for dumps in custom and directory format.
Unfortunately, neither the number of files in the directory nor the directory size changes much, each large object is still dumped as its own file, which can lead to problems if there are a lot files:
$ for version in 16 17; do echo -n "$version: "; find lo_test_$version.dir/ | wc -l; done
16: 1000003
17: 1001002
$ du -s -h lo_test_??.dir
4,1G lo_test_16.dir
3,9G lo_test_17.dir
This might be an area for future improvements in Postgres 18 and beyond.
Categories: | PostgreSQL® |
---|---|
Tags: | planetpostgresql postgresql17 |
About the author
zur Person
Michael Banck ist seit 2009 Mitarbeiter der credativ GmbH, sowie seit 2001 Mitglied des Debian Projekts und auch in weiteren Open Source Projekten aktiv. Als Mitglied des Datenbank-Teams von credativ hat er in den letzten Jahren verschiedene Kunden bei der Lösung von Problemen mit und dem täglichen Betrieb von PostgreSQL®, sowie bei der Einführung von Hochverfügbarkeits-Lösungen im Bereich Datenbanken unterstützt und beraten.