Openzfs postgres tuning recommendations

1/30/2024

So, as Marsell notes, block size is variable. # rm /testpool/foo# dd if=/dev/zero of=/testpool/foo bs=2k count=1# zdb -dddddddd testpool. Note that if you're following along, and don'tseethe "/foo" file in your output, run sync, or wait a few seconds.Generally, it can take up to 5 seconds before the data is on disk.This implies that zdb reads from disk, bypassing ARC (which is whatyou want for a file system debugger). We'll start with the simplest case: # dd if=/dev/zero of=/testpool/foo bs=128k count=11+0 records in1+0 records out# zdb -dddddddd testpool. This also cuts down on the amount of data displayed by zdb. # mkfile 100m /var/tmp/poolfile# zpool create testpool /var/tmp/poolfile# zfs get recordsize,compression testpoolNAME PROPERTY VALUE SOURCEtestpool recordsize 128K defaulttestpool compression off default#Īn alternative to using files (/var/tmp/poolfile), is to create achild dataset using the zfs command, and run zdb on the childdataset. I'm assuming you are on a systemthat supports ZFS and has zdb.

To make things easy (i.e., we don't want to sift through tens ofthousands of lines of zdb(1M) output), we'll createa small pool and work with that. Instead of repeating the blog post, let's do someexperimenting. For instance, a write of 2K to a file will typically result inat least one 2KB write (and maybe more than one for metadata).The recordsize is the largest block that ZFS will read/write.The interested reader can verify this by using DTrace on bdev_strategy(), left as an exercise.Also note that because of the way ZFS maintains information aboutallocated/free space on disk (i.e., spacemaps), smaller recordsizeshould not result in more space or time being used to maintain thatinformation. Note that ZFS does not always read/write recordsizebytes. The recordsize parameter enforces the size of the largest blockwritten to a ZFS file system or volume.There is an excellent blog about the ZFS recordsize here. Assuming a recordsize of 8K, then it'd be about 3KB written to disk for that record (again, excluding all the metadata), right?" I've noticed that Postgres compresses quite well. Fair enough, I presume this means that records written to disk are then always at most 8KB (ignoring any headers and footers), but how does compression factor into this? I'm asking this in context of one of the recommendations in the evil tuning guide, to use a recordsize of 8K to match with Postgres' buffer size.

Maybe recordsize only affects writes within a file?

That causes a problem with read and write magnification in future writes though, so I'm not sure if such behaviour makes sense. I assume this is affected by transaction groups, so if I write 6 2K files, it'll write a 12K record, but if I write 6 32K files, it'll write two records: 128K and 64K. "I thought ZFS record size is variable: by default it's 128K, but write 2KB of data (assuming nothing else writes), then only 2KB writes to disk (excluding metadata). Marsell Kukuljevic of Joyent wrote me to say (paraphrasing):

0 Comments

Openzfs postgres tuning recommendations

Leave a Reply.

Author

Archives

Categories