barriers do not explain the 50% drop in 2GB IOZone read performance.
Also, the article tested with nobarriers in 2.6.33-rc4, and although it helped the TPS for the PostgreSQL test, it still had a significant performance drop.
I think there must be other important issues than just barriers.
Sure it will hurt read performance, it's a forced sync no matter what the current operation is. The default commit on EXT 4 is 5 secs btw.
http://www.mjmwired.net/kernel/Docum...stems/ext4.txt
The big point about comparing EXT3 to EXT4 is that by default EXT4 with it's default mount parameters your data at the cost of performance. That security doesn't come free.
AFAIK, that happens because the rename (metadata) can be committed before the write (data), and if you really need the write to be committed first, you're supposed to call fsync() between the two. And, unless I'm completely misunderstanding the scenario you're describing, it's not just "after a reboot", but "after a crash/power loss/other abnormal shutdown that occurs between the rename commit and the data commit".
Last edited by Ex-Cyber; 01-19-2010 at 11:42 AM.
Except no other current file system requires that, and 99.999% of all existing software doesn't do it. And even if much of that software is 'fixed', probably 90% of the people 'fixing' it won't realise that they also need to sync the directory to ensure that it works.
And one of the common uses is in shell scripts, where you'll have to sync the entire disk. Just to safely update a two-line file.
True, but 99% of Linux systems crash at some point, even if only because of a power failure; and I believe that ext4 as originally implemented could delay the data write up to a couple of minutes after the metadata, so the odds of this happening on a crash were high.And, unless I'm completely misunderstanding the scenario you're describing, it's not just "after a reboot", but "after a crash/power loss/other abnormal shutdown that occurs between the rename commit and the data commit".
Applications should be able to rely on some basic, sane behaviour from a file system (such as a 'rename a b' leaving them with either file a or file b on the disk and not an empty file which never existed in the logical filesystem), with a few exceptions like databases which provide explicit guarantees to their users. File systems which don't behave in such a manner simply won't get used for anything which requires reliable storage, because no matter how fast they are they're not performing their most basic function of storing your data.
In addition, different users and different uses have different thresholds for data reliability: for example, I might not care if I lose a data file that I saved two minutes ago so long as I still have the data file which I wrote out five minutes ago... someone else might be incensed if they lose data that they wrote out two seconds ago. That kind of decision should not have to be made on a per-application basis ('Edit/Preferences/Do you care about your data?'), it should be part of the filesystem configuration.
The only argument I've seen for this behaviour is that 'Posix doesn't require us to do anything else'. But Posix doesn't require much of anything and I suspect that at least 90% of current software would fail on a system which only implements the absolute minimum Posix requirements.
There is some interesting discussion of some of these issues in the comments to Ubuntu bug #317781. Particularly interesting are Theodore Ts'o comments #45, #54, #56:
https://bugs.edge.launchpad.net/ubun...ux/+bug/317781
Also, Ted's "Don't fear the fsync" blog entry is worthwhile:
http://thunk.org/tytso/blog/2009/03/...ear-the-fsync/