dont-use-fsync real world impact

Sat Sep 7 22:19:02 UTC 2019

Jure Sah <e at juresah.si> wrote:

> The documentation clearly states that using the dont-use-fsync option is
> not recommended.
> 
> I am wondering what is the realistic impact of this? As I understand the
> kernel commits dirty pages to disk every 30 seconds by default, and this
> is configurable. Wouldn't this mean that at worst 30 seconds worth of
> leases are lost?

Yes, but that could be a rather serious loss of data for some operators. As always, there's no "one size fits all" answer, different operators will have different ideas on this.
Indeed, AIUI (from several years ago at least) the DHCP service in Windows Server massively outperformed the ISC DHCp server in benchmarks using out of the box settings. The reason for this was that the MS server did NOT fsync it's leases database and thus is vulnerable to exactly the issue you mention - also making non-compliant with the relevant RFC.
However, in their defence, they have "sort of" moved that security aspect to clients by making the clients very sticky about their leases - more so than other clients in my observations. That doesn't fully prevent the problem of the server missing knowledge of leases it's granted.

> The leases file is in most cases relatively tiny (under 1 MB)

That's probably a generalisation too far. Mine (at home) is only 20k, but as Andrew Bell has already pointed out, some people do have large lease files.

> From the past correspondence from the mailing list archive I surmise
> that people usually work around this by using hardware cache that does
> not obey fsync, which simply offloads the problem from the kernel to the
> cache controller and only superficially solves the problem.

Yes, but no.
Yes it offloads the problem, no it's not just a superficial fix. A "proper" hardware cache will be battery backed and can survive a crash or power failure of the host. So if we assume we're talking about the hardware cache in a disk controller (eg a RAID controller) then if the power goes off without the chance of an orderly shutdown, then the battery backed cache will hold the updates until the power comes back on again - at which point it will push the updates out to the disk(s).
There are other sorts of cache hardware. In the distant past I recall seeing (and drooling over !) a "magic box" that comprised a stack of RAM, some disks, a battery, and a controller. To the host it presented as a standard wide SCSI device (that dates it), while internally it was a big RAM disk. In the event of power failure, the battery would run the system long enough to write everything to disk.
In both cases (and others), under normal conditions it's safe to assume that if the "disk" comes back and says "yes that's written", then it's either been written or has been saved into battery backed cache that will survive problems such as host crashes or power failures. If the cache/disk subsystem fails in that promise, then that's really little different to having a normal disk fail and lose all your data.