config help - scaling problem
marccp at srttel.com
Tue Mar 2 15:40:17 UTC 2010
Thanks all for your responses! Where I was sitting with my back up to the wall, at least now I have a few more things to try.
We've had a lot of problems in years past with dhcp protocol failover, mostly communications interrupted and all leases owned by peer, so we're currently using a warm standby with frequent backups of the .conf. As noted, we really don't care about the actual state of the lease, so we don't even really need the .leases file at all except for the operation of the server.
Since all of our environment comes from the .conf and not the .leases, does anyone see any reason why we shouldn't have 1 or more other servers also performing active dhcp servicing (as was recommended)? If I were to point my relays to several DHCP servers, it seems I'd have more redundancy and failover without having to use any protocol, which is a benefit in my mind. Any flaws in that thinking?
For those of you using the failover protocol, have you had outage because of it in the past? Do you consider it reliable now? Does the benefit of failover outweigh the risk of outage for you? We've had so few hardware failures that protocol problems, albeit on very old versions compared to now, accounted for 5 times the downtime or more in comparison. Perhaps this is one of those YMMV and mine did?
We try to keep up to date on the software, but also don't upgrade from a stable release unless there's something in the patch notes of worth to us - as such we're currently on 3.1.1. It appeared to me that the major difference between selecting 3.1.0 and 4.x was whether or not we wanted IPv6 support. I'll definitely give 4.1.1 a try in our lab and see if I can't use David's suggestions to reach a performance baseline we like. I knew we were I/O bound, but I didn't know about these feature changes to alleviate this particular problem in environments like mine. It sounds like I would like to use both the delayed fsync() and host option matching - is there a release that supports both of these? From David's reply, I'm inclined to think that 4.1.1+ has delayed fsync(), and 4.2+ has host option matching, but 4.2+ has a bug that breaks the former. Is that right, or is it just broken with failover?
I'm not a Sun or disk guru, but the V240 has a ultra160 SCSI disk - since we're worried about I/O and not CPU, I assumed this was a pretty decent box for the task. If that isn't the case, it's definitely within scope to get more/different hardware. The V240 was a remnant of an overly expensive failed service deployment, so we just decided to depreciate it ;-) I suspect that between 4.x features and the ramdisk (which we do also backup), the V240 should have plenty of horsepower - yeah?
Thank you David for further explaining how sub-classes work. That would be even more disastrous than the linear search that it seems we're stuck with.
Thanks again for all the replies - I'll report back after testing out the 4.x features.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dhcp-users