Identifying subdomains and top-level domains in a URI
bortzmeyer at nic.fr
Mon Jan 31 21:27:24 UTC 2005
On Sat, Jan 29, 2005 at 01:57:23PM -0800,
stan <wanderingstan at gmail.com> wrote
a message of 17 lines which said:
> My challenge is to determine the base portion of the URI--stripped
> of subdomains but including top-level domains. E.g., for
> "http://www.google.com" I need to get "google.com", and for
> "subdomain.domain.com.au", I need to get "domain.com.au".
Your examples do not match your requirment. The top-level domain for
www.google.com is "com" and for subdomain.domain.com.au, it is "au".
> My current naive system just takes the last two chunks, which means
> it thinks all web pages from austrailia are the same site. (They'll
> all from "com.au"!)
There is no better algorithm, not even hardwiring the number of labels
in a registry-indexed table (because some registries like "fr", "dz"
or "af" delegate both second-level and third-level domains).
> What's the intelligent way to do this?
None. Funny question because there have been a thread on namedroppers
(the IETF Working group on DNS extensions) recently about this very
subject (in the context of the SPF protocol):
More information about the bind-users