Identifying subdomains and top-level domains in a URI

Stephane Bortzmeyer bortzmeyer at nic.fr
Mon Jan 31 21:27:24 UTC 2005


On Sat, Jan 29, 2005 at 01:57:23PM -0800,
 stan <wanderingstan at gmail.com> wrote 
 a message of 17 lines which said:

> My challenge is to determine the base portion of the URI--stripped
> of subdomains but including top-level domains.  E.g., for
> "http://www.google.com" I need to get "google.com", and for
> "subdomain.domain.com.au", I need to get "domain.com.au".

Your examples do not match your requirment. The top-level domain for
www.google.com is "com" and for subdomain.domain.com.au, it is "au".

> My current naive system just takes the last two chunks, which means
> it thinks all web pages from austrailia are the same site.  (They'll
> all from "com.au"!)

There is no better algorithm, not even hardwiring the number of labels
in a registry-indexed table (because some registries like "fr", "dz"
or "af" delegate both second-level and third-level domains).

> What's the intelligent way to do this?  

None. Funny question because there have been a thread on namedroppers
(the IETF Working group on DNS extensions) recently about this very
subject (in the context of the SPF protocol):

http://ops.ietf.org/lists/namedroppers/namedroppers.2005/msg00039.html



More information about the bind-users mailing list