[BALUG-Admin] Some web stats by domains, etc.

Michael Paoli Michael.Paoli@cal.berkeley.edu
Fri Mar 19 04:33:28 UTC 2021


Some web stats by domains, etc.
I was interested, thought others might be curious/interested too, so ...
I've got the log rotation set up so it retains bit over a year's worth of
web server logs
$ awk '{if($1=="rotate" || $1 !~ /^#/ && $1 ~ /ly$/)print}'  
/etc/logrotate.d/apache2
         weekly
         rotate 60
so, fair bit of data available.
This web server runs on the balug Virtual Machine (VM) host,
which covers for not only the BALUG [Linux] User Group ([L]UG),
but others too.  Anyway, grabbed the data from the logs, and did a wee
bit of analysis.  This is from the {ssl_,}access.log* files.

So, basically analyzed domain and port, here's the information/analysis,
and counts, from the highest levels on down to that:

13945791 Total
That's the total traffic seen, all sites/domains that got logged.
That's approximately:
   232430    per week (13945791/60)
    33204    per day (13945791/60/7)
     1384    per hour (13945791/60/7/24)
       23    per minute (13945791/60/7/24/60)
        0.38 per second (13945791/60/7/24/60/60)
So about one per 2.6 seconds (1/(13945791/60/7/24/60/60))

by [L]UG or project or whatever and the like.  Note also that test
traffic isn't excluded, so, e.g. some domains that aren't in DNS or
so delegated (or not yet or no longer in DNS) also show wee bit 'o
traffic.
  9504843 BALUG
  3463857 BerkeleyLUG
   968724 SF-LUG
     8283 digitalwitness.org
       51 BAD
       33 BUUG

In a bit more detail, by relevant TLD:
  9504843 balug.org
  3457611 berkeleylug.com
   896560 sf-lug.org
    40847 sf-lug.com
    13698 sflug.org
     8283 digitalwitness.org
     7458 sflug.net
     6246 berkeleylug.org
     6084 sflug.com
     4077 sf-lug.net
       51 bad.debian.net
       33 buug.org
Note that for all of SF-LUG's various domains,
traffic drops by >20x once we leave the canonical TLD ([www.]sf-lug.org)
and drop to 1st runner up in SF-LUG traffic, >65x to 2nd runner up, and
> 120x for 3rd runner up.
For BerkeleyLUG, the obscure and almost completely unknown (was never
canonical nor particularly promoted) BerkeleyLUG.org,
The canonical BerkeleyLUG.com has >550x the traffic ...
however BerkeleyLUG.org no longer exists, so it was something >1/550 of
the traffic when it still existed (but even back then I recall it being
relatively negligible portion of traffic).

So, by domain ... secure.balug.org is somewhat surprising.  It's probably
mostly from "bad" or not so well behaved bots, poking at the BALUG wiki,
trying to login there, and getting redirected ... or maybe such bots
thinking there's something more "interesting" to go after because it's
got "secure" in the name?  Maybe ought phase that one out, not nearly
so relevant anymore.  I think the original idea is that one would
redirect to force https, but I believe now all the domains offer https,
and as/where relevant (e.g. wiki login and after doing so with cookie
that has authenticated state) redirection can be done from http to
https with same domain - no need for a separate domain for that.
Also a bit surprising too, is BerkeleyLUG.com being as high as it is.
It's WordPress, and bots - legitimate, or not so, and search engines,
well, WordPress effectively "expands" to a quite huge set of unique
URLs, even if the content of each isn't all that incredibly unique.
E.g. I remember earlier trying to crawl the site when it was still
hosted by WordPress.com - it expanded into a huge amount of content
that wasn't particularly feasible to replicate in-place as-is without
WordPress - especially relatively to the actual full data in/behind
the site - a much more manageable and smaller set of data.  So that
may, at least partially, explain the somewhat surprising large number
there.
Likewise, BALUG's wiki - lots of content - especially if one crawls the
archive of all older versions of all pages.
And BALUG's list - every posting can be individually crawled, so, lots of
URLs, and probably keeps search engines / bots relatively busy.
And one other that's slightly surprising ... the mx entries.
Really noting promoting - or even configured for that as web,
other than it existing in DNS.  So that's probably mostly "bad" bots
and/or a bit of test traffic.
Also, pi.berkeleylug.com is slightly, but not entirely redundant with
https://berkeleylug.com/Pi.BerkeleyLUG/
Notably pi.berkeleylug.com 302 redirects to the above ...
however also, there's account and sudo and dynamic DNS highly available
to pi.berkelelylug - so if/whenever they / that SIG wants it to go to
somewhere else in DNS, it's readily available for that.  But if the
DNS & web traffic hits the BALUG host's web server, it 302 ("temporary")
redirects as noted above.
  4467098 secure.balug.org
  3408433 berkeleylug.com
  2083824 www.balug.org
  1692238 www.wiki.balug.org
   998636 lists.balug.org
   875806 www.sf-lug.org
   185888 balug.org
    47829 www.berkeleylug.com
    42223 www.archive.balug.org
    23561 www.sf-lug.com
    17286 sf-lug.com
    15402 sf-lug.org
    14875 old-debian.balug.org
     8993 sflug.org
     5372 www.digitalwitness.org
     5175 wiki.balug.org
     5047 berkeleylug.org
     4705 www.sflug.org
     4295 sflug.com
     4052 sflug.net
     3841 www.ipv4.sf-lug.org
     3406 www.sflug.net
     3291 www.new.balug.org
     3291 sf-lug.net
     2911 digitalwitness.org
     2819 www.test.balug.org
     2519 www.ipv4.balug.org
     2016 www.beta.balug.org
     1789 www.sflug.com
     1225 www.php.test.balug.org
     1199 www.berkeleylug.org
     1182 ipv4.sf-lug.org
     1096 mx.balug.org
      966 ipv4.balug.org
      786 www.sf-lug.net
      753 www.pi.berkeleylug.com
      611 mx.lists.balug.org
      596 pi.berkeleylug.com
      266 www.ipv6.balug.org
      262 www.ipv6.sf-lug.org
       77 ipv6.balug.org
       67 ipv6.sf-lug.org
       51 bad.debian.net
       25 www.buug.org
        8 buug.org

And, nothing horribly surprising here, given the above.  This is
essentially same again, but broken down by by port.
Also, if we total up all of http (:80) and https (:443) we get:
  7263155 :80  (http)
  6682636 :443 (https)
And by domain and port:
  3570578 secure.balug.org:80
  2942325 berkeleylug.com:443
  1762326 www.balug.org:443
  1355057 www.wiki.balug.org:80
   896520 secure.balug.org:443
   834027 www.sf-lug.org:80
   616148 lists.balug.org:443
   466108 berkeleylug.com:80
   382488 lists.balug.org:80
   337181 www.wiki.balug.org:443
   321498 www.balug.org:80
   140573 balug.org:80
    45315 balug.org:443
    43816 www.berkeleylug.com:80
    41779 www.sf-lug.org:443
    29510 www.archive.balug.org:80
    22723 www.sf-lug.com:80
    15730 sf-lug.com:80
    14827 old-debian.balug.org:80
    12713 www.archive.balug.org:443
    11427 sf-lug.org:80
     6618 sflug.org:80
     4604 berkeleylug.org:80
     4420 www.digitalwitness.org:80
     4134 wiki.balug.org:80
     4013 www.berkeleylug.com:443
     3975 sf-lug.org:443
     3548 sflug.com:80
     3263 sflug.net:80
     3251 www.sflug.org:80
     2738 sf-lug.net:80
     2485 digitalwitness.org:80
     2462 www.test.balug.org:80
     2394 www.ipv4.sf-lug.org:80
     2375 sflug.org:443
     2244 www.new.balug.org:80
     1964 www.sflug.net:80
     1767 www.beta.balug.org:80
     1556 sf-lug.com:443
     1454 www.sflug.org:443
     1447 www.ipv4.sf-lug.org:443
     1442 www.sflug.net:443
     1409 www.sflug.com:80
     1392 www.ipv4.balug.org:443
     1127 www.ipv4.balug.org:80
     1047 www.new.balug.org:443
     1041 wiki.balug.org:443
      975 www.php.test.balug.org:80
      952 www.digitalwitness.org:443
      879 ipv4.sf-lug.org:80
      838 www.sf-lug.com:443
      789 sflug.net:443
      787 www.berkeleylug.org:80
      747 sflug.com:443
      736 ipv4.balug.org:80
      718 www.pi.berkeleylug.com:80
      599 mx.balug.org:80
      553 sf-lug.net:443
      497 mx.balug.org:443
      463 www.sf-lug.net:80
      443 berkeleylug.org:443
      426 digitalwitness.org:443
      412 www.berkeleylug.org:443
      381 mx.lists.balug.org:80
      380 www.sflug.com:443
      376 pi.berkeleylug.com:80
      357 www.test.balug.org:443
      323 www.sf-lug.net:443
      303 ipv4.sf-lug.org:443
      250 www.php.test.balug.org:443
      249 www.beta.balug.org:443
      230 mx.lists.balug.org:443
      230 ipv4.balug.org:443
      220 pi.berkeleylug.com:443
      194 www.ipv6.sf-lug.org:80
      173 www.ipv6.balug.org:80
       93 www.ipv6.balug.org:443
       68 www.ipv6.sf-lug.org:443
       67 ipv6.balug.org:443
       60 ipv6.sf-lug.org:443
       48 old-debian.balug.org:443
       45 bad.debian.net:80
       35 www.pi.berkeleylug.com:443
       18 www.buug.org:80
       10 ipv6.balug.org:80
        7 www.buug.org:443
        7 ipv6.sf-lug.org:80
        6 bad.debian.net:443
        4 buug.org:80
        4 buug.org:443

Anyway, maybe some day I'll get some "real" web reporting in place.
It is on the todo list, but ...
$ wc todo
   6156  22307 188878 todo
$
It's also not a FIFO list, nor FILO.
It's mostly a priority interrupt driven stack that gets a lot
of reordering applied to it, and tends to grow much more than it
shrinks.  Well, at least I don't have to worry about running
out of stuff to do - have well over lifetime's worth 'o stuff on the
list.
More BALUG and [L]UG specific lists may be found on BALUG's wiki,
though they're not necessarily complete nor current.




More information about the BALUG-Admin mailing list