[BALUG-Admin] Some web stats by domains, etc.
Michael Paoli
Michael.Paoli@cal.berkeley.edu
Fri Mar 19 04:33:28 UTC 2021
Some web stats by domains, etc.
I was interested, thought others might be curious/interested too, so ...
I've got the log rotation set up so it retains bit over a year's worth of
web server logs
$ awk '{if($1=="rotate" || $1 !~ /^#/ && $1 ~ /ly$/)print}'
/etc/logrotate.d/apache2
weekly
rotate 60
so, fair bit of data available.
This web server runs on the balug Virtual Machine (VM) host,
which covers for not only the BALUG [Linux] User Group ([L]UG),
but others too. Anyway, grabbed the data from the logs, and did a wee
bit of analysis. This is from the {ssl_,}access.log* files.
So, basically analyzed domain and port, here's the information/analysis,
and counts, from the highest levels on down to that:
13945791 Total
That's the total traffic seen, all sites/domains that got logged.
That's approximately:
232430 per week (13945791/60)
33204 per day (13945791/60/7)
1384 per hour (13945791/60/7/24)
23 per minute (13945791/60/7/24/60)
0.38 per second (13945791/60/7/24/60/60)
So about one per 2.6 seconds (1/(13945791/60/7/24/60/60))
by [L]UG or project or whatever and the like. Note also that test
traffic isn't excluded, so, e.g. some domains that aren't in DNS or
so delegated (or not yet or no longer in DNS) also show wee bit 'o
traffic.
9504843 BALUG
3463857 BerkeleyLUG
968724 SF-LUG
8283 digitalwitness.org
51 BAD
33 BUUG
In a bit more detail, by relevant TLD:
9504843 balug.org
3457611 berkeleylug.com
896560 sf-lug.org
40847 sf-lug.com
13698 sflug.org
8283 digitalwitness.org
7458 sflug.net
6246 berkeleylug.org
6084 sflug.com
4077 sf-lug.net
51 bad.debian.net
33 buug.org
Note that for all of SF-LUG's various domains,
traffic drops by >20x once we leave the canonical TLD ([www.]sf-lug.org)
and drop to 1st runner up in SF-LUG traffic, >65x to 2nd runner up, and
> 120x for 3rd runner up.
For BerkeleyLUG, the obscure and almost completely unknown (was never
canonical nor particularly promoted) BerkeleyLUG.org,
The canonical BerkeleyLUG.com has >550x the traffic ...
however BerkeleyLUG.org no longer exists, so it was something >1/550 of
the traffic when it still existed (but even back then I recall it being
relatively negligible portion of traffic).
So, by domain ... secure.balug.org is somewhat surprising. It's probably
mostly from "bad" or not so well behaved bots, poking at the BALUG wiki,
trying to login there, and getting redirected ... or maybe such bots
thinking there's something more "interesting" to go after because it's
got "secure" in the name? Maybe ought phase that one out, not nearly
so relevant anymore. I think the original idea is that one would
redirect to force https, but I believe now all the domains offer https,
and as/where relevant (e.g. wiki login and after doing so with cookie
that has authenticated state) redirection can be done from http to
https with same domain - no need for a separate domain for that.
Also a bit surprising too, is BerkeleyLUG.com being as high as it is.
It's WordPress, and bots - legitimate, or not so, and search engines,
well, WordPress effectively "expands" to a quite huge set of unique
URLs, even if the content of each isn't all that incredibly unique.
E.g. I remember earlier trying to crawl the site when it was still
hosted by WordPress.com - it expanded into a huge amount of content
that wasn't particularly feasible to replicate in-place as-is without
WordPress - especially relatively to the actual full data in/behind
the site - a much more manageable and smaller set of data. So that
may, at least partially, explain the somewhat surprising large number
there.
Likewise, BALUG's wiki - lots of content - especially if one crawls the
archive of all older versions of all pages.
And BALUG's list - every posting can be individually crawled, so, lots of
URLs, and probably keeps search engines / bots relatively busy.
And one other that's slightly surprising ... the mx entries.
Really noting promoting - or even configured for that as web,
other than it existing in DNS. So that's probably mostly "bad" bots
and/or a bit of test traffic.
Also, pi.berkeleylug.com is slightly, but not entirely redundant with
https://berkeleylug.com/Pi.BerkeleyLUG/
Notably pi.berkeleylug.com 302 redirects to the above ...
however also, there's account and sudo and dynamic DNS highly available
to pi.berkelelylug - so if/whenever they / that SIG wants it to go to
somewhere else in DNS, it's readily available for that. But if the
DNS & web traffic hits the BALUG host's web server, it 302 ("temporary")
redirects as noted above.
4467098 secure.balug.org
3408433 berkeleylug.com
2083824 www.balug.org
1692238 www.wiki.balug.org
998636 lists.balug.org
875806 www.sf-lug.org
185888 balug.org
47829 www.berkeleylug.com
42223 www.archive.balug.org
23561 www.sf-lug.com
17286 sf-lug.com
15402 sf-lug.org
14875 old-debian.balug.org
8993 sflug.org
5372 www.digitalwitness.org
5175 wiki.balug.org
5047 berkeleylug.org
4705 www.sflug.org
4295 sflug.com
4052 sflug.net
3841 www.ipv4.sf-lug.org
3406 www.sflug.net
3291 www.new.balug.org
3291 sf-lug.net
2911 digitalwitness.org
2819 www.test.balug.org
2519 www.ipv4.balug.org
2016 www.beta.balug.org
1789 www.sflug.com
1225 www.php.test.balug.org
1199 www.berkeleylug.org
1182 ipv4.sf-lug.org
1096 mx.balug.org
966 ipv4.balug.org
786 www.sf-lug.net
753 www.pi.berkeleylug.com
611 mx.lists.balug.org
596 pi.berkeleylug.com
266 www.ipv6.balug.org
262 www.ipv6.sf-lug.org
77 ipv6.balug.org
67 ipv6.sf-lug.org
51 bad.debian.net
25 www.buug.org
8 buug.org
And, nothing horribly surprising here, given the above. This is
essentially same again, but broken down by by port.
Also, if we total up all of http (:80) and https (:443) we get:
7263155 :80 (http)
6682636 :443 (https)
And by domain and port:
3570578 secure.balug.org:80
2942325 berkeleylug.com:443
1762326 www.balug.org:443
1355057 www.wiki.balug.org:80
896520 secure.balug.org:443
834027 www.sf-lug.org:80
616148 lists.balug.org:443
466108 berkeleylug.com:80
382488 lists.balug.org:80
337181 www.wiki.balug.org:443
321498 www.balug.org:80
140573 balug.org:80
45315 balug.org:443
43816 www.berkeleylug.com:80
41779 www.sf-lug.org:443
29510 www.archive.balug.org:80
22723 www.sf-lug.com:80
15730 sf-lug.com:80
14827 old-debian.balug.org:80
12713 www.archive.balug.org:443
11427 sf-lug.org:80
6618 sflug.org:80
4604 berkeleylug.org:80
4420 www.digitalwitness.org:80
4134 wiki.balug.org:80
4013 www.berkeleylug.com:443
3975 sf-lug.org:443
3548 sflug.com:80
3263 sflug.net:80
3251 www.sflug.org:80
2738 sf-lug.net:80
2485 digitalwitness.org:80
2462 www.test.balug.org:80
2394 www.ipv4.sf-lug.org:80
2375 sflug.org:443
2244 www.new.balug.org:80
1964 www.sflug.net:80
1767 www.beta.balug.org:80
1556 sf-lug.com:443
1454 www.sflug.org:443
1447 www.ipv4.sf-lug.org:443
1442 www.sflug.net:443
1409 www.sflug.com:80
1392 www.ipv4.balug.org:443
1127 www.ipv4.balug.org:80
1047 www.new.balug.org:443
1041 wiki.balug.org:443
975 www.php.test.balug.org:80
952 www.digitalwitness.org:443
879 ipv4.sf-lug.org:80
838 www.sf-lug.com:443
789 sflug.net:443
787 www.berkeleylug.org:80
747 sflug.com:443
736 ipv4.balug.org:80
718 www.pi.berkeleylug.com:80
599 mx.balug.org:80
553 sf-lug.net:443
497 mx.balug.org:443
463 www.sf-lug.net:80
443 berkeleylug.org:443
426 digitalwitness.org:443
412 www.berkeleylug.org:443
381 mx.lists.balug.org:80
380 www.sflug.com:443
376 pi.berkeleylug.com:80
357 www.test.balug.org:443
323 www.sf-lug.net:443
303 ipv4.sf-lug.org:443
250 www.php.test.balug.org:443
249 www.beta.balug.org:443
230 mx.lists.balug.org:443
230 ipv4.balug.org:443
220 pi.berkeleylug.com:443
194 www.ipv6.sf-lug.org:80
173 www.ipv6.balug.org:80
93 www.ipv6.balug.org:443
68 www.ipv6.sf-lug.org:443
67 ipv6.balug.org:443
60 ipv6.sf-lug.org:443
48 old-debian.balug.org:443
45 bad.debian.net:80
35 www.pi.berkeleylug.com:443
18 www.buug.org:80
10 ipv6.balug.org:80
7 www.buug.org:443
7 ipv6.sf-lug.org:80
6 bad.debian.net:443
4 buug.org:80
4 buug.org:443
Anyway, maybe some day I'll get some "real" web reporting in place.
It is on the todo list, but ...
$ wc todo
6156 22307 188878 todo
$
It's also not a FIFO list, nor FILO.
It's mostly a priority interrupt driven stack that gets a lot
of reordering applied to it, and tends to grow much more than it
shrinks. Well, at least I don't have to worry about running
out of stuff to do - have well over lifetime's worth 'o stuff on the
list.
More BALUG and [L]UG specific lists may be found on BALUG's wiki,
though they're not necessarily complete nor current.
More information about the BALUG-Admin
mailing list