[TUHS] TUHS: Maintenance, Succession and Funding

Sun Apr 19 01:20:04 AEST 2026

Alright, on your IP SCRAPERS problem

I would do the following -

On your physical firewall,  block the entire subnet range that they were
assigned  by their ISP using a single access control list statement with ip
address and appropriate subnet mask. Drop all packets from this range. Its
been awhile, but I believe IANA maintains a list of ip address ranges per
internet client. Other organizations might as well. It makes your site
disappear from their view. They may automatically stop connecting once
enough failed attempts are registered at their end.

Those IANA  et. al. Databases should also contain company names and
addresses for the actual sources. Worst case scenario, just their ISPs to
whom you send a different letter of complaint.

If you are using a server based firewall such as iptables or a successor,
do the above ACL there. Instead of one ACL per ip address. It's one ACL per
offender blocking everything.

 Failing that.  you could add-in subroutines to your web server code that
implement a blacklist at connection time. Works If you are not NATing their
true source ip to an internal firewall address at the web server level.

You can drop the connection or send them back a small static packet of well
chosen, But polite words to cease and desist and then close the connection.

If you have a bank of webservers with a load balancer in front of them, you
might be able to use that to your advantage.  You have to become
uncivilized for this approach.
Basically the if all else fails solution.
Costs may prohibit this type of response.

You might be able to have the load balancer or firewall redirect all
traffic from these source ips to a separate server which is setup as a
counterattack server. This server accepts the connection,  verifies the
source ip as blacklisted and holds that connection open for a very long
time. Essentially a form.of reverse denial of service counterattack. Every
HTML et.al link they try to connect to goes to the same underlying text
file. This text file, aka THE PACKAGE. Is a enormous text file of absolute
rubbish created on a daily basis. You want it to look.like new
information.Terabytes or Petabytes in overall size. Perhaps even a
different one created for every possible link pathway into your system. The
file should have a paragraph at the top that mimics your cease and desist
request letter. Basically you are tying up their network resources for as
long as possible. Then sending them massive amounts of useless garbage
which may tie up their disk and cpu resources. "Do unto others as they do
unto you".  In order to get noticed and cause the desired change in
behavior.

Regarding the flat file of blacklisted source ip addresses. Read it into
memory at program start or when you send a signal to the program. Caching
the list instead of reading the drive contents for performance.

You can build this blacklist automatically by extracting ip addresses,
Timestamps, etc from the web server logs. Use a program to Process those
entries to find the top number of connections from ip addresses. The
absurdly high count that points out who is doing this.

You also also use this raw data connection list in another way.

Use the IANA demographic information to contact the CEO, CFO, and CTO of
the offending companies. Email and certified postal mail. Inform them in
detail of the situation and it's impact on your costs and general internet
population.  Use a Cease and Desist legal format. Imply legal action will
be necessary if this continues. Make sure they understand that it is the
absurd high volume of nonsense connections and not their occasional
connection for information gathering that is the issue. Encourage them to
reduce connection frequency to something far more reasonable for what is
essentially a static site. Include the connection data that you extracted
from the log files proving their abuse of your systems. If you can quantify
the out of pocket costs to you. Include an invoice for said same in your
cease and desist response or at least a mention of actual costs in the
letter.

So basically-

 Phase one - TO ease your current pain. Block their entire assigned IP
ADDRESS RANGE with a single ACL.

Phase two - Be civilized and assume that they don't know what harm they are
causing. Inform them with evidence of their impact and your associated
Costs. Ask them politely to cease and desist.  Tell them you have blocked
their access, but will gladly lift it once reasonable changes have been
made.
Suggest improvements such as once a week scanning and/or breaking it into a
single scan spread out over 7 days by only looking at 1/7 of your site each
day.  Perhaps even a restriction on time of day for their scan. Whatever
compromise solutions appeal to you.

Make sure that you communicate only with executive level  (c suite) and
founder people who have the most to lose. Polite, reasonable. Extend the
olive branch of peace and friendship . Reasonable people will act in mutual
self interest to resolve the situation.

Phase three - the Counter offensive. If all else fails....  watch out for
the applicable laws in your location. Send the cease and desist message in
another way. Give them more data than they expect, tie up their resources
in every way possible. A non lethal counter offensive.

Basically a honeypot server setup to deal with these extreme situations if
no reasonable compromise can be had.

Just suggestions, you don't have to be a victim and tolerate this impact.

On Sat, Apr 18, 2026, 5:58 AM Arrigo Triulzi via TUHS <tuhs at tuhs.org> wrote:

> On 18 Apr 2026, at 11:47, Warren Toomey via TUHS <tuhs at tuhs.org> wrote:
> >
> > On Fri, Apr 17, 2026 at 09:34:02PM -0700, Al Kossow via TUHS wrote:
> >> There should be international mirrors of the content.
> >
> > Absolutely! Here's the list of them:
> >
> > https://wiki.tuhs.org/doku.php?id=source:unix_archive
>
> Hello from unix-archive.eu!
>
> Please note that I only mirror weekly and, right now, my main problem are
> the AI scrapers which take my bandwidth usage and CPU load through the roof…
>
> I wish I had a good solution to that: they continue downloading the same
> file over and over again, at 1s intervals from multiple IPs, no concept of
> “has this file changed?” at all, just download repeatedly.
>
> Have started blocking IPs at the firewall but, as you all know, that is a
> losing battle…
>
> Before someone suggests it: no, Cloudflare is not a solution, the
> centralisation of the Internet is a huge problem, not a solution.
>
> Cheers,
>
> Arrigo
>
>