Disclaimer. I am not affiliated with the Internet Archive, Brave and this is my personal experience. I do not provide any support or assistant for removing your site or page. This page does not have any affiliate links or ads, so I’m not making money by sharing this with you. If you found this helpful and want to say thanks why not send some money to buy me a coffee!
If you’re in a hurry, scroll down to the 5 Easy Steps to Remove your Website from Archive.org
I was surprised when the Brave browser announced that they were partnering with the Wayback Machine / Archive.org on 404 errors. It seems at odds with Brave’s privacy-by-design principles and when some countries don’t have right to be forgotten laws.
I get the 404 pages can be frustrating for users, but as a former employee of a major publisher and a current website developer, 404 pages are important from removing out of date or erroneous content, making sure people (and crawlers) following old back-links deprecate links, etc. Plus maintaining countless redirects on a huge site can get complex, better instead to have friendly 404 pages.
The onus on how a 404 should be handled should really on the publisher/developer. It’s my decision either redirect to the right place in a site (or not) and I disagree with Brave surfacing content that the publisher has intentionally deleted or removed content. Yes, Brave isn’t doing anything people cannot do themselves manually or via plugins. It’s the automation that is of concern.
There are also safety concerns – old domains and expired pages are the target of SEO abuse, domaineering and 404 adfraud. There’s the danger that less technically astute people could find themselves redirected by Brave to out of date content and make a bad decision as a result. As a copyright holder, I have also had my needs change over time, so what was once OK to be archived is sometimes not currently the case.
To be clear, I am not anti Archive.org. I do family history research and do find the site valuable. Just content owners should be given some choice and control in how they participate. So here’s how to delete your site from the Internet Archive / Wayback Machine / Archive.org
Steps to Delete your Site from the Internet Archive / Wayback Machine / Archive.org
Please read these 5 easy and proven steps to remove your site from the Internet Archive / Wayback Machine / Archive.org:
- Updated your website robots.txt file to block the Internet Archive / Wayback Machine / Archive.org Crawler / Check your Copyright Notice
- Draft a DMCA Takedown Notice with specific links to sites / pages you want removed from the Internet Archive / Wayback Machine / Archive.org
- Find an old invoice demonstrating the oldest date of ownership you have for the domain.
- Draft and send a polite email with 2. and 3. attached to the Internet Archive / Wayback Machine / Archive.org Crawler
- Wait 3-5 days
I have provided details below with more information to complete each easy step to remove your website from Archive.org. Honestly, my results have always been mixed and it’s one of my frustrations with the Internet Archive. A site update has sometimes resulted in my robots.txt file getting nuked and I find out I am in Archive.org again. I wish Archive.org would give publishers a way of verifying your domain to do a takedown or a webmaster tool like that found on Google/Bing.
Step 1: Robots.txt to Block a site from the Internet Archive / Wayback Machine / Archive.org / Check Copyright Notice
If you’re super interested, you can learn more about robots.txt here.
Archive.org has a mixed attitude to robots.txt but they do honor them.
Make sure you add this to the end of your existing robots.txt file, don’t delete any existing entries.
User-agent: ia_archiver Disallow: /
If you’re not sure how to edit your robots.txt then talk to your hosting provider or website developer.
If you use WordPress, this free Archive.org Blocker WordPress plugin does everything you need to block Archive.org from wordpress. Install, activate, and you’re done. If you already use a robots.txt plugin, you can add the above code to the end of your existing robots.txt file.
While you’re making these changes, it’s also a good time to check that there is a current Copyright Notice on your website. Most content management systems put this on your site automatically.
Step 2: DMCA Takedown Notice for the Internet Archive / Wayback Machine / Archive.org
The DMCA is short for the Digital Millennium Copyright Act. It’s a piece of US legislation to help copyright holders protect their intellectual property. Even if you don’t live in the US you can use a DMCA notice to have content remove from the Internet Archive / Wayback Machine / Archive.org.
I am #NotALawyer so if you’re dealing with a serious issue with archived content, get your own legal counsel. This is also not legal advice, so if this step makes you nervous then get your own legal counsel.
To generate a DMCA takedown notice, I used this free DMCA Generator tool from Who Is Hosting This. Otherwise, use this DMCA Takedown Notice generator from the Intellectual Property HQ.
I am going to stress again, DMCA notices are legal documents so make sure you’re fully aware of what you’re doing.
The DMCA form is straight forward, but make sure you paste in as many website addresses from Archive.org that match the dates you owned the domain and the content you want to be removed.
Step 3: Demonstrating a History of Domain Ownership to the Internet Archive / Wayback Machine / Archive.org
If you’re requesting the removal of a whole domain or website from Archive.org may ask for proof of domain ownership. Archive.org provide no automated verification of ownership such as a DNS record change, website code, or uploading of a file. You will need to find an old invoice / receipt from your domain host proving ownership.
Most hosting providers do provide access to a history of invoices, so you will need to login to your account to get these. Worst case, it might require an email to the accounts department of your hosting company.
If you’re in a hurry, you can try and skip this step and see how Archive.org responds but be prepared for them to ask for this information. One way to try and avoid the issue is to send the request from an email address associated with the domain.
That said, I strongly encourage you to send proof of ownership as part of the request. Archive.org can be frustrating if your domain has switched hosts, registrars, etc. during the request period which they verify against public domain records. If you forget your original register or host, I’d do a free domain history check to jog your memory.
Step 4: Email Requesting the Internet Archive / Wayback Machine / Archive.org remove your website
The email address for Archive.org takedown requests is [email protected] but do not email them unless you have done Steps 1-3.
It is better if your email comes from the domain you’re emailing about. For example, if you want Google.com removed, you should have an @google.com email. In my experience, Archive.org will respond to a request from an email address other than the domain you are requesting but they may require additional verification steps.
Sending a request from a free email service like Gmail, Outlook.com, etc. is almost guaranteed to slow things down. This is one of the reasons I recommend Step 3, because it provides additional information when you make the request.
Here’s some suggested wording for an Archive.org Takedown Request / Removal of Domain where:
- [Your_Name] should be replaced with your name and
- [Your_Domain] with your relevant domain name.
- [Start_Date] that has the date from which you want the domain removed and can prove ownership of the domain.
I recommend sending a separate notice for each domain, don’t try and do it all at once.
Formal Request To Remove [Your_Domain] From Internet Archive Wayback Machine
Hello I am [Your_Name] owner of [Your_Domain]. I’m officially requesting the immediate removal of [Your_Domain] site/domain from web.archive.org and the Internet Archive Wayback Machine. The User-agent: ia_archiver Disallow: / code in our robots.txt file is not being followed. The Copyright Notice on this site can be found here [Your_Domain] I am requesting removal of [Your_Domain] from [Start_Date] up to and including today and all days going forward. Attached is formal DMCA notice as well as evidence that I am the owner of [Your_Domain]. Thank you for your prompt attention. [Your_Name]
Don’t Forget! to attach the DMCA notice you generated in Step 2 and proof of ownership in Step 3.
Step 5: Wait and Track Archive.org
Once you send your email you will need to wait. I’ve had response times as fast as 24 hours and some that take a few days. Archive.org will reply, just remember that they are US-based (California) so make sure you allow for US Pacific Time, weekends, and major US holidays. Be patient, polite, but firm. If you don’t hear after 3 days, a polite follow-up email may be warranted.
In my experience, if you do everything above, you will get a response within 5 days. It takes about a week after they respond for content to be purged from Archive.org
- The Internet Archive / Wayback Machine / Archive.org will only delete pages and sites from when you took ownership, not just because you now have ownership. This is really important. So if you have bought an old domain, you’re out of luck for anything older than the day you commenced ownership.
- I have found the people from Internet Archive / Wayback Machine / Archive.org friendly. So please be polite. They do want to help and anything they ask is to clarify an issue. They do only respond during US business hours. So it pays to be patient (3 day way minimum).
- There is no urgent process. There’s not much else I can say other than if you need this done fast, I share your frustration. If there’s a legal reason why you need something removed quickly, you should really get legal counsel involved.
- I have no experience having content removed from Archive.org where you are not the domain owner, for example, if a domain has breached your copyright and now your content is in the archive. I am #NotALawyer and recommend you get legal advice if your issue relates to your content but not your domain.
- This last piece of advice is as much for me as it is for everyone else. If you want to always block the Internet Archive / Wayback Machine / Archive.org make sure you keep your robots.txt up to date. It’s a lot easier to keep robots.txt updated and block Archive.org than get pages removed.
- There is value to the Internet Archive, so don’t remove your site unless you really feel it’s necessary. It may be better to just remove specific pages.
One last thing…
I am not affiliated with the Internet Archive, Brave and this is my personal experience. I do not provide any support or assistant for removing your site or page. This page does not have any affiliate links or ads, so I’m not making money by sharing this with you. If you found this helpful and want to say thanks why not send some money to buy me a coffee!