Disclaimer. I am not affiliated with the Internet Archive, Brave and this is my personal experience. I do not provide any support or assistant for removing your site or page. I also cannot remove your site. This page does not have any ads, so I’m not making money by sharing this with you. If you found this helpful and want to say thanks why not buy me a coffee or buy my book on Amazon. Thanks!
If you’re in a hurry, scroll down to the 5 Easy Steps to Remove your Website from Archive.org
I was surprised when the Brave browser announced that they were partnering with the Wayback Machine / Archive.org on 404 errors. It seems at odds with Brave’s privacy-by-design principles and when some countries don’t have right to be forgotten laws.
I get the 404 pages can be frustrating for users, but as a former employee of a major publisher and a current website developer, 404 pages are important from removing out-of-date or erroneous content, making sure people (and crawlers) following old back-links deprecate links, etc. Plus maintaining countless redirects on a huge site can get complex, better instead to have friendly 404 pages.
The onus on how a 404 should be handled should really on the publisher/developer. It’s my decision to either redirect to the right place in a site (or not) and I disagree with Brave surfacing content that the publisher has intentionally deleted or removed content. Yes, Brave isn’t doing anything people cannot do themselves manually or via plugins. It’s the automation that is of concern.
There are also safety concerns – old domains and expired pages are the target of SEO abuse, domaineering and 404 adfraud. There’s the danger that less technically astute people could find themselves redirected by Brave to out-of-date content and make a bad decision as a result. As a copyright holder, I have also had my needs change over time, so what was once OK to be archived is sometimes not currently the case.
To be clear, I am not anti-Archive.org. I do family history research and do find the site valuable. Just content owners should be given some choice and control in how they participate. So here’s how to delete your site from the Internet Archive / Wayback Machine / Archive.org
Steps to Delete your Site from the Internet Archive / Wayback Machine / Archive.org
Please read these 5 easy and proven steps to remove your site from the Internet Archive / Wayback Machine / Archive.org.
I have provided extensive detail if you scroll down, but the key 5 steps to deleting your site from Archive.org are below:
- Update your website robots.txt file to block the Internet Archive / Wayback Machine / Archive.org Crawler / Check your Copyright Notice
- Draft a DMCA Takedown Notice with specific links to sites / pages you want to be removed from the Internet Archive / Wayback Machine / Archive.org
- Find an old invoice demonstrating the oldest date of ownership you have for the domain.
- Draft and send a polite email with 2. and 3. attached to the Internet Archive / Wayback Machine / Archive.org Crawler
- Wait 3-5 days
I have provided details below with more information to complete each easy step to remove your website from Archive.org and links if you need help. Honestly, my results have always been mixed and it’s one of my frustrations with the Internet Archive. A site update has sometimes resulted in my robots.txt file getting nuked and I find out I am in Archive.org again. I wish Archive.org would give publishers a way of verifying your domain to do a takedown or a webmaster tool like that found on Google/Bing.
Step 1: Robots.txt to Block a site from the Internet Archive / Wayback Machine / Archive.org / Check Copyright Notice
If you’re super interested, you can learn more about robots.txt here.
Archive.org has a mixed attitude to robots.txt but they do honor them.
Make sure you add this to the end of your existing robots.txt file, don’t delete any existing entries.
User-agent: ia_archiver Disallow: /
If you’re not sure how to edit your robots.txt then talk to your hosting provider or website developer.
If you use WordPress, this free Archive.org Blocker WordPress plugin does everything you need to block Archive.org from WordPress. Install, activate, and you’re done. If you already use a robots.txt plugin, you can add the above code to the end of your existing robots.txt file.
While you’re making these changes, it’s also a good time to check that there is a current Copyright Notice on your website. Most content management systems put this on your site automatically.
Step 2: DMCA Takedown Notice for the Internet Archive / Wayback Machine / Archive.org
The DMCA is short for the Digital Millennium Copyright Act. It’s a piece of US legislation to help copyright holders protect their intellectual property. Even if you don’t live in the US you can use a DMCA notice to have content remove from the Internet Archive / Wayback Machine / Archive.org.
I am #NotALawyer so if you’re dealing with a serious issue with archived content, get your own legal counsel. This is also not legal advice, so if this step makes you nervous best to ask an expert. I have been told by others who have read these instructions that you can skip this DMCA step and still have success. Your mileage may vary.
To generate a DMCA takedown notice, I used this free DMCA Generator tool from Who Is Hosting This. Otherwise, use this DMCA Takedown Notice generator from the Intellectual Property HQ.
I am going to stress again, DMCA notices are legal documents so make sure you’re fully aware of what you’re doing.
The DMCA form is straightforward, but make sure you paste in as many website addresses from Archive.org that match the dates you owned the domain and the content you want to be removed.
Step 3: Demonstrating a History of Domain Ownership to the Internet Archive / Wayback Machine / Archive.org
If you’re requesting the removal of a whole domain or website from Archive.org may ask for proof of domain ownership. Archive.org provides no automated verification of ownership such as a DNS record change, website code, or uploading of a file. You will need to find an old invoice / receipt from your domain host proving ownership.
Most hosting providers do provide access to a history of invoices, so you will need to log in to your account to get these. Worst case, it might require an email to the accounts department of your hosting company.
If you’re in a hurry, you can try and skip this step and see how Archive.org responds but be prepared for them to ask for this information. One way to try and avoid the issue is to send the request from an email address associated with the domain.
That said, I strongly encourage you to send proof of ownership as part of the request. Archive.org can be frustrating if your domain has switched hosts, registrars, etc. during the request period which they verify against public domain records. If you forget your original register or host, I’d do a free domain history check to jog your memory.
If you do not own the domain you will not be able to get the site deleted from the Internet Archive.
Step 4: Email Requesting the Internet Archive / Wayback Machine / Archive.org remove your website
The email address for Archive.org takedown requests is email@example.com but do not email them unless you have done Steps 1-3.
It is better if your email comes from the domain you’re emailing about. For example, if you want Google.com removed, you should have an @google.com email. In my experience, Archive.org will respond to a request from an email address other than the domain you are requesting but they may require additional verification steps.
Sending a request from a free email service like Gmail, Outlook.com, etc. is almost guaranteed to slow things down. This is one of the reasons I recommend Step 3, because it provides additional information when you make the request.
Here’s some suggested wording for an Archive.org Takedown Request / Removal of Domain where:
- [Your_Name] should be replaced with your name and
- [Your_Domain] with your relevant domain name.
- [Start_Date] that has the date from which you want the domain removed and can prove ownership of the domain.
I recommend sending a separate notice for each domain, don’t try and do it all at once.
Formal Request To Remove [Your_Domain] From Internet Archive Wayback Machine
Hello I am [Your_Name] owner of [Your_Domain]. I’m officially requesting the immediate removal of [Your_Domain] site/domain from web.archive.org and the Internet Archive Wayback Machine. The User-agent: ia_archiver Disallow: / code in our robots.txt file is not being followed. The Copyright Notice on this site can be found here [Your_Domain] I am requesting removal of [Your_Domain] from [Start_Date] up to and including today and all days going forward. Attached is formal DMCA notice as well as evidence that I am the owner of [Your_Domain]. Thank you for your prompt attention. [Your_Name]
Don’t Forget! to attach the DMCA notice you generated in Step 2 and proof of ownership in Step 3.
Step 5: Wait and Track Archive.org
Once you send your email you will need to wait. I’ve had response times as fast as 24 hours and some that take a few days. Archive.org will reply, just remember that they are US-based (California) so make sure you allow for US Pacific Time, weekends, and major US holidays. Be patient, polite, but firm. If you don’t hear after 3 days, a polite follow-up email may be warranted.
In my experience, if you do everything above, you will get a response within 5 days. It takes about a week after they respond for content to be purged from Archive.org
- The Internet Archive / Wayback Machine / Archive.org will only delete pages and sites from when you took ownership, not just because you now have ownership. This is really important. So if you have bought an old domain, you’re out of luck for anything older than the day you commenced ownership.
- I have found the people from Internet Archive / Wayback Machine / Archive.org friendly. So please be polite. They do want to help and anything they ask is to clarify an issue. They do only respond during US business hours. So it pays to be patient (3 working days minimum).
- There is no urgent process. There’s not much else I can say other than if you need this done fast, I share your frustration. If there’s a legal reason why you need something removed quickly, you should really get legal counsel involved.
- I have no experience having content removed from Archive.org where you are not the domain owner, for example, if a domain has breached your copyright and now your content is in the archive. I am #NotALawyer and recommend you get legal advice if your issue relates to your content but not your domain.
- This last piece of advice is as much for me as it is for everyone else. If you want to always block the Internet Archive / Wayback Machine / Archive.org make sure you keep your robots.txt up to date. It’s a lot easier to keep robots.txt updated and block Archive.org than get pages removed.
- There is value to the Internet Archive, so don’t remove your site unless you really feel it’s necessary. It may be better to just remove specific pages.
- If you want to remove your data from data brokers, then you will need to use something like OneRep.
One last thing…
I am not affiliated with the Internet Archive, Brave and this is my personal experience. I do not provide any support or assistant for removing your site or page. I also cannot remove your site. This page does not have any ads, so I’m not making money by sharing this with you. If you found this helpful and want to say thanks why not buy me a coffee or buy my book on Amazon. Thanks!