Pilots for 911 Truth site restoration

Jerry Russell

Administrator
Staff member
I am starting this new thread, for progress reports and discussion on the 'Pilots for 911 Truth' site restoration project.

This all started when a researcher known as 'Xander Arena' obtained an archived copy of the now-defunct 'Pilots for 911 Truth' website, and gave it to member Ruby Gray for safekeeping. The original URL, pilotsfor911truth.org, now seems to be owned & operated by persons hostile to the original message, who are undermining it by making subtle changes to the content. Also the site is blocked by Cloudflare in the US, and the forum doesn't work.

I was able to register the redirect URL 'PF911T.org', and I uploaded the public_html folder to my Internet service provider, siteground.com. One problem quickly surfaced: many of the hard coded HTML pages making up the site, contain direct references to the original URL (that is, pilotsfor911truth.org). So all of those hard coded links don't work. I fixed the links on the site homepage, so it does load correctly. As of now, that's the only page working.

The vast majority of valuable material at the old site is in the discussion forum. Unfortunately, Xander's copy of the website did not include the SQL database for the forum. This database is typically maintained by the server as a dynamic RAM object, and is not backed up to long term storage unless specifically requested.

Because we don't have this SQL database, all my attempts to use standard Xenforo importer software failed miserably. That was last month, and since then I've been mostly busy and/or procrastinating.

But, I did get back to the project over the last week. Unfortunately, I haven't actually made any headway. The first problem is to globally change all hard-coded references to the site URL. I experimented with using SSH keys to connect with the ISP server. That didn't work out after half an hour of experimentation, so I switched to the alternative of trying to edit a local copy of the site on my Mac. I then experimented with the Darwin command line interface on Mac Terminal, and again drew a blank after another hour of effort. I tried to use the Darwin sed editor, but standard Unix syntax is apparently more similar to gsed, whatever that means.

The bigger problem is the missing SQL database. All is not lost, because the forum subdirectory in the site archive includes about 70,000 HTML files which seem to be generated from the posts in the database. I don't understand why this resource exists, but fortunately it does. The data was presumably generated by the Invision forum system. A membership + license to use Invision software is expensive on a monthly basis, but they do have a free trial option. So I signed up for that, and I was able to ask a tech support person about my problem. He just said "if you don't have an SQL file, you don't have a site." I'm glad I didn't pay anything for that bit of terrible advice.

Next steps are: (1) get a copy of gsed, or maybe use VI, or otherwise figure out how to do global edits in Darwin. And, (2) use TCL or some other scripting language to extract the data from the 70,000 HTML files, and put into a form suitable for importing into SQL. Basically I think I can get the poster name, post title, posting date and body text for each forum entry. Then all that data needs to be imported into the Xenforo SQL database for display and searching.

I had been hoping that Duck Duck Go or Google would index the 70,000 files, but that doesn't seem to be happening. With no links from the home page, the indexing crawlers don't seem to be finding the pages. Even in its present totally dysfunctional form, the site is attracting a little bit of traffic, so apparently it's not blacklisted.

Doing all this programming work would probably take me a day or two of actual work, plus a lot more time reviewing to restore my rusty programming skills. With everything else on my plate, I'm not in a position to make any promises as to when I'll get to it.

But I'm hoping for a miracle! My next strategy is going to be, to ask ChatGPT to solve this problem for me. Maybe I can get some AI generated code with practically no effort on my part. We shall see.
 
Last edited:
I appreciate your hard work Jerry! I don't understand a word of all that, but I am hopeful that you are going to make a breakthrough soon.

I do have saved P4T forum pages on my dusty rusty computer. Quite a few, but unsure how many. Are they of any use to you at all?
 
Hard work? I wish!! You and BW are the ones doing the heavy lifting around here.

ChatGPT means well I'm sure, but is giving lousy advice so far. There may be no alternative to applying my own personal type of intelligence, or the lack thereof? I'll probably give the chat bot one more chance before resorting to old textbooks and online manuals.

If you have saved forum pages, I'd be curious to see them. I would try to verify that the same posts appear in the database of ~70,000 html files in the archive.
 
The first problem is to globally change all hard-coded references to the site URL.

This is done now. After several hours of experimentation with grep, find, vim, and xargs, the answer was to install gnu-sed from homebrew. (Just in case anyone is keeping score.)

As a result, various pages such as 'Latest News' and 'American Pentagon (Flight 77)' are working, making more of the original information available.

Forum still missing. Some forum links mysteriously connect to a porn site hosted by Invision.
 
So it seems that you are making progress! But I could be wrong. I know nothing about all this stuff!
 
I will try to remember to take my computer records to the local online centre this week and send you at least some of those pages. What's the best way to send them?
 
So it seems that you are making progress!

Yes, slow progress. Actually my problem is that my girlfriend dumped me, so I've been dealing with some emotional chaos. But getting better, and starting to be able to focus again.

I will try to remember to take my computer records to the local online centre this week and send you at least some of those pages. What's the best way to send them?

Thanks Ruby! Probably best to send by email -- unless it's a huge amount of data. If the files are too big collectively to send by email, upload to google docs or dropbox. Or I could send you a manager password, and you could upload directly to PF911T.
 
So it seems that you are making progress!

I've made a little more progress! If you navigate to http://pf911t.org/forum/index.html, you will see a file containing about 70,000 links, each pointing to a file from the old forum. The link texts give the filename and subject of each file.

If you're on a slow Internet connection, this index file will take forever to load. And unfortunately, out of those 70,000 files, my guess is that about 90% of them contain nearly identical (but not exactly identical) content. So, my next step is to get rid of those files with high similarity, which will result in a much quicker loading index file.

For now, if you have a reasonably fast connection (or, are extremely patient) it's possible to load this index file and then search within it to find topics of interest.


I was wondering if anyone would ask! Will send PM.
 
Last edited:
my next step is to get rid of those files with high similarity, which will result in a much quicker loading index file.

This is done. Setting a threshold for similarity was a judgment call, but I used a program called SequenceMatcher and rejected files more than 95% similar. Out of the original ~70,000 links, I wound up with about 25,000.

Reviewing the results, I'm guessing that we probably have most of the "Original Post" content, but the discussion threads are mostly lost.

Remaining task is to clean up references to 'pilotsfor911truth.org' within the forum post files. Some of the links within these posts might start working when that's taken care of. I originally had thought maybe the text and link structures within these files could be scraped and put into a new SQL database, but I'm not sure that the results would be worth the effort. If there are any gems contained within this data, it should be possible to find them by searching or crawling the index. Hopefully, Google will find the information & index it soon.

It might be worth checking with Xander to ask if there's any chance he has the SQL database file squirreled away somewhere.
 
About those 70,000 files you whittled down for similarity. Would that have been because so many people just copy the entire previous post into their response? So if you deleted similar files, that means any such responses are lost? Although they would still be in the source copy, if ever some miraculous solution presented itself for retrieving them.
 
So if you deleted similar files, that means any such responses are lost?

The vast majority of the similarities were at about 99.997 percent, meaning probably one bit difference. Some unexplained file formatting issue. But there were many more eliminated at 98% similarity, which might be enough for some sort of brief response post. I'll re-run the indexing routine at 99% threshold, which will probably pick up another thousand (very brief at best) reply posts or so.

Also, I believe Google has an option to register one's site with 'Google Analytics', to get traffic reports from them. I suppose if we want the best indexing service, I should probably register.
 
I've been in contact with Xander on another issue, his incredulity of the Lloyde England story, but he's been very busy with work and family and hasn't replied for a while. So he might be some time replying to queries about P4911T.
 
I set the comparison threshold to 99%, and this time the routine preserved 45,403 links. So we have a huge number of files that are very similar, but not exactly. I added an item to the link descriptors, giving the file size. So it's possible to select the longest one, or review several to see if there's any difference.

My index routine is about as short & simple as possible. I could add features to examine the files, get the longest one in a series of updates, and identify different threads with the same title (if such things exist in the folder.) I'm not sure how useful any of that would be. For now, it's easy enough to check by hand, regarding the contents of similar files sharing the same title.

Google still isn't indexing the material. I doubt if enabling Google Analytics would make any difference. I'm trying to log into Google Search Console and see if it gives any clues, but it seems the registration process for that tool might take a day.
 
I'd like some feedback about how useful the PF911T forum resource is, in its present configuration; and whether any additional tools would be useful.

As of today, Google still hasn't crawled and indexed the new data. The search console tool is still saying "check back tomorrow". I don't know whether to hope it will be around eventually, or whether the site has been blacklisted.

It's possible right now, to search the index page topics using Edit->Find... in the Safari browser, or whatever the equivalent is in the browser you're using. With more effort, I could probably create a search tool that could do basic searches everywhere in the entire database. Or, the data could probably be imported into Xenforo and integrated with the forum on this site.

Your thoughts, Ruby and BW? Can I declare this as a finished project? Wait to see what Google does? Or, ?
 
I've just spent a couple of hours there. I don't know what to think. I went to the Forum which of course is very different from its original self. All the thread titles were listed in an unending page. Which I cannot search, on a tablet.

So I kept scrolling down and clicking on numerous Pentagon related titles.
In nearly all cases, the title was just posted once. Very rarely there were two versions, sometimes more. The opening post would always be there, but replies were missing.

Except in the case of multiple listings of the same title. Then, there would be maybe one reply. Or a few more perhaps. Different versions of the title would have different comments below, highlighted in yellow.

Then there was a glitch and I found myself back at the home page.
When I went back to the Forum, now the number of titles had expanded exponentially. Instead of one, or rarely a few listings of the same title, now there were dozens or even hundreds of each. Every separate version had different comment/s below.
I don't know what's happening. Maybe Google is playing with it. It's all beyond me!!
 
Ruby, thanks for looking around. If you can't search the titles with the tablet, I guess we'll have to do something.

Then there was a glitch and I found myself back at the home page.
When I went back to the Forum, now the number of titles had expanded exponentially.

I can only guess that somehow, you found your way to the version of that index from a couple days ago. That version had about twice as many titles, mostly duplicates.

Although I can't explain hundreds of copies of each title, I don't remember ever seeing that. And I can't replicate your experience of following a link to that old index, if that's what it was. Maybe your tablet kept a cache of the old version.
 
I slept on it and went back. The index appeared to have reverted to mostly single listings of topics again.
Then when I got down a bit further I was back into mega multi lists, such as this heading, which had about 65.

Screenshot_20240304-073135_Chrome.jpg

So checking on some individual titles, I found that the opening post begins every page, with maybe one or more comments beneath, different for every "php-####".
The comments that are published beneath the OP are highlighted in yellow below, as here.

Screenshot_20240304-073352_Chrome.jpg

So it seems that each individual listing contains a different comment or set of comments. Obviously some popular threads had hundreds of comments.
So the formatting that groups all comments onto their original page, has been lost. The comments are fragmented, each below the OP.
Many of these have been list anyway.

However, on this thread for example, I found that the formatting seems to have been preserved, with a whole page of comments, and no extra links below.
This is Craig Ranke's thread "North Approach Impact Analysis".


Screenshot_20240304-073842_Chrome.jpg

Generally, links contained within posts are broken, which is probably a function of the passage of time. Many of the external hosting sites had long gone anyway, and very few images were posted within comments.

However, some gifs still work, and video links.

Most news stories are gone. So annoying that bandwidth concerns severely limited what could be included in posts back then.

But I did find a very long list of photos linked to Photobucket, which are still working.

Anyway, I think by deleting thousands of similar files, thousands of comments have been deleted.
I don't know whether there is any way of restoring the format to show full pages of comments again, although the example above says that somehow it did happen, at least on that occasion.
 
Then when I got down a bit further I was back into mega multi lists, such as this heading, which had about 65.

OK, I see. I was too lazy to scroll down that far!! The last version of the index file, is the one that includes the file length reports. And that's the one you're looking at.

Anyway, I think by deleting thousands of similar files, thousands of comments have been deleted.

I'd be surprised if this is the case. I think that the deleted files were virtually identical to the ones that have been kept. But it's very easy to go back to the full index of all 70,000+ files in the 'forum' directory, so that we can review them to see what (if anything) we're missing.

After I finish this post, I'll go and recreate all those missing links to duplicate(??) files. It's about five minutes of actual work on my part, and then about a half hour of processing.

I don't know whether there is any way of restoring the format to show full pages of comments again, although the example above says that somehow it did happen, at least on that occasion.

Unfortunately, I think that what you're seeing is pretty much what we've got. I could try to write code that would identify the files that contain "full pages of comments", as opposed to the ones with the index of 'posts in this topic' at the end, which tend to contain just one post, or a few posts.

It's a complete mystery to me, why this resource of ~70,000 html files even exists. The data was never intended to be preserved in this format. These files seem to be some byproduct of the normal operation of the system. It seems to be very random, what has been saved and what hasn't.
 
Back
Top