Thursday, September 15th, 2005
9:53 am - The Corruption of Google  
So, when I checked my e-mail first this morning, there was an anonymous LJ comment in my inbox. I marked it as read (I prefer reading comments in context), and clicked over to my most recent post on LJ to read the comment. Except there were no new comments in my most recent post. I checked all my other recent posts, and there were no new comments in any of them, either.

Went back to the message in my inbox, and read it. It was for a post I'd written back in July, and the new comment was from a total stranger, mentioning that she was also a mad scientist. Huh? I sat there for a second wondering at how a random stranger had come across that particular post. If she had selected "search random," she'd have gotten the most recent stuff. My head started to hurt, so I went off to take my shower.

I was in the shower when I remembered something alterjess posted at b.org yesterday. Google now has a blog search function. In my LJ settings, I've checked the little box that says "Block Robots/Spiders from indexing your journal," but there is a disclaimer that says some robots may ignore it anyway. Google never indexed my journal, having pledged to do no evil, but maybe the new function did.

After I toweled off and got dressed, I went to Google Blog Search, and entered "mad scientist" into the box. Sure enough, there was my post of July 5th, fourteenth on the list. Apparently, Google Blog Search is not as polite or ethical as far as its indexing goes as its older cousin. Pillocks...

As a side note, I tried entering "dxmachina mad scientist" into the main Google search box, and only got a handful of results, none of them from my LJ. It was an odd group of links. One was to an old Buffistas thread over on WX. A couple were about the two imposter DXMachinas on the net. I also discovered a third, who has managed to snag the domain dxmachina.com, which used to belong to a French manufacturing company. Rats. I wanted that domain, but I never figured it to come back on the market. (Note to self, see if dxmachina.org is available.)

The oddest cite was the last.

☞ landscaping - care plant take
... can spread quickly when pushed by power-mad men. ... was really cold out), I ran into dxmachina, who was ... viral-marketing game about a nonexistent scientist for the ...


Clicking on the link takes you to a links page for folks interested in landscaping. None of the text cited in the search appears on the page. I checked the cached version, and it doesn't show any of the cited text, either. The thing is, I recognized the middle sentence of the text. It came from veejane's recap of Boskone last February. The "viral-marketing" sentence comes from the same post. The line about "power-mad men" doesn't, though. Very strange. I can only conclude that Google's database is corrupt. As are their blog indexers.
 
 
 
Current Mood: paranoid
 
 
( Post a new comment )
vw bugvwbug on September 15th, 2005 - 02:03 pm
Eek! That makes me very glad that I lock most of my posts.
(Reply) (Thread) (Link)
mearamearagrrl on September 15th, 2005 - 02:54 pm
Be very careful--apparently, if you initially didn't lock a post, and then went back and did so, google may have already found it. As someone on LJ discovered.

And DX, mine is on there too, if you search for mearagrrl (along with a bunch of other peoples' mentions of me on LJ). Very freaky (I, too, have the anti-spider box checked, but I guess that means nothing to them. And like I said, too late to go lock posts now, they'd already be there. BASTARDS.)
(Reply) (Parent) (Thread) (Link)
DXMachinadxmachina on September 15th, 2005 - 03:10 pm
Yeah, I read that about the non-private private posts. It looks like they are working off the RSS feed rather than using a spider, so the no-robots file is by-passed.
(Reply) (Parent) (Thread) (Link)
vw bugvwbug on September 15th, 2005 - 03:16 pm
I just did a search for myself. Eek. There are more unlocked posts than I remember. Oh, well...
(Reply) (Parent) (Thread) (Link)
Jessejesseh on September 15th, 2005 - 02:26 pm
That google blog search is creepy. Makes me want to lock all my posts.
(Reply) (Thread) (Link)
DXMachinadxmachina on September 15th, 2005 - 03:12 pm
Locking a previously unlocked post won't remove it from the database. See meara's comment above.
(Reply) (Parent) (Thread) (Link)
Jessejesseh on September 15th, 2005 - 03:24 pm
Well, horse, barn door, etc.

I know I've been fairly careful about identifiable things, so I'm not like freaking out or anything, but still. I said no!
(Reply) (Parent) (Thread) (Link)
grime and livestockcofax7 on September 15th, 2005 - 02:35 pm
There's some evidence that the ignorign of the no-robots thing is a bug and will be corrected. So I recall seeing last night on one of the LJ support posts.
(Reply) (Thread) (Link)
noumignon on September 15th, 2005 - 03:07 pm
I just wanted to mention that Google Blog Search is the fourth result when you Google Google Blog Search...
(Reply) (Thread) (Link)
sumisumik on September 15th, 2005 - 03:22 pm
That is freaky.
(Reply) (Thread) (Link)
Suesusano on September 15th, 2005 - 03:33 pm
Me no likey.
(Reply) (Thread) (Link)
JenP: Star Trek - Spock holy shitjenlp on September 15th, 2005 - 03:39 pm
Well. That really, really sucks. Huh.
(Reply) (Thread) (Link)
(no subject) - ww1614 on September 15th, 2005 - 03:51 pm
flyingtapesflyingtapes on September 15th, 2005 - 04:50 pm
you can actually remove that function so that your posts are indexed differently…even if you have robots.txt noted in your preferences, it still looks at all your posts. Here's a link to one thing on it:

http://www.livejournal.com/users/isilya/239478.html
(Reply) (Thread) (Link)