Drive By Resume Harvesting, Compliments of Google

My resume is on the web. Most of the time, however, I’ve asked the ‘bots to leave it out of the index. Recently, though, I’ve allowed the ‘bots to index my resume, and it’s generated a bit of harvesting action. Here’s the snippet from the weblog Actual Link Here:

210.245.110.78 - - [17/Sep/2007:02:56:34 -0400]
“GET /MyResume.html HTTP/1.1″ 200 4592 “http://www.google.com.vn/search?q=inurl:cv+%7C+
inurl:resume+%7C+inurl:vitae+%7C+intitle:cv+%7C+
intitle:resume+%7C+intitle:vitae)
+(%22Java+developer%22+%7C+%22C%2B%2B+developer%22)+
(C%2B%2B+%7C+Java+%7C+J2EE)+
(Linux+%7C+Unix)+-usa+-india+-C%23+-.Net+-PhD+-Ph.D+-CA+
-NY&hl=vi&start=80&sa=N”
“Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)”

Which translated means:

210.245.110.78
This is the IP Address of the requestor. DNSStuff.com reports that this IP address is from Ho Chi Minh City, Ho Chi Minh (Vietnam). This is consistent with the Google site used for the search (www.google.com.vn)
[17/Sep/2007:02:56:34 -0400]
The date of the search
200
The webserver response. 200 is success
4592
This is the size of the document - my resume is only 4592 bytes. There’s an associated stylesheet, and that’s fetched right after the resume.
referrer field
The remaining information is the referrer field. This is how you tell what document the user was on when they clicked a link.

http://www.google.com.vn
This is the Google site in Vietnam. Looks like our visitor was from Vietnam.
/search
This is the search url - nothing special here
inurl:cv
Specifies a search where the URL fetched has “cv” in it.
+%7C+
the “+” symbol represents a space in URL encoding and the %7C is a “|” (bar) symbol. This construct means the previous search term is “or’d” with the next search term.
inurl:resume |
Specifies a search where the URL fetched has “resume” in it.
inurl:vitae |
Specifies a search where the URL fetched has “resume” in it.
intitle:cv |
Specifies a search where the title of the document fetched has “cv” in it.
intitle:resume |
Specifies a search where the title of the document fetched has “resume” in it.
intitle:vitae
Specifies a search where the title of the document fetched has “vitae” in it.
("Java developer" | "C++ developer")
Document text contains “Java Developer” or “C++ developer”
(C++ | Java | J2EE)
Document text contains “C++” or “Java” or “J2EE”
(Linux | Unix)
Document text contains “Linux” or “Unix”
-usa -india -C# -.Net -PhD -Ph.D -CA -NY
Documents without the words usa, india, C#, .NET, PHD, Ph.D, CA and NY

So, as you can see, the headhunters (at least the Vietnamese ones), are very proficient in Google searches. If you want to type in the text to the Google search box, here’s the text you’ll need:

inurl:cv | inurl:resume | inurl:vitae | intitle:cv | intitle:resume | intitle:vitae) (”Java developer” | “C++ developer”) (C++ | Java | J2EE) (Linux | Unix) -usa -india -C# -.Net -PhD -Ph.D -CA -NY

Enjoy, and now you’ll recognize the resume harvesters when they drive by your site.



Leave a Reply

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word