Drive By Resume Harvesting, Compliments of Google
Tuesday, September 18th, 2007My resume is on the web. Most of the time, however, I’ve asked the ‘bots to leave it out of the index. Recently, though, I’ve allowed the ‘bots to index my resume, and it’s generated a bit of harvesting action. Here’s the snippet from the weblog Actual Link Here:
210.245.110.78 - - [17/Sep/2007:02:56:34 -0400]
“GET /MyResume.html HTTP/1.1″ 200 4592 “http://www.google.com.vn/search?q=inurl:cv+%7C+
inurl:resume+%7C+inurl:vitae+%7C+intitle:cv+%7C+
intitle:resume+%7C+intitle:vitae)
+(%22Java+developer%22+%7C+%22C%2B%2B+developer%22)+
(C%2B%2B+%7C+Java+%7C+J2EE)+
(Linux+%7C+Unix)+-usa+-india+-C%23+-.Net+-PhD+-Ph.D+-CA+
-NY&hl=vi&start=80&sa=N”
“Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)”
Which translated means:
210.245.110.78- This is the IP Address of the requestor. DNSStuff.com reports that this IP address is from Ho Chi Minh City, Ho Chi Minh (Vietnam). This is consistent with the Google site used for the search (www.google.com.vn)
[17/Sep/2007:02:56:34 -0400]- The date of the search
200- The webserver response. 200 is success
4592- This is the size of the document - my resume is only 4592 bytes. There’s an associated stylesheet, and that’s fetched right after the resume.
- referrer field
-
The remaining information is the referrer field. This is how you tell what document the user was on when they clicked a link.
http://www.google.com.vn- This is the Google site in Vietnam. Looks like our visitor was from Vietnam.
/search- This is the search url - nothing special here
inurl:cv- Specifies a search where the URL fetched has “cv” in it.
+%7C+- the “+” symbol represents a space in URL encoding and the %7C is a “|” (bar) symbol. This construct means the previous search term is “or’d” with the next search term.
inurl:resume |- Specifies a search where the URL fetched has “resume” in it.
inurl:vitae |- Specifies a search where the URL fetched has “resume” in it.
intitle:cv |- Specifies a search where the title of the document fetched has “cv” in it.
intitle:resume |- Specifies a search where the title of the document fetched has “resume” in it.
intitle:vitae- Specifies a search where the title of the document fetched has “vitae” in it.
("Java developer" | "C++ developer")- Document text contains “Java Developer” or “C++ developer”
(C++ | Java | J2EE)- Document text contains “C++” or “Java” or “J2EE”
(Linux | Unix)- Document text contains “Linux” or “Unix”
-usa -india -C# -.Net -PhD -Ph.D -CA -NY- Documents without the words usa, india, C#, .NET, PHD, Ph.D, CA and NY
So, as you can see, the headhunters (at least the Vietnamese ones), are very proficient in Google searches. If you want to type in the text to the Google search box, here’s the text you’ll need:
inurl:cv | inurl:resume | inurl:vitae | intitle:cv | intitle:resume | intitle:vitae) (”Java developer” | “C++ developer”) (C++ | Java | J2EE) (Linux | Unix) -usa -india -C# -.Net -PhD -Ph.D -CA -NY
Enjoy, and now you’ll recognize the resume harvesters when they drive by your site.




