Twitter's robots.txt question:
Twitter's robots.txt, It shows everything is disallowed, but surprisingly search engines are crawling and indexing everybody's profiles pages, Why?
Twitter doesn't disallow all URLs. If they wanted to do that they'd do "Disallow: /". Their "Disallow: /*?" disallows all pages with a ? in the URL for robots that recognize wildcards (Google and Yahoo only as far as I know). For others the * is interpreted like any other character would be. Twitter profile, tweet, etc pages don't use a ? in the URL (no HTTP GET parameters) so search engines index them.