Ruthless Automation

M-x write-something

Does Data Scale Forever?

Next time someone in your group says something about how Google is doing X, so that shows you should do the same thing, give them this test: 1. Describe Google. 2. Give two examples.

(Apologies to whoever created the joke in the first place, some say Woody Allen.) Google is google because they do stuff no one has ever done. If your company did that, you’d be the new Google.

There was a recent email floating around at work that went over an article about how Google has enough data that theory isn’t needed to predict outcomes anymore. With enough data, anything can be modeled. So applying this to writing HTML, can you say “Don’t worry about semantic markup?”. Can you just keep adding data to your models and you’ll be able to extract meaning from that? If enough web pages are written, even badly, about a topic, can we pull the data out of those pages regardless? See http://www.wired.com/science/dis coveries/magazine/16-07/pb_theory

Is there a point where we’ll be creating so much data that we’ll overwhelm even the biggest servers, processors, RAM, and code? Which wins out, our ability to process the data or create it? Hardware and software manufacturers or people manufacturers?

You can get by without semantics if you have enough data, storage, memory, and money. Also see http://jeremy.zawodny.com/blog/archives/010841.html. And think about data mining for terrorists - is it actually working as the government tries to plow through every bit of communications data. As data keeps growing, will hardware and software scale to keep up?