Archive for 'Storage'

BigDataBigBuildingsThere is a huge focus on big data nowadays. Driven by ever decreasing prices and ever increasing capacity of data storage solutions, big data provides magical insights and new windows into the exploitation of the long tail and addressing micro markets and their needs.  Big data can be used to build, test and validate models and ideas Big data holds promise akin to a panacea.  It is being pushed as a universal solution to all ills.  But if you look carefully and analyze correctly what big data ultimately provides is what Marshall MacLuhan described as an accurate prediction of the present.  Big data helps us understand how we got to where we are today. It tells us what people want or need or do within a framework as it exists today.  It is bounded by today’s (and the past’s) possibilities and ideas.

But big data does not identify the next seismic innovation.  It does not necessarily even identify how to modify the current big thing to make it incrementally better

In the October 2013 issue of IEEE Spectrum, an article described the work of a company named Lex Machina. The company is a classic big data play.  They collect, scan and analyze all legal proceedings associated with patent litigation and draw up statistics identifying, for instance, the companies who are more likely to settle, law firms that are more likely to win, judges who are more favorable to defendants or the prosecution, duration and cost assessments of prosecutions in different areas.  So it is a useful tool.  But all it does is tell you about the state of things now.  It does not measure variables like outcomes of prosecution or settlements (for instance, if a company wins but goes out of business or wins and goes on to build a more dominant market share or wins and nothing happens).  It does not indicate if companies protect only specific patents that have, say, an estimated future value of, say, $X million or what metric companies might use in their internal decision making process because that is likely not visible in the data.

Marissa Meyer, the hyper-analyzed and hyper-reported-on CEO of Yahoo!, famously tests all decisions based on data.  Whether it is the shade of purple for the new Yahoo! logo, the purchase price of the next acquisition or value of any specific employee – it’s all about measurables.

But how can you measure the immeasurable?  If something truly revolutionary is developed, how can big data help you decide if it’s worth it? How even can little data help you?  How can people know what they like until they have it? If I told you that I would provide you with a service that lets you broadcast your thoughts to anyone who cares to subscribe to them, you’d probably say.  “Sounds stupid. Why would I do that and who would care what I think?”  If I then told you that I forgot one important aspect of the idea, that every shared thought is limited to 140 characters, you would have likely said, “Well, now I KNOW it’s stupid!”.  Alas, I just described Twitter.  An idea that turned into a company that is, as of this writing, trading on the NYSE for just over $42 per share with a market capitalization of about $25 billion.

Will a strong reliance on big data lead us incrementally into a big corner?  Will all this fishing about in massive data sets for patterns and correlations merely reveal the complete works of Shakespeare in big enough data sets? Is Big Data just another variant of the Infinite Monkey Theorem? Will we get the to point that with so much data to analyze we merely prove whatever it is we are looking for?

Already we are seeing that Google Flu Trends is looking for instances of the flu and finds them where they aren’t or in higher frequencies than they actually are.  In that manner, big data fails even to accurately predict the present.

It is only now that some of the issues with ‘big data’ are being considered.  For instance, even when you have a lot of data – if it is bad or incomplete, you still have garbage only just a lot more of it (that is where wearable devices, cell phones and other sophisticated but merely thinly veiled data accumulation appliances come into play – to help improve the data quality by making it more complete).  Then the data itself is only as good as the analysis you can execute on it.  The failings of Google Flu Trends are often attributed to bad search terms in the analysis but of course, there could be many other different reasons.

Maybe, in the end, big data is just big hubris.  It lulls us into a false sense of security, promising knowledge and wisdom based on getting enough data but in the end all we learn is where we are right now and its predictive powers are, at best, based merely on what we want the future to be and, at worst, are non-existent.

Tags: , , , , , ,

Back at the end of March, I attended O’Reilly‘s Web 2.0 Expo in San Francisco. As usual with the O’Reilly brand of conferences it was a slick, show-bizzy affair. The plenary sessions were fast-paced with generic techno soundtracks, theatrical lighting and spectacular attempts at buzz-generation. Despite their best efforts, the staging seems to overhwelm the Droopy Dog-like presenters who tend to be more at home coding in darkened rooms whilst gorging themselves on Red Bull and cookies. Even the audience seemed to prefer the company of their smartphones or iPads than any actual human interaction with “live tweets” being the preferred method of communication.

In any event, the conference is usually interesting and a few nuggets are typically extracted from the superficial, mostly promotional aspects of the presentations.

What was clear was that every start-up and every business plan was keyed on data collection. Data collection about YOU. The more – the better. The goal was to learn as much about you as possible so as to be able to sell you stuff. Even better – to sell you stuff that was so in tune with your desires that you would be helpless to resist purchasing it.

The trick was – how to get you to cough up that precious data? Some sites just assumed you’d be OK with spending a few days answering questions and volunteering information – apparently just for the sheer joy of it. Others believed that being up-front and admitting that you were going to be sucked into a vortex of unrelenting and irresistable consumption would be reward enough. Still others felt that they ought to offer you some valuable service in return. Most often, this service, oddly enough, was financial planning and retirement saving-based.

The other thing that was interesting (and perhaps obvious) was that data collection is usually pretty easy (at least the basic stuff). Getting details is harder and most folks do expect something in return. And, of course, the hardest part is the data mining to extract the information that would provide the most compelling sales pitch to you.

There are all sorts of ways to build the case around your apparent desires. By finding out where you live or where you are, they can suggest things “like” other things you have already that are nearby. (You sure seem to like Lady Gaga, you know there’s a meat dress shoppe around the corner…) By finding out who your friends are and what they like, they can apply peer-pressure-based recommendations (All of your friends are downloading the new Justin Beiber recording. Why aren’t you?). And by finding out about your family and demographic information they can suggest what you need or ought to be needing soon (You son’s 16th birthday is coming up soon, how about a new car for him?).

Of all the sites and ideas, it seems to me that Intuit‘s Mint is the most interesting. Mint is an on-line financial planning and management site. Sort of like Quicken but online. To “hook” you, their key idea is to offer you the tease of the most valuable analysis with the minimum of initial information. It’s almost like given your email and zip code they’ll draw up a basic profile of you and your lifestyle. Give them a bit more and they’ll make it better. And so you get sucked in but you get value for your data. They do claim to keep the data separate from you but they also do collect demographically filtered data and likely geographically filtered data.

This really isn’t news. facebook understood this years ago when their ill-fated Beacon campaign was launched. This probably would have been better accepted had it been rolled out more sensitively. But it is ultimately where everyone is stampeding right now.

The most interesting thing is that there is already a huge amount of personal data on the web. It is protected because it’s all in different places and not associated. facebook has all of your friends and acquaintances. Amazon and eBay have a lot about what you like and what you buy. Google has what you’re interested in (and if you have an Android phone – where you go). Apple has a lot about where you go and who you talk to and also through your app selection what you like and are interested in. LinkedIn has your professional associations. And, of course, twitter has when you go to the bathroom and what kind of muffins you eat.

Each of these giants is trying to expand their reservoir of data about you. Other giants are trying to figure out how to get a piece of that action (Yahoo!, Microsoft). And yet others, are trying to sell missing bits of information to these players. Credit card companies are making their vast purchasing databases available, specialty retailers are trying to cash in, cell phone service providers are muscling in as well. They each have a little piece of your puzzle to make analysis more accurate.

The expectations is that there will be acceptance of diminishing privacy and some sort of belief that the holders of these vast databases will be benevolent and secure and not require government intervention. Technologically, storage and retrieval will need to be addressed and newer, faster algorithms for analysis will need to be developed.

Looking for a job…or a powerful patent? I say look here.

Tags: , , , ,
Back to top