Tag: big data

lrgThis is the second installment in my irregular series of book reviews for O’Reilly Media. In the interests of full disclosure, I received this ebook for free in exchange for this review. I get to keep it even if I hate it and they will publish this review on their web site even if I trash the volume completely.

The book under the microscope this time is “The R Cookbook” by Paul Teetor. For those of you unfamiliar, R is a powerful, free, open source programming language and environment used for statistical programming and analysis. It features a rich graphical display language to assist in data visualization. You can think of it as a scripting language akin to Excel Spreadsheets or a variant of MATLAB focused on statistics. The language includes a full suite of community-developed, sector-specific libraries that provide re-usable functions typical of industry needs. These libraries indicate the areas in which R has found popularity.This includes the worlds of finance, genomics, statistics and data science.

The R Cookbook describes itself as a book for the user who is somewhat familiar with R but needs easy access to useful techniques and common R program building blocks. The book is arranged as a series of recipes. Each recipe describes a problem that you might trying to solve and then a solution or possible solutions to resolve the issue. For instance “You want a basic statistical summary of your data” is described as the problem to solve and then the text provides you with at least one approach to providing a solution to that problem.

The structure of the book is such that it begins with simpler recipes and builds its way up to more complex ones. In fact, because of this structure, I would recommend this book as a great tool to learning R for the novice despite the book’s self-identification of that being its incorrect use. The rational behind this recommendation is that the beginning recipes are tasks like “How do I install R?” and similar novice tasks. It then builds slowly from there into a use cases and scenarios of increasing complexity and utility.

The book includes great examples that illustrate the power of R in doing data transformation, probability and statistical analysis. It also shows how you can use R to provide meaningful graphical representations of your results. The chapter on ‘Useful Tricks’ is what seals the deal for me providing 19 great pointers to allow you to improve your R analyses.

Tags: , , , ,

thingsThe hoi polloi are running fast towards the banner marked “Internet of Things“.   They are running at full speed chanting “I-o-T, I-o-T, I-o-T” all along the way. But for the most part, they are each running towards something different.  For some, it is a network of sensors; for others, it is a network of processors; for still others, it is a previously unconnected and unnetworked  embedded system but now connected and attached to a network;  some say it is any of those things connected to the cloud; and there are those who say it is simply renaming whatever they already have and including the descriptive marketing label “IoT” or “Internet of Things” on the box.

So what is it?  Why the excitement? And what can it do?

At its simplest, the Internet of Things is a collections of endpoints of some sort each of which has a sensor or a number of sensors, a processor, some memory and some sort of wireless connectivity.  The endpoints are then connected to a server – where “server” is defined in the broadest possible sense.  It could be a phone, a tablet, a laptop or desktop, a remote server farm or some combination of all of those (say, a phone that then talks to a server farm).  Along the transmission path, data collected from the sensors goes through increasingly higher levels of analysis and processing.  For instance, at the endpoint itself raw data may be displayed or averaged or corrected and then delivered to the server and then stored in the cloud.  Once in the cloud, data can be analyzed historically, compared with other similarly collected data, correlated to other related data or even unrelated data in an attempt to search for unexpected or heretofore unseen correlations.  Fully processed data can then be delivered back to the user in some meaningful way. Perhaps the processed data could be displayed as trend display or as a prescriptive suite of actions or recommendations.  And, of course, the fully analyzed data and its correlations could also be sold or otherwise used to target advertising or product or service recommendations.

There is a further enhancement to this collection of endpoints and associated data analysis processes described in my basic IoT system.  The ‘things’ on this Internet of Things could also use to the data it collects to improve itself.  This could include identifying missing data elements or sensor readings, bad timing assumptions or other ways to improve the capabilities of the overall system.  If the endpoints are reconfigurable either through programmable logic (like Field Programmable Gate Arrays) or through software updates then new hardware or software images could be distributed with enhancements (or, dare I say, bug fixes) throughout the system to provide it with new functionality.  This makes the IoT system both evolutionary and field upgradeable.  It extends the deployment lifetime of the device and could potentially extend the time in market at both the beginning and the end of the product life cycle. You could get to market earlier with limited functionality, introduce new features and enhancement post deployment and continue to add innovations when the product might ordinarily have been obsoleted.

Having defined an ideal IoT system, the question becomes how does one turn it into a business? The value of these IoT applications are based on the collection of data over time and the processing and interpretation (mining) of said data.  As more data are collected over time the value of the analysis increases (but likely asymptotically approaching some maximal value).  The data analysis could include information like:

  • Your triathlon training plan is on track, you ought to taper the swim a bit and increase the running volume to 18 miles per week.
  • The drive shaft on your car will fail in the next 1 to 6 weeks – how about I order one for you and set up an appointment at the dealership?
  • If you keep eating the kind of food you have for the past 4 days, you will gain 15 pounds by Friday.

The above sample analysis is obviously from a variety of different products or systems but the idea is that by mining collected and historical data from you, and maybe even people ‘like’ you, certain conclusions may be drawn.

Since the analysis is continuous and the feedback unsynchronized to any specific event or time, the fees for these services would have to be subscription-based.  A small charge every month would deliver the analysis and prescriptive suggestions as and when needed.

This would suggest that when you a buy a car instead of an extended service contract that you pay for as a lump sum upfront, you pay, say, $5 per month and the IoT system is enabled on your car and your car will schedule service with a complete list of required parts and tasks exactly when and as needed.

Similarly in the health services sector, your IoT system collects all of your biometric data automatically, loads your activity data to Strava, alerts you to suspicious bodily and vital sign changes and perhaps even calls the doctor to set up your appointment.

The subscription fees should be low because they provide for efficiencies in the system that benefit both the subscriber and the service provider.  The car dealer orders the parts they need when they need them, reducing inventory, providing faster turnaround of cars, obviating the need for overnight storage of cars and payment for rentals.

Doctors see patients less often and then only when something is truly out of whack.

And on and on.

Certainly the possibility for tiered levels of subscription may make sense for some businesses.  There may be ‘free’ variants that provide limited but still useful information to the subscriber but at the cost of sharing their data for broader community analysis. Paid subscribers who share their data for use in broader community analysis may get reduced subscription rates. There are obvious many possible subscription models to investigate.

These described industry capabilities and direction facilitated by the Internet of Things are either pollyannaish or visionary.  It’s up to us to find out. But for now, what do you think?

Tags: , , , , , , , , , , ,

BigDataBigBuildingsThere is a huge focus on big data nowadays. Driven by ever decreasing prices and ever increasing capacity of data storage solutions, big data provides magical insights and new windows into the exploitation of the long tail and addressing micro markets and their needs.  Big data can be used to build, test and validate models and ideas Big data holds promise akin to a panacea.  It is being pushed as a universal solution to all ills.  But if you look carefully and analyze correctly what big data ultimately provides is what Marshall MacLuhan described as an accurate prediction of the present.  Big data helps us understand how we got to where we are today. It tells us what people want or need or do within a framework as it exists today.  It is bounded by today’s (and the past’s) possibilities and ideas.

But big data does not identify the next seismic innovation.  It does not necessarily even identify how to modify the current big thing to make it incrementally better

In the October 2013 issue of IEEE Spectrum, an article described the work of a company named Lex Machina. The company is a classic big data play.  They collect, scan and analyze all legal proceedings associated with patent litigation and draw up statistics identifying, for instance, the companies who are more likely to settle, law firms that are more likely to win, judges who are more favorable to defendants or the prosecution, duration and cost assessments of prosecutions in different areas.  So it is a useful tool.  But all it does is tell you about the state of things now.  It does not measure variables like outcomes of prosecution or settlements (for instance, if a company wins but goes out of business or wins and goes on to build a more dominant market share or wins and nothing happens).  It does not indicate if companies protect only specific patents that have, say, an estimated future value of, say, $X million or what metric companies might use in their internal decision making process because that is likely not visible in the data.

Marissa Meyer, the hyper-analyzed and hyper-reported-on CEO of Yahoo!, famously tests all decisions based on data.  Whether it is the shade of purple for the new Yahoo! logo, the purchase price of the next acquisition or value of any specific employee – it’s all about measurables.

But how can you measure the immeasurable?  If something truly revolutionary is developed, how can big data help you decide if it’s worth it? How even can little data help you?  How can people know what they like until they have it? If I told you that I would provide you with a service that lets you broadcast your thoughts to anyone who cares to subscribe to them, you’d probably say.  “Sounds stupid. Why would I do that and who would care what I think?”  If I then told you that I forgot one important aspect of the idea, that every shared thought is limited to 140 characters, you would have likely said, “Well, now I KNOW it’s stupid!”.  Alas, I just described Twitter.  An idea that turned into a company that is, as of this writing, trading on the NYSE for just over $42 per share with a market capitalization of about $25 billion.

Will a strong reliance on big data lead us incrementally into a big corner?  Will all this fishing about in massive data sets for patterns and correlations merely reveal the complete works of Shakespeare in big enough data sets? Is Big Data just another variant of the Infinite Monkey Theorem? Will we get the to point that with so much data to analyze we merely prove whatever it is we are looking for?

Already we are seeing that Google Flu Trends is looking for instances of the flu and finds them where they aren’t or in higher frequencies than they actually are.  In that manner, big data fails even to accurately predict the present.

It is only now that some of the issues with ‘big data’ are being considered.  For instance, even when you have a lot of data – if it is bad or incomplete, you still have garbage only just a lot more of it (that is where wearable devices, cell phones and other sophisticated but merely thinly veiled data accumulation appliances come into play – to help improve the data quality by making it more complete).  Then the data itself is only as good as the analysis you can execute on it.  The failings of Google Flu Trends are often attributed to bad search terms in the analysis but of course, there could be many other different reasons.

Maybe, in the end, big data is just big hubris.  It lulls us into a false sense of security, promising knowledge and wisdom based on getting enough data but in the end all we learn is where we are right now and its predictive powers are, at best, based merely on what we want the future to be and, at worst, are non-existent.

Tags: , , , , , ,

next-big-thing1There is a great imbalance in the vast internet marketplace that has yet to be addressed and is quite ripe for the picking. In fact, this imbalance is probably at the root of the astronomical stock market valuations of existing and new companies like Google, facebook, Twitter and their ilk.

It turns out that your data is valuable.  Very valuable.  And it also turns out that you are basically giving it away.  You are giving it away – not quite for free but pretty close.  What you are getting in return is personalization. You get advertisements targeted at you providing you with products you don’t need but are likely to find quite iresistable.  You get recommendations for other sites that ensure that you need never venture outside the bounds of your existing likes and dislikes. You get matched up with companies that provide services that you might or might not need but definitely will think are valuable.

Ultimately, you are giving up your data so businesses can more efficiently extract more money from you.

If you are going to get exploited in this manner, it’s time to make that exploitation a two way street. Newspapers, for instance, are rapidly arriving at the conclusion that there is actual monetary value in the information that they provide.  They are seeing that the provision of vetted, verified, thougful and well-written information is intrinsicly worth more than nothing.  They have decided that simply giving this valuable commodity away for free is giving up the keys to the kingdom.  The Wall Street Journal, the New York Times, The Economist and others are seeing that people are willing to pay and do actually subscribe.

There is a lesson in this for you – as a person. There is value in your data.  Your mobile movements, your surf trail, your shopping preferences  It  should not be the case that you implicitly surrender this information for better personalization or even a $5 Starbucks gift card.  This constant flow of data from you, your actions, movements and keystrokes ought to result in a constant flow of money to you.  When you think about it, why isn’t the ultimate personal data collection engine, Google Glass, given away for free? Because people don’t realize that personal data collection is its primary function.  Clearly, the time has come for the realization of a personal paywall.

The idea is simple, if an entity wants your information they pay you for it.  Directly.  They don’t go to Google or facebook and buy it – they open up an account with you and pay you directly.  At a rate that you set.  Then that business can decide if you are worth what you think you are or not.  You can adjust your fee up or down anytime and you can be dropped or picked up by followers. You could provide discount tokens or free passes for friends.  You could charge per click, hour, day, month or year.  You might charge more for your mobile movements and less for your internet browsing trail.  The data you share comes with an audit trail that ensures that if the information is passed on to others without your consent you will be able to take action – maybe even delete it – wherever it is.  Maybe your data lives for only a few days or months or years – like a contract or a note – and then disappears.

Of course, you will have to do the due diligence to ensure you are selling your information to a legitimate organization and not a Nigerian prince.  This, in turn, may result in the creation of a new class of service providers who vet these information buyers.

This data reselling capability would also provide additional income to individuals.  It would not a living wage to compensate for having lost a job but it would be some compensation for participating in facebook or LinkedIn or a sort of kickback for buying something at Amazon and then allowing them to target you as a consumer more effectively. It would effectively reward you for contributing the information that drives the profits of these organizations and recognize the value that you add to the system.

The implementation is challenging and would require encapsulating data in packets over which you exert some control.  An architectural model similar to bitcoin with a central table indicating where every bit of your data is at any time would be valuable and necessary. Use of the personal paywall would likely require that you include an application on your phone or use a customized browser that releases your information only to your paid-up clients. In addition, some sort of easy, frictionless mechanism through which companies or organizations could automatically decide to buy your information and perhaps negotiate (again automatically) with your paywall for a rate that suits both of you would make use of the personal paywall invisible and easy. Again this technology would have to screen out fraudulent entities and not even bother negotiating with them.

There is much more to this approach to consider and many more challenges to overcome.  I think, though, that this is an idea that could change the internet landscape and make it more equitable and ensure the true value of the internet is realized and shared by all its participants and users.

Tags: , , , , , , , , , , , ,
Back to top