Formidable Engineering Consultants

Archive for 'Data'

The Quiet Evil of Big Data & Recommendation Engines

Friday, December 30th, 2016 | Big Data, Data, Databases, Innovation, Recommendation Engines, Transformative, Web (X+1).0 | No Comments

I have noticed that as people age, they become finer and finer versions of themselves. Their eccentricities become sharper and more pronounced; their opinions and ideas more pointed and immutable; their thoughts more focussed. In short, I like to say that they become more perfect versions of themselves. We see it in our friends and acquaintances and in our parents and grandparents. It seems a part of natural human development.

Back in 2006, Netflix initiated the Netflix Prize with the intent of encouraging development of improvements in the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences and rewarding the winner with $1,000,000. Contestants were given access to a set of Netflix’s end-users’ movie ratings and were challenged to provide recommendations of other movies to watch that bested Netflix’s own recommendation engine. BellKor’s Pragmatic Chaos team was announced as the winner in 2009 having manage to improve Netflix’s recommendations by 10% and walked off with the prize money.

What did they do? Basically, they algorithmically determined and identified movies that were exceptionally similar to the ones that were already liked by a specific user and offered those movies as recommended viewing. And they did it really well.

In essence what the Bellkor team did was build a better echo chamber. Every viewer is analyzed, their taste detailed and then the algorithm perfects that taste and hones it to a razor sharp edge. You become, say, an expert in light romantic comedies with a strong female lead, who lives in a spacious apartment in Manhattan, includes many dog owners, no visible children and often features panoramic views of Central Park.

Of course, therein lies the rub. A multifaceted rub at that. As recommendation engines become more accurate and discerning of individual tastes they remove any element of chance, randomness or error that might serve to introduce new experiences, genres or even products into you life. You become a more perfect version of you. But in that perfection you are also stunted. You are shielded from experimentation and breadth of experience. You pick a single pond and overfish it.

There are many reasons why this is bad and we see it reflected, most obviously, in our political discourse where our interactions with opposing viewpoints are limited to exchanges of taunts (as opposed to conversations) followed by a quick retreat to the comfort of our well-constructed echo chambers of choice where our already perfected views are nurtured and reinforced.

But it also has other ramifications. If we come to know what people like to such a degree then innovation outside safe and well-known boundaries might be discouraged. If Netflix knows that 90% of its subscribers like action/adventure films with a male hero and lots of explosions why would they bother investing in a story about a broken family being held together by a sullen beekeeper. If retail recommendations hew toward what you are most likely to buy – how can markets of unrelated products be expanded? How can individual tastes be extended and deepened?

Extending that – why would anyone risk investment in or development of something new and radically different if the recommendation engine models cannot justify it. How can the leap be made from Zero to One – as Peter Theil described – in a society, market or investment environment in which the recommendation data is not present and does not justify it?

There are a number of possible answers. One might be that “gut instincts” need to continue to play a role in innovation and development and investment and that risk aversion has no place in making the giant leaps that technology builds upon and needs in order to thrive.

A more geeky answer is that big data isn’t yet big enough and that recommendation engines aren’t yet smart enough. A good recommendation engine will not just reinforce your prejudicial tastes, it will also often challenge and extend them and that we don’t yet have the modelling right to do that effectively. The data are there but we don’t yet know how to mine it correctly to broaden rather than narrow our horizons. This broadening – when properly implemented – will widen markets and opportunities and increase revenue.

Tags: advertising, Collaboration, ideas, reinvent, revolutionary, risk, Usability, Workforce Flexibility, world wide web

Log Rolling In Our Times (Part 2)

Tuesday, October 7th, 2014 | Big Data, Book Reviews, Data, Innovation, Programming, Programming Languages, Software, Software Tips & Tricks | No Comments

This is the second installment in my irregular series of book reviews for O’Reilly Media. In the interests of full disclosure, I received this ebook for free in exchange for this review. I get to keep it even if I hate it and they will publish this review on their web site even if I trash the volume completely.

The book under the microscope this time is “The R Cookbook” by Paul Teetor. For those of you unfamiliar, R is a powerful, free, open source programming language and environment used for statistical programming and analysis. It features a rich graphical display language to assist in data visualization. You can think of it as a scripting language akin to Excel Spreadsheets or a variant of MATLAB focused on statistics. The language includes a full suite of community-developed, sector-specific libraries that provide re-usable functions typical of industry needs. These libraries indicate the areas in which R has found popularity.This includes the worlds of finance, genomics, statistics and data science.

The R Cookbook describes itself as a book for the user who is somewhat familiar with R but needs easy access to useful techniques and common R program building blocks. The book is arranged as a series of recipes. Each recipe describes a problem that you might trying to solve and then a solution or possible solutions to resolve the issue. For instance “You want a basic statistical summary of your data” is described as the problem to solve and then the text provides you with at least one approach to providing a solution to that problem.

The structure of the book is such that it begins with simpler recipes and builds its way up to more complex ones. In fact, because of this structure, I would recommend this book as a great tool to learning R for the novice despite the book’s self-identification of that being its incorrect use. The rational behind this recommendation is that the beginning recipes are tasks like “How do I install R?” and similar novice tasks. It then builds slowly from there into a use cases and scenarios of increasing complexity and utility.

The book includes great examples that illustrate the power of R in doing data transformation, probability and statistical analysis. It also shows how you can use R to provide meaningful graphical representations of your results. The chapter on ‘Useful Tricks’ is what seals the deal for me providing 19 great pointers to allow you to improve your R analyses.

Tags: big data, new, Reviews, revolutionary, Software

The Internet of Things Thing

Tuesday, July 1st, 2014 | Data, Databases, Disruptive Technology, Gadgets, Innovation, Mobile, Mobile Applications, Security, Systems On A Chip, Web (X+1).0 | No Comments

The hoi polloi are running fast towards the banner marked “Internet of Things“. They are running at full speed chanting “I-o-T, I-o-T, I-o-T” all along the way. But for the most part, they are each running towards something different. For some, it is a network of sensors; for others, it is a network of processors; for still others, it is a previously unconnected and unnetworked embedded system but now connected and attached to a network; some say it is any of those things connected to the cloud; and there are those who say it is simply renaming whatever they already have and including the descriptive marketing label “IoT” or “Internet of Things” on the box.

So what is it? Why the excitement? And what can it do?

At its simplest, the Internet of Things is a collections of endpoints of some sort each of which has a sensor or a number of sensors, a processor, some memory and some sort of wireless connectivity. The endpoints are then connected to a server – where “server” is defined in the broadest possible sense. It could be a phone, a tablet, a laptop or desktop, a remote server farm or some combination of all of those (say, a phone that then talks to a server farm). Along the transmission path, data collected from the sensors goes through increasingly higher levels of analysis and processing. For instance, at the endpoint itself raw data may be displayed or averaged or corrected and then delivered to the server and then stored in the cloud. Once in the cloud, data can be analyzed historically, compared with other similarly collected data, correlated to other related data or even unrelated data in an attempt to search for unexpected or heretofore unseen correlations. Fully processed data can then be delivered back to the user in some meaningful way. Perhaps the processed data could be displayed as trend display or as a prescriptive suite of actions or recommendations. And, of course, the fully analyzed data and its correlations could also be sold or otherwise used to target advertising or product or service recommendations.

There is a further enhancement to this collection of endpoints and associated data analysis processes described in my basic IoT system. The ‘things’ on this Internet of Things could also use to the data it collects to improve itself. This could include identifying missing data elements or sensor readings, bad timing assumptions or other ways to improve the capabilities of the overall system. If the endpoints are reconfigurable either through programmable logic (like Field Programmable Gate Arrays) or through software updates then new hardware or software images could be distributed with enhancements (or, dare I say, bug fixes) throughout the system to provide it with new functionality. This makes the IoT system both evolutionary and field upgradeable. It extends the deployment lifetime of the device and could potentially extend the time in market at both the beginning and the end of the product life cycle. You could get to market earlier with limited functionality, introduce new features and enhancement post deployment and continue to add innovations when the product might ordinarily have been obsoleted.

Having defined an ideal IoT system, the question becomes how does one turn it into a business? The value of these IoT applications are based on the collection of data over time and the processing and interpretation (mining) of said data. As more data are collected over time the value of the analysis increases (but likely asymptotically approaching some maximal value). The data analysis could include information like:

Your triathlon training plan is on track, you ought to taper the swim a bit and increase the running volume to 18 miles per week.
The drive shaft on your car will fail in the next 1 to 6 weeks – how about I order one for you and set up an appointment at the dealership?
If you keep eating the kind of food you have for the past 4 days, you will gain 15 pounds by Friday.

The above sample analysis is obviously from a variety of different products or systems but the idea is that by mining collected and historical data from you, and maybe even people ‘like’ you, certain conclusions may be drawn.

Since the analysis is continuous and the feedback unsynchronized to any specific event or time, the fees for these services would have to be subscription-based. A small charge every month would deliver the analysis and prescriptive suggestions as and when needed.

This would suggest that when you a buy a car instead of an extended service contract that you pay for as a lump sum upfront, you pay, say, $5 per month and the IoT system is enabled on your car and your car will schedule service with a complete list of required parts and tasks exactly when and as needed.

Similarly in the health services sector, your IoT system collects all of your biometric data automatically, loads your activity data to Strava, alerts you to suspicious bodily and vital sign changes and perhaps even calls the doctor to set up your appointment.

The subscription fees should be low because they provide for efficiencies in the system that benefit both the subscriber and the service provider. The car dealer orders the parts they need when they need them, reducing inventory, providing faster turnaround of cars, obviating the need for overnight storage of cars and payment for rentals.

Doctors see patients less often and then only when something is truly out of whack.

And on and on.

Certainly the possibility for tiered levels of subscription may make sense for some businesses. There may be ‘free’ variants that provide limited but still useful information to the subscriber but at the cost of sharing their data for broader community analysis. Paid subscribers who share their data for use in broader community analysis may get reduced subscription rates. There are obvious many possible subscription models to investigate.

These described industry capabilities and direction facilitated by the Internet of Things are either pollyannaish or visionary. It’s up to us to find out. But for now, what do you think?

Tags: big data, cloud, hardware, ideas, internet of things, IoT, media, mobility, new, revolutionary, Software, world wide web

Big Data – Little Innovation

Saturday, April 5th, 2014 | Big Data, Data, Databases, Disruptive Technology, Innovation, Software, Start-Ups, Storage | No Comments

There is a huge focus on big data nowadays. Driven by ever decreasing prices and ever increasing capacity of data storage solutions, big data provides magical insights and new windows into the exploitation of the long tail and addressing micro markets and their needs. Big data can be used to build, test and validate models and ideas Big data holds promise akin to a panacea. It is being pushed as a universal solution to all ills. But if you look carefully and analyze correctly what big data ultimately provides is what Marshall MacLuhan described as an accurate prediction of the present. Big data helps us understand how we got to where we are today. It tells us what people want or need or do within a framework as it exists today. It is bounded by today’s (and the past’s) possibilities and ideas.

But big data does not identify the next seismic innovation. It does not necessarily even identify how to modify the current big thing to make it incrementally better

In the October 2013 issue of IEEE Spectrum, an article described the work of a company named Lex Machina. The company is a classic big data play. They collect, scan and analyze all legal proceedings associated with patent litigation and draw up statistics identifying, for instance, the companies who are more likely to settle, law firms that are more likely to win, judges who are more favorable to defendants or the prosecution, duration and cost assessments of prosecutions in different areas. So it is a useful tool. But all it does is tell you about the state of things now. It does not measure variables like outcomes of prosecution or settlements (for instance, if a company wins but goes out of business or wins and goes on to build a more dominant market share or wins and nothing happens). It does not indicate if companies protect only specific patents that have, say, an estimated future value of, say, $X million or what metric companies might use in their internal decision making process because that is likely not visible in the data.

Marissa Meyer, the hyper-analyzed and hyper-reported-on CEO of Yahoo!, famously tests all decisions based on data. Whether it is the shade of purple for the new Yahoo! logo, the purchase price of the next acquisition or value of any specific employee – it’s all about measurables.

But how can you measure the immeasurable? If something truly revolutionary is developed, how can big data help you decide if it’s worth it? How even can little data help you? How can people know what they like until they have it? If I told you that I would provide you with a service that lets you broadcast your thoughts to anyone who cares to subscribe to them, you’d probably say. “Sounds stupid. Why would I do that and who would care what I think?” If I then told you that I forgot one important aspect of the idea, that every shared thought is limited to 140 characters, you would have likely said, “Well, now I KNOW it’s stupid!”. Alas, I just described Twitter. An idea that turned into a company that is, as of this writing, trading on the NYSE for just over $42 per share with a market capitalization of about $25 billion.

Will a strong reliance on big data lead us incrementally into a big corner? Will all this fishing about in massive data sets for patterns and correlations merely reveal the complete works of Shakespeare in big enough data sets? Is Big Data just another variant of the Infinite Monkey Theorem? Will we get the to point that with so much data to analyze we merely prove whatever it is we are looking for?

Already we are seeing that Google Flu Trends is looking for instances of the flu and finds them where they aren’t or in higher frequencies than they actually are. In that manner, big data fails even to accurately predict the present.

It is only now that some of the issues with ‘big data’ are being considered. For instance, even when you have a lot of data – if it is bad or incomplete, you still have garbage only just a lot more of it (that is where wearable devices, cell phones and other sophisticated but merely thinly veiled data accumulation appliances come into play – to help improve the data quality by making it more complete). Then the data itself is only as good as the analysis you can execute on it. The failings of Google Flu Trends are often attributed to bad search terms in the analysis but of course, there could be many other different reasons.

Maybe, in the end, big data is just big hubris. It lulls us into a false sense of security, promising knowledge and wisdom based on getting enough data but in the end all we learn is where we are right now and its predictive powers are, at best, based merely on what we want the future to be and, at worst, are non-existent.

Tags: big data, ideas, new, reinvent, revolutionary, Software, space

The Next Big Thing in E-Commerce is You

Friday, December 6th, 2013 | Data, Databases, Disruptive Technology, Innovation, Mobile, Mobile Applications, Privacy, Security, Software, Start-Ups, Web (X+1).0 | No Comments

There is a great imbalance in the vast internet marketplace that has yet to be addressed and is quite ripe for the picking. In fact, this imbalance is probably at the root of the astronomical stock market valuations of existing and new companies like Google, facebook, Twitter and their ilk.

It turns out that your data is valuable. Very valuable. And it also turns out that you are basically giving it away. You are giving it away – not quite for free but pretty close. What you are getting in return is personalization. You get advertisements targeted at you providing you with products you don’t need but are likely to find quite iresistable. You get recommendations for other sites that ensure that you need never venture outside the bounds of your existing likes and dislikes. You get matched up with companies that provide services that you might or might not need but definitely will think are valuable.

Ultimately, you are giving up your data so businesses can more efficiently extract more money from you.

If you are going to get exploited in this manner, it’s time to make that exploitation a two way street. Newspapers, for instance, are rapidly arriving at the conclusion that there is actual monetary value in the information that they provide. They are seeing that the provision of vetted, verified, thougful and well-written information is intrinsicly worth more than nothing. They have decided that simply giving this valuable commodity away for free is giving up the keys to the kingdom. The Wall Street Journal, the New York Times, The Economist and others are seeing that people are willing to pay and do actually subscribe.

There is a lesson in this for you – as a person. There is value in your data. Your mobile movements, your surf trail, your shopping preferences It should not be the case that you implicitly surrender this information for better personalization or even a $5 Starbucks gift card. This constant flow of data from you, your actions, movements and keystrokes ought to result in a constant flow of money to you. When you think about it, why isn’t the ultimate personal data collection engine, Google Glass, given away for free? Because people don’t realize that personal data collection is its primary function. Clearly, the time has come for the realization of a personal paywall.

The idea is simple, if an entity wants your information they pay you for it. Directly. They don’t go to Google or facebook and buy it – they open up an account with you and pay you directly. At a rate that you set. Then that business can decide if you are worth what you think you are or not. You can adjust your fee up or down anytime and you can be dropped or picked up by followers. You could provide discount tokens or free passes for friends. You could charge per click, hour, day, month or year. You might charge more for your mobile movements and less for your internet browsing trail. The data you share comes with an audit trail that ensures that if the information is passed on to others without your consent you will be able to take action – maybe even delete it – wherever it is. Maybe your data lives for only a few days or months or years – like a contract or a note – and then disappears.

Of course, you will have to do the due diligence to ensure you are selling your information to a legitimate organization and not a Nigerian prince. This, in turn, may result in the creation of a new class of service providers who vet these information buyers.

This data reselling capability would also provide additional income to individuals. It would not a living wage to compensate for having lost a job but it would be some compensation for participating in facebook or LinkedIn or a sort of kickback for buying something at Amazon and then allowing them to target you as a consumer more effectively. It would effectively reward you for contributing the information that drives the profits of these organizations and recognize the value that you add to the system.

The implementation is challenging and would require encapsulating data in packets over which you exert some control. An architectural model similar to bitcoin with a central table indicating where every bit of your data is at any time would be valuable and necessary. Use of the personal paywall would likely require that you include an application on your phone or use a customized browser that releases your information only to your paid-up clients. In addition, some sort of easy, frictionless mechanism through which companies or organizations could automatically decide to buy your information and perhaps negotiate (again automatically) with your paywall for a rate that suits both of you would make use of the personal paywall invisible and easy. Again this technology would have to screen out fraudulent entities and not even bother negotiating with them.

There is much more to this approach to consider and many more challenges to overcome. I think, though, that this is an idea that could change the internet landscape and make it more equitable and ensure the true value of the internet is realized and shared by all its participants and users.

Tags: advertising, big data, ideas, media, mobility, monetization, new, paywall, revolutionary, risk, security, Software, world wide web

Log Rolling in Our Times (Part 1)

Saturday, November 23rd, 2013 | Book Reviews, Consulting, Data, Databases, Security, Software, Software Tips & Tricks, Web (X+1).0 | No Comments

I admit it. I got a free eBook. I signed up with O’Reilly Media as a reviewer. The terms and conditions of this position were that when I get an eBook, I agree to write a review of it. Doesn’t matter if the review is good or bad (so I guess, technically, this is NOT log rolling). I just need to write a review. And if I post the review, I get to choose another eBook to review. And so on. So, here it is. The first in what will likely be an irregular series. My review.

The book under review is “The Basics of Web Hacking” subtitled “Tools and Techniques to Attack the Web” by Josh Pauli. The book was published in June, 2013 so it is fairly recent. Alas, recent in calendar time is actually not quite that recent in Internet time – but more on this later.

First, a quick overview. The book provides an survey of hacking tools of the sort that might be used for either the good of mankind (to test and detect security issues in a website and application installation) or for the destruction of man and the furtherance of evil (to identify and exploit security issues in a website and application installation). The book includes a several page disclaimer advising against the latter behavior suggesting that the eventual outcomes of such a path may not be pleasant. I would say that the disclaimer section is written thoughtfully with the expectation that readers would take seriously its warnings.

For the purposes of practice, the book introduces the Damn Vulnerable Web Application (DVWA). This poorly-designed-on-purpose web application allows you to use available tools and techniques to see exactly how vulnerabilities are detected and exploits deployed. While the book describes utilizing an earlier version of the application, figuring out how to install and use the newer version that is now available is a helpful and none-too-difficult experience as well.

Using DVWA as a test bed, the book walks you through jargon and then techniques and then practical exercises in the world of hacking. Coverage of scanning, exploitation, vulnerability assessment and attacks suited to each vulnerability including a decent overview of the vast array of available tools to facilitate these actions. The number of widely available very well built applications with easy-to-use interfaces is overwhelming and quite frankly quite scary. Additionally, a plethora of web sites provide a repository of information regarding already known to be vulnerable web sites and how they are vulnerable (in many cases these sites remain vulnerable despite the fact that they have been notified)

The book covers usage of applications such as Burp Suite, Metasploit, nmap, nessus, nikto and The Social Engineer Toolkit. Of course, you could simply download these applications and try them out but the book marches through a variety of useful hands-on experiments that exhibit typical real-life usage scenarios. The book also describes how the various applications can be used in combination with each other which can make investigation and exploitation easier.

In the final chapter, the book describes design methods and application development rules that can either correct or minimize most vulnerabilities as well as providing a relatively complete list of “for further study” items that includes books, groups, conferences and web sites.

All in all, this book provides a valuable primer and introduction to detecting and correcting vulnerabilities in web applications. Since the book is not that old, changes to applications are slight enough that figuring out what the changes are and how to do what the book is describing is a great learning experience rather than simply an exercise in frustration. These slight detours actually serve to increase your understanding of the application.

I say 4.5 stars out of 5 (docked a star because these subject areas tend to get out-of-date too quickly but if you read it NOW you are set to grow with the field)

See you at DEFCON!

Tags: ideas, linux, new, revolutionary, risk, security, Software, Usability, windows, world wide web

How Do I Know What You Know?

Friday, June 21st, 2013 | Consulting, Data, Innovation, Software, Start-Ups | No Comments

I’ve had occasion to be interviewed for positions at a variety of technology companies. Sometimes the position actually exists, other times it might exist and even other times, the folks are just fishing for solutions to their problems and hope to save a little from their consulting budget. In all cases, the goal of the interview is primarily to find out what you know and how well you know it in a 30 to 45 minute conversation. It is interesting to see how some go about doing it. My experience has been that an interview really tells you nothing but does give you a sense of whether the person is nice enough to “work well with others“.

But now, finally folks at Google used big data to figure out something that has been patently obvious to anyone who has either interviewed for a job or was interviewing someone for a job. The article published in the New York Time details a talk with Mr. Laszlo Bock, senior vice president of people operations at Google. In it, he shared that puzzle questions don’t tell you anything about anyone. I maintain that they tell you if someone has heard that particular puzzle question before. In the published interview Mr. Bock, less charitably, suggests that it merely serves to puff up the ego of the interviewer.

I think it’s only a matter of time before big data is used again to figure out another obvious fact – that even asking simple or complex programming questions serves as no indicator of on-the-job success. Especially now in the age of Google and open-source software. Let’s say you want to write some code to sort a string of arbitrary letters and determine the computational complexity, a few quick Google searches and presto – you have the solution. You need to understand the question and the nature of the problem but the solution itself has merely become a matter of copying from your betters and equals who shared their ideas on the Internet. Of course, such questions are always made more useless when the caveat is added – “without using the built-in sort function” – which is, of course, the way you actually solve it in real life.

Another issue I see is the concern about experience with a specific programming language. I recall that the good people at Apple are particularly fond of Objective C to the point where they believe that unless you have had years of direct experience with it, you could never use it to program effectively. Of course, this position is insulting to both any competent programmer and the Objective C language. The variations between these algorithmic control flow languages are sometimes subtle, usually stylistic but always easily understood. This is true of any programming language. In reality, if you are competent at any one, you should easily be able to master any another. For instance, Python uses indentation but C uses curly braces to delineate code blocks. Certainly there are other differences but give any competent developer a few days and they can figure it out leveraging their existing knowledge.

But that still leaves the hard question. How do you determine competency? I don’t think you can figure it out in a 45 minute interview – or a 45 hour one for that matter – if the problems and work conditions are artificial. I think the first interview should be primarily behavioral and focus on fit and then, if that looks good, the hiring entity should then pay you to come in and work for a week solving an actual problem working with the team that would be yours. This makes sense in today’s world of limited, at-will employment where everyone is really just a contractor waiting to be let go. So, in this approach, everyone gets to see how you fit in with the team, how productive you can be, how quickly you can come up to speed on a basic issue and how you actually work a problem to a solution in the true environment. This is very different from establishing that you can minimize the number of trips a farmer takes across a river with five foxes, three hens, six bag of lentils, a sewing machine and a trapeze.

I encourage you to share some of your ideas for improving the interview process.

Tags: consultant, contractor, ideas, new, Software, test, Workforce Flexibility

The Security Enigma

Friday, May 10th, 2013 | Data, Databases, Privacy, Security, Software, Web (X+1).0 | No Comments

A spate of recent articles describes the proliferation of back doors in systems. There are so many such back doors in so many systems, it claims, that the idea of a completely secure and invulnerable system is, at best, a fallacy. These back doors may be as result of the system software or even designed into the hardware. Some back doors are designed in to the systems to facilitate remote update, diagnosis, debug and the like – usually never with the intention of being a security hole. Some are inserted with subterfuge and espionage in mind by foreign-controlled entities keen on gaining access to otherwise secure systems. Some may serve both purposes, as well. And some, are just design or specification errors. This suggests that once you connect a system to a network, some one, some how will be able to access. As if to provide an extreme example, a recent break-in at the United States Chamber of Commerce was traced to an internet-connected thermostat.

That’s hardware. What about software? Despite the abundance of anti-virus software and firewalls, a little social engineering is all you really need to get through to any system. I have written previously about the experiment in which USB memory sticks seeded in a parking lot were inserted in corporate laptops by more than half of employees who found them without any prompting. Email written as if sent from a superior is often utilized to get employees to open attached infected applications that install themselves and open a hole in a firewall for external communications and control.

The problem is actually designed in. The Internet was built for sharing. The sharing was originally limited to trusted sources. A network of academics. The idea that someone would try to do something awful to you – except as some sort of prank – was inconceivable.

That was then.

Now we are in a place where the Internet is omnipresent. It is used for sharing and viewing cat videos and for financial transactions. It is used for the transmission of top secret information and buying cheese. It connected to servers containing huge volumes of sensitive and personal customer data: social security numbers, bank account numbers, credit card numbers, addresses, health information, etc. And now, not a day goes by without reports of another breach. Sometimes attributed to Anonymous, the Chinese, organized crime or kids with more time than sense, these break-ins are relentless and everyone is susceptible

So what to do?

There is a story, perhaps apocryphal, that, at the height of the cold war, when the United States captured a Soviet fighter jet and were examining it, they discovered that there was no solid state electronics in it. The entire jet was designed using vacuum tubes. That set the investigators thinking. Were the Soviets merely backward or did they design using tubes to guard against EMP attacks?

Backward to the future?

Are we headed to a place where the most secure organizations will go offline. They will revert to paper documents, file folders and heavy cabinets stored in underground vaults? Of course such systems are not completely secure, as no system actually is. On the other hand, a break in requires physical presence, carting away tons of documents requires physical strength and effort. Paper is a material object that cannot be easily spirited away as a stream of electrons. Maybe that’s the solution. But what of all the information infrastructure built up for convenience, cost effectiveness, space savings and general efficiency? Do organizations spend more money going back to paper, staples, binders and hanging folders? And then purchase vast secure spaces to stow these materials?

Will there instead a technological fix in designing a parallel Internet infrastructure from the ground up redesigned so that it incorporates authentication, encryption and verifiable sender identification? Then all secure transactions and information could move to that newer, safer Internet? Is that newer, safer Internet just a .secure domain? Won’t that just be a bigger, better and more value laden target for evil-doers? And what about back-doors – even in a secure infrastructure, an open door or even a door with a breakable window ruins even the finest advanced security infrastructure. And, of course, there is always social engineering of people that provides access more easily that any other technique. Or spies. Or people thinking they are “doing good”.

The real solution may not yet even be defined or known. Is it Quantum Computing (which is really just a parallel environment of a differently-developed computing infrastructure)? Or is it really nothing – in that there is no solution and we are stuck with tactical solutions? It’s an interesting question but for now, it is clear as it was some 20 years ago when Scott McNeally said it “The future of the Internet is security”.

Tags: ideas, reinvent, revolutionary, security, Software, world wide web

Tell Me About Yourself

Saturday, July 16th, 2011 | Data, Databases, Privacy, Software, Storage, Web (X+1).0 | No Comments

Back at the end of March, I attended O’Reilly‘s Web 2.0 Expo in San Francisco. As usual with the O’Reilly brand of conferences it was a slick, show-bizzy affair. The plenary sessions were fast-paced with generic techno soundtracks, theatrical lighting and spectacular attempts at buzz-generation. Despite their best efforts, the staging seems to overhwelm the Droopy Dog-like presenters who tend to be more at home coding in darkened rooms whilst gorging themselves on Red Bull and cookies. Even the audience seemed to prefer the company of their smartphones or iPads than any actual human interaction with “live tweets” being the preferred method of communication.

In any event, the conference is usually interesting and a few nuggets are typically extracted from the superficial, mostly promotional aspects of the presentations.

What was clear was that every start-up and every business plan was keyed on data collection. Data collection about YOU. The more – the better. The goal was to learn as much about you as possible so as to be able to sell you stuff. Even better – to sell you stuff that was so in tune with your desires that you would be helpless to resist purchasing it.

The trick was – how to get you to cough up that precious data? Some sites just assumed you’d be OK with spending a few days answering questions and volunteering information – apparently just for the sheer joy of it. Others believed that being up-front and admitting that you were going to be sucked into a vortex of unrelenting and irresistable consumption would be reward enough. Still others felt that they ought to offer you some valuable service in return. Most often, this service, oddly enough, was financial planning and retirement saving-based.

The other thing that was interesting (and perhaps obvious) was that data collection is usually pretty easy (at least the basic stuff). Getting details is harder and most folks do expect something in return. And, of course, the hardest part is the data mining to extract the information that would provide the most compelling sales pitch to you.

There are all sorts of ways to build the case around your apparent desires. By finding out where you live or where you are, they can suggest things “like” other things you have already that are nearby. (You sure seem to like Lady Gaga, you know there’s a meat dress shoppe around the corner…) By finding out who your friends are and what they like, they can apply peer-pressure-based recommendations (All of your friends are downloading the new Justin Beiber recording. Why aren’t you?). And by finding out about your family and demographic information they can suggest what you need or ought to be needing soon (You son’s 16th birthday is coming up soon, how about a new car for him?).

Of all the sites and ideas, it seems to me that Intuit‘s Mint is the most interesting. Mint is an on-line financial planning and management site. Sort of like Quicken but online. To “hook” you, their key idea is to offer you the tease of the most valuable analysis with the minimum of initial information. It’s almost like given your email and zip code they’ll draw up a basic profile of you and your lifestyle. Give them a bit more and they’ll make it better. And so you get sucked in but you get value for your data. They do claim to keep the data separate from you but they also do collect demographically filtered data and likely geographically filtered data.

This really isn’t news. facebook understood this years ago when their ill-fated Beacon campaign was launched. This probably would have been better accepted had it been rolled out more sensitively. But it is ultimately where everyone is stampeding right now.

The most interesting thing is that there is already a huge amount of personal data on the web. It is protected because it’s all in different places and not associated. facebook has all of your friends and acquaintances. Amazon and eBay have a lot about what you like and what you buy. Google has what you’re interested in (and if you have an Android phone – where you go). Apple has a lot about where you go and who you talk to and also through your app selection what you like and are interested in. LinkedIn has your professional associations. And, of course, twitter has when you go to the bathroom and what kind of muffins you eat.

Each of these giants is trying to expand their reservoir of data about you. Other giants are trying to figure out how to get a piece of that action (Yahoo!, Microsoft). And yet others, are trying to sell missing bits of information to these players. Credit card companies are making their vast purchasing databases available, specialty retailers are trying to cash in, cell phone service providers are muscling in as well. They each have a little piece of your puzzle to make analysis more accurate.

The expectations is that there will be acceptance of diminishing privacy and some sort of belief that the holders of these vast databases will be benevolent and secure and not require government intervention. Technologically, storage and retrieval will need to be addressed and newer, faster algorithms for analysis will need to be developed.

Looking for a job…or a powerful patent? I say look here.

Tags: ideas, media, new, Software, world wide web