Tag: space

BigDataBigBuildingsThere is a huge focus on big data nowadays. Driven by ever decreasing prices and ever increasing capacity of data storage solutions, big data provides magical insights and new windows into the exploitation of the long tail and addressing micro markets and their needs.  Big data can be used to build, test and validate models and ideas Big data holds promise akin to a panacea.  It is being pushed as a universal solution to all ills.  But if you look carefully and analyze correctly what big data ultimately provides is what Marshall MacLuhan described as an accurate prediction of the present.  Big data helps us understand how we got to where we are today. It tells us what people want or need or do within a framework as it exists today.  It is bounded by today’s (and the past’s) possibilities and ideas.

But big data does not identify the next seismic innovation.  It does not necessarily even identify how to modify the current big thing to make it incrementally better

In the October 2013 issue of IEEE Spectrum, an article described the work of a company named Lex Machina. The company is a classic big data play.  They collect, scan and analyze all legal proceedings associated with patent litigation and draw up statistics identifying, for instance, the companies who are more likely to settle, law firms that are more likely to win, judges who are more favorable to defendants or the prosecution, duration and cost assessments of prosecutions in different areas.  So it is a useful tool.  But all it does is tell you about the state of things now.  It does not measure variables like outcomes of prosecution or settlements (for instance, if a company wins but goes out of business or wins and goes on to build a more dominant market share or wins and nothing happens).  It does not indicate if companies protect only specific patents that have, say, an estimated future value of, say, $X million or what metric companies might use in their internal decision making process because that is likely not visible in the data.

Marissa Meyer, the hyper-analyzed and hyper-reported-on CEO of Yahoo!, famously tests all decisions based on data.  Whether it is the shade of purple for the new Yahoo! logo, the purchase price of the next acquisition or value of any specific employee – it’s all about measurables.

But how can you measure the immeasurable?  If something truly revolutionary is developed, how can big data help you decide if it’s worth it? How even can little data help you?  How can people know what they like until they have it? If I told you that I would provide you with a service that lets you broadcast your thoughts to anyone who cares to subscribe to them, you’d probably say.  “Sounds stupid. Why would I do that and who would care what I think?”  If I then told you that I forgot one important aspect of the idea, that every shared thought is limited to 140 characters, you would have likely said, “Well, now I KNOW it’s stupid!”.  Alas, I just described Twitter.  An idea that turned into a company that is, as of this writing, trading on the NYSE for just over $42 per share with a market capitalization of about $25 billion.

Will a strong reliance on big data lead us incrementally into a big corner?  Will all this fishing about in massive data sets for patterns and correlations merely reveal the complete works of Shakespeare in big enough data sets? Is Big Data just another variant of the Infinite Monkey Theorem? Will we get the to point that with so much data to analyze we merely prove whatever it is we are looking for?

Already we are seeing that Google Flu Trends is looking for instances of the flu and finds them where they aren’t or in higher frequencies than they actually are.  In that manner, big data fails even to accurately predict the present.

It is only now that some of the issues with ‘big data’ are being considered.  For instance, even when you have a lot of data – if it is bad or incomplete, you still have garbage only just a lot more of it (that is where wearable devices, cell phones and other sophisticated but merely thinly veiled data accumulation appliances come into play – to help improve the data quality by making it more complete).  Then the data itself is only as good as the analysis you can execute on it.  The failings of Google Flu Trends are often attributed to bad search terms in the analysis but of course, there could be many other different reasons.

Maybe, in the end, big data is just big hubris.  It lulls us into a false sense of security, promising knowledge and wisdom based on getting enough data but in the end all we learn is where we are right now and its predictive powers are, at best, based merely on what we want the future to be and, at worst, are non-existent.

Tags: , , , , , ,

spaceIn the famous Aardman Animations short film “Creature Comforts“, a variety of zoo animals discuss their lives in the zoo.  A Brazilian Lion speaks at length about the virtue of the great outdoors (cf. a zoo) recalling that in Brazil “We have space“.  While space might be a great thing for Brazilian Lions, it turns out that space is a dangerous and difficult reality in path names for computer applications.

In a recent contract, one portion of the work involved running an existing Windows application under Cygwin. Cygwin, for the uninitiated, is an emulation of the bash shell and most standard Unix commands. It provides this functionality so you can experience Unix under Windows. The Windows application I was working on had been abandoned for several years and customer pressure finally reached a level at which maintenance and updates were required – nay, demanded. Cygwin support was required primarily for internal infrastructure reasons. The infrastructure was a testing framework – primarily comprising bash shell scripts – that ran successfully on Linux (for other applications). My job was to get the Windows application re-animated and running under the shell scripts on Cygwin.

It turns out that the Windows application had a variety of issues with spaces in path names. Actually, it had one big issue – it just didn’t work when the path names had spaces. The shell scripts had a variety of issues with spaces. Well, one big issue – they, too, just didn’t work when the path names had spaces. And it turns out that some applications and operations in Cygwin have issues with spaces, too. Well, that one big issue – they don’t like spaces.

Now by “like”, I mean that when the path name contains spaces then even using ‘\040’ (instead of the space) or quoting the name (e.g., “Documents and Settings”) does not resolve matters and instead merely yields unusual and unhelpful error messages. The behavior was completely unpredictable, as well. For instance, quoting might get you part way through a section of code but then the same quoted name failed when used to call stat. It would then turn out that stat didn’t like spaces in any form (quoted, escaped, whatever…).

Parenthetically, I would note that the space problem is widespread. I was doing some Android work and having an odd an unhelpful error displayed (“invalid command-line parameter”) when trying to run my application on the emulator under Eclipse. It turns out that a space in the path name to the Android SDK was the cause.  Once the space was removed, all was well.

The solution to my problem turned out to be manifold. It involved a mixture of quoting, clever use of cygpath and the Windows API calls GetLongPathName and GetShortPathName.

When assigning and passing variables around in shell scripts, quoting a space-laden path or a variable containing a space-laden path,  the solution was easy. Just remember to use quotes:

THIS=”${THAT}”

Passing command line options that include path names with spaces tended to be more problematic. The argc/argv parsers don’t like spaces.  They don’t like them quoted and don’t like them escaped.  Or maybe the parser likes them but the application doesn’t. In any event, the specific workaround that used was clever manipulation of the path using the cygpath command. The cygpath -w -s command will translate a path name to the Windows version (with the drive letter and a colon at the beginning) and then shortens the name to the old-style 8+3 limited format thereby removing the spaces. An additional trick is that then, if you need the cygwin style path – without spaces – you get the output of the cygpath -w -s and run it through cygpath -u. Then you get a /cygdrive/ style file name with no spaces. There is no other direct path to generating a cygwin Unix style file name without spaces.

These manipulations allow you to get the sort of input you need to the various Windows programs you are using. It is important to note, however, that a Windows GUI application built using standard file browser widgets and the like always passes fully instantiated, space-laden path names. The browser widgets can’t even correctly parse 8+3 names. Some of the system routines, however, don’t like spaces. Then the trick is how do you manipulate the names once within the sphere of the Windows application? Well, there are a number of things to keep in mind, the solutions I propose will not work with cygwin Unix-style names and they will not work with relative path names.

Basically, I used the 2 windows API calls GetLongPathName and GetShortPathName to manipulate the path. I used GetShortPathName to generate the old-style 8+3 format name that removes all the spaces. This ensured that all system calls worked without a hitch. Then, in order, to display messaging that the end-user would recognize, make sure that the long paths are restored by calling GetLongPathName for all externally shared information. I need to emphasize that these Windows API calls do not appear to work with relative path names. They return an empty string as a result. So you need to watch out for that.

Any combination of all these approaches (in whole or in part) may be helpful to you in resolving any space issues you encounter.

Tags: , , , , , , , ,
Back to top