<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>A&#38;L Enterprises Tech Line</title>
	<atom:link href="http://anlenterprises.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://anlenterprises.com</link>
	<description>Andrew Explores Technology with you</description>
	<lastBuildDate>Mon, 10 Jun 2013 18:12:44 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<atom:link rel='hub' href='http://anlenterprises.com/?pushpress=hub'/>
		<item>
		<title>Textual Disambiguation &#8211; Final Thoughts (Part 4)</title>
		<link>http://anlenterprises.com/2013/06/10/textual-disambiguation-final-thoughts-part-4/</link>
		<comments>http://anlenterprises.com/2013/06/10/textual-disambiguation-final-thoughts-part-4/#comments</comments>
		<pubDate>Mon, 10 Jun 2013 18:12:44 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1569</guid>
		<description><![CDATA[In my last post I discussed applying techniques of Textual Disambiguation at the document &#8211; vs. the paragraph or sentence level.  Overall I&#8217;ve covered quite a few techniques that Bill Inmon shared &#8211; hopefully with meaningful examples.  Before I summarize my thoughts and compare to other techniques I wanted to share 2 more examples I [...]]]></description>
				<content:encoded><![CDATA[<p>In my last <a title="Textual  Disambiguation – working at the document level (Part 3)" href="http://anlenterprises.com/2013/06/10/textual-disambiguation-working-at-the-document-level-part-3/">post</a> I discussed applying techniques of Textual Disambiguation at the document &#8211; vs. the paragraph or sentence level.  Overall I&#8217;ve covered quite a few techniques that Bill Inmon shared &#8211; hopefully with meaningful examples.  Before I summarize my thoughts and compare to other techniques I wanted to share 2 more examples I found in my presentation notes (which he didn&#8217;t directly cover):</p>
<ol>
<li>Analyzing Customer Feedback in the Airline Environment
<ul>
<li>The example was fragments of text from customers that provided feedback about their experiences with an airline.</li>
<li>One analysis that can be done is the &#8220;tone&#8221; of the customer (&#8220;I think your airline is the <em>best</em> that I have flown on&#8221;   &#8220;Plane was late. Messed up my schedule. Arline <em>sucks</em>&#8220;</li>
<li>You can extract cities (to and from) as well as names of other airlines.</li>
<li>He had examples in other languages &#8211; specifically Spanish and French.  NOTE: The &#8220;stop&#8221; words concept applies to many latin-based languages.</li>
</ul>
</li>
<li>Automating Extraction of Data from Raw Reports
<ul>
<li>The first step that Bill Inmon recommends is to strip the metadata from the report</li>
<li>This is often labels on the report &#8211; which also exist in a hierarchy that needs to be retained</li>
<li>Based on that he creates a &#8220;Metadata Template&#8221; &#8211; so that the report can be broken down more into a data aspect</li>
<li>Then that &#8220;Metadata Template&#8221; can be applied to the report itself to generate a list of the data for a given piece of metadata (which can repeat).</li>
</ul>
</li>
</ol>
<h3>Summary</h3>
<p>Overall I found the concepts that Bill Inmon presented to be of great value and insight.  He has spent years thinking of this &#8211; as he realized that the next focus would be on the &#8220;unstructured&#8221; data within our enterprises.  I&#8217;m not sure that I agree with his assertion that most vendors aren&#8217;t thinking about this &#8211; but I do believe he has a well thought out process.   The key aspect he continued to communicate was that context is needed in order to perform detailed analysis of free-form text.</p>
<p>His techniques are clearly focused on taking free-form text and resolving it to a traditional relational format.  He does this by adding context to the meaningful elements of the text so that they can fit into that structure. By doing so traditional query logic can be applied and the understanding of the data is much richer (Bob Jones, the pilot, is mentioned 6 times in the reports,  How many times is an animal mentioned, etc.).  I agree with this assertion that there is a lot of potential business value to be found in this unstructured data &#8211; so that focus needs to be spent on it.</p>
<p>He does correctly point out that many of the tools in the Big Data space do not address this challenge &#8211; as they either assume a structure (Hive) or ignore structure (Pig, Map Reduce).  Re-creating structure from that free-form text is a different exercise that requires careful planning and effort.</p>
<h3>Other approaches</h3>
<p>I don&#8217;t agree with his assertion that other vendors are not supplying any solutions to free-form text.  IBM has a whole software solution built around <a href="http://www-01.ibm.com/software/ebusiness/jstart/textanalytics/" target="_blank">text analytics</a>.  They have created a new API language called <a href="http://pic.dhe.ibm.com/infocenter/bigins/v1r2/index.jsp?topic=%2Fcom.ibm.swg.im.infosphere.biginsights.doc%2Fdoc%2Fbiginsights_aqlref_con_aql-overview.html" target="_blank">AQL</a> (Annotated Query Language) to drive their text analytics engine.  While I agree that most vendors (Greenplum, Oracle, IBM, etc.) are focused more on infrastructure than software solutions they are addressing it.  What may confuse the issue is that many Big Data use cases actually use structured/semi-structured data &#8211; not unstructured data.</p>
<p>The other aspect to be considered is that you don&#8217;t necessarily need to deeply analyze and structure free-form text in order to gain valuable insight from it.  A simple case could be sentiment analysis of Tweets in terms of your company.  If I scan through the text of tweets looking for my company name and for each tweet found look for sentiment words (great, terrible, always, never..) that can provide a rough gauge of how a company is doing.  You can do similar things to understand how often a topic is mentioned (even using known taxonomies to increase the quality of that analysis).</p>
<p>For geeks like me we often want a near perfect solution &#8211; one that address all the edge cases.  Many people in business simply don&#8217;t care &#8211; they focus on the core cases.  This is an area where we can ask ourselves &#8211; is it good enough?  Can I, without the deep structure that Textual Disambiguation provides, get my answers without the same level of effort.  I think in many cases we are learning that we can &#8211; that the rough is good enough for our purposes.</p>
<h3>Conclusion</h3>
<p>I think any company wanting to derive value from &#8220;unstructured data&#8221; should carefully consider Bill Inmon&#8217;s approach.  I think many IT professionals (especially &#8220;Data&#8221; people) need to understand this &#8211; as it is a paradigm shift from what we&#8217;re used to dealing with.   He does identify a hole in our tendencies &#8211; as we often assume that our data will be structured in some way (even when it isn&#8217;t).  He has some very intelligent ways of addressing these needs &#8211; including working software.  Overall I was very glad that I attended this talk as it gave me lots to think about and process through.</p>
<p>Finally I dare you to say &#8220;Textual Disambiguation&#8221; 5 times while rubbing your stomach and patting your head.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/06/10/textual-disambiguation-final-thoughts-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Textual  Disambiguation &#8211; working at the document level (Part 3)</title>
		<link>http://anlenterprises.com/2013/06/10/textual-disambiguation-working-at-the-document-level-part-3/</link>
		<comments>http://anlenterprises.com/2013/06/10/textual-disambiguation-working-at-the-document-level-part-3/#comments</comments>
		<pubDate>Mon, 10 Jun 2013 17:28:02 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1557</guid>
		<description><![CDATA[In our last post we went over in detail techniques to add context back into what is other-wise free-form text.  The goal being to add context to data that otherwise has no context.  Another level we can operate at is the document level &#8211; as opposed to just the paragraph or sentence level. There are [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/06/466101_20161383.jpg" rel="lightbox[1557]"><img class="alignright size-thumbnail wp-image-1560" alt="466101_20161383" src="http://anlenterprises.com/wp-content/uploads/2013/06/466101_20161383-150x150.jpg" width="150" height="150" /></a>In our last <a title="Textual Disambiguation – what is it? (Part 2)" href="http://anlenterprises.com/2013/06/08/textual-disambiguation-what-is-it-part-2/">post </a>we went over in detail techniques to add context back into what is other-wise free-form text.  The goal being to add context to data that otherwise has no context.  Another level we can operate at is the document level &#8211; as opposed to just the paragraph or sentence level. There are two document types I&#8217;m going to review: contracts and e-mail filtering.</p>
<h3>Contracts</h3>
<p>Below I have an example of the template of a mortgage &#8211; a very common contract.  A completed contract would have the &#8220;&#8230;.&#8221; filled in with actual data &#8211; such that it blends with the text around it.  Obviously if you knew ahead of time the format of the contract (such as the template below) you could parse out the terms using the words around it. For example between &#8220;this&#8221; and &#8220;day&#8221; would have the day.  Ideally we would store the text of the contract along with the details that are entered in &#8211; for easy usage.</p>
<p>But let&#8217;s take an example of a large volume of varying contracts &#8211; such that we can&#8217;t rely on a specific structure (even for the same lender/agent).  Therefore we need to apply some techniques to break down the text into meaningful data.  One of the first steps we can do is to remove the &#8220;Stop&#8221; Words from the documents. For example:</p>
<pre><del>This</del> mortgage <del>is</del> made <del>this</del> 14th day <del>of</del> October 1998, between <del>the</del> Mortgagor, John William Smith (herin known as the "Borrower") , and <del>the</del> Mortgagee, Countrywide Financial <del>a</del> corporation organized and existing under <del>the</del> laws <del>of</del> California, <del>whose</del> address <del>is</del> 128 W. Absalom Rd, San Diego, California (herein "Lender")</pre>
<p>The next thing we need to do is to begin to find &#8220;delimiters&#8221; of the different terms in the document.  If we had a CSV file we would use the commas to know when a field begins and ends. Similarly we need to identify beginning and ending words that indicate where to find an item of meaning.  If we look at the example above we could say &#8220;between Mortgagor,&#8221; and &#8220;(herein known&#8221; to know the name of the Borrower.  I suspect this may be an iterative process as the documents can vary significantly (so you may either have many different algorithms or very smart ones).</p>
<h4>Mortgage Temple Example</h4>
<pre dir="ltr" data-font-name="Times" data-canvas-width="419.51884462661724">THIS MORTGAGE is made this ........................ day of .............................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="439.78631376285506">19 ....., between the Mortgagor, ................................................................ (herein</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="448.6476940649033">"Borrower"), and the Mortgagee, ...................................................... a corporation</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="358.9706347404481">organized and existing under the laws of ......................................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="450.77284001951205">......................................................................................................, whose address is</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="283.6065359192846">..................................................................................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="455.4044282125472">.......................................................................................................(herein "Lender").</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="3.416633367538452"></pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="451.191035943699">WHEREAS, Borrower is indebted to Lender in the principal sum of ...................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="246.85722741804128">Dollars, which indebtedness is evidenced by</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="241.27991510887153">Borrower's note dated ..................... (herein</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="489.51336244735734">"Note"), providing for monthly installments of principal and interest, with the balance of</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="392.53290763645174">the indebtedness, if not sooner paid, due and payable on .......................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="3.416633367538452"></pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="477.9842748119353">TO SECURE to Lender (a) the repayment of the indebtedness evidenced by the Note,</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="476.70508727912915">with interest thereon, the payment of all other sums, with interest thereon, advanced in</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="488.1685755538943">accordance herewith to protect the security of this Mortgage, and the performance of the</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="477.79294334335367">covenants and agreements of Borrower herein contained, and (b) the repayment of any</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="228.93356877193446">future advances, with interest thereon, ma</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="210.12568541030882">de to Borrower by Lender pursuant to</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="486.96728726186785">paragraph 21 hereof (herein "Future Advances"), Borrower does hereby mortgage, grant</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="494.3171489621162">and convey to Lender, with power of sale, the following described property located in the</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="464.5746721710199">County of .................. ......................................................................... ......... , State of</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="86.24402611675261">Massachusetts:</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="3.416633367538452"></pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="394.182458226299">which has the address of .........................................................................</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="285.54991697874055">[Street] [City]</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="396.3349372478484">.................................................................... (herein "Property Address");</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="213.32092093563082">[State and Zip Code]</pre>
<pre dir="ltr" data-font-name="Times" data-canvas-width="3.416633367538452"></pre>
<h3>E-mail Filtering</h3>
<p>E-mail is a good example where a filtering process may need to be applied first before more advanced Textual Disambiguation (still a hard word to type) process is applied. This is due to the large volume of e-mails &#8211; many of which you don&#8217;t need to be looked at.  Bill Inmon described 3 categories of e-mail:</p>
<ol>
<li>SPAM &#8211; this is externally generated content from outside the organization with little to no value (extraneous information).</li>
<li>Blather &#8211; this is internally generated content (from your users) with little to no business value. Examples would be jokes, personal e-mails, broadcast e-mails, etc.</li>
<li>Business Meaningful &#8211; the content you actually want to look at.</li>
</ol>
<p>One example scenario he had was to apply analysis to current e-mails to detect and prevent potential new liabilities for a company (so they don&#8217;t end up like Enron).  He quoted that in 2012 approximately $65 billion was spent defending lawsuits.  For this to be effective the &#8220;SPAM&#8221; and &#8220;Blather&#8221; needs to be filtered out so that time and energy is spent on e-mails with potential meaning.</p>
<p>The principal process he applied to this problem was building a &#8220;relevance&#8221; taxonomy.  Basically a list of words (using known taxonomies as a guide) used to filter out e-mails that aren&#8217;t of interest (i.e. if they don&#8217;t have one of the words they are filtered out).</p>
<p>The next principle is to use &#8220;words of concern&#8221; to identify more significant e-mails.  These are human generated words that if a human was looking for e-mail they would want to see.    Some examples are:</p>
<ul>
<li>apologize</li>
<li>attorney</li>
<li>risk</li>
<li>ashamed</li>
<li>scandal</li>
</ul>
<p>Additionally the relevance of an e-mail can be further filtered by looking how close these words of concern are to each other (proximity).  Typically a &#8220;proximity boundary&#8221; is selected (in bytes) to determine how close words need to be to increase the relevance of an e-mail.   Once the list of e-mails is generated there are some other techniques to narrow down the e-mails that should be looked at by a person:</p>
<ul>
<li>Use header information &#8211; such as From and To, Dates, etc.</li>
<li>Use the # of words of concern found (i.e. more words means more likely needs to be looked at)</li>
<li>&#8220;Hot&#8221; words &#8211; words that attract the attention of a human when scanning e-mails.  Use these words to increase the relevance of those e-mails</li>
</ul>
<p>In reality this would likely be a very iterative process as the users provide feedback on why an e-mail is more or less relevant to their concerns.  Additionally changes in the business and/or legal environment could shift what needs to be looked for.</p>
<p>This post was about applying techniques to either filter out documents or break them apart in a systematic way.  In my next post I will summarize what I&#8217;ve learned and contrast with other techniques in the industry.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/06/10/textual-disambiguation-working-at-the-document-level-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Textual Disambiguation &#8211; Just a fancy word? (Part 1)</title>
		<link>http://anlenterprises.com/2013/06/10/textual-disambiguation-just-a-fancy-word-part-1/</link>
		<comments>http://anlenterprises.com/2013/06/10/textual-disambiguation-just-a-fancy-word-part-1/#comments</comments>
		<pubDate>Mon, 10 Jun 2013 14:03:07 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1535</guid>
		<description><![CDATA[Recently I had the privilege of attending a presentation by Bill Inmon at the local KC DAMA Day. The focus of his presentation was on gaining business data from unstructured data &#8211; which lies at the heart of much of the &#8220;Big Data&#8221; craze. A common quote in the Big Data realm is that 80% [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/06/1418158_30307462.jpg" rel="lightbox[1535]"><img class="alignright  wp-image-1536" alt="1418158_30307462" src="http://anlenterprises.com/wp-content/uploads/2013/06/1418158_30307462-300x212.jpg" width="210" height="148" /></a>Recently I had the privilege of attending a presentation by <a href="http://en.wikipedia.org/wiki/Bill_Inmon" target="_blank">Bill Inmon</a> at the local <a href="http://www.dama.org/i4a/pages/index.cfm?pageid=3764" target="_blank">KC DAMA Day</a>. The focus of his presentation was on gaining business data from unstructured data &#8211; which lies at the heart of much of the &#8220;Big Data&#8221; craze. A common quote in the Big Data realm is that 80% of the world&#8217;s data is unstructured.</p>
<p>That word &#8211; <a href="http://en.wikipedia.org/wiki/Unstructured_data" target="_blank">unstructured</a> &#8211; is one that bothers me &#8211; as much of the data we&#8217;re talking about does have a structure. Often what we&#8217;re describing with the term &#8220;unstructured&#8221; is the free-form text within that data. Tweets, Facebook messages, e-mails, texts, etc. &#8211; do have a structure. At a minimum they have some metadata wrapped around them (such as e-mail headers). In other cases, such as a Twitter message, there is a lot of potential data beyond just the &#8220;Tweet&#8221; itself. If you look at the <a href="https://gist.github.com/gnip/764239" target="_blank">format</a> of a Twitter message there is information about the user, a time-stamp, reply-to information, and possibly location information.</p>
<p>I think often what we really are saying with the term &#8220;unstructured&#8221; is the free-form text within these data sources. It&#8217;s the text of a document, the body of the e-mail, the &#8220;tweet&#8221; itself, notes and comments. We can&#8217;t use traditional techniques on this data for a variety of reasons &#8211; the most prominent being that most tool rely on a structure.</p>
<p>One of the main points Bill Inmon made was that when processing this &#8220;text&#8221; data is that we lack context. If you think about how we often process data we have some form of context:</p>
<ul>
<li><span style="line-height: 13px;">A relational database relies on a defined schema &#8211; so you know the column name and it&#8217;s characteristics</span></li>
<li>XML data is self describing (it carries it&#8217;s own context) [&lt;myfield&gt;abc&lt;/myfield&gt;]</li>
<li>Flat files are often either delimited or are position based &#8211; which implies a context by an external definition</li>
<li>JSON Data &#8211; like the tweet &#8211; is also self-describing ["coordinates": null,"created_at": "Thu Oct 21 16:02:46 +0000 2010",]</li>
</ul>
<p>Context is vitally important to drawing useful conclusions &#8211; as language is often imprecise. Much of what Google does today is to try to guess what your intent is &#8211; using the context of your previous behavior (try searching for something in <a href="https://support.google.com/chrome/answer/95464?hl=en" target="_blank">Incognito mode</a> in Google Chrome vs normally searching). Amazon, Facebook, Netflix, etc. depend on delivering a solution based on the context of your past behavior in conjunction with other user&#8217;s behavior.</p>
<p>Let&#8217;s take an example right from Bill [he has a website - <a href="http://forestrimtech.com/">http://forestrimtech.com/</a>- where you can learn more - including downloading white papers]: &#8220;She&#8217;s hot&#8221;. OK &#8211; so what does this mean? Does it mean she is very attractive? Does it mean she is running a fever? Does it mean she&#8217;s sweating heavily due to the heat?</p>
<p>So how could we know what &#8220;She&#8217;s hot&#8221; actually means? What if this was mentioned in San Antonio during the heat of the summer? What if it was said by a young male around other young males? What if it was a spoken by a concerned parent at a doctor&#8217;s office?</p>
<p>What we are doing is adding a context around &#8220;She&#8217;s hot&#8221; in order to understand what it means. Apart from that context &#8220;She&#8217;s hot&#8221; doesn&#8217;t have clear meaning and could lead to wrong conclusions easily. [I personally remember an example years ago on a bus, probably in Junior High, where a young girl said "I'm hot". Shortly after she said it she clarified that it was temperature hot - as many people we're thinking of it as she was complementing herself.]</p>
<p>So in one sense we have this jumbled mess of text &#8211; without any organization. But in another sense there is structure to derive from this text &#8211; which provides us the missing context. Documents have some type of structure (especially legal documents like contracts) &#8211; even if it wasn&#8217;t planned that way. Sentences have structure &#8211; anyone remember <a href="http://grammar.ccc.commnet.edu/grammar/diagrams2/one_pager1.htm">diagramming</a> a sentence?</p>
<h3>Why do we care?</h3>
<p>One simple reason we should care is we have this 80% of corporate data that is unstructured &#8211; vs. 20% that is.   Of that 80% unstructured only 1-2% of corporate decisions are based on it (vs 98% on that 20% of structured data).   So this is potentially an untapped reservoir of information to make better decisions.   There are business opportunities &#8211; both internal and external &#8211; lurking in this untapped data resource.  We will have to learn and implement new techniques to utilize this data &#8211; but the clear trend is in that direction.</p>
<p>In my next post we will look in more detail about Bill Inmon&#8217;s Textual Disambiguation concept itself (which he has commercialized as <a href="http://forestrimtech.com/index.php?option=com_content&amp;view=article&amp;id=80&amp;Itemid=482">Textual ETL</a>).  In a later post we&#8217;ll look at applying this at a document level &#8211; instead of just at a sentence/paragraph level.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/06/10/textual-disambiguation-just-a-fancy-word-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Textual Disambiguation &#8211; what is it? (Part 2)</title>
		<link>http://anlenterprises.com/2013/06/08/textual-disambiguation-what-is-it-part-2/</link>
		<comments>http://anlenterprises.com/2013/06/08/textual-disambiguation-what-is-it-part-2/#comments</comments>
		<pubDate>Sat, 08 Jun 2013 19:49:28 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1548</guid>
		<description><![CDATA[In my previous post I introduced the concept of applying context to what is unstructured text. Often when we talk about Big Data and/or unstructured data we are really talking about the free-form text within it. As I previously mentioned I attended a seminar where Bill Inmon taught about &#8220;Textual Disambiguation&#8221; (quite a mouthful). His [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/06/1418158_30307462.jpg" rel="lightbox[1548]"><img class="alignright size-medium wp-image-1536" alt="1418158_30307462" src="http://anlenterprises.com/wp-content/uploads/2013/06/1418158_30307462-300x212.jpg" width="300" height="212" /></a>In my previous <a title="Textual Disambiguation – Just a fancy word? (Part 1)" href="http://anlenterprises.com/2013/06/10/textual-disambiguation-just-a-fancy-word-part-1/">post</a> I introduced the concept of applying context to what is unstructured text. Often when we talk about Big Data and/or unstructured data we are really talking about the free-form text within it.</p>
<p>As I previously mentioned I attended a seminar where Bill Inmon taught about &#8220;Textual Disambiguation&#8221; (quite a mouthful). His concept is that we need to apply context to that text in order to analyze it effectively. He has some specific principles on how to draw that context out &#8211; which he believes are still unique within the industry (he believes that many vendors are ignoring this issue).</p>
<p>So the heart of his process is to process through text in order to discover context &#8211; and therefore derive meaning. His company&#8217;s software (which he didn&#8217;t dwell on in his talk as it was NOT a sales presentation) will start with free-form text and end up with a structured analysis of the words. He was very clear that this is different than NLP (Natural Language Processing), many Text mining techniques, Map Reduce, HIVE, Pig, etc.</p>
<p>I don&#8217;t have a copy of his presentation of this &#8211; but I will share what I learned using my own examples. Here is the list of techniques (probably not a complete list) to add context back into unstructured text:</p>
<ul>
<li>Remove Stop Words</li>
<li>Correct Misspellings</li>
<li>Word Stemming</li>
<li>Standardize Dates</li>
<li>Document metadata</li>
<li>Taxonomy / Ontology</li>
<li>Proximity Analysis</li>
<li>Number Patterns (SSN, Phone Number)</li>
<li>Numeric Value Tagging</li>
<li>Date Naming</li>
<li>Acronyms</li>
</ul>
<p>To make things interesting I&#8217;m going to use some Enron e-mails fragments to demonstrate some of these techniques.  You can download the whole set at <a href="https://www.cs.cmu.edu/~enron/">https://www.cs.cmu.edu/~enron/</a></p>
<h3>Stop Words</h3>
<ul>
<li><span style="line-height: 13px;"><a href="http://en.wikipedia.org/wiki/Stop_words" target="_blank">Stop Words</a> are basically the words that connect the words of meaning.  These are typically words like &#8220;the&#8221;, &#8220;is&#8221;, &#8220;at&#8221;, &#8220;which&#8221;, etc.  </span></li>
<li><span style="line-height: 13px;">While they are part of the language they really don&#8217;t have any value outside of the sentence.</span></li>
<li>Below is a paragraph from an Enron e-mail where I <del>striked</del> through the stop words based on this <a href="http://www.ranks.nl/resources/stopwords.html" target="_blank">list</a> I found.</li>
<li>This technique is also used for Search Engine Indexing &#8211; as these words wouldn&#8217;t help you locate the page you want.</li>
</ul>
<pre><span style="line-height: 13px;">Mr Lay - <del>If</del> you really think <del>that</del> <del>this</del> sale creates <del>a</del> "great opportunity" <del>for</del> shareholders <del>then</del> you <del>are</del> more <del>out</del> <del>of</del> touch <del>with</del> reality <del>than</del> <del>I</del> previously thought (unless you <del>were</del> referring <del>to</del> Dynegy shareholders). Under <del>your</del> "leadership" <del>the</del> shareholders <del>have</del> <del>been</del> devastated, employees <del>have</del> lost <del>their</del> retirements, college funds <del>have</del> <del>been</del> desiminated <del>and</del> reputations <del>have</del> <del>been</del> ruined, including <del>your</del> own. </span><span style="line-height: 13px;"> 
</span></pre>
<h3>Correct Misspellings</h3>
<ul>
<li><span style="line-height: 13px;">A simple technique is to correct misspellings of words in the text</span></li>
<li>By doing so we can then match words together that otherwise we couldn&#8217;t (how many times is x mentioned?)</li>
<li>I suspect this requires some sophisticated logic to perform this correctly &#8211; but I did find this <a href="http://oxforddictionaries.com/words/common-misspellings" target="_blank">list</a> from the Oxford Dictionary</li>
<li>I took a snippet of an e-mail and deliberately misspelled some words as an example</li>
</ul>
<pre>Jeff,</pre>
<pre>We spent quite a bit of time over the past several months discussing a possible minority <strong>investiment</strong> of about $5MM in Silicon Energy. We have broken off those discussions because (1) their proposed pre-money valuation of $150MM was, in our opinion, <strong>excesive</strong>, and (2) our people at EES, who would be the primary users of Silicon Energy, were not happy with Silicon Energy's <strong>functionalty</strong>.</pre>
<h3>Word Stemming</h3>
<ul>
<li>Word Stemming is a concept where we take a word and break it down into it&#8217;s root.</li>
<li>For example - move, mover, moving &#8211; stem of &#8220;mov&#8221;</li>
<li>I found this <a href="http://xapian.org/docs/stemming.html" target="_blank">page</a> that suggests that there are algorithms that can be used to accomplish this.</li>
</ul>
<pre>Maureen Smith and Ruth Concannon have raised some issues regarding how Brooklyn Union Gas is in the <strong>books</strong>, specifically, the final contract year (11/1/03 - 10/31/04) is not in the <strong>books</strong> at all (which will produce a gain when <strong>booked</strong>), and the current deal structure is telescoped incorrectly at Transco Zones 1,3, &amp; 4 when it should be telescoped 17% at Zone 1, 25% at Zone 2, and 58% at Zone 3.  This will produce a slight loss when rebooked, but the gain from <strong>booking</strong> the final year is more than enough to offset the loss.</pre>
<h3>Standardize Dates</h3>
<ul>
<li><span style="line-height: 13px;">Dates come in many formats (05/29/2013, 29/05/2013, May 05, 2013&#8230;)</span></li>
<li>For ease of comparison it is best to standardize them</li>
<li>Below are some sentence fragments out of e-mails with different formats</li>
</ul>
<pre>Date: Thursday, <strong>January 24, 2002</strong></pre>
<pre>at the time of their next board meeting on <strong>February 12, 2001</strong>.</pre>
<pre>Attached is a revised Credit Watch listing as of <strong>4/09/01</strong>. Please note that there are 12 counterparty additions/revisions to the Watchlist for this week.</pre>
<h3>Document Metadata</h3>
<ul>
<li><span style="line-height: 13px;">Many documents have metadata associated with them &#8211; including word documents, e-mails, tweets, etc.</span></li>
<li>This includes dates, metadata about the document, in the case of e-mail the from/to addresses</li>
<li>Here is an example of an e-mail header:</li>
</ul>
<pre><strong>Date: Tue, 16 Oct 2001 14:41:10 -0700 (PDT)</strong>
<strong>From: administration.enron@enron.com</strong>
<strong>To: owa.notification@enron.com</strong>
<strong>Subject: Outlook Web Access for Calgary</strong>
Mime-Version: 1.0
<strong>Content-Type: text/plain</strong>; charset=us-ascii
Content-Transfer-Encoding: 7bit
<strong>X-From: Enron Messaging Administration &lt;/O=ENRON/OU=NA/CN=RECIPIENTS/CN=NOTESADDR/CN=ENRON MESSAGING ADMINISTRATION&gt;</strong>
<strong>X-To: OWA.Notification@enron.com</strong>
X-cc: 
X-bcc:</pre>
<h3>Taxonomy / Ontology</h3>
<ul>
<li><span style="line-height: 15.203125px;">A method to bring meaning between words is to build a taxonomy or ontology</span></li>
<li>This is typically a type of, kind of, etc.   For example, a dogs and cats are types of animals</li>
<li>Therefore these words can be grouped together (even though they are in different parts of a document)</li>
<li>If you look at the below e-mail snippet the words &#8220;jet skis&#8221;, &#8220;boats&#8221;, and &#8220;cataraman&#8221; are all types of boats, which are a type of vehicle</li>
</ul>
<pre>This is the info.-  Kim Hillis is making the reservations (I hope there are rooms available this late)    A couple guys here have stayed there and said it's awesome.  With a great beach, close to town and golf. There's <em><span style="color: #800000;">jet skis</span></em> and <em>boats</em> right there.  I'm up for chartering a a <em>catamaran</em> for snorkeling and cruisin' all day.</pre>
<h3>Proximity Analysis</h3>
<ul>
<li><span style="line-height: 13px;">Another technique is to look at how close in proximity words are (i.e. are they within a few words or paragraphs down)</span></li>
<li>This is known <a href="http://en.wikipedia.org/wiki/Proximity_search_(text)" target="_blank">process</a> &#8211; used in search engines to some degree</li>
<li>There is no guarantee this will add any value-  but could be useful in some cases</li>
<li>Below is a pretty weak example as I struggled finding a good example (I don&#8217;t have good notes on Bill Inmon&#8217;s example).  Because snow is near forecast it can create a connection to quality forecasts:</li>
</ul>
<pre> Cooper-- I have been looking at the chassis for the model we need to develop. it looks like we can expand (piggy back) the model AE uses for the <strong>Snow &amp; water</strong> supply <strong>forecasts</strong>. They only have reports for March thru to August i.e. the period of the bulk of the runoff from snow and rain. We would have to add the remainder of the year as well as the power plants. Will fax ou a sketch of my proposed basic layout.</pre>
<h3>Number Patterns (SSN, Phone Number)</h3>
<ul>
<li>There are many common number patterns that can be identified &#8211; given how we typically format them.</li>
<li>Phone Number and SSN are probably some of the most common examples &#8211; but each use case may have it&#8217;s own format(s)</li>
</ul>
<pre>Pursuant to your e-mail to Jeff Skilling dated February 20, 2001, Mr. Skilling suggests that you contact Mr. Jim Fallon regarding CAIS Internet Inc. Mr. Fallon is managing director of trading at Enron Broadband Services. He can be reached at <strong>713.853.3354</strong>.</pre>
<pre>EEI member utilities wishing to have access should contact Lynn Hailes at: lhailes@eei.org or <strong>202/508-5624</strong>.</pre>
<h3>Numeric Value Tagging</h3>
<ul>
<li><span style="line-height: 13px;">Numeric values in a document without context don&#8217;t provide a lot of value</span></li>
<li>Often times though there is text adjacent to the number that indicates what type of number it is.</li>
</ul>
<pre>3. Book administrator rolls showing the following new deal P&amp;L for the following new deals rolling up to the Executive DPR:</pre>
<pre><em>Deal</em> <strong>#559092.1</strong> on 3/23/01
<em>Deal</em> <strong>#514509.1</strong> on 2/6/01
<em>Deal</em> <strong>#568025.1</strong> on 4/2/01</pre>
<h3>Date Naming</h3>
<ul>
<li><span style="line-height: 13px;">While we can standardize dates easily without any context they still don&#8217;t mean anything.</span></li>
<li>We want to look for nearby words to provide that context &#8211; the name &#8211; of a date</li>
<li>Below we have a date of Feburary 12, 2001.  There are two nearby sets of words that can provide context:
<ul>
<li>effective ==&gt; Effective Date of February 12, 2001</li>
<li>board meeting ==&gt; Board Meeting Date of February 12, 2001</li>
</ul>
</li>
</ul>
<pre>It is my great pleasure to announce that the Board has accepted my recommendation to appoint Jeff Skilling as chief executive officer, <em>effective</em> at the time of their next <em>board meeting</em> on <strong>February 12, 2001</strong>. Jeff will also retain his duties as president and chief operating officer. I will continue as chairman of the Board and will remain at Enron, working with Jeff on the strategic direction of the company and our day-to-day global operations.</pre>
<h3>Acronyms</h3>
<ul>
<li><span style="line-height: 13px;">We often use a cryptic set of letters in place of longer phrases &#8211; for ease of use.</span></li>
<li>However, these acronyms themselves don&#8217;t have any real meaning &#8211; but what they stand for</li>
</ul>
<pre>If you are interested in the Live or Archived meetings of the Federal Communications Commission (<strong>FCC</strong>) or National Transportation Safety Board (<strong>NTSB</strong>), please contact us.</pre>
<pre>If you have any questions concerning the <strong>FERC</strong> Video Archives, please contact us at capcon@gmu.edu or at 703-993-3100.</pre>
<h2>Summary</h2>
<p>I feel like I just gave a very brief summary of these techniques &#8211; where there is a lot more knowledge lurking out there.  By no means is this an exhaustive or complete list &#8211; but just a beginning.  Again &#8211; Bill Inmon&#8217;s company website &#8211; forestrimtech.com &#8211; has a lot more information.  I think the important part is to realize that you can discover context to wrap around what looks like otherwise meaningless text.</p>
<p>Next we&#8217;ll address this at the document level &#8211; instead of at the paragraph/sentence level.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/06/08/textual-disambiguation-what-is-it-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kindle really the cloud?</title>
		<link>http://anlenterprises.com/2013/06/08/kindle-really-the-cloud/</link>
		<comments>http://anlenterprises.com/2013/06/08/kindle-really-the-cloud/#comments</comments>
		<pubDate>Sat, 08 Jun 2013 13:01:23 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[ebook]]></category>
		<category><![CDATA[kindle]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1528</guid>
		<description><![CDATA[The other day I read a kindle book on three different devices: a kindle touch (e-ink), an Ipad and a laptop. When I originally got into kindle books I thought of them as a physical file -like a PDF. Over time I&#8217;ve realized that Kindle is something more -it&#8217;s more of a cloud application. For [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/05/kindle_cloud.jpg" rel="lightbox[1528]"><img class="alignright size-medium wp-image-1533" alt="kindle_cloud" src="http://anlenterprises.com/wp-content/uploads/2013/05/kindle_cloud-300x153.jpg" width="300" height="153" /></a>The other day I read a kindle book on three different devices: a kindle touch (e-ink), an Ipad and a laptop. When I originally got into kindle books I thought of them as a physical file -like a PDF. Over time I&#8217;ve realized that Kindle is something more -it&#8217;s more of a cloud application.</p>
<p>For a kindle book we really have 2 types of data -the text/pictures from the publisher and the data we create. We create 3 types of data:</p>
<p>- where we are in the book (whispersync)</p>
<p>- our highlights</p>
<p>- our notes</p>
<p>These items (provided we have an Internet connection.) are available on different devices. It&#8217;s only over time that I&#8217;ve realized just how useful this is. I&#8217;ve <a title="Kindle – highlights and notes" href="http://anlenterprises.com/2012/08/29/kindle-highlights-and-notes/">blogged</a> before about how I highlight and take notes for myself in ways that I never did with a physical book. I find myself taking advantage of these features more often now.</p>
<p>I wonder now how much data I&#8217;m providing to Amazon now with those highlights and notes. I know they look at the highlights &#8211; as they show the most powerful highlights. How much profiling of me can they derive from what I highlight in a given book (in comparison to others). Maybe that&#8217;s why there are so many free books&#8230;</p>
<p>So is it an eBook reader or something else? Am I giving up some privacy for convenience?  One advantage of this is that you can view all your notes and highlights (at <a href="http://kindle.amazon.com" target="_blank">kindle.amazon.com</a>).  I still sometimes wonder though what would happen in the future if Amazon stopped selling these books &#8211; if I could somehow offline integrate the book content with my notes again (this is where I place my hope in hackers).</p>
<p>Maybe this is the reality to be (if it isn&#8217;t already is) &#8211; that we&#8217;re not purchasing a product but a service.  We don&#8217;t &#8220;own&#8221; something anymore &#8211; but have a right to use it.  I&#8217;m not sure if I like that &#8211; but to get the rich service we want we may have to deal with that.  Hopefully for ebooks in the future there will be a way to backup your books and notes yourself &#8211; so that you&#8217;re not beholden to a single company (like I am to Amazon).  Or maybe the NSA will offer that&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/06/08/kindle-really-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Book Review: Big Data Governance by Sunil Soares</title>
		<link>http://anlenterprises.com/2013/03/24/book-review-big-data-governance-by-sunil-soares/</link>
		<comments>http://anlenterprises.com/2013/03/24/book-review-big-data-governance-by-sunil-soares/#comments</comments>
		<pubDate>Mon, 25 Mar 2013 02:57:05 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1511</guid>
		<description><![CDATA[Overview I recently finished reading a book by Sunil Soares:  Big Data Governance.  The whole &#8220;Big Data&#8221; topic has been exploding &#8211; so I&#8217;ve done a lot of research into the area.  With my background in data architecture (which inherently recognizes the value of data) the concept of applying Data Governance principles to Big Data was interesting. [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.amazon.com/gp/product/B00AA33FIG/ref=as_li_ss_tl?ie=UTF8&amp;camp=1789&amp;creative=390957&amp;creativeASIN=B00AA33FIG&amp;linkCode=as2&amp;tag=alenteli-20"><img class="alignright size-full wp-image-1512" alt="images" src="http://anlenterprises.com/wp-content/uploads/2013/03/images.jpg" width="225" height="225" /></a></p>
<h2>Overview</h2>
<p>I recently finished reading a book by <a href="http://www.amazon.com/gp/product/B00AA33FIG/ref=as_li_ss_tl?ie=UTF8&amp;camp=1789&amp;creative=390957&amp;creativeASIN=B00AA33FIG&amp;linkCode=as2&amp;tag=alenteli-20" target="_blank">Sunil Soares:  Big Data Governance</a>.  The whole &#8220;Big Data&#8221; topic has been exploding &#8211; so I&#8217;ve done a lot of research into the area.  With my background in data architecture (which inherently recognizes the value of data) the concept of applying Data Governance principles to Big Data was interesting.  So I broke down and spent the $30+ dollars on the Kindle book so I could better absorb this concept for possible use in my professional career.</p>
<p>Sunil Soares used to work for IBM and was one of the authors of a free eBook: the <a href="http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=IMM14074USEN&amp;appname=wwwsearch">IBM Data Governance Unified Process</a>.   I read that eBook previously and was able to absorb some of the material (I need to re-read to get more out of it).  It felt like some of the information in this book was very similar to what was written in that free eBook (so if you&#8217;re cheap like me you may want to start there).</p>
<p>As usual I&#8217;m going to talk about what themes I saw/learned from reading this book (which could have come from other literature by him and others).   The book itself has a pretty organized outline of the steps he would recommend for various data governance principles relating to Big Data.  You can get most of that by simply reading the Table of Contents of the book so I&#8217;m not going to repeat it here.</p>
<p>I believe that the &#8220;Big Data&#8221; phenomenon itself has clearly demonstrated the value of data itself -of what value it can provide.  That said &#8211; it&#8217;s the results of insight where the real business value is.  Sunil Soares says: &#8220;its value must be truly understood and unlocked by deriving insights that are revealed through analysis and then translating those insights into information, knowledge, and ultimately action.&#8221;    Companies are finding profound ways to take data &#8211; from many sources &#8211; and derive insight they never could before (either it was impossible or too expensive).</p>
<h2>What is Big Data Governance?</h2>
<p>Sunil Soares defines Big Data Governance as follows: &#8220;Big data governance is part of a broader information governance program that formulates policy relating to the optimization, privacy, and monetization of big data by aligning the objectives.&#8221;  This definition implies that Big Data Governance sits in the context of an already existing Data Governance program.  Therefore it seems that the author is saying that most of the principles for Data Governance in general would apply to Big Data Governance.  I&#8217;m not sure that I agree with that &#8211; but we shouldn&#8217;t ignore what has been learned in traditional Data Governance programs.</p>
<h2>Why Big Data Governance?</h2>
<p>I think the first question to answer is why you would even want Data Governance as part of a Big Data program.  Some of the promise of Big Data analytics is that you don&#8217;t have to do all the traditional Data Warehouse work to get results.  The concept is that you can load the raw data into your Hadoop environment and perform some advanced analysis and wala &#8211;  out comes meaningful results.  You don&#8217;t need to perform rationalization, cleaning, summarizing, etc. &#8211; you just work with the data as it is.</p>
<p>While some of that is true &#8211; it&#8217;s somewhat of a misleading picture.  It&#8217;s true that one of the advantages of putting all the data into the Hadoop environment is that you don&#8217;t have to rely on a sample of the data (which may not represent well the whole).  For example: if you we&#8217;re provisioning bandwidth for a set of servers and used an averaged sample of the currently used bandwidth you may miss the occasional spikes in bandwidth that really drive what&#8217;s needed [Scott Kahler explains this better than I did in this keynote video: http://www.kcitp.com/2012/09/03/big-data-kansas-city-technology-events/].</p>
<p>The other major difference is including far more variety of sources in your analysis &#8211; including segments of unstructured text.  The technology now allows us to efficiently process through far more volumes and variety of data than we could before.  We&#8217;ve also advanced in how we can process text and other varieties of data &#8211; in terms of our algorithms and other advanced processing. The ability to combine so much data together to get a picture is fascinating.</p>
<p>That said &#8211; it&#8217;s not quite that easy or simple.  When we say unstructured data what we often mean is that there is some unstructured text within an otherwise structured container.  That structure may not be as rigid &#8211; in that maybe not everything is present or it&#8217;s more variable -but it still has a structure.  Therefore an effort needs to be made to understand that data &#8211; especially in terms of it&#8217;s reliability.</p>
<p>Here are a few examples of data that may not be what it seems:</p>
<ol>
<li>User names in Social Media &#8211; they&#8217;re not always a real name.  In some social media sites there is no guarantee that the user name is a person&#8217;s real name (or any real name).  This is significant as one of the goals is often to tie a master customer record to their social media data.</li>
<li>Sunil Soares mentioned the term &#8220;unique visitors&#8221; (in the context of clickstream data).  One site/source may measure the # of unique visitors a week vs. another measures it within a month.  If you directly compared this data without addressing this you would get skewed results.</li>
<li>Let&#8217;s say we have a measurement that represents the average temperature for the last hour.  If one measurement was a rolling average (taking into a large # of previous values) vs. another is only for that hour.</li>
<li>Location data &#8211; does each data source assign the same meaning to the same value?  If you matched data solely on the values would you really be matching the same location?</li>
<li>Another of Sunil Soare&#8217;s examples was sensor and part terminology in railroads.  If we can determine that sensor event #282 typically occurs before part #339 fails does that part have the same # in different cars/engines?  Do the different sensors produce the same code for the same event?  Would we need some type of cross-reference table to map these together?</li>
<li>At a higher level consider whether the same data is being pulled into your Hadoop environment multiple times?  Is Data Source Q really the same as Data Source A?   Did we end up wasting storage space, transmission and possibly licensing cost on duplicate data?</li>
</ol>
<p>There are another dimensions of concern that are not technical &#8211; but a function of the complex and inter-related environment we all live in:</p>
<ol>
<li>Privacy &#8211; despite fact some think that privacy is dead there are serious concerns around privacy and Big Data.
<ul>
<li>Consider who really owns the data?  Is it yours or the customer?  Most social media sites will tell you that the data is the customer&#8217;s &#8211; you can&#8217;t own it (you may even have to delete it if they ask you).</li>
<li>Are you, by combining data, create new types of sensitive data that didn&#8217;t exist before</li>
<li>Have you built safeguards into your Big Data platform to control who has access to what (security is not part of the native Hadoop platform)</li>
</ul>
</li>
<li>Regulatory &#8211; regulatory agencies don&#8217;t care how the sensitive data is stored (i.e. Hadoop) &#8211; they will hold you accountable regardless.
<ul>
<li>Are you in a highly regulated industry such as HealthCare?</li>
<li>Are you dealing with sensitive corporate data governed by regulation?</li>
<li>Do you have industry constraints &#8211; such as PCI (credit cards)?</li>
<li>Do you know what the regulations are in each country you operate in (they are often different)?</li>
</ul>
</li>
<li>Reputation
<ul>
<li>Even if something is legal &#8211; it may not look very good in the eyes of your customers or partners</li>
<li>You must weight the risk of the impact to your reputation vs. the revenue potential</li>
</ul>
</li>
</ol>
<h2>What is different about Big Data Governance?</h2>
<p>So the next question is whether &#8220;Big Data Governance&#8221; is really any different than traditional governance for operational or enterprise reporting systems.   I believe it can and should be different &#8211; as it&#8217;s often for a different purpose.  Sunil Soares puts this well:</p>
<p><em>Big data needs to be “good enough” because poor data quality does not necessarily impede the analytics that are required to derive business insights.</em></p>
<p>You may have heard of <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a> (Extract, Transform, Load) but now there is a new term:  <a href="http://it.toolbox.com/wiki/index.php/ELT">ELT</a> (Extract, Load, Transform).  At it&#8217;s simplest the concept is that the data is loaded in its raw form and then transformed &#8211; not the other way around.  This is possible due to the fact we can both afford to load the raw data and have the computing power to transform it in place.  Therefore data quality may be enforced on the fly &#8211; instead of before the data is at rest.  So the focus is on doing a reasonable effort on the data that&#8217;s imported instead of making it pristine before it&#8217;s loaded.</p>
<h2>How do I implement Big Data Governance?</h2>
<p>The next question is then what&#8217;s a framework for implementing Big Data Governance.  Here are some of my thoughts (hopefully organized enough to be useful):</p>
<ol>
<li>Know your Data.
<ul>
<li>Catalog your internal and external data.  Other than for a sandbox don&#8217;t let data into your Big Data platform unless it&#8217;s cataloged</li>
<li>Understand your data &#8211; not in complete detail but the overall quality, time scale, etc.</li>
<li>Document some of the key fields within your data &#8211; ones that aren&#8217;t intuitive and that are key to using the data effectively.</li>
<li>Develop a method to document and share this metadata</li>
</ul>
</li>
<li>Know your organization and your platform.
<ul>
<li>Understand who can be involved in data quality &#8211; both at it&#8217;s source and while it&#8217;s in your Big Data platform</li>
<li>Understand what your platform can do &#8211; good and bad.</li>
</ul>
</li>
<li>Understand constraints, regulations, etc. &#8211; especially by region.
<ul>
<li>Understand your legal, ethical and internal constraints.</li>
<li>Evaluate these by region &#8211; as they can differ greatly</li>
<li>Understand what your organization&#8217;s commitment level is regarding platform and people resources</li>
</ul>
</li>
<li>Determine what data needs to be cleaned up and what needs to be protected.
<ul>
<li>Flag data that needs to be cleaned and why it needs to be cleaned</li>
<li>Flag data that is sensitive and needs to be protected.</li>
<li>Develop a method to document and share this metadata</li>
</ul>
</li>
<li>Determine how and when to clean and protect the data.
<ul>
<li>Will you clean your data before it hits your Big Data Platform, after it hits it or in real time?</li>
<li>Determine strategies for cleaning that data</li>
<li>Determine strategies for protecting sensitive data and overall security schemes</li>
</ul>
</li>
<li>Evaluate how you are doing on a regular basis.
<ul>
<li>Establish routine meetings (quarterly, yearly, etc.) to evaluate how things are going</li>
<li>Create the expectation that this is a process and that changes will be common</li>
</ul>
</li>
</ol>
<p>In conclusion I believe introducing Data Governance into a Big Data environment is a worthwhile choice.</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/03/24/book-review-big-data-governance-by-sunil-soares/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Book Review: Steve Jobs by Walter Isaacson</title>
		<link>http://anlenterprises.com/2013/03/22/book-review-steve-jobs-by-walter-isaacson/</link>
		<comments>http://anlenterprises.com/2013/03/22/book-review-steve-jobs-by-walter-isaacson/#comments</comments>
		<pubDate>Fri, 22 Mar 2013 12:02:50 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Book Reviews]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[steve jobs]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1505</guid>
		<description><![CDATA[I recently finished reading the book &#8220;Steve Jobs&#8221; by Walter Isaacson after checking it out from the library.  I&#8217;ve been wanting to read this book for some time &#8211; as Steve Jobs is such an interesting figure.  He had a profound impact on technology and how we use it. Reading this book for me was [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/03/Steve-Jobs-by-Walter-Isaacson.jpg" rel="lightbox[1505]"><img class="alignright size-medium wp-image-1506" alt="Steve-Jobs-by-Walter-Isaacson" src="http://anlenterprises.com/wp-content/uploads/2013/03/Steve-Jobs-by-Walter-Isaacson-300x234.jpg" width="300" height="234" /></a>I recently finished reading the book &#8220;Steve Jobs&#8221; by Walter Isaacson after checking it out from the library.  I&#8217;ve been wanting to read this book for some time &#8211; as Steve Jobs is such an interesting figure.  He had a profound impact on technology and how we use it.</p>
<p>Reading this book for me was a bit of journey into my past by remembering the technology.  I remember playing Oregon Trail on an Apple II (e I think) in Elementary School [Sidebar - I got distracted from writing this post by playing that game in an emulator.  I'm not sure I ever won at school either..]</p>
<p>I was born in 1973 so I remember a lot of the technology of early PCs &#8211; having experienced many of them. Despite the fact I grew up with this technology I still find it hard to believe how limited it was compared to what we have today.  In this day and age computers have become more of a commodity and not a special thing.  But back then, any computer was a work of art and full of amazing technology to accomplish what it did.</p>
<p>It is the character of Steve Jobs that is the most interesting part of this book.  He was a very unique individual &#8211; almost a force of nature.  He had such an impact on the technology we use today &#8211; not just our computers but on so much consumer technology.  He had a vision for what we needed before we knew it &#8211; an instinct that defied logic.  He made possible what many thought was impossible &#8211; and we all benefited from it.</p>
<p>That said &#8211; would I want to be him?  No &#8211; he&#8217;s not my role model as a person.  I don&#8217;t think I would want to work for him either &#8211; given how he treated people.  In fairness he was able to get people to do more than what they thought they could do &#8211; to perform at a high level.  I&#8217;m sure there are many people who still remember him fondly as a boss and as a person &#8211; but many others do not.  Personally I couldn&#8217;t treat people the way he did &#8211; it&#8217;s just not who I am.</p>
<p>I think &#8220;innovative&#8221; people like him are often over focused on their vision for the future to the point they forget there are real people are around them.  In Steve&#8217;s case it was clear that he had what they call a &#8220;Reality Distortion Field&#8221; &#8211; his own view of reality.  In some cases it worked for him &#8211; making things happen that seem impossible.  In other cases reality caught up with him and with others around him &#8211; often with tragic consequences.</p>
<p>Maybe one of the things he did well was take technology and make it usable for &#8220;normal&#8221; people.  What often happens with technology is that it&#8217;s made by &#8220;geeks&#8221; &#8211; who don&#8217;t think like normal people.  Therefore what comes out is designed to work they way a &#8220;geek&#8221; would want it &#8211; which often is not what a normal person would want.  A &#8220;geek&#8221; designs for edge cases (rare but difficult cases) without putting full effort into the common cases.</p>
<p>Steve made technology elegant and usable &#8211; attractive to &#8220;normal&#8221; people.  He made the concept of a Graphical Interface possible on computers.  He made a music player that transformed the music industry  - one that made others look weak (oh &#8211; and being able to buy music online).  He ushered in a mobile computing era &#8211; with the iPhone taking the world by storm.  He even finally designed a computing tablet that really worked &#8211; after so many others failed.</p>
<p>The world will miss his innovation &#8211; his focus on design and usability &#8211; not just capability.</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/03/22/book-review-steve-jobs-by-walter-isaacson/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Swimming in Data</title>
		<link>http://anlenterprises.com/2013/03/03/swimming-in-data/</link>
		<comments>http://anlenterprises.com/2013/03/03/swimming-in-data/#comments</comments>
		<pubDate>Mon, 04 Mar 2013 03:07:55 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1495</guid>
		<description><![CDATA[The other day I felt like swimming in a sea of data &#8211; that it was all around me. I was looking at some Vending machines and noticed what looked like an old fashioned antenna on them.  From a previous conversation with a maintenance technician I knew the vending machines are wirelessly connected back to [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/02/1159613_85120857.jpg" rel="lightbox[1495]"><img class="alignright size-medium wp-image-1496" style="margin-left: 4px; margin-right: 4px;" alt="1159613_85120857" src="http://anlenterprises.com/wp-content/uploads/2013/02/1159613_85120857-212x300.jpg" width="212" height="300" /></a>The other day I felt like swimming in a sea of data &#8211; that it was all around me. I was looking at some Vending machines and noticed what looked like an old fashioned antenna on them.  From a previous conversation with a maintenance technician I knew the vending machines are wirelessly connected back to their office.   He told me they knew everything that happened on the machine &#8211; every keystroke/transaction that occurred.  The primary purpose of that was to prevent fraud &#8211; as there is an audit trail to reconcile the amount of money brought back to the office from the machine.</p>
<p>I realized though they could learn a lot more from the data they are gathering than how much money should be in the machine.  They could tell if an item needed to be restocked and predict what items would sell well in the machine.  But that&#8217;s just the beginning &#8211; as they can learn about us from our purchases.</p>
<p>Imagine what they could do if they aggregate that data across many machines.  What could they predict &#8211; what could they learn? Would they be able to see trends across a metro area &#8211; or possibly the country?  Would they be able to tell about the health direction of the employees of a company based on what type of items are purchased more frequently?</p>
<p>What value would that data have &#8211; would anyone want to purchase it?  Could it assist with the supply chain &#8211; would that data be valuable to their suppliers?  What other data could it be correlated with to add additional insight?</p>
<p>These days there is so much data &#8211; those magical ones and zeroes &#8211; floating through the air. These streams of data about us and others are constantly surrounding us &#8211; flying through the air around us.  Our phones track our location, cable boxes track what we watch, websites gather and share information (ever seen ads about things you&#8217;ve searched on follow you?), who knows what the government now can track&#8230;</p>
<p>The world is changing &#8211; in subtle and profound ways &#8211; by what data we and others have. Do you get your bills electronically or by the mail?  When was the last time you &#8220;developed&#8221; a picture (not sure my kids even understand that concept).  Meters are going electronic &#8211; parking, electric, gas, water &#8211; with real-time feedback to the companies that manage those resources.</p>
<p>Sometimes when I stop and think it feels like there should be the streams of 1s and 0s flying through the air around me.  Maybe I&#8217;ve just watched too many movies like the Matrix and Tron or maybe it&#8217;s because through my life I&#8217;ve seen so much change in technology.  When it comes to data a simple example comes to mind: how much bandwidth is now available.  In college I remember a 56K modem being fast &#8211; now I can get 10 meg on my phone.  When my first daughter was born I had to write back to tape some of the edited video as there wasn&#8217;t room on my hard drive.  Now I&#8217;m going through and extracting the video off of all the raw DV tapes and the DVDs I created as I have the hard drive space.</p>
<p>So maybe it&#8217;s just me &#8211; but then again maybe it&#8217;s all of us.  What do you think?  Are you swimming in data?</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/03/03/swimming-in-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Phone left me</title>
		<link>http://anlenterprises.com/2013/01/26/my-phone-left-me/</link>
		<comments>http://anlenterprises.com/2013/01/26/my-phone-left-me/#comments</comments>
		<pubDate>Sun, 27 Jan 2013 00:14:24 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[GPS]]></category>
		<category><![CDATA[lost]]></category>
		<category><![CDATA[phone]]></category>
		<category><![CDATA[remote wipe]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1488</guid>
		<description><![CDATA[The other day my phone left me &#8211; taking a trip without me.  `I got to my office and realized I had the phone holster but no Galaxy S3 in it.  I looked around at work and then took a trip to my car thinking it had fallen out in the car.  It wasn&#8217;t in [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2013/01/galaxys3.jpg" rel="lightbox[1488]"><img class="alignright size-thumbnail wp-image-1489" alt="galaxys3" src="http://anlenterprises.com/wp-content/uploads/2013/01/galaxys3-150x150.jpg" width="150" height="150" /></a>The other day my phone left me &#8211; taking a trip without me.  `I got to my office and realized I had the phone holster but no Galaxy S3 in it.  I looked around at work and then took a trip to my car thinking it had fallen out in the car.  It wasn&#8217;t in the car &#8211; which wasn&#8217;t a good sign.  I tried calling the phone -thinking that I may have left it at home &#8211; but no one answered. I happened to have my laptop with me at work that day so I powered it up and went to the <a href="https://www.lookout.com/" target="_blank">Lookout website</a>.  Lookout is an Android security app &#8211; that both scans apps and provides some other security features.  One of the unique features is the ability to locate your phone &#8211; to a large degree of accuracy.</p>
<p>I have used Lookout for some time &#8211; as it&#8217;s a free app that seemed to help with security.  I actually used the locate feature once on my previous phone when I thought I had lost it. I &#8220;located&#8221; my phone from my house and saw that it was near where I work in Downtown Kansas City.  That night I drove back to work and found my phone right next to where I park &#8211; a happy outcome.</p>
<p><img class="alignleft size-thumbnail wp-image-1490" alt="lookout" src="http://anlenterprises.com/wp-content/uploads/2013/01/lookout-150x150.jpg" width="150" height="150" /></p>
<p>This time around the outcome wasn&#8217;t quite so happy &#8211; as my phone wasn&#8217;t nearby.  After I clicked on locate in the website it showed me where my phone was &#8211; at around 31st and Linwood &#8211; which is over 5 miles away from where I work.  That&#8217;s when I knew this didn&#8217;t look good &#8211; as I definitely didn&#8217;t drop it around there &#8211; someone or something was moving it.  I tried both making the phone &#8220;scream&#8221; (quite annoying) and calling the phone multiple times.   Lookout has some advanced options &#8211; which require upgrading to the premium version ($).  After upgrading I decided to &#8220;lock&#8221; the phone to safeguard the data I have on the phone (it was setup to require a password to unlock the phone normally).</p>
<p>Strangely enough I located the phone again and it had moved &#8211; to the public library near Prospect.  I made the &#8220;phone&#8221; scream again to try to get someone&#8217;s attention. I even called the library to see if anyone had turned in a lost phone.  At this point I decided to &#8220;wipe&#8221; the phone to be on the safe side &#8211; as I was afraid whoever had it might turn it off.  After wiping it I could no longer track it so it was essentially &#8220;lost&#8221; at that point.</p>
<p>I had signed up for the insurance through Sprint as I knew this was still a very expensive phone. After starting that process I found out there was a $150 deductible I had to pay out to get it replaced. The irony is that I only paid $50 for the thing &#8211; due to a Black Friday deal.  The process of replacing the phone was fairly quick and painless -part of which had be printed off.  During the day I was able to complete part of the process &#8211; and fax/upload the documents in the evening.</p>
<p>The phone arrived at my house the next day before I went home.  After I got home from AWANA with my kids I started setting up the phone.  It seems like it gets easier each time to get the phone back up and running &#8211; especially with Google as the key player for the phone.  I love having my contacts centralized through Gmail &#8211; as it&#8217;s the same list everywhere.  I also had Dropbox this time as a repository of the photos from the phone &#8211; which was great.</p>
<p>Since I had to buy a new case I decided to get something with a more secure holster &#8211; as I think my phone fell out of the one I had.  I bought a Seido case as it had a locking holster.  I still dislike the extra bulk of the rugged holsters &#8211; but they do have their place.</p>
<p>Today my daughter tried to get into my phone and I was notified about it.  Apparently Lookout premium will take a picture with the front-facing camera if someone enters the wrong password 3 times (she thought I was kidding when I showed her the e-mail).  The technology today to know what&#8217;s going on with the phone even when you are away from it is fascinating.  It may not help recover the phone &#8211; but it does make for an interesting story.  Hopefully you will never lose your phone &#8211; but today it can be resolved without too much difficulty.</p>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2013/01/26/my-phone-left-me/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Astounding Growth of Data in my lifetime</title>
		<link>http://anlenterprises.com/2012/11/06/astounding-growth-of-data-in-my-lifetime/</link>
		<comments>http://anlenterprises.com/2012/11/06/astounding-growth-of-data-in-my-lifetime/#comments</comments>
		<pubDate>Wed, 07 Nov 2012 00:39:08 +0000</pubDate>
		<dc:creator>andrew</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Big Data]]></category>

		<guid isPermaLink="false">http://anlenterprises.com/?p=1473</guid>
		<description><![CDATA[At the IOD conference this year the volumes of data discussed were amazing.  I&#8217;ve been thinking about how much the volumes of storage have grown in my own lifetime.  I remember my first exposures to computers &#8211; both at home and at school.  I remember my dad building a personal computer and using a tape [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://anlenterprises.com/wp-content/uploads/2012/11/571px-Floppy_disk_300_dpi.jpg" rel="lightbox[1473]"><img class="alignright size-thumbnail wp-image-1475" title="571px-Floppy_disk_300_dpi" src="http://anlenterprises.com/wp-content/uploads/2012/11/571px-Floppy_disk_300_dpi-150x150.jpg" alt="" width="150" height="150" /></a>At the IOD conference this year the volumes of data discussed were amazing.  I&#8217;ve been thinking about how much the volumes of storage have grown in my own lifetime.  I remember my first exposures to computers &#8211; both at home and at school.  I remember my dad building a personal computer and using a tape recorder (i.e  cassette tape) to store data.  The Apple IIes at school we&#8217;re more advanced &#8211; as they had the early 5 1/4 inch floppy drives.  I remember one of my first IBM compatible (MS-DOS) PCs that I used for a while &#8211; in that it didn&#8217;t even have a hard drive initially. It had a 3 1/2 inch floppy drive that you booted from first &#8211; and then used a different disk to store data.  Getting my first hard drive was really cool &#8211; but I can&#8217;t remember how big it was &#8211; but it was likely close to 500 megabytes.</p>
<p>Today we&#8217;re starting to measure data not in megabytes or gigabytes &#8211; but in petabytes and zetabytes (and soon yottabytes).  For understanding here are some basic definitions:</p>
<ul>
<li>kilobyte  - 1024 bytes</li>
<li>megabyte - 1000 kilobytes (1,024,000 bytes)</li>
<li>gigabyte &#8211; 1000 megabytes (1,024,000,000 bytes)</li>
<li>petabyte - 1000 gigabytes (1,024,000,000,000 bytes)</li>
<li>zetabyte - 1000 petabytes (1,024,000,000,000,000 bytes)</li>
</ul>
<p>Another way to put this in perspective is to compare these numbers to the storage objects that I&#8217;ve encountered in my lifetime.  For example,</p>
<ul>
<li>Remember those 3 1/2 inch disks (see picture above)?  They held 1.4  megabytes &#8211; which was a good amount in the early to mid 1990&#8242;s when I was in high school and college</li>
<li>For backup I had an <a href="http://en.wikipedia.org/wiki/Zip_drive" target="_blank">Iomega Zip Drive</a> &#8211; essentially a removable hard drive.  These had a capacity of 100 megabytes - which meant a few of these disks could back up an entire hard drive at the time (which we&#8217;re probably around 500 megabytes)</li>
<li>CDs typically held about 700 megabytes and we&#8217;re in most cases read only.</li>
<li>DVDs can hold up to 4.7 gigabytes of data (which at the time seemed a lot).</li>
<li>I have an inexpensive USB flash stick &#8211; which holds about 4 gigabytes itself (which are now less than $10)</li>
<li>You can buy a 2 terabyte hard drive now at MicroCenter for $99.</li>
</ul>
<div>So how any of these would you need to contain a petabyte of data?</div>
<ul>
<li> 500 2 terabyte hard drives (1000 terabytes / 2 = 500)</li>
<li>212, 766 DVDs (1000 terabytes x 1000 gigabytes / 4.7 gigabytes = 212766)</li>
<li>250,000 4 gigabyte USB sticks (1000 terabytes x 1000 gigabytes / 4 gigabytes = 250,000)</li>
<li>1,428,571 CDs (1000 terabytes x 1000 gigabytes x 1000 megabytes / 700 megabytes = 1428571)</li>
<li>10,000,000 Iomega 100 megabyte ZIP disks (1000 terabytes x 1000 gigabytes x 1000 megabytes / 100 megabytes = 10000000)</li>
<li>694,444,444 3 1/2 inch disk drives (1000 terabytes x 1000 gigabytes x 1000 megabytes / 1.44 megabytes = 694444444)</li>
</ul>
<div>Another way to think of it is how much video is a petabyte.  My Sony digital camera takes 1080 AVCHD video &#8211; which takes up 1.2 megabyte of disk space for every second of video.  So a 1 minute video (60 seconds) uses 72 megabytes of space.  A petabyte is 1,000,000 megabytes so a petabyte can consist of over 13, 800 minutes of that video (231 hours).</div>
<div></div>
<div>So personally I&#8217;m astounded by how much more disk space is available now than when I remember.  What&#8217;s also amazing is how we can fill up that space (including me).  I had a 500 gigabyte hard drive in my desktop computer (which is really just used to centrally store data) that I thought was quite large when I bought it a few years ago.  Earlier this year I ended up buying a 1 terabyte hard drive to replace it &#8211; as pictures and video we&#8217;re starting to fill it up.  I personally think it will be an interesting race between how fast we can develop affordable storage vs. how fast we can generate data to fill that up.</div>
<div></div>
<div>For nostalgia purposes here are some pictures of the items mentioned above:</div>
<div class="flashalbum" style="width:100%;height:500px;">
<div class="flagallery_swfobject" id="sid_1856980843_div"><style type="text/css">
@import url("http://anlenterprises.com/wp-content/plugins/flash-album-gallery/admin/css/flagallery_nocrawler.css");
@import url("http://anlenterprises.com/wp-content/plugins/flash-album-gallery/admin/css/flagallery_noflash.css");
#fancybox-title-over .title { color: #ff9900; }
#fancybox-title-over .descr { color: #cfcfcf; }
.flag_alternate .flagcatlinks { background-color: #292929; }
.flag_alternate .flagcatlinks a.flagcat, span.flag_pic_counters { color: #ffffff; background-color: #292929; }
.flag_alternate .flagcatlinks a.active, .flag_alternate .flagcatlinks a.flagcat:hover { color: #ffffff; background-color: #737373; }
.flag_alternate .flagcategory a.flag_pic_alt { background-color: #ffffff; border: 2px solid #ffffff; color: #ffffff; }
.flag_alternate .flagcategory a.flag_pic_alt:hover { background-color: #ffffff; border: 2px solid #4a4a4a; color: #4a4a4a; }
.flag_alternate .flagcategory a.flag_pic_alt.current, .flag_alternate .flagcategory a.flag_pic_alt.last { border-color: #4a4a4a; }
</style>
<link href="http://anlenterprises.com/wp-content/plugins/flash-album-gallery/admin/js/jquery.fancybox-1.3.4.css" rel="stylesheet" type="text/css" />
<script type='text/javascript' src='http://anlenterprises.com/wp-content/plugins/flash-album-gallery/admin/js/jquery.fancybox-1.3.4.pack.js'></script>
<script type='text/javascript'>var ExtendVar='fancybox', hitajax = 'http://anlenterprises.com/wp-content/plugins/flash-album-gallery/lib/hitcounter.php';</script>
<div id="sid_1856980843_jq" class="flag_alternate">
		<div class="flagcatlinks"></div>
<div class="flagCatMeta">
	<h4>Data Growth</h4>
	<p></p>
</div>
<div class="flagcategory" id="gid_6_sid_1856980843">
<a class="i0 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/110.jpg" id="flag_pic_36" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_110.jpg]<span class="flag_pic_counters"><i>185</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_36"><strong></strong><br /><span></span></span></a>
<a class="i1 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/250px-tdkc60cassette.jpg" id="flag_pic_37" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_250px-tdkc60cassette.jpg]<span class="flag_pic_counters"><i>1</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_37"><strong></strong><br /><span></span></span></a>
<a class="i2 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/571px-floppy_disk_300_dpi.jpg" id="flag_pic_38" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_571px-floppy_disk_300_dpi.jpg]<span class="flag_pic_counters"><i>1</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_38"><strong></strong><br /><span></span></span></a>
<a class="i3 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/602px-zip-100a-transparent.png" id="flag_pic_40" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_602px-zip-100a-transparent.png]<span class="flag_pic_counters"><i>2</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_40"><strong></strong><br /><span></span></span></a>
<a class="i4 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/800px-commodore64_fdd1541_front_demodified.jpg" id="flag_pic_41" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_800px-commodore64_fdd1541_front_demodified.jpg]<span class="flag_pic_counters"><i>2</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_41"><strong></strong><br /><span></span></span></a>
<a class="i5 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/5_inch_1_4_floppy_disk_smal.jpg" id="flag_pic_45" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_5_inch_1_4_floppy_disk_smal.jpg]<span class="flag_pic_counters"><i>4</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_45"><strong></strong><br /><span></span></span></a>
<a class="i6 flag_pic_alt" href="http://anlenterprises.com/wp-content/flagallery/data-growth/imag0290_smaller.jpg" id="flag_pic_46" rel="gid_6_sid_1856980843" title="" rel="lightbox[1473]">[img src=http://anlenterprises.com/wp-content/flagallery/data-growth/thumbs/thumbs_imag0290_smaller.jpg]<span class="flag_pic_counters"><i>2</i><b>0</b></span><span class="flag_pic_desc" id="flag_desc_46"><strong></strong><br /><span></span></span></a>
</div>
</div>

</div></div>
<script type="text/javascript" defer="defer">
function json_xml_sid_1856980843(e){ return {"properties":"","galleries":[{"gid":"6","name":"data-growth","path":"wp-content\/flagallery\/data-growth","title":"Data Growth","galdesc":"","data":[{"pid":"36","filename":"110.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-06 12:55:21","hitcounter":"185","total_value":"0","total_votes":"0"},{"pid":"37","filename":"250px-tdkc60cassette.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-06 12:55:23","hitcounter":"1","total_value":"0","total_votes":"0"},{"pid":"38","filename":"571px-floppy_disk_300_dpi.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-06 12:55:26","hitcounter":"1","total_value":"0","total_votes":"0"},{"pid":"40","filename":"602px-zip-100a-transparent.png","description":"","alttext":"","link":"","imagedate":"2012-11-06 12:55:36","hitcounter":"2","total_value":"0","total_votes":"0"},{"pid":"41","filename":"800px-commodore64_fdd1541_front_demodified.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-06 12:55:29","hitcounter":"2","total_value":"0","total_votes":"0"},{"pid":"45","filename":"5_inch_1_4_floppy_disk_smal.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-07 00:11:51","hitcounter":"4","total_value":"0","total_votes":"0"},{"pid":"46","filename":"imag0290_smaller.jpg","description":"","alttext":"","link":"","imagedate":"2012-11-07 00:11:56","hitcounter":"2","total_value":"0","total_votes":"0"}]}]}; }
flag_alt['sid_1856980843'] = jQuery("div#sid_1856980843_jq").clone().wrap(document.createElement('div')).parent().html();
var sid_1856980843_div = {
	params : {
		wmode : 'opaque',
		allowfullscreen : 'true',
		allowScriptAccess : 'always',
		saling : 'lt',
		scale : 'noScale',
		menu : 'false',
		bgcolor : '#262626'},
	flashvars : {
		path : 'http://anlenterprises.com/wp-content/plugins/flagallery-skins/default/',
		gID : '6',
		galName : 'Gallery',
		skinID : 'sid_1856980843',
		postID : '1473',
		postTitle : 'Astounding+Growth+of+Data+in+my+lifetime+',
		json : 'json_xml_sid_1856980843'},
	attr : {
		styleclass : 'flashalbum',
		id : 'sid_1856980843'},
	start : function() {
		if(jQuery.isFunction(swfobject.switchOffAutoHideShow)){ swfobject.switchOffAutoHideShow(); }
swfobject.embedSWF("http://anlenterprises.com/wp-content/plugins/flagallery-skins/default/gallery.swf", "sid_1856980843_div", "100%", "100%", "10.1.52", "http://anlenterprises.com/wp-content/plugins/flash-album-gallery/skins/expressInstall.swf", this.flashvars, this.params , this.attr );
swfobject.createCSS("#sid_1856980843","outline:none");
	}
}
sid_1856980843_div.start();
</script>
]]></content:encoded>
			<wfw:commentRss>http://anlenterprises.com/2012/11/06/astounding-growth-of-data-in-my-lifetime/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
