Tag Archives: Big data

Sick Reviews

Today’s Foodie Friday Fun finds us at the intersection of food, data, and social media.

New York Skyline

(Photo credit: CJ Isherwood)

Yes I know we’ve been here before but today’s tidbit concerns an article in the NY Times the other day. The NYC Health Department conducted a pilot study using Yelp reviews to see if they could identify unreported outbreaks of food-borne illness.  Despite what some may think, not everyone calls the city to let them know they got sick eating someplace.  What many folks do, however, is post something on social media.  Since Yelp is the go-to site on dining out, it would make sense to start here.  One can easily see the effort expanding to other likely places – Twitter, Trip Advisor, etc.

So what did they find?

Using a software program developed by Columbia University, city researchers combed through 294,000 Yelp reviews for restaurants in the city over a period of nine months in 2012 and 2013, searching for words like “sick,” “vomit” and “diarrhea” along with other details. After investigating those reports, the researchers substantiated three instances when 16 people had been sickened.

Doesn’t sound like much but it’s a start.  Maybe you’re aware that Google tried something similar to help spot flu outbreaks.  There is a bigger business point here.  What the city is doing is growing big ears.  They’re learning to use the vast amount of self-reported data to eliminate problems in some cases before they’re actually reported via the official channels.  The three instances they found were open for business with no complaints on the official record.  Inspections turned up unclean conditions at all of them.

The real question is how are you going to do something similar in your business?  Maybe you’re watching your Facebook page for negative comments or responding to people pinging your brand account on Twitter.  What are you doing to get beyond those quasi-official channels?

I wrote the other day about the need to improve data quality.  Sure – in theory a bunch of vindictive people could trigger a health department visit by writing up negative posts containing keywords or phrases.  In theory, I could win the U.S. Senior Open.  Neither is likely to happen.  What is likely to occur, however, is that your competition will find new ways to seek out and use information to drive their businesses forward.  Will you be there with them?

Enhanced by Zemanta

Leave a comment

Filed under digital media, food

Your Data Sucks

If you do any work in marketing or sales or just about anything these days you know that you get an overwhelming amount of data each day.  As it turns out, the real issue might not be the amount of the data but the quality of it.  The chart I’ve included today is from the Experian folks reminding us that “Garbage In, Garbage Out” is a truism we can’t avoid.  In fact, many of us are doing a really lousy job of doing so.


The state of data quality

I don’t think it’s a big surprise that the report states that only one third of companies manage their data quality strategy centrally, through a single director.  That, of course, means that:

66% of companies lack a coherent, centralized approach, says the report. Most have little centralization and manage data quality by individual department. For marketers to really take advantage of data insights, information needs to be accurate, consolidated and accessible in real time. A centralized organization-wide data management strategy is essential for marketing success.

I’ll give you an example.  Say you have great web analytics information and fantastic sales information from another data source.  If nobody took the time to figure out a “key”  – a field of data common to both databases – those two excellent, useful, actionable pieces of information can’t be synched up.  That’s why a coherent data schema is important and too many cooks, especailly unsupervised cooks, can really spoil this dish.

Even within a single data-gathering pool, poor planning can be a disaster.  Let’s say you are gathering address information.  If you don’t use a drop-down menu to populate the “state” field, you’re going to end up with typos, different abbreviations (AR, AK, ARK, AS could all be Arkansas) or someone using an abbreviation that your database thinks is another place entirely.  91% of companies suffer from common data errors, the main cause of which is human error. Experian again:

The high level of inaccurate information is brought about by a high level of human error. In many instances information entered across the organization is typed into a database at some point manually, by an employee or the customer directly. That exposes information to different levels of standardization, abbreviations and errors.

As with any part of your business, the quality of your actions is dependent on the quality of the information you have at  hand.  A little time spent on planning is worth a lot in improving that quality.  You agree?

Enhanced by Zemanta

Leave a comment

Filed under Consulting, Helpful Hints, Reality checks

That Does Not Compute

One of the challenges any of us have in business is to predict the future.

English: Knuth's version of Euclid's algorithm...

. (Photo credit: Wikipedia)

The hardest part of my job – and maybe yours – is seeing over the horizon to help my clients get prepared for what is to come.  That might be a change in a market or it might be a change in technology.  No matter what it is, any of us who look ahead do so by gathering data.  In many cases that data is some measure of past behavior – how people bought from your website for example.  In many cases, those data points are put into some sort of algorithm which predicts what is to come.  Increasingly, many marketers and others use these models to drive their own business behaviors as the amount of data available grows exponentially.  While I’m not a believer that “big data means big problems,” a blind reliance on these algorithmic predictions can mean just that.

Let’s take one simple form of algorithm.  You probably see it every day.  it’s known as collaborative filtering and if you’re on Amazon or Netflix or any other site with a recommendation engine you’ve used it.  You may also have seen things offered to you as content on YouTube.  The algorithms use measures of your past behavior as well as of others like you (“people who bought XYZ also bought…”).  But what if you were buying a gift and the purchasing is not reflective or your tastes or interests at all?  What if someone else used your browser to search and purchase?  Cookies are browser-based – they have no way to tell if the activity is from one person or six.

Another problem.  Algorithms are built by people and those people are..well…human.  They might have confirmational bias operating as they refine the formula to eliminate noise – data that’s not germane to the prediction at hand.  The problem is that you don’t really know if it’s noise until it proves to be not significant.  Maybe it’s a new trend that your model misses altogether.

The thing to keep in mind is that modeling can only go so far.  It’s not very good  at predicting the unexpected.  It tends to ignore outliers.  As with all things, you need to ask questions, search for facts, and draw your own conclusions.  Yes, it’s impossible to make sense of all this data without algorithmically based analysis.  Just remember that while machines don’t make computational errors it was a human that gathered the data (or installed the code that does) and wrote the formula.  People often don’t compute.  Make sense?

Enhanced by Zemanta

Leave a comment

Filed under Consulting, Reality checks