Apple's secretive nature is legendary. Though key part of its
history, all the various projects would be highly compartmentalized and
no one knew what other groups did. Employees working on different
projects would refuse to sit together in the campus cafeteria for fear
of being accused of sharing details on their projects.
And this secretiveness extends to data collection from the iPhone.
Apple, like Google, wants to improve on machine learning, particularly
as it extends to Siri, but according to a recent report from Reuters, its strict control over data collected by the iPhone is hampering the ability of data scientists to get anything done.
Machine learning experts who want unfettered access to data tend to
shy away from jobs at Apple, former employees told Reuters. Apple's data
retention on user-centric information gathered by Siri is six months,
while information from Apple Maps expires after only 15 minutes. So it's
rather difficult to gather data from iPhone's using the Maps function.
This gives Google and even Microsoft's Cortana an edge in spotting
larger trends and – to the extent this one metric is a factor – Apple's
predictions may be further from precise.
In a way, Apple should be applauded. It analyzes its users' behavior
under some very strict self-imposed constraints to better protect the
data from outsiders. But it is leaving Apple data scientists with less
data, which means they can't do their job as well.
One Word: Trust
It's a problem that other companies may face if they don't strike a
balance between analytics and privacy. After monster breaches as Home
Depot, Target, Anthem Blue Cross, UCLA Health and Community Health
System, people are understandably edgy about the security of their
personal information.
Privacy is considered sacrosanct, but it also has its price, notes Tim M. Crawford, CIO Strategic Advisor
and president of his consultancy AVOA.
"Forget privacy for a second. If we all took our medical records and
diagnostics data where we took this pill for this symptom and what
result we got, if we took all the data and could compile it, imagine how
much further we'd be because it would be a science because of all the
data points. But we are apprehensive to do something like that because
we have things like HIPPA," he said.
However, there is a flip side to that argument, but it requires a
great deal of trust, said Frank Buytendijk, research vice president and
distinguished analyst with Gartner.
"There is a case in Denmark where the government inadvertently stored
too much health information. In their habit of being transparent, they
were open about it and said they would delete it. The general public,
trusting their government, pleaded for the opposite. They said 'Keep it!
It could help in healthcare'," he said.
This could never happen in the U.S., in part because of so many past
breaches have shredded confidence in patient privacy and also because
many Americans are less trusting of their government, with more than a
few reasons why.
But building trust is the next big challenge, said Crawford.
"Culturally, how do we get comfortable with data and how data is used?
There is a direct relationship between that statement and trust. So if I
trust that Apple will only use this data for their purposes to make
Siri better, then that might be okay. But having Apple sell the data and
Apple benefitting financially or making it publicly available and
potentially compromising my behavior, that's where you lost trust," he
said.
Mark Thiele, executive vice president of ecosystem evangelism at data center provider SuperNAP, said trust is a core tenant to making the most of people's personal information for data mining and business intelligence.
"[Companies] need to build trust with their customer base over the
data they are custodians of, and they do that by leveraging data in
appropriate ways and not abusing it, and taking great care with how they
protect it. As soon as you violate that trust you are done. Look at the
data breaches and the results we've had," he said. Companies have to
figure out on their own how they become a good custodian of the data.
This is not something that can be rushed, either. "It has to happen
over time because trust has to play a significant role and the culture
has to change as well. More times than not the data is about individuals
and behaviors, we have to be comfortable sharing that data. Also we
have to have trust in those who are storing and leveraging that data. So
the company has a responsibility but so does the individual," said
Thiele.
More Instances?
Thiele thinks there will be more incidences such as Apple's
challenge, but it's more dependent on the culture of the organization.
"It goes back to what the company stands for and what they hold valuable
and how they leverage it. Privacy is a core tenant for Apple. Companies
that follow along those lines will follow along what Apple does," he
said.
Buytendijk said Gartner has some stats to back this up. He said 59%
of respondents in Gartner’s CIO Agenda Survey 2015 said that they are
already experiencing digital ethical dilemmas, most prominently around
privacy and security.
"At the same time, from information surveys we have learned that
around 70% of people indicate that in their organizations there is no
logical moment or logical place to raise these digital ethical
dilemmas," he added.
The issue, however, is not new, nor is it related to the advent of
Big Data, where more and more data than ever is being collected. "Big
Data just shows that there are more sources of data, which means the
value of the data can only increase," said Crawford. "The more data
points you get more clarity on the problem you're trying to solve. I
don't think Big Data itself is a leading indicator or the reason for
this problem. Big Data and unstructured data is just another data
point."
But Buytendijk disagrees. "Even if you apply all kinds of masking,
Big Data has certainly complicated things," he said. "If you know
someone’s gender, age and zip code, this is enough already to
re-identify the vast majority of people. Big Data most often adds all
kinds of contextual information that makes it harder to be anonymous."
There is a class of technology out there, called dynamic data masking,
which replaces identifiable fields with meaningless but consistent
codes. So the data can still be used for all types of analysis, and will
show the same results, just some fields are meaningless. Once put into
action, you can change the information back to being meaningful, only in
those cases where needed.
However, that's not an ideal solution. He cited Georgetown University Professor of Law Paul Ohm’s
maxim about privacy, which states "Every perfectly anonymous data set
is perfectly unusable." "The more personal identifiable information you
strip, the less opportunity the data gives to provide value for
individual customers or individuals," he said.
Regardless, Thiele thinks it will happen more often among companies
that view privacy as a core tenant, as Apple does. He believes Apple's
findings from Siri and Maps data may be the motivation for its strict
policies.
"The reality is when you understand how people are using data, how
you want to use it is irrelevant," said Thiele. "Once you start to
expose trends, you might start to expose data you'd rather not know. I'm
sure they have some analytics, like the most common words used with
Siri. That could be an indicator as to why they are taking such a hard
stance."
Thursday, 17 November 2016
Big Data vs. Privacy: Striking a Balance
More Articles
Subscribe to:
Post Comments (Atom)
EmoticonEmoticon