It’s time to light the match and burn your data

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful talks and exciting networking opportunities. Learn More

If you spend time reading the latest quarterly results from any company, there will undoubtedly be discussion of how much they invest and how good they are at analyzing and using information. Silicon Valley is filled with companies that are dedicated to creating, consuming and analyzing huge amounts of data. We have been told that data is a currency, its value increasing as ever more complex, sophisticated technologies are applied to derive insight. However, if data is not only a currency, but a debt instrument, its intrinsic value can quickly turn negative.

The value of old data: A new calculus

The value of information is obvious: it is needed across nearly all functions of an organization, from small local businesses to the largest financial services and technology companies. But information risk calculations remain inconsistent. Information security-related risks have been highlighted by commentators, breaches and ransomware attacks.

Yet, even with these well-known risks, organizations often struggle to delete, well, anything. There are three primary reasons that businesses have been reluctant to delete data: (1) its potential value or use at some point in the future, (2) legal or compliance concerns regarding spoliation or deleting the wrong information and (3) an incomplete view of information across the organization.

The first issue is often the most difficult to resolve. Marketing, sales, development and product teams have an insatiable appetite for data to deliver results. The idea of deleting information, even if nominally used today, that might provide unique insights in the future is terrifying. And the ever-increasing sophistication of analytics capabilities provides the ability to draw subtle inferences without significant incremental investment.

In contrast, legal and compliance concerns are generally becoming more manageable. For a long time, the risk of spoliation in legal proceedings, or improper/accidental deletion of corporate records, far outweighed the benefit of deleting anything. Legal and compliance teams are battle-scarred from over a decade of litigation and regulatory enforcement actions where data issues were at the forefront. But this experience also taught these teams there is risk associated with information, and they can see that the calculus of keeping data versus deleting data is changing. In addition, early experience with global privacy requirements, such as GDPR, has provided further risk validation.

The new calculus is based on a balance of variables and a multiplying factor that is associated with sensitive information. First, all parts of an organization need to accept that possession of information represents risk, in addition to value. Second, sensitive information that may provide high levels of insight carries equal levels of potential risk. Finally, enterprises need to establish effective means to dispose of information they do not need once its value and retention obligations have passed.

The big new variable: privacy

The insurance industry is not often viewed as a driving force behind change. It is highly regulated in most jurisdictions and has developed risk models based on a long history of claims and events. These dynamics have effectively forced the industry to adapt slowly to change, require significant retrospective data analysis and maintain long data retention periods. And yet, we may see the insurance industry now quietly leading the new charge.

Long before big data, machine learning and advanced analytics ever graced the latest technology journals, actuarial sciences in the insurance industry had blazed a trail. However, analyses were largely backward-looking, based on similar previous events, to predict future risk. In recent years, the insurance industry adopted practices that created vast amounts of information, consumed in real-time, to develop its models. In the process, the industry created new risk, which it is still trying to fully comprehend.

For example, many insurance companies now offer potential savings in automotive insurance if allowed to monitor driving habits in real-time. These applications capture tremendous amounts of information, from duration, distance, acceleration, speed and other attributes for a given individual. This allows the companies to create models of risk and alter coverage rates based on this analysis. At the same time, they are creating vast amounts of sensitive private information.

Insurance companies also now develop insurability scores and models, based on extraordinary aggregation of publicly and privately available data. The aggregation of this data comprises some of the most expansive views of an individual’s habits, practices and personal information. It is updated constantly by them, providers and third-party suppliers, and feeds any number of models, systems and automated processes.

All this data creates value in developing risk models and serving customers. But it also generates a tremendous amount of highly-sensitive, private information.

Actuaries on the job

The National Association of Insurance Commissioners (NAIC) is an organization that few have likely encountered. Insurance regulation is largely state-based in the U.S., and NAIC creates standards and model rules to be adopted as practices by insurance companies or codified in statute or regulation. The NAIC has a history of model rules that deal with information security, records retention and privacy, focused on protecting information and organizations and availability of data to regulators. However, with new statutes being adopted across many U.S. states, and experience with the EU’s General Data Protection Regulation (GDPR) that governs the use, access and rights associated with information, NAIC realized a more privacy-centric model was necessary.

Through a working group, they sought to distill obligations and lessons from GDPR, along with the CCPA, CPRA and CDPA, and provide a common set of requirements that include:

Right to opt out of data sharing
Right to limit data sharing unless the consumer opts in
Right to correct information
Right to delete information
Right to data portability
Right to restrict the use of data

The elements are not particularly unique, but the insurance industry was among the first to realize that the sheer scale of what may confront them from a privacy perspective could overwhelm existing technologies and practices. Nearly every single person in global developed markets is a customer of an insurance company. What happens if just a fraction exercise one of the rights noted above? It will dwarf the volume of preservation requests handled for litigation or regulatory purposes. And what about all the sensitive information that is long past its retention requirements, but was never deleted?

Burning the undergrowth: establishing the value of your data

Enterprises need to establish practices and technologies that address the full range of privacy obligations in the EU and emerging in the U.S. Ridding your organization of information with limited value or beyond its retention period is a critical first step. Many organizations have struggled with routine data deletion; now they must prepare for doing so on-demand, potentially from many of their customers.

Like undergrowth in the forest, information provides value up to a point. It then risks burning the whole forest if not managed or removed. Organizations should start with establishing the value of information and clearly understanding what represents undergrowth and risk. Then, they should light the match and burn what they should not have or no longer need.

George Tziahans is managing director at Breakwater Solutions.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!