Data classification is a method for defining and categorizing files and other critical business information. It’s mainly used in large organizations that must use data classification to build security systems that follow strict compliance guidelines, but it can be used in small environments as well. The most important use of data classification is understanding the sensitivity of your stored information so that you can build the right cybersecurity tools, access controls, and monitoring around it.

The process of classifying data assets based on the sensitivity of the information that they contain is called data classification. By classifying data, organizations can determine two key things:

  • Who should be authorized to access it.
  • What protection policies to apply when storing and transferring it.

Classification can also help determine what regulatory standards apply when protecting the data. Overall, data classification helps organizations better manage their data for privacy, compliance and cybersecurity.

Reasons to Perform Data Classification

Every organization should classify the data it creates, manages and stores. But it’s even more critical for large enterprise environments. That’s because large enterprises have data assets spread across many locations, including the cloud.

Administrators must track and audit this information to ensure that it has the proper authentication and access controls. With data classification, administrators can identify which locations store sensitive data and determine how it should be accessed and shared.

Classification is an essential first step to meeting almost any data compliance mandate. HIPAA, GDPR, FERPA and other regulatory governing bodies require data to be labeled so that security and authentication controls can limit access to it. Labeling data helps organize and secure it. The exercise also reduces needlessly duplicated data, cuts storage costs, increases performance and keeps it trackable as it's shared.

Types of Data Classification

Any stored data can be classified into categories. To classify your data, you must ask several questions as you discover and review it. Use the following sample questions as you review each section of your data:

  • What information do you store for customers, employees, and vendors?
  • What types of data does the organization create when generating a new record?
  • How sensitive is the data using a numeric scale (e.g., 1-10 with 1 being the most sensitive)?
  • Who must access this data to continue productive operations?

Using these questions, you can loosely define categories for your data, including:

  • High sensitivity: This data must be secured and monitored to protect it from threat actors. It often falls under compliance regulations as information that requires strict access controls that also minimize the number of users who can access the data.
  • Medium sensitivity: Files and data that cannot be disclosed to the public, but a data breach would not pose a significant risk could be considered medium risk. It requires access controls like high-sensitivity data, but a wider range of users can access it.
  • Low sensitivity: This data is typically public information that doesn't require much security to protect it from a data breach.

Methods of Data Classification

Data classification works closely with other technology to better protect and govern data. Should the organization suffer a data breach, data classification helps administrators identify lost data and potentially help track down the cyber criminal.

Here are technologies that rely on data classification:

  • Identity access management (IAM): IAM tools enable administrators to determine who and what can access data. Users with similar permissions can be grouped. Groups are given authorization levels and managed as a single unit. When one user leaves, the user can be removed from the group, which removes all permissions from that user. This type of grouping and organization makes managing permissions across the network much easier.
  • Data encryption: Certain data assets must be encrypted at rest and in motion. “At-rest” data is data being stored—usually on a hard drive, but any storage device. “Data in motion” refers to data as it is being transferred across a network. Encrypting data makes it unreadable if attackers intercept it.
  • Automation: Automation works together with monitoring tools to find, classify and label data for administrative review. Some tools integrate artificial intelligence (AI) and machine learning (ML) to automatically detect, label and classify data. The technologies can also help identify threats that could be used to steal it. With labeled data, administrators can use IAM to apply permissions and stop specific threats from gaining access to stored data.
  • Data forensics: After a data breach, data forensics collects and preserves evidence for further investigation. Forensics is the process of identifying what went wrong and who breached the network. Data forensics is usually a two-part process. Automation tools first collect data. Then a human analyst identifies anomalies and investigates.

Data Classification Levels

As you ask these questions, you can better classify your data. Data classification typically can be broken into four categories:

Public Data

This data is available to the public either locally or over the internet. Public data requires little security, and its disclosure would not result in a compliance violation.

Internal-Only Data

Memos, intellectual property, and email messages are a few examples of data that should be restricted to internal employees.

Confidential Data

The difference between internal-only data and confidential data is that confidential data requires clearance to access it. You can assign clearance to specific employees or authorized third-party vendors.

Restricted Data

Restricted data usually refers to government information that only authorized individuals can access. Disclosure of restricted data may result in irrefutable damage to corporate revenue and reputation.

Data Classification Process

When you decide that it’s time to classify data to meet compliance standards, the first step is to implement procedures to assist with data location, classification, and determining the proper cybersecurity to protect it. The execution of each procedure depends on your organization's compliance standards and the infrastructure that best secures data. The general data classification steps are:

  • Perform a risk assessment: A risk assessment determines the sensitivity of data and identifies how an attacker could breach network defenses.
  • Develop classification policies and standards: If you generate additional data in the future, a classification policy enables streamlining of a repeatable process, making it easier for staff members while minimizing mistakes in the process.
  • Categorize data: With a risk assessment and policies in place, categorize your data based on its sensitivity, who should be able to access it, and any compliance penalties should it be disclosed publicly.
  • Find the storage location of your data: Before you can deploy the right cybersecurity defenses, you need to know where data is stored. Identifying data storage locations points to the type of cybersecurity necessary to protect data.
  • Identify and classify your data: With data identified, you can now classify it. Third-party software helps you with this step to make it easier to classify data and track it.
  • Deploy controls: The controls you put in place should require authentication and authorization access requests from every user and resource that needs access to data. Access to data should be on a “need to know” basis, meaning users should only receive access if they need to see data to perform a job function.
  • Monitor access and data: Monitoring data is a requirement for compliance and the privacy of your data. Without monitoring, an attacker could have months to exfiltrate data from the network. The proper monitoring controls detect anomalies and reduce the time necessary to detect, mitigate, and eradicate a threat from the network.

Streamlining the Data Classification Process

You can streamline the data classification process. Some of it can even be automated. But it still requires elements of human review and manual procedures.

Automated systems can make suggestions about labeling and classifications. But a human review must still determine whether these labels are correct. Objectives and standards must be outlines and defined, which requires human reviewers and IT staff.

Automated tools flag digital assets for human review. The list displays the objects (such as data around a given customer) and the rules (such as HIPAA or PCI-DSS) that apply to each one. Some automation tools can index objects. (Indexing is a process of sorting and organizing data to enable quick and efficient searching on the network.)

Other policies also apply during the process of data classification. General Data Protection Regulation (GDPR) is an EU regulation that gives consumers the right to have their data deleted. Organizations must comply when they store data from consumers in the EU. Some data classification tools index objects so that they can be quickly removed when customers ask.

Data Classification Examples

One of the most challenging steps in classifying data is understanding the risks. Compliance standards oversee most private sensitive data, but organizations adhere to the compliance regulations applicable to the different data stored in files and databases. Data classification helps secure data and ensure compliance. It’s essential for following GDPR requirements. (Organizations must index EU consumer data so that it can be deleted on request, for instance.)

GDPR also mandates protecting secondary personal information such as customers’ ethnic origin, political opinions, race and religious belief. To do so, organizations must classify this data and set the right permissions across digital assets. Only then can they avoid disclosing private consumer information and costly data breaches. Classification determines who can access this data so that it’s not misused.

Three steps for classifying GDPR include:

  • Locate and audit data. Before classification, administrators must identify where data is stored and the rules that affect it.
  • Create a classification policy. To stay compliant, create data classification standards and procedures to define the way organizations store and transfer sensitive data.
  • Organize and prioritize data. With prioritization, the organization can determine data classification and the permissions to access it.

Here are some examples of data sensitivity that could be categorized as high, medium, and low.

  • High sensitivity: Suppose that your company collects credit card numbers as a payment method from customers buying products. This data should have strict authorization controls, auditing to detect access requests, and encryption applied data is stored and transmitted. A data breach would likely cause harm to both the customer and the organization, so it should be classified as highly sensitive with strict cybersecurity controls.
  • Medium sensitivity: For every third-party vendor, you have a contract with signatures executing an agreement. This data would not harm customers, but it still is sensitive information describing business details. These files could be considered medium sensitive.
  • Low sensitivity: Data for public consumption could be considered low sensitivity. For example, marketing material published on your site would not need strict controls since it’s publicly available and created for a general audience.

Importance of Data Classification

A data sensitivity level dictates how you're going to process and protect it. Even if you know data is important, you must assess the risks associated with it. The data classification process helps you discover potential threats and deploy cybersecurity solutions most beneficial for your business.

By assigning sensitivity levels and categorizing data, you understand the access rules surrounding critical data. You can better monitor data for potential data breaches, and most importantly, remain compliant. Compliance guidelines help you determine the proper cybersecurity controls, but you need to perform a risk assessment and classify data first. In many cases, organizations require a third party to help with data classification so that cybersecurity deployment can be more efficiently executed.

Data Classification Best Practices

Following data classification best practices will make policy creation and its entire process much more efficient. Best practices define the steps to fully index and label digital assets so that none are overlooked or mismanaged.

Organizations should follow these best practices:

  • Carefully identify where all sensitive data, including intellectual property, is located across all storage locations.
  • Define data categories so that sensitive data can be labeled and the right permissions set. Categories should be granular—so that permissions can also be granular. Categories should also allow administrators to categorize data within groups.
  • Identify the most critical and sensitive data. Automation tools can then be used to tag it with the right classification and regulatory mandates.
  • Educate employees so that they understand how to handle sensitive data. Give them the tools they need to protect sensitive data and follow cybersecurity practices.
  • Review all regulatory standards so that rules are followed and penalties avoided.
  • Build policies that allow users to identify misclassified or unclassified data and fix the issue.

Analyst Report: Best Practices for e-Discovery and Regulatory Compliance

While Microsoft is making forward strides with its e-discovery capabilities, there are a number of limitations and weaknesses in its approach.

Proofpoint Modern Data Compliance Solutions

The next generation of archiving is here. Proofpoint data archiving solutions offers modern compliance that makes it easy for you to manage information risk.

Proofpoint Data Discovery Tools for Information Protection

Find out how a data discovery tool can help your organization identify and remediate sensitive data, reduce the impact of breaches, and comply with regulations.