Data Discovery Updated Apr 08 2025

Data Classification: A Step-by-Step Guide

data classification
AUTHOR | Lindsay MacDonald

Data classification is about putting things in the right place based on how sensitive or important they are. Think of it like sorting your inbox: there’s spam, random newsletters, personal messages, and those critical project updates that require immediate attention.

In practical terms, this means creating a system where everyone in your organization understands what data they’re handling and how to treat it appropriately, with safeguards if someone accidentally tries to mishandle sensitive information. It’s the difference between knowing which documents can be shared in a public Slack channel versus which ones need encrypted storage and limited access.

Let’s walk through how to build this system step by step, using PostgreSQL examples to make it real and actionable.

Step 1: Figure Out What You’re Working With

data classification

Data classification is the process of organizing data into categories based on its sensitivity, importance, or how it should be handled. So before you start writing SQL or labeling columns, it’s important to understand what you’re working with. Ask yourself:

  • What kind of data are you storing? Is it basic info like names and emails, or serious secrets like credit card numbers, health records, and internal documents?
  • Are there any laws or regulations you need to follow? Like GDPR if you have European users, HIPAA for health data, or SOC 2 if your security team keeps talking about it.
  • And most importantly—who really needs access to this data? Maybe it’s just your admin team, or maybe one super-paranoid person in IT who guards the database like a dragon guards gold.

To make this easier, you can organize your data in different classification levels, such as:

  • Public – Totally safe to share. Think product names, blog posts, or anything already out in the open.
  • Internal Only – Meant for employees, but not a big deal if it leaks. Like meeting notes, drafts, or internal docs.
  • Confidential – A bit more sensitive. Stuff like employee records, customer contact info, or internal emails.
  • Restricted – Top-secret territory. Credit card numbers, medical records, or anything that could cause serious damage if it leaks.

These levels are the heart of any data classification system, helping you figure out what needs protection and what doesn’t. Once you’ve got a good grip on what’s in your database and what counts as sensitive, you’re ready to start digging into PostgreSQL.

Step 2: Hunt Down the Sensitive Stuff

Now it’s time to play detective in your database. What sensitive data does it hide? Luckily, information_schema can give you a behind-the-scenes look at your tables and columns. You can use it to search for the usual suspects—columns with names like “ssn,” “email,” or “credit_card.”

Here’s a quick query to do just that:

SELECT table_name, column_name, data_type 
FROM information_schema.columns 
WHERE column_name ILIKE '%ssn%' 
   OR column_name ILIKE '%credit_card%' 
   OR column_name ILIKE '%email%';

Once you find those sensitive columns, don’t just make a mental note—go ahead and leave a sticky note for your future self with a COMMENT:

COMMENT ON COLUMN customers.ssn IS 'Sensitive: Personally Identifiable Information';
COMMENT ON COLUMN transactions.credit_card IS 'Sensitive: Payment Information';

Nice! With your data properly labeled like a pro, let’s keep it safe.

Step 3: Lock It Down

Just because something’s in the database doesn’t mean everyone should be able to see it. Now it’s time to set some ground rules. SQL lets you control who can see what with role-based access.

Say you’ve got an analyst who needs access to basic info but not sensitive data. You can give them a read-only role and let them peek at certain columns only:

CREATE ROLE analyst;

REVOKE ALL ON TABLE customers FROM PUBLIC;
REVOKE ALL ON TABLE customers FROM analyst;

GRANT SELECT (name, email) ON TABLE customers TO analyst;

Boom. They can see what they need and nothing more.

Want even tighter security? Row-level security lets you control access at—yep, the row level. For example, with the query below, users will only see data that belongs to them (matching their own customer_id), and not anyone else’s:

ALTER TABLE customers ENABLE ROW LEVEL SECURITY;

CREATE POLICY customer_policy 
ON customers 
FOR SELECT 
USING (customer_id = current_setting('app.current_user')::INTEGER);

Now you’re not just organizing data—you’re securing it.

So far we’ve just been using comments to mark sensitive data—which works fine. But what if you want to run a report or search through all your classified data in one go?

Creating a searchable table of your data classification levels makes it way easier to manage, especially as your system grows. Let’s do that:

CREATE TABLE data_classification (
    table_name TEXT,
    column_name TEXT,
    classification TEXT
);

INSERT INTO data_classification VALUES
('customers', 'ssn', 'Sensitive'),
('transactions', 'credit_card', 'Sensitive'),
('employees', 'salary', 'Confidential');

Now, if you want to pull up everything marked as Sensitive, it’s just a single query away:

SELECT * FROM data_classification WHERE classification = 'Sensitive';

Neat, right? Clean and simple.

Step 5: Stop Problems Before They Happen

Even with all this organization, mistakes can happen. Someone might accidentally drop sensitive data in the wrong place. That’s where triggers come to the rescue—they’re like bouncers, checking every new entry and blocking anything sketchy.

Let’s say SSNs are only allowed in the customers table. You can create a trigger to enforce that rule:

CREATE OR REPLACE FUNCTION prevent_ssn_in_wrong_table()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.ssn IS NOT NULL THEN
        RAISE EXCEPTION 'SSNs can only be stored in the customers table!';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER ssn_check 
BEFORE INSERT OR UPDATE ON employees
FOR EACH ROW 
EXECUTE FUNCTION prevent_ssn_in_wrong_table();

Now if someone tries to sneak an SSN into the wrong table, it’s blocked on the spot!

Step 6: Keep an Eye on Things with Data + AI Observability

Classifying your data isn’t a one-and-done job. Databases change. New tables pop up. Someone renames a column or accidentally changes permissions. It happens. That’s why data + AI observability is a game-changer.

With a tool like Monte Carlo’s data observability platform, you can keep tabs on your data without manually checking every little thing. It can alert you when:

  • Someone creates a new table with sensitive data and forgets to classify it.
  • Access rules are changed in ways they shouldn’t be.
  • Sensitive data shows up where it doesn’t belong.

Basically, it’s like having a smart assistant who never sleeps and is always watching your data (in a totally non-creepy way). If you’re serious about keeping your data classification tight and tidy, it’s worth checking out. You can even book a demo with just an email.

Our promise: we will show you the product.