A Practical Guide to Primary Keys, Foreign Keys, and Unique Keys for Data Analysts 

Table of Contents

    If you’re diving into the world of SQL as a data analyst, you’ve likely come across terms like primary, foreign, and unique keys. At first glance, these concepts might seem like abstract rules for database design, but they play an important role in organizing and managing data effectively. Understanding how these keys work is a skill that helps you unlock cleaner datasets, avoid common pitfalls, write better queries, and maintain a robust relational database structure.

    This guide aims to demystify these database fundamentals. Whether you’re just starting out or brushing up on the basics, we’ll explain what these keys are, why they matter, and how to use them practically in your work.

    Quick Answer: Primary Keys vs Foreign Keys vs Unique Keys 

    Here’s a high-level overview of the differences between primary keys vs foreign keys vs unique keys:

    AspectPrimary KeyForeign KeyUnique Key
    PurposeUniquely identifies each row in a tableEstablishes a relationship between tablesEnsures all values in a column are unique
    UniquenessMust be uniqueCan have duplicate valuesMust be unique
    Null ValuesCannot contain NULL valuesCan contain NULL valuesCan contain one or more NULL values
    RelationshipNo direct relationship with other tablesReferences the Primary Key in another tableNo direct relationship with other tables
    Number AllowedOnly one per tableCan have multiple per tableCan have multiple per table
    Index CreationAutomatically creates a clustered indexMay or may not create an index automaticallyAutomatically creates a non-clustered index
    Example Useid column in a users tableuser_id in an orders table linking to users(id)email in a users table (for uniqueness)

    What Is a Primary Key?

    A primary key is a column or group of columns in a database table. It identifies each row in the table, prevents duplicate records, and makes every row unique. For example, in a customer database, the primary key might be a unique customer ID.

    Characteristics of a Primary Key

    • Unique Values: Every value in a primary key must be unique to identify each record.
    • No Nulls: A primary key cannot have null values since it must always reference a specific record.
    • Single or Composite: It can be a single column or a combination of multiple columns, called a composite key.
    SQL
    CREATE TABLE Category    ( Id int primary key identity(1,1),    CategoryName varchar(255) not null );

    What Is a Foreign Key?

    A foreign key is a column or group of columns in one table that connects to the primary key in another table. This link keeps data consistent between the two tables and lets you relate information.

    Characteristics of a Foreign Key

    • Referential Integrity: Every value in a foreign key must match a record in the linked table, keeping the relationship valid.
    • Duplicates Allowed: Foreign keys can have duplicate values, unlike primary keys.
    • Null Values: Foreign keys can include nulls if the relationship isn’t required.
    SQL
    CREATE TABLE Product (
        Id INT IDENTITY(1,1) PRIMARY KEY,
        ProductName NVARCHAR(50) NOT NULL,
        SupplierId INT NOT NULL,
        UnitPrice DECIMAL(12,2) NULL DEFAULT 0,
        Package NVARCHAR(30) NULL,
        IsDiscontinued BIT NOT NULL DEFAULT 0,
        CategoryId INT NOT NULL,
        CONSTRAINT FK_Product_Category FOREIGN KEY (CategoryId) REFERENCES Category(Id)
    );

    What Is a Unique Key?

    A unique key ensures that the values in a column or group of columns are unique across a table. Unlike a primary key, it allows null values, which aren’t treated as duplicates.

    Characteristics of a Unique Key

    • Prevents Duplicates: No two rows can have the same value in the unique key column(s).
    • Supports Nulls: Null values are allowed unless specifically restricted.
    • Supports Multiple Unique Keys: A table can have multiple unique keys, enabling distinct constraints on various columns or groups of columns.
    SQL
    CREATE TABLE Customer (
        Id INT IDENTITY(1,1) PRIMARY KEY,
        FirstName VARCHAR(40) NOT NULL,
        LastName VARCHAR(40) NOT NULL,
        CustAddress VARCHAR(40) NULL,
        Phone VARCHAR(20) UNIQUE
        );

    The Difference between Primary Keys vs Foreign Keys vs Unique Keys

    A primary key identifies each record in a table, while a foreign key connects records between two tables by referencing the primary key of another table. For example, in a “Categories” table, each category might have a unique “CategoryID,” such as “1” for Electronics or “2” for Clothing. In a “Products” table, the “CategoryID” would act as a foreign key to show each product’s category.

    Foreign keys don’t need to be unique because they’re used to create links between tables. This means multiple records in one table can refer to the same primary key in another. For example, a “Products” table might list a laptop, a smartphone, and a tablet, all with the same “CategoryID” of “1,” connecting them to the Electronics category. This setup allows databases to organize and connect related data while handling one-to-many relationships effectively.

    A unique key is similar to a primary key because both are used to ensure values in a column are unique. However, a primary key does not allow null values, while a unique key can include nulls. This means a unique key is useful for optional fields as long as duplicate values are avoided.

    A table can have only one primary key, but it can have multiple unique keys. For example, in the Customer table, ID is the primary key, uniquely identifying each customer. However, the table also has two unique keys: Phone and Email. These fields allow the storage of unique values for contact information but can be null if not provided.

    SQL
    CREATE TABLE Customer (
        ID INT IDENTITY(1,1) PRIMARY KEY,
        Phone VARCHAR(20) UNIQUE,
        Email VARCHAR(30) UNIQUE,
        FirstName VARCHAR(40) NOT NULL,
        LastName VARCHAR(40) NOT NULL,
        CustAddress VARCHAR(40) NULL
    );

    Best Practices for Designing Keys: A Simple Guide

    When building a database, planning your keys carefully is essential for keeping your data accurate, consistent, and easy to manage. This guide uses clear examples and straightforward language to explain the best practices for working with primary, foreign, and unique keys.

    Primary Key Best Practices

    A good primary key is stable, simple, and doesn’t change over time. The following primary key best practices will help you design a database that performs optimally over time:

    Opt for Surrogate Keys Over Natural Keys

    A surrogate key is a system-generated identifier, like an auto-incrementing number, that is compact, stable, and separate from business logic. On the other hand, natural keys are meaningful attributes, such as an order number or email address, used to uniquely identify records. However, natural keys are more prone to changes and can lead to complications in the database, such as requiring updates across related tables.

    SQL
    CREATE TABLE "Order" (
        Id INT IDENTITY(1,1) PRIMARY KEY,
        OrderDate DATETIME NOT NULL DEFAULT GETDATE(),
        OrderNumber NVARCHAR(10) NULL,
        CustomerId INT NOT NULL,
        TotalAmount DECIMAL(12,2) NULL DEFAULT 0
    );

    For example, in the Order table, the Id column is a surrogate key that auto-increments for every new order. This approach is better than using a field like OrderNumber as the primary key because OrderNumber might need to be updated if business rules change. Using Id as the primary key ensures that the database remains efficient and avoids the risk of cascading updates.

    In related tables, such as an OrderItems table, the Id from the Order table can be used as a foreign key to establish relationships. This setup keeps the database structure stable and simplifies management while allowing flexibility to store additional details, such as the OrderNumber, in separate columns.

    Leverage Auto-Increment and Global Identifiers Appropriately

    Auto-incrementing integers work well in single-node systems where all data is managed in one place. They create simple, sequential numbers (e.g., 1, 2, 3) for each new record, which makes them fast and easy to use without any risk of duplication. For example, a small bakery assigning order numbers 1 for Alice, 2 for Bob, and 3 for Charlie can rely on auto-incrementing IDs because everything is processed in the same system.

    In distributed systems, where data is created on multiple servers or nodes, GUIDs or UUIDs are a better choice. These long, randomly generated codes (e.g., 4f6ac0d1-45e2-4aaf-b3f9-df00c7791fda) prevent duplicate IDs, even when systems operate independently. For instance, if the bakery expands to multiple branches in different cities, each branch might generate the same order numbers. Using GUIDs solves this by giving every order a unique identifier, no matter where it’s created.

    For systems on a single server, auto-incrementing integers are simple and efficient. For systems spread across multiple locations or servers, GUIDs or UUIDs are the better option to keep IDs unique and avoid conflicts.

    Avoid Changing Primary Keys

    Once set, primary keys should remain constant. Changing them can create cascading updates across related tables and compromise database integrity. For instance, if a school database uses RollNo-001 as the primary key and later changes it to include department codes (e.g., CS-001), this requires updates in all related tables like Grades and Attendance, increasing the risk of errors. A better approach is to use an immutable primary key, such as an auto-incremented StudentID while storing roll numbers in a separate column.

    Foreign Key Best Practices

    Using foreign keys correctly is crucial to maintaining data integrity, improving performance, and avoiding unnecessary complications in your database.

    Add Indexes for Faster Joins

    Indexes on foreign key columns make queries faster and more efficient, especially when joining tables in a database. Foreign keys connect related tables but joins can slow down as data grows if the database scans entire tables to match rows. Indexing foreign key columns helps the database find matching rows quickly, reducing query time. 

    For instance, idx_orders_customerid makes it faster to retrieve or filter orders in the Order table based on customer data from the Customer table. Similarly, idx_Product_SupplierId improves the performance of queries that connect the Product table to the Supplier table, making it easier to get supplier information for products. These indexes are especially useful for maintaining efficiency as the database grows.

    SQL
    -- Create indexes on foreign keys
    CREATE INDEX idx_orders_customerid ON "Order" (CustomerId);
    CREATE INDEX idx_Product_SupplierId ON Product (SupplierId);

    The second query joins the Order and Customer tables using CustomerId and filters orders with TotalAmount > 1000. An index on CustomerId speeds up row matching and filtering.

    SQL
    SELECT 
        o.ID AS OrderId,
        c.FirstName,
        c.LastName,
        o.OrderDate,
        o.TotalAmount
    FROM 
        "Order" o
    INNER JOIN 
        Customer c ON o.CustomerID = c.ID
    WHERE 
        o.TotalAmount > 1000;

    The third query joins the Product and Supplier tables using SupplierId and filters active products (IsDiscontinued = 0). An index on SupplierId enhances both join and filter performance.

    SQL
    SELECT 
        p.ID AS ProductId,
        p.ProductName,
        p.UnitPrice,
        s.CompanyName AS SupplierName,
        s.ContactName
    FROM 
        Product p
    INNER JOIN 
        Supplier s ON p.SupplierId = s.Id
    WHERE 
        p.IsDiscontinued = 0;

    Use Cascading Actions with Caution

    Cascading actions, such as updates or deletes, automatically apply changes to related records. While helpful, they can lead to unintended data loss. For instance, deleting a class might also delete its enrollments, which makes sense. However, applying cascading deletes to a teachers table might remove all classes tied to a teacher, which may not be intended. To use cascading actions effectively:

    • Apply cascading deletes only when child data is irrelevant without the parent.
    • Avoid cascading updates by using stable primary keys.
    • Always test these rules in a controlled environment to prevent unexpected issues.
    SQL
    CREATE TABLE "Order" (
        ID INT IDENTITY(1,1) PRIMARY KEY,
        OrderDate DATETIME NOT NULL DEFAULT GETDATE(),
        OrderNumber NVARCHAR(10) NULL,
        CustomerID INT NOT NULL,
        TotalAmount DECIMAL(12,2) NULL DEFAULT 0,
        CONSTRAINT FK_Order_Customer FOREIGN KEY (CustomerId)
            REFERENCES Customer(Id)
            ON DELETE CASCADE
    );

    Unique Key Best Practice

    Composite unique keys are helpful when you must ensure a combination of values across multiple columns is unique. For example, in an OrderItem table, you might want to prevent duplicate entries for the same product in the same order. You can achieve this by adding a composite unique key on OrderId and ProductId.

    Here’s how it works:

    SQL
    ALTER TABLE OrderItem
    ADD CONSTRAINT UQ_OrderItem_OrderId_ProductId
    UNIQUE (OrderId, ProductId);

    This rule ensures that the combination of OrderId and ProductId is unique. It allows the same product to appear in different orders but prevents duplicate entries in the same order. If you try to insert a duplicate combination, the database will reject it. 

    Using composite unique keys keeps your data accurate and handles this kind of validation directly in the database, so you don’t need extra checks in your application code. It’s an efficient way to align your database with real-world requirements.

    Common Challenges and Solutions: A Detailed Guide

    Relational databases often present challenges with keys, particularly as they grow and change over time. Below, we’ll look at some of the most common issues and practical ways to solve them effectively.

    Handling Nulls in Unique Keys

    When working with unique keys, handling null values depends on your business rules and database requirements. You can allow multiple nulls, restrict to a single null, or avoid nulls entirely by using a default value:

    • Allow Nulls: Use a unique constraint on the column to enforce uniqueness for non-null values, while allowing multiple nulls. This is useful when nulls represent missing information that doesn’t require further restrictions.
    SQL
    CREATE TABLE Users (
        UserId INT IDENTITY(1,1) PRIMARY KEY,
        Email NVARCHAR(255) UNIQUE, -- Allows nulls and ensures unique values
        FirstName NVARCHAR(50) NOT NULL,
        LastName NVARCHAR(50) NOT NULL
    );
    • Restrict Nulls: Add a filtered unique index to allow only one null value in the column. This approach is ideal when your logic permits only a single instance of missing data.
    SQL
    CREATE UNIQUE INDEX idx_users_email 
    ON Users (Email) 
    WHERE Email IS NOT NULL;
    • Avoid Nulls: Set a default value (e.g., “N/A”) for missing data. This ensures all records have a value while still maintaining uniqueness for actual entries.
    SQL
    CREATE TABLE Users (
        UserId INT IDENTITY(1,1) PRIMARY KEY,
        Email NVARCHAR(255) UNIQUE DEFAULT 'N/A', -- Default value for missing emails
        FirstName NVARCHAR(50) NOT NULL,
        LastName NVARCHAR(50) NOT NULL
    );

    Each method provides a different way to manage nulls while respecting the unique constraint, allowing you to align database behavior with specific business needs.

    Cascading Actions with Foreign Keys

    Cascading actions like ON DELETE CASCADE or ON DELETE SET NULL help maintain relationships between tables but can sometimes lead to problems, such as unexpected data loss or orphaned records. You can handle these issues using table-specific cascading actions or centralized management through stored procedures.

    Centralized Cascading Management Using Stored Procedures

    This method uses stored procedures to manage cascading actions for multiple tables and relationships. It’s helpful when you need consistent rules across environments like development and production.

    Benefits
    • Simplifies management by centralizing cascading rules.
    • Reduces repetitive setup for cascading constraints.
    • Keeps behavior consistent across all related tables.
    SQL
    CREATE PROCEDURE AddCascadeConstraints
    AS
    BEGIN
        BEGIN TRY
            -- Add cascade constraint between "Order" and Customer
            ALTER TABLE "Order"
            ADD CONSTRAINT FK_Order_Customer
            FOREIGN KEY (CustomerId) REFERENCES Customer(Id)
            ON DELETE CASCADE
            ON UPDATE CASCADE;
    
            -- Add cascade constraint between OrderItem and "Order"
            ALTER TABLE OrderItem
            ADD CONSTRAINT FK_OrderItem_Order
            FOREIGN KEY (OrderId) REFERENCES "Order"(Id)
            ON DELETE CASCADE
            ON UPDATE CASCADE;
    
            -- Add cascade constraint between Product and Category
            ALTER TABLE Product
            ADD CONSTRAINT FK_Product_Category
            FOREIGN KEY (CategoryId) REFERENCES Category(Id)
            ON DELETE CASCADE
            ON UPDATE CASCADE;
    
            PRINT 'Foreign key constraints with cascading actions added successfully.';
        END TRY
        BEGIN CATCH
            PRINT 'An error occurred while adding constraints.';
            PRINT ERROR_MESSAGE();
                END CATCH;
    END;
    
    EXEC AddCascadeConstraints;

    Granular Table-Specific Cascading Actions

    This approach allows you to control how child records behave when a parent record is deleted or updated. For example, SET NULL can preserve child records by clearing the foreign key reference without removing the rows. The following steps can help you resolve this problem:

    • Create Tables with Cascading Actions
    SQL
    CREATE TABLE Customer (
        Id INT PRIMARY KEY IDENTITY(1,1),
        FirstName VARCHAR(50),
        LastName VARCHAR(50)
    );
    CREATE TABLE Orders (
        Id INT PRIMARY KEY IDENTITY(1,1),
        OrderDate DATETIME NOT NULL,
        CustomerId INT,
        TotalAmount DECIMAL(12,2),
        CONSTRAINT FK_Order_Customer FOREIGN KEY (CustomerId)
            REFERENCES Customer(Id)
            ON DELETE SET NULL
    );
    • Insert Data
    SQL
    INSERT INTO Customer (FirstName, LastName) VALUES ('John', 'Doe');
    INSERT INTO Customer (FirstName, LastName) VALUES ('Jane', 'Smith');
    
    INSERT INTO Orders (OrderDate, CustomerId, TotalAmount) VALUES ('2025-01-01', 1, 100.00);
    INSERT INTO Orders (OrderDate, CustomerId, TotalAmount) VALUES ('2025-01-02', 2, 150.00);
    • Delete Parent Row
    SQL
    DELETE FROM Customer WHERE Id = 1;

    After deleting a customer with Id = 1, the CustomerId column in the Orders table is set to NULL for related orders. This keeps the order data while removing the connection to the deleted customer.

    Scalability with Composite Keys

    Managing relationships and keeping data unique in a growing database can become difficult when composite keys are used. These keys rely on multiple columns, making queries harder to write and reducing performance as the database grows. The following steps show how to simplify your database design and handle scalability using surrogate keys and clear row structures.

    Replace Composite Keys with Surrogate Keys

    Composite keys, like combining category_name and subcategory_name, often make queries and indexing more complicated than necessary. Using a surrogate key, such as an auto-incrementing identifier, simplifies the database structure and makes it easier to work with.

    SQL
    CREATE TABLE Category (
        Id INT PRIMARY KEY IDENTITY(1,1),
        CategoryName VARCHAR(255) NOT NULL
    );

    The Id column is a unique, auto-generated key for each category. Using a single-column key makes queries and joins faster and simpler to manage. This design avoids the complications of multi-column keys and makes the database easier to grow over time. It also reduces query complexity when working with related tables like Product or Orders, as relationships are managed through a single key instead of a combination of columns.

    Use Separate Rows for Simplicity

    Breaking data into individual rows with unique keys eliminates the need for composite keys while keeping relationships easy to follow.

    SQL
    INSERT INTO Category (CategoryName) VALUES ('Electronics');
    INSERT INTO Category (CategoryName) VALUES ('Clothing');
    INSERT INTO Category (CategoryName) VALUES ('Books');
    INSERT INTO Category (CategoryName) VALUES ('Furniture');
    INSERT INTO Category (CategoryName) VALUES ('Groceries');
    INSERT INTO Category (CategoryName) VALUES ('Toys');
    INSERT INTO Category (CategoryName) VALUES ('Beauty');
    INSERT INTO Category (CategoryName) VALUES ('Sports');
    INSERT INTO Category (CategoryName) VALUES ('Automotive');
    INSERT INTO Category (CategoryName) VALUES ('Health');

    Each category is stored in its own row with a unique Id assigned automatically. This approach removes the need for multi-column constraints and keeps data clear and organized. Adding or changing a category is straightforward because each is tied to a single row. This makes the database easier to maintain and less prone to errors as it scales.

    Conclusion

    Learning the differences between primary, foreign, and unique keys is essential for creating a functional and efficient database. Each key has a specific role: primary keys uniquely identify records, foreign keys connect tables, and unique keys prevent duplicate values. Knowing how and when to use them helps keep your database organized and reliable.

    Following best practices for these keys is just as important. Techniques like replacing composite keys with surrogate keys, indexing foreign keys, and managing null values in unique keys make your database easier to work with and improve its performance. These steps reduce complexity, prevent errors, and support scalability as your data grows.

    Optimizing your database design with these keys helps it handle large datasets efficiently and stay manageable over time. A well-structured database performs better, supports your current needs, and is ready for future growth.