MySQL DISTINCT


The DISTINCT keyword in MySQL is used to remove duplicate rows from the result set of a query. It ensures that the returned data contains only unique values for the specified columns. This is useful for eliminating redundancy and ensuring that the results contain distinct records.

Syntax

SELECT DISTINCT column1, column2, ... FROM table_name;
  • column1, column2, ...: The columns for which to return distinct values. If multiple columns are specified, the combination of values in these columns must be unique for each row in the result set.

Key Concepts

  1. Selecting Unique Rows

    To retrieve unique rows based on one or more columns:

    SELECT DISTINCT department FROM employees;

    This query returns a list of unique departments from the employees table, eliminating any duplicate department names.

  2. Multiple Columns

    To get unique combinations of values across multiple columns:

    SELECT DISTINCT department, job_title FROM employees;

    This query returns unique pairs of department and job_title, meaning each combination of department and job title will be distinct.

  3. Using DISTINCT with Aggregate Functions

    You can use DISTINCT inside aggregate functions to count unique values:

    SELECT COUNT(DISTINCT department) AS unique_departments FROM employees;

    This query counts the number of unique departments in the employees table.

  4. Handling NULL Values

    The DISTINCT keyword considers NULL values as distinct. If you have multiple rows with NULL values in the same column, DISTINCT will treat each NULL as a single unique value:

    SELECT DISTINCT department FROM employees;

    If there are multiple rows where the department column is NULL, the result will include a single NULL value.

  5. Performance Considerations

    • Indexing: Using DISTINCT can impact query performance, especially with large datasets. Indexing the columns used with DISTINCT can help improve performance.
    • Query Complexity: For complex queries with multiple joins, the use of DISTINCT can be computationally expensive. Optimize queries to minimize the impact on performance.
  6. Examples

    • Single Column Distinct

      SELECT DISTINCT city FROM customers;

      This query retrieves a list of unique cities from the customers table.

    • Multiple Columns Distinct

      SELECT DISTINCT first_name, last_name FROM employees;

      This query retrieves unique combinations of first_name and last_name from the employees table.

    • Distinct with Conditions

      SELECT DISTINCT city FROM customers WHERE country = 'USA';

      This query retrieves unique cities from the customers table where the country is 'USA'.

Best Practices

  • Use with Care: Use DISTINCT when necessary to avoid unnecessary performance overhead. Ensure it is used when genuinely needed to eliminate duplicates.
  • Index Columns: Index columns used with DISTINCT to improve performance, particularly for large tables.
  • Combine with Other Clauses: Combine DISTINCT with other clauses like WHERE, JOIN, and ORDER BY to refine and sort the results.