top of page
Antares Studio Games Icon_Small.png

The Antares Alien

Power BI   |  DAX

Writer's pictureBrent Jones

Power BI: How to Remove Duplicates and Keep Most Recent?

Updated: Aug 8



In this post, I'll show you how you can use Power Query within Power BI to remove duplicates while keeping the most recent item (or the max, min, etc.). This post uses Power Query Lists to solve our issue. For more details on lists, check out my blog post here.


Why is this helpful? Let's say you have a situation where new data is appended to a spreadsheet instead of overwriting it. This introduces duplicate rows in your dataset. Simply removing duplicates may inadvertently remove the new data, causing inaccurate results. This is particularly troublesome if the data source has the potential to be sorted in unexpected ways.



 

The Alien says:

The solution is to use a combination of grouping rows, extracting a value, then performing a number of checks. But don't worry! It's nothing too complicated.




 


Steps

Here's our sample dataset. We want only the products with the most recent date (highlighted below).

  • Fake hotdogs with qty of 6 on 1/5/2021 (row 5)

  • Old hamburgers with qty of 9 on 1/5/2021 (row 8)



1. Group by the product, and a new column with All Rows.

Here is what that looks like:


2. Copy the "Data" column so that you have two columns for Data:


3. Extract the latest date with List.Max.

To do this, click on the arrows and use the "Aggregate" function. There is no Max function, so just select Count of Date.



4. After that, change the Count function in the formula bar to List.Max instead. Feel free to change the name of the column as well.


5. Now expand the copied data column.


6. Add a "Custom Column" to check if the [Date] column equals the [Latest Date] column.


7. Now, just filter to True on this column:


Now you have a dataset with removed duplicates while keeping the latest record.


 

Here is the full M code if you get stuck anywhere (you'll need to create your own dataset).

let
    // Group
    #"Grouped Rows" = Table.Group(
            Source, 
            {"Product"}, 
            {
                {
                    "Data", 
                    each _, 
                    type table [Date=nullable date, 
                    Product=nullable text, Qty=nullable number]
                }
            }),
    
    // Duplicate the data
    #"Duplicated Column" = Table.DuplicateColumn(
    #"Grouped Rows", "Data", "Data - Copy"),
    
    // Extract the max date
    #"Extract Max Date" = Table.AggregateTableColumn(
    #"Duplicated Column", "Data", 
        {
            {"Date", List.Max, "Latest Date"}
        }),
    
    // Expand the data
    #"Expanded Data - Copy" = Table.ExpandTableColumn(
        #"Extract Max Date", 
        "Data - Copy", 
        {"Date", "Qty"}, 
        {"Date", "Qty"}
        ),
    
    // Add the check to see if the latest date equals the record date
    #"Added Custom" = Table.AddColumn(
        #"Expanded Data - Copy", 
        "Latest Date Check", 
        each if [Latest Date] = [Date] then true else false, Logical.Type
        ),
    
    // Filter on True records
    #"Filtered Rows" = Table.SelectRows(
        #"Added Custom", 
        each ([Latest Date Check] = true)
        )
in
    #"Filtered Rows"

Hope that helps!


 






Lists are a powerful feature of Power Query! They can help in many unique ways (in fact, lists helped us with this issue of removing duplicates). Check out this blog to see how they can help in more ways, like getting a dynamic column sort order.


 

Comentários


  • Facebook
  • Twitter
  • Instagram
bottom of page