Conditional Merge in Power Query

Doing a merge operation in Power Query is pretty simple, just click the Merge Queries button, select your tables and field(s) and away you go. But what if you want to merge Table A to a subset of Table B? You can do it, but you need to tweak your M code manually in the Advanced Editor.

In this example I will use the WideWorldImportersDW sample database Microsoft offers. You should have a copy of SQL Server installed to play with for Power BI. The Developer version of SQL Server is 100% functional and free forever. You just cannot use it in production.

I am going to use two tables for this example, Fact.Sales and Dim.Customer. I only want my Fact Sales table to have customer info for the Buying Group Tailspin Toys. That info is in the Dim Customer table. I can do this several ways.

  1. Do the merge, expand the Buying Group column from the Customer table, then filter to only show those rows.

  2. Pre-filter the Customer table for Tailspin Toys in the Buying Group column, then do the merge. It would need to be an Inner Join, otherwise you will get nulls in the Sales table when you expand, and then you have to filter those out. Which you can do, but it is more steps.

  3. You can bypass all of that and do it in one step.

I’m not going to bore you with how to do methods 1 and 2. Let’s do method 3, a conditional join!

First, create the merge as you normally would. In the Sales table, select Merge Queries from the ribbon, then select the Customer table. Select the Customer Key fields in both. Also make sure this is an inner join so it will remove all records from the Sale Table that are not Tailspin Toys. Your Merge dialog box will look like this:

Inner join dialog box

Inner join dialog box

Now click Ok. It will do the merge. Now we have to fix this so it is a conditional join. When you did the merge, Power Query wrote the M code shown below for you, which you can see in the Advanced Editor. Note that I put in the line feeds to make this more readable. I have also already expanded the Buying Group column. The “Merged Queries” step is what we want to dig into

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            #"Dimension Customer", 
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

Table.NestedJoin has the following parameters per MS documentation:

  1. table1 as table,

  2. key1 as any,

  3. table2 as any,

  4. key2 as any,

  5. newColumnName as text,

  6. optional joinKind as nullable number,

  7. optional keyEqualityComparers as nullable list) as table

We want to tweak 3rd parameter - table2, which is our Customer table, We don’t want the full table2, we only want table2, filtered for Topspin Toys in the Buying Group field.

So in the code above, we need to replace the 3rd parameter which is just a reference to #”Dimension Customer” - the name of the Customer Table. (It is just Dimension Customer - the #”name here” syntax is just how Power Query references objects with spaces and special characters.)

We want to replace that with this code:

Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys")

So the full M code in the advanced editor becomes this:

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys"),
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

That’s it. You are done. And if you are doing this on SQL Server, you can right-click the Expanded Dimension Customer step, the final step where you expand the Buying Group column and see that View Native Query is showing. This means query folding is working. It generates the following SQL code. (I’ve removed some columns from the Sales to make the code shorter)

select [$Outer].[Sale Key] as [Sale Key],
    [$Outer].[City Key] as [City Key],
    [$Outer].[Customer Key2] as [Customer Key],
    [$Outer].[Bill To Customer Key] as [Bill To Customer Key],
    [$Outer].[Stock Item Key] as [Stock Item Key],
    [$Outer].[Invoice Date Key] as [Invoice Date Key],
    [$Outer].[Delivery Date Key] as [Delivery Date Key],
    [$Outer].[Salesperson Key] as [Salesperson Key],
    [$Outer].[WWI Invoice ID] as [WWI Invoice ID],
    [$Outer].[Description] as [Description],
    [$Inner].[Buying Group] as [Buying Group]
from 
(
    select [Sale Key] as [Sale Key],
        [City Key] as [City Key],
        [Customer Key] as [Customer Key2],
        [Bill To Customer Key] as [Bill To Customer Key],
        [Stock Item Key] as [Stock Item Key],
        [Invoice Date Key] as [Invoice Date Key],
        [Delivery Date Key] as [Delivery Date Key],
        [Salesperson Key] as [Salesperson Key],
        [WWI Invoice ID] as [WWI Invoice ID],
        [Description] as [Description]
    from [Fact].[Sale] as [$Table]
) as [$Outer]
inner join 
(
    select [_].[Customer Key],
        [_].[WWI Customer ID],
        [_].[Customer],
        [_].[Bill To Customer],
        [_].[Category],
        [_].[Buying Group],
        [_].[Primary Contact],
        [_].[Postal Code],
        [_].[Valid From],
        [_].[Valid To],
        [_].[Lineage Key]
    from [Dimension].[Customer] as [_]
    where [_].[Buying Group] = 'Tailspin Toys'
) as [$Inner] on ([$Outer].[Customer Key2] = [$Inner].[Customer Key])

Since 100% of the work is done on the SQL Server, it will be very fast compared to how it would run if the Power Query mashup engine on your PC or On-Premise Gateway would process it.

2020-09-04 16_37_41-.png

If you are using some other source, like text files or Excel files, folding won’t happen of course, but it should still perform well as this is single filter applied, not a filter applied for every row of either table.

I cannot say that this method is any faster than doing it the longer ways (methods #1 and #2 above) but it is shorter code. Plus, you can get fancier with the Table.SelectRows() function we used by filtering on multiple fields using and/or criteria. You may break folding if you get too fancy and the Power Query engine cannot figure the SQL out, so be careful. If performance is a must, methods #1 or #2 are more likely to fold with many filters in the condition, but you won’t know until you try. Happy Querying!