Case Insensitive Merges In Power Query
/Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything. It results in data that looks like this:
There are at least 2 ways to fix this before merging:
Manually fix the data at the source. {shudder}
Use one of the Power Query functions that will adjust the case - Text.Proper, Text.Lower, or Text.Upper.
The first is just unacceptable, unless you are a manager and can tell whomever entered it that way to go fix it. 😉
The second isn’t ideal because neither upper nor lower is right, and proper case will not correctly adjust names like Joe McGuinness. Text.Proper(“Joe McGuinness”) returns Joe Mcguinness.
Unfortunately, in the Table.NestedJoin() function documentation, there is no obvious way to do a join that ignores case. There is an optional parameter called keyEqualityComparers but believe it or not, that is never defined anywhere on Microsoft’s site. If you search for it, it will only show up as options for some functions, but there is no guidance on how to use it, and it may not help even if there were.
The good new is, in Power Query for both Power BI and Excel, there is a relatively new feature called “Fuzzy Merge.” It allows you to match names like Will, Bill, and William in merges. You have to tweak the sensitivity or the matches either don’t work, or return way too many matches.
But there is a way to use Fuzzy Merge to do exact matches, yet ignore case sensitivity.
You can see below if I merge the normal case table to the CAPSLAPPER table, I get no matches. It shows 0 of 10 records matched.
But if you do the following, you’ll get a perfect match on all records:
Check “Use fuzzy matching…” at the bottom of the Join Kind section
Expand the Fuzzy Matching options. You must do this. If you don’t, you will get a 10 out of 10 match in this sample data, but you will also get other matches that are not exact in a larger data set. For example, If I added a “MERLE BERBER” to the table, it would return as a match for “Merle Barber” - which we do not want.
Change the Similarity Threshold to 1. The default is 0.8, and that is why in step 2 above you’ll get more matches than you want. By setting it to 1, it becomes an exact match.
Check the “Ignore Case” box. This is where the magic happens.
Uncheck the “Match by Combining Parts” box, and ignore the remaining boxes in the Fuzzy Matching Options. They aren’t needed for this.
Now your merge will show a 10 out of 10 match, without having to change the source data or change the case within Power Query.