Case Insensitive Merges In Power Query
/Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything. It results in data that looks like this:
There are at least 2 ways to fix this before merging:
Manually fix the data at the source. {shudder}
Use one of the Power Query functions that will adjust the case - Text.Proper, Text.Lower, or Text.Upper.
The first is just unacceptable, unless you are a manager and can tell whomever entered it that way to go fix it. đ
The second isnât ideal because neither upper nor lower is right, and proper case will not correctly adjust names like Joe McGuinness. Text.Proper(âJoe McGuinnessâ) returns Joe Mcguinness.
Unfortunately, in the Table.NestedJoin() function documentation, there is no obvious way to do a join that ignores case. There is an optional parameter called keyEqualityComparers but believe it or not, that is never defined anywhere on Microsoftâs site. If you search for it, it will only show up as options for some functions, but there is no guidance on how to use it, and it may not help even if there were.
The good new is, in Power Query for both Power BI and Excel, there is a relatively new feature called âFuzzy Merge.â It allows you to match names like Will, Bill, and William in merges. You have to tweak the sensitivity or the matches either donât work, or return way too many matches.
But there is a way to use Fuzzy Merge to do exact matches, yet ignore case sensitivity.
You can see below if I merge the normal case table to the CAPSLAPPER table, I get no matches. It shows 0 of 10 records matched.
But if you do the following, youâll get a perfect match on all records:
Check âUse fuzzy matchingâŚâ at the bottom of the Join Kind section
Expand the Fuzzy Matching options. You must do this. If you donât, you will get a 10 out of 10 match in this sample data, but you will also get other matches that are not exact in a larger data set. For example, If I added a âMERLE BERBERâ to the table, it would return as a match for âMerle Barberâ - which we do not want.
Change the Similarity Threshold to 1. The default is 0.8, and that is why in step 2 above youâll get more matches than you want. By setting it to 1, it becomes an exact match.
Check the âIgnore Caseâ box. This is where the magic happens.
Uncheck the âMatch by Combining Partsâ box, and ignore the remaining boxes in the Fuzzy Matching Options. They arenât needed for this.
Now your merge will show a 10 out of 10 match, without having to change the source data or change the case within Power Query.