Working With Multiple Row Headers From Excel in Power Query

It is fairly common for users to format Excel reports with headers that are comprised of two or more rows for the header, rather than using a single cell with word wrap on. I’ve seen text files with similar issues as well. Consider the following example:

ExcelMultiRowHeader1.png

Getting this info into Power Query can sometimes be a challenge. I’m going to show you two ways to do it. The first way will be to do it manually mostly using the Power Query user interface. The second way will be a custom function that will do all of the work for you. For simplicity’s sake, I’ll use Power Query in Excel for this since my data is in Excel already, but the same logic would hold if you were importing the data into Power Bi. We do not want to consolidate the headers in Excel. You’ll just have to do it manually again the next time you get a file from someone. Remember - never pre-transform your data before you transform it in Power Query.

The Manual Method Using the Power Query Interface

If you are using Excel, just put your cursor on the data somewhere and then the Data ribbon, select “From Sheet” if you are using Office 365. If you have an older version of Excel it may say From Table or Range. Make sure you uncheck “My table has headers” because it will only make the top row the header, and we don’t want that. Once you click OK, your data in Power Query window should look something like this:

Original table

Original table

If your Applied Steps to the right of the table have a Promoted First Row to Header and Change Type step, delete both of them. We just want everything to be in the data area, with columns named Column1, Column2, etc.

This example has two rows that need to be made the header. The process below though can handle any number of header rows. Just change the 2 to whatever it needs to be.

We need to break this table into two pieces - the first two rows, and everything but the first two rows. First, on the Power Query Home ribbon, select the Keep Rows button, Keep Top Rows, then type 2.

ExcelMultiRowHeader3.png

Now go to the Transform ribbon, and select Transpose. Your data should look like this:

So we’ve kept the top two rows, and rotated it 90 degrees so it is on its side. Now we just need to combine those two columns, transpose them again, then we need to go back and get the actual data. Here is how we will do that.

First, select both columns, then on the Transpose ribbon, select Merge Columns, and call it New Header. I chose a space as my separator.

ExcelMultiRowHeader5.png

Now to ensure there are no extra spaces, right-click on the new column and select the Transform, Trim command, and then again on the Transpose ribbon, the Transpose button. Now we have this:

ExcelMultiRowHeader6.png

We are almost there. We just need our data that we got rid of. It is still there in the Source line of our applied steps. This will require a little bit of modification of the M code in the formula bar.

To make this easier in a few minutes, you probably have a step called “Transposed Table1” in your applied steps. Rename that “NewHeader” with no spaces.

Next, right-click on NewHeader and select Insert Step After. In your formula bar it will say =NewHeader. Change that to read =Source. You’ll see we have our entire table back. Now, go to the Home ribbon, and select Remove Rows, Remove Top Rows, and type 2 - the same number of header rows our original Excel table had. Now we should just have our data with no headers at all.

I would rename this step “DataOnly” (again, no spaces) rather than the Removed Top Rows name. Here is where it gets interesting.

Go to the Home Ribbon and select Append Queries. Select the same query you are in. We are going to append this query to itself. It will show (Current) in the table name.

Right now our data is doubled up and we have no headers. Look at the formula bar. It should show this:

=Table.Combine({DataOnly, DataOnly})

Change the first DataOnly parameter to NewHeader. Now it looks more reasonable, and we have one header row at the top.

Now, on the Home ribbon, select Use First Row as Headers and change your data types. This is the final result:

At this point, you would probably get rid of the Total Change column, unpivot the Population columns to get it into a nice Fact table for Power BI’s data model, but that is another article. You can use this method for headers of any number of rows. I’ve seen some ERP systems export data with 2, 3, or 4 header rows. Just change the number of rows you keep or skip above to correspond to that.

The Custom Function Method

Now that you understand the logic, here is a custom function you can use to do all of that work for you. I am not going to walk through the code, but it is commented. I did it totally differently so it would be dynamic depending on how many header rows there were, handling different data types in the column headers, and how many columns were in the original data set. See below on how to put this in your model.

(OriginalTable as table, HeaderRows as number, optional Delimiter as text) => let DelimiterToUse = if Delimiter = null then " " else Delimiter, HeaderRowsOnly = Table.FirstN(OriginalTable, HeaderRows), /* Convert the header rows to a list of lists. Each row is a full list with the number of items in the list being the original number of columns*/ ConvertedToRows = Table.ToRows(OriginalTable), /* Counter used by List.TransformMany to iterate over the lists (row data) in the list. */ ListCounter = {0..(HeaderRows - 1)}, /* for each list (row of headers) iterate through each one and convert everything to text. This can be important for Excel data where it is pulled in from an Excel Table and is kept as the Any data type. You cannot later combine numerical and text data using Text.Combine */ Transformation = List.TransformMany( ListCounter, each {ConvertedToRows{_}}, (Counter, EachList) => List.Transform(EachList, Text.From) ), /* Convert the list of lists (rows) to a list of lists (each column of headers is now in a list - so you'll have however many lists you originally had columns, and each list will have the same number of elements as the number of header rows you give it in the 2nd parameter */ ZipHeaders = List.Zip(Transformation), /* Combine those lists back to a single value. Now there is just a list of the actual column header, combined as one value, using a space, or the chosen delimiter. */ CombineHeaders = List.Transform( ZipHeaders, each Text.Trim(Text.Combine(_, DelimiterToUse)) ), /* Convert this list back to a single column table. */ BackToTable = Table.FromList(CombineHeaders, Splitter.SplitByNothing(), null, null, ExtraValues.Error), /* Transpose this table back to a single row. */ TransposeToRow = Table.Transpose(BackToTable), /* Append the original data from the source table to this. */ NewTable = Table.Combine( { TransposeToRow, Table.Skip(OriginalTable, HeaderRows) } ), /* Promote the new first row to the header. */ PromoteHeaders = Table.PromoteHeaders(NewTable, [PromoteAllScalars=true]) in PromoteHeaders

To add this to Power Query do the following:

  1. Create a new blank query. In Excel’s Power Query it is New Source, Other, Blank Query. In Power BI it is New Source, Blank Query.

  2. Right-click on that new query and select Advanced Editor

  3. Remove all of the code there so it is 100% empty.

  4. Copy the code above - everything from the first row with “(OriginalTable” to the last row that says “PromoteHeaders” - and paste it into the Advanced Editor.

  5. Press Done to save it

  6. Rename it. It is probably called something useless like Query1. I usually use “fn” as a prefix to custom functions, and you want no spaces in it. So fnConsolidateHeaders as an example.

It should look like this in the Power Query window.

Now to use this. It is very simple. You want to start at the point your data in Power Query looks like the image above called “Original Table.” Way at the top. Yes, that is correct - it is immediately after getting your Excel grid of data into Power Query, and should be either the Source step or possibly the Navigation step depending on how your data is coming in. Please note: if you have aChanged Type or Promoted Headers step after the Source row, I recommend you delete them.

Now, right-click on that last step and select Insert Step After. Whatever is showing in the formula bar is the name of the table you need to modify. I will assume it is Source for this. In the formula bar, wrap that with this new function.

Here is the syntax of the function:

fnConsolidateHeaders(Table as table, Number of Header Rows as number, optional Delimiter as text)
  • Table - this is the table name - so Source, or whatever the name of the immediate step above is. Again, right-click that step and select insert Step After and the table name will be there in the formula bar for you.

  • Number of Header Rows - type in an integer of how many header rows there are in the data. 2, 3, whatever.

  • optional Delimiter - this is what you want to be between the header rows when they are concatenated. This is optional. The formula will use a space if you put nothing there, but you can use any character(s) you want. You can even have no delimiter if you type two quotes with nothing in between - ““. This is a text field, so you must enclose your delimiter in quotes.

That’s it. It should return this:

Now you have one final step. Change all of the columns to the correct data types.

Hopefully this will help you when dealing with Excel reports, or even some CSV/Text files, where there are multiple rows used for the header.
Finally - here is the Excel file with the sample data and full code for doing this manually or with the custom function.

Use Data Profiling in Power Query

I often give demos or training in how to use Power BI, and when I am in Power Query, my screen usually looks something like this:

2021-05-08_5-06-11.png

One of the first questions I get asked is “How did you get those bars and graphs across the top?” Those are called data profiling tools and they can be incredibly helpful when you are cleaning your data. Let’s take a look.

By default, you usually just have the thin bar shown below that gives you a little info on your data.

2021-05-08_5-16-47.png

A solid green bar is usually best. It means there are no issues in that column, as shown in the Discounts field. On the far right in the COGS field, there is a tiny bit of gray. That means there is no data, or a null, in one or more records. The bigger the gray bar, the higher percentage of nulls there are. In the middle though we have a problem. Anything in red is an error and must be fixed. These may cause more errors in further transformations downstream, and will usually cause refresh errors.

Before we address this, let’s get a bit more information about our data. Go to the View ribbon, and there are 3 settings we want to look at.

2021-05-08_5-24-07.png
  1. Column Quality - this will give us some additional statistics about the column

  2. Column Distribution - this will give us the bar graph with info on how many distinct and unique values there are

  3. Column Profile - this will open up a large pane that will have a ton of stats about the column, like the min and max values, average value, value distribution, and more.

Let’s look at them one at a time. I’m going to go ahead and check both Column Quality and Column Distribution for now.

2021-05-08_5-32-45.png

For the Discounts column we can see now that the data is 100% valid, meaning no errors, and no empty or null records. There are 515 distinct values, 384 of which are unique. If you are working with a field that will be the 1 side of a 1 to many relationship in the DAX model, you want to make sure your data profile looks like this:

2021-05-08_5-36-32.png

You want the distinct and unique values to be the same, and there can be no empty values. This is a small dataset with only 700 records. When you are working with more data, it is likely it will tell you you have 1,000 distinct and unique values. Data profiling by default only works with the first 1,000 records to avoid a performance hit. More on that later.

For the Sales column with an error, you’ll need to figure out what the error is and fix it. The fastest way to do this is to hover over the thin bar with the red in it and you’ll get this popup:

2021-05-08_5-42-10.png

Click on the ellipses menu and you can take action:

2021-05-08_5-42-29.png

Unless I already know what the issue is, I will almost always select “Keep Errors” and investigate to see if I can safely delete that record. Then I’ll decide if “Remove Errors” is appropriate, or see if I need to fix an issue with the source data. Note that Remove Errors will remove the entire row. You might want to use the “Replace Errors” feature and replace with null or 0. To see exactly what the error is, click to the right of the word “Error” - not on the word itself - in the cell and Power Query will bring up a window at the bottom of your screen that will show you what the error is for that specific cell.

2021-05-08_5-45-27.png

In this case, there was some text in the Sales field. So whether I replace it with 0 or null, remove the row entirely, or fix the source data, it has to be fixed.

As for the tiny gray bar for the COGS column, that means there are some empty, or null, values. Those may or may not be ok in your data. Some fields like “Date Completed” may have a lot of nulls where items aren’t finished, but things like Sales or COGS would seem to indicate an error of some sort if there were missing data there, perhaps indicating someone hadn’t set up the standard cost or sales price in the source system correctly. You have to determine how to proceed here. Power Query is just helping to bring the issue up.

The final checkbox in the View ribbon is the Column Profile. I generally leave this off and turn it on when I need it. Here is it is on, and I’ve selected the Sales column

2021-05-08_5-56-37.png

This gives a tremendous amount of information, but it also takes up half of the screen. This is why I turn it on and off as needed. You can see there are 700 records, 6 of which are distinct. There are no unique values, and the prices are distributed as shown in the graph to the right.

Remember though that no matter how large your data source is, this will only profile the first 1,000 records. If you want to see the stats for your entire dataset, you can. In the lower left corner of your screen, there is a small text box that says “Column profiling based on top 1000 rows.” Click it and you’ll see this:

2021-05-08_6-01-26.png

You can change it to look at your entire dataset, but be forewarned, this can be a severe performance hit. If your data has a few million records, every time you make a change to your data, Power Query will rescan the entire dataset and generate the data profile statistics. That can take several minutes with each click. For me, it is slow enough that I would just rather load what I have into the data model and use DAX to return the stats I need.

One more thing. Take a look at this column:

2021-05-08_6-05-55.png

You’ll notice there are no bars analyzing the data. This is because there is no data type for Power Query to work with. The data type is set to Any, or ABC/123. You should never let an Any data type column make it to the DAX model and this is a good indicator that you have a column data type not properly set.

Hopefully that helps. Every time I get a fresh install of Power BI or Excel, I always go to the View ribbon and make sure that Column Quality and Column Distribution are on. And while you are there, go ahead and make sure “Formula Bar” is checked as well. I believe fresh installations in early 2021 and later have this on by default, but if you installed your version of Power BI or Excel before that, the Formula Bar will be off, and it will not get turned on automatically by monthly updates.

Change The Day Your Week Starts In Power Query

By default, Power BI starts its week on Sunday, but that doesn’t work for every situation. Some companies prefer their week start on Monday, and I recently helped someone that needed their week to start on Thursday. Fortunately, by tweaking your date table in Power Query, you can make this change with no calculations at all. Here is what the first few days of 2021 looks like by default:

DayOfWeek.png

You can see that Sunday is the start of the week, being 0 for the week day. The formula to return the day of the week is:

= Date.DayOfWeek([Date])

The Date functions in Power Query that use day and week calculations have an optional parameter at the end. Simply add the desired day as the second parameter of the Date.DayOfWeek function, so it would be the following to change it to Monday:

= Date.DayOfWeek([Date], Day.Monday)

Now our Day of Week column looks like this:

DayOfWeek2.png

To change it to start on Thursday, simply use Day.Thursday and it will adjust accordingly. Day.Sunday, Day.Monday, etc. are just easier ways to tell Power Query when to start. Day.Sunday returns 1, so you could use that in your formula, but that is not as easy to read. Date.DayOfWeek([Date], 3), for example, isn’t immediately obvious that the week is starting on Tuesday.

This is only meaningful in functions that are Week based. Having your first day of the week start on Monday has no bearing on monthly, quarterly, or yearly functions. But functions like Date.DayOfWeek, Date.StartOfWeek, Date.EndOfWeek, and Date.WeekOfMonth can use this first day of week setting.

In the Power Query date table I linked to above, I have a variable called varFirstDayOfWeek that is preset to Day.Sunday. If you change that line, every relevant column in the Date table will adjust to your preferred day of the week.

DayOfWeek3.png

This will also make your DAX calculations easier as well, since you can just use the week number, day number, start of week date, etc. from the Date table rather than having to remember to add/subtract days to adjust the calendar to your needs.

Working with Unix Epoch Time In Power BI

You may need to write a Power BI report that works with Unix Epoch Time, converting either to or from it. Epoch Time is simply the number of seconds that have passed since January 1, 1970, at 12:00 am UTC. So at 1 am of that date, Epoch Time would be 3600, since 3,600 seconds have transpired. Today Epoch Time is around 1610644465, or 1,610,644,465 to make it easier to read, though you will never see the commas in a database.

Converting to and from this time format is deceptively simple, as long as you understand a few functions in Power Query and the concept of Duration. To convert a date to Epoch Time you need to use one of the Duration functions. We want to:

  1. Find the difference between a normal datetime value and Jan 1, 1970, which is #datetime(1970,1,1,0,0,0) in Power Query.

  2. Return that difference in seconds.

This is the result:

2021-01-14 09_18_37-Untitled - Power Query Editor.png

#datetime(1970,1,1,0,0,0) is Jan 1, 1970 at 12am. In other words, this is #datetime(Year, Month, Day, Hour, Minute, Seconds)

Without the Duration.TotalSeconds() function, you will get 18641.09:05:00, which is technically correct. That is 18,641 days, 9 hours and 5 minutes based on my sample time stamp shown in the image. Duration.TotalSeconds() converts that to seconds, which gives you a valid Epoch Time.

The interesting thing is your results in the desktop may differ from what is in the service. If you use something like DateTime.LocalNow() to return the current time, then convert that to Epoch Time using the following formula:

= Duration.TotalSeconds(DateTime.LocalNow() - #datetime(1970,1,1,0,0,0))

Your PC will have your local time with proper offsets from UTC. However, once you publish to the service, DateTime.LocalNow() will calculate at UTC +0, so that is actually correct. If you want to adjust for the timezone, simply add or subtract 3,600 from the result for each hour you are offset from UTC. So if you are in the Pacific timezone, which is UTC -8 (I’m ignoring DST adjustments - that is a different post entirely!) you could subtract 28.800 seconds. In this case you should not do that as Epoch Time and DateTime.LocalNow() are both UTC +0 in the service. But if you are converting from dates from a datasource that are fixed to your time zone, you may need to adjust to get it back to UTC.

Let’s go the other direction now, from Epoch Time to a normal time stamp. Use the following formula:

= #datetime(1970,1,1,0,0,0) + #duration(0,0,0,[TimeStamp])

This takes the date/time of Jan 1, 1970 at 12am and the total number of seconds from the [TimeStamp] field, which is the Epoch Time.

2021-01-14 09_39_41-.png

Duration returns a time here, which was in the format of 18641.17:04:58, so it added 18,641 days, 17 hours, 4 minutes, and 58 seconds to Jan 1, 1970 12am. That becomes Jan 14, 2021 at just after 5pm. That is UTC time. Here, I would need to convert this back to my local time. So I’d use this more complete function below.

= DateTimeZone.SwitchZone(
     #datetimezone(1970,1,1,0,0,0,0,0) + #duration(0,0,0,[TimeStamp]),
     -8
)

I added two more zeros in the #datetimezone() function which represents the hours and minutes offset from UTC, and I want no offset initially, but then in DateTimeZone.SwitchZone I adjust the final calculation by -8 hours to get to the Pacific Time Zone.

Remember, when dealing with time zones, Epoch Time is always UTC+0. The only conversions you need to do are to get from a local time zone to UTC+0 before converting to Epoch Time, or after you’ve converted Epoch Time to a regular time, and you want that regular time to be a local time zone. In many cases, you never need to do any conversion. Talk to your DBA about the time zone data is stored in. Many systems use UTC+0 for all times to keep conversions to a minimum.

2021-01-14 09_50_22-Untitled - Power Query Editor.png

In my case though, I wanted the time local, so my DateTimeZone.SwitchZone() function adjusted the time back 8 hours to just after 9am here.

One amusing side note: Epoch Time is a 32 bit integer. That means on Jan 19, 2038, the time will grow larger than a 32 bit integer can store. Don’t worry. Power Query uses a 64 bit integer for this calculation. So Unix systems all over the world may crash and burn, but your Power BI report will continue to refresh without error. 😉

Renaming A Column In Power Query Based On Position

The easiest way to rename a column in Power Query is to do it the same way you do in Excel - just double-click it and rename it. Sometimes though the column you are renaming can have different names with each refresh, so this method won’t work. We can use a List in Power Query to make this dynamic. Consider this very simple example:

2020-11-09 16_03_18-Dynamic Column Renaming Data - Excel.png

You receive this file every day, and the column name starting in Row 6, Column B, constantly changes. We are going to extract that date and make it a column, as it should be, and then rename both columns regardless of what they are called.

The first thing we do after importing the Excel file into Power Query is to remove the first 5 rows. Use the Remove Rows button on the Home ribbon, and remove the top 5 rows. If your data has rows that move around a bit because someone inserts or deletes rows above it, see this blog post on how to dynamically find that first row and remove the appropriate number of rows. Now we have something that looks like this:

2020-11-09 16_09_43-Untitled - Power Query Editor.png

Ok, let’s get to work. NOTE: If you had any “Changed Type” step created by Power Query, delete it. That will cause problems in the refresh. We’ll add it back at the end when our column names are predictable.

1) We need that date in a new Date column so we can relate it to a Date table later. If you have a date in your date model, you always need a good date table.

2) On the Add Column ribbon click Add Custom Column. Name the column “Date” and use this formula:

try 
    Date.FromText(
        Text.BeforeDelimiter([Column2], " ")
        )
otherwise null

You don’t have to have the line breaks and indents, it just makes it easier to read. Let’s break this down from the inside out:

  • Text.BeforeDelimiter([Column2], “ “) - in the first row of data, this will return the “11/20/2020” text. In all other rows it will return errors, because there is no spaces in the other rows, and those are numbers, so there is no text to extract. That isn’t a problem and you’ll see why in a minute.

  • Date.FromText() will take the date in the first row and convert “11/20/2020” to a real date. In all other rows, it returns errors.

  • The try/otherwise construct says “Try this formula. If it works, give me the result. Otherwise, return null.” It is like IFERROR() in Excel. This is why the errors don’t matter. The try/otherwise construct takes care of it.

Now our data looks like this:

3) Right-click on the Date column and select the Fill menu, then Down. Now the date will fill the column.

4) On the Home ribbon, click Use First Row as Headers. Again, remove any automatically added “Changed Type” step if Power Query added one. Now we have this:

Both the 2nd and 3rd column need to be renamed, and we cannot double-click to rename, otherwise the refresh will fail tomorrow when the date changes to Nov 21, 2020. If you did double-click though, this would be the formula Power Query would generate:

= Table.RenameColumns(
    #"Promoted Headers",
    {
        {"11/20/2020 Production Quantity", "Production Quantity"},
        {"11/20/2020", "Date"}
    }
)

We need to replace the first part of the renaming list (the list is in the curly brackets). We can use a function called Table.ColumnNames() for this. To see how it works, remove your manual renaming step, and type in this formula, and you will get the list shown below. It is each column name in the table from the Promoted Headers step.

= Table.ColumnNames(#"Promoted Headers")

To use this we need to get the column names from the 2nd and 3rd row. To get items from a list, you can refer to the index of an item in the list using curly brackets again, so change your formula to this, and you’ll get the column name shown below.

= Table.ColumnNames(#"Promoted Headers"){1}
2020-11-09 16_29_14-Untitled - Power Query Editor.png

You’ll notice I used {1} to get the second item in the list. Power Query indexes at zero, so {0} would return the first item. Now you see where this is going?

5) Delete any steps you have up to the Promote Headers. Once again, manually rename the headers. This is so Power Query will generate the Table.RenameColumns() step for us and we will make minor tweaks. It should be this again:

= Table.RenameColumns(
    #"Promoted Headers",
    {
        {"11/20/2020 Production Quantity", "Production Quantity"},
        {"11/20/2020", "Date"}
    }
)

6) Replace the “11/20/2020 Production Quantity" with Table.ColumnNames(#"Promoted Headers"){1} and replace “11/20/2020” with Table.ColumnNames(#"Promoted Headers"){2}. It should now look like this:

= Table.RenameColumns(
    #"Promoted Headers",
    {
      {Table.ColumnNames(#"Promoted Headers"){1}, "Production Quantity"},
      {Table.ColumnNames(#"Promoted Headers"){2}, "Date"}
     }
)

It is dynamically finding the column name of the 2nd and 3rd columns and replacing it with specific text!

2020-11-09 16_42_54-Untitled - Power Query Editor.png

As always, set the data types for all columns at this point too now that we know what our column names will be.

Use List.PositionOf() To Find The First Row of Data In An Excel File

Excel can be one of the worst data sources for Power BI. Its greatest strength for users is also its greatest weakness for Power BI developers getting data out of it via Power Query - it is almost unlimited in its flexibility. So end users will do things that cause your refreshes to fail. Things like inserting columns, rows, merging cells, etc. I’m going to show you how to find data in a spreadsheet when users add or remove rows above the table you need. Consider the spreadsheet data below:

2020-11-06 07_52_45-Using List PositionOf - Excel.png

This is a simple grid of data. Ideally this would be in an Excel table, but often a Power BI developer cannot control how the data comes in, and most users still do not use Excel tables. So I call this a grid of data. And the problem with a grid of data is when it moves and you use Power Query to get the data, it will be missing information. For example, when you bring this Excel file into Power Query, it looks like this:

2020-11-06 08_02_13-Untitled - Power Query Editor.png

Fixing this is simple:

  1. On the Home ribbon in Power Query, select Remove Rows, then Remove Top Rows, then type in the number 5.

  2. On the Home ribbon, select “Use first row as headers”

Now it looks like this:

Perfect! We have good data. Next week, however, the user accumulating the production data add a few rows at the top of the file to include some notes about one of the facilities being down for maintenance one of the days.

2020-11-06 08_08_10-Using List PositionOf - Excel.png

Now the grid of data starts on row 8, not row 6 like before. When Power Query does the refresh, you get this:

2020-11-06 08_09_05-Untitled - Power Query Editor.png

Now you have a mess. the column names will load into Power BI as Column1, Column2, and Column3 if it loads at all. More likely you’ll just get a refresh error. If you had changed the data types (as you should) it would have broken right off of the bat when it tried to change the “Material” column to text. The Material column no longer exists! It is just Column1. The Production Date column is worse as if you tried to change that to a Date data type, that itself will generate errors because there is text in that column. In other words, this is a total mess. You could edit your Power Query code to skip the first 7 rows instead of the first 5, but that just becomes a cat and mouse game when the users adds or removes rows again in the future.

The better solution is to dynamically find the first row of data. Power Query has a function that will help with this. It is called List.PositionOf(). Basically what it does is it takes a list and tell you what position your search item is on in the list. Let’s walk through it with our new production data with the extra rows.

First, delete the Promoted Headers and Removed Top rows steps you created earlier so we are back to the below image. If there are any Changed Type steps, get rid of those as well. You should only have the Source and Navigation steps in the Applied Steps at the right side of the Power Query window.

2020-11-06 08_16_54-Untitled - Power Query Editor.png

We want to find the word “Material” here. Right-Click on the “Navigation Step” in the Applied Steps pane and click Insert After. It will create a Custom1 step that will simply have this in the formula bar: = Sheet1_Sheet - which is just pointing to the Navigation step. Change that to this:

= Sheet1_Sheet[Column1]

We told it to only show us [Column1] from the Sheet1_Sheet step. Sheet1_Sheet is just the name of the table. Confusingly Power Query hides this in the Navigation step. Here we can clearly see it. Note: If you are using this technique at any time after the Navigation step, the table name is just the name of the Applied Step. For example, if you had selected the first three columns and then selected Remove Other Columns from the ribbon, your step would be “Remove Other Columns” and the table name is #”Remove Other Columns” - the # and quotes are necessary if any table name has spaces in it.

Most Applied Steps in Power Query are just tables. But this one is not. You’ll notice after adding [Column1] after it, the name of the first column, it is now a list.

2020-11-06 08_29_56-Untitled - Power Query Editor.png

Now we are getting somewhere. Time to use List.PositionOf. Edit the formula above to this, and you’ll get the following screen.

= List.PositionOf(Sheet1_Sheet[Column1], "Material")
2020-11-06 08_32_00-Untitled - Power Query Editor.png

It found the word “Material” on row 7. Wait a minute… on the image above with the list, it shows Material on row 8. Power Query indexes at 0, so the first row isn’t 1, but 0. So Material is in position 7 in the list. This actually helps us with the next step.

Right-click on the Custom1 step and rename it something useful, like PositionOfMaterial. I recommend you use no spaces for this step. It makes referencing it later easier.

Right-click on PositionOfMaterial again and Insert Step After. Now, we need to get our original table and skip the first 7 rows, right? When you use the Remove Top Rows feature, Power Query creates a step using the Table.Skip function. We are going to do that manually. In the formula bar, you probably see =PositionOfMaterial and it still show the value. Get rid of that and type this:

= Table.Skip(Sheet1_Sheet, PositionOfMaterial)

Remember, our original table was Sheet1_Sheet. Table.Skip has this syntax: Table.Skip(TableName, RowsToSkip). TableName is Sheet1_Sheet in this case, and instead of hardcoding the RowsToSkip, we are using a variable, our PositionOfMaterial step. Now our data looks like this:

2020-11-06 08_40_18-Untitled - Power Query Editor.png

Now you can promote the first row as headers, change your data types, and continue on. Next week when our production supervisor turns in the report, it won’t matter if Material on on row 1 or 100, List.PositionOf() will find it for us and dynamically set the top of the table. Even if all of the header is deleted, List.PositionOf() will return 0 for the position of Material, and Table.Skip will effectively do nothing, skipping 0 rows. So it will still work for us.

This method will work with any kind of data, but Excel is the most common use for it. Most other data sources, like databases, SharePoint Lists, true Excel tables, and even CSV files, don’t have data moving up and down on you.

Group By In Power Query When Data In Column Changes

If you’ve used Power Query long enough, you’ve needed to group data and provide some sort of summary. For example, grouping by customer and summarizing sales. The normal way the Table.Group function works is to group the entire table by the field(s) you are grouping by, then providing whatever aggregations you need.

However, you can alter that behavior and group data by a column, and have it provide a separate group at each change in the data. Let me illustrate using some data a user posted in the Power BI forum:

2020-10-13 08_03_05-Untitled - Power Query Editor.png

They wanted to find the start and end date of each section in the OnOff column. So in the first group, the start and end dates would be 1/1/2020 and 1/11/2020. The On periods would be 1/12/2020 and 1/27/2020. Then, the next blank section would start at 1/28/2020

A normal grouping would look like this in Power Query:

  1. Select the OnOff column and press Group By in the ribbon.

  2. Add a new column called AllRows and use the “All Rows” operation.

  3. Group By will return the table shown. OnOff has two values, blank and “On” and then two nested tables.

  4. If you click on the small white area next to the yellow word “Table” you can see what is in that nested table in the bottom view. It has all records, or all rows, for the blank value. As you can see, there is no way this way to separate the first period that ends 1/11/2020 with the start of the next period on 1/28/2020.

2020-10-13 08_07_36-Untitled - Power Query Editor.png

The formula that is generated by Power Query uses Table.Group, and it looks like this:

= Table.Group(#"Changed Type", {"OnOff"}, {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}})

If you look at the documentation though, for Table.Group you will see the 4th parameter is an optional setting called groupkind. It then proceeds to give you no info on what that means. If you search around a bit, you will find two settings that might work. One is GroupKind.Global, and the other is GroupKind.Local. Looking at the documentation for these two provides absolutely no help.

2020-10-13 08_20_38-GroupKind.Local - PowerQuery M _ Microsoft Docs and 59 more pages - Work - Micro.png

Well, that why I am writing this. I learned about this several months ago but forgot it. If you looked at the Power BI Community thread above, you’ll notice I did the Excel thing:

  • Bring in one of the columns into the table but offset by one row

  • Do some if/then analysis to determine when data changes

  • Do a grouping

  • Get the start/end dates.

But Jimmy801 posted the more elegant solution, reminding me of this optional and rarely used parameter, and that is the main reason for this blog post. To remind me of this parameter. 😁

By default, Table.Group uses GroupKind.Global as the 4th parameter. It scans the whole table and groups by each distinct value in the grouping columns, the OnOff field in our example. However, add GroupKind.Local to the function as shown below: (I’ve added line breaks for readability) and you’ll see a very different table:

Table.Group(
    #"Changed Type",
    {"OnOff"},
    {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}},
    GroupKind.Local
    )
2020-10-13 08_26_14-Untitled - Power Query Editor.png

Now, every time the OnOff field changes, it does a grouping. So in the sample data, there are two sections that are blank, and two that are “On” and you can see that in the first blank section, it correctly shows 1/1/2020 through 1/11/2020. You can then use the Table.Min() and Table.Max() functions to get the start and end dates from each local grouping, and finally re-expand the necessary columns from the AllRows field.

2020-10-13 08_32_50-Untitled - Power Query Editor.png

The only caveat here is once you add the GroupKind.Local parameter, the little gear icon next to the Grouped Rows step goes away and you can no longer alter the grouping using the Group By dialog box. You would either need to edit the M code manually, or remove GroupKind.Local, then edit the group settings, then add GroupKind.Local back to the formula manually.

2020-10-13 08_36_08-Untitled - Power Query Editor.png

You can see the full PBIX file here or paste the full M code below in to a blank query of your own to play with.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdOxDYMwGEThXVwj2b7f4KTNAhkAsQL7l1iCQ1TJkyy78Nc93bqmmmtWUUlTij6P+9NbG88499c2/XGCLqBr0M3QLdB16F7QvaGrhUJQ5LufFEQxBV1MQRpTUMcUBDIFjUxBJlNQ6qICrUx5LfFa4rXEa4nXEq8luijRSYluKnineHTSL6rMpbAMLJ1zOwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Start of Month" = _t, EmpID = _t, Fleet = _t, OnOff = _t, #"Rotation Date" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start of Month", type date}, {"Rotation Date", type date}}),
    #"Grouped Rows" = 
        Table.Group(
            #"Changed Type",
            {"OnOff"},
            {{"AllRows", each _, type table [Start of Month=nullable date, EmpID=nullable text, Fleet=nullable text, OnOff=nullable text, Rotation Date=nullable date]}},
            GroupKind.Local
            ),
    #"Added Start Date" = Table.AddColumn(#"Grouped Rows", "Start Date", each Table.Min([AllRows], "Rotation Date")[Rotation Date], type date),
    #"Added End Date" = Table.AddColumn(#"Added Start Date", "End Date", each Table.Max([AllRows], "Rotation Date")[Rotation Date], type date),
    #"Expanded AllRows" = Table.ExpandTableColumn(#"Added End Date", "AllRows", {"Start of Month", "EmpID", "Fleet", "Rotation Date"}, {"Start of Month", "EmpID", "Fleet", "Rotation Date"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded AllRows",{"Start of Month", "EmpID", "Fleet", "OnOff", "Rotation Date", "Start Date", "End Date"})
in
    #"Reordered Columns"

Conditional Merge in Power Query

Doing a merge operation in Power Query is pretty simple, just click the Merge Queries button, select your tables and field(s) and away you go. But what if you want to merge Table A to a subset of Table B? You can do it, but you need to tweak your M code manually in the Advanced Editor.

In this example I will use the WideWorldImportersDW sample database Microsoft offers. You should have a copy of SQL Server installed to play with for Power BI. The Developer version of SQL Server is 100% functional and free forever. You just cannot use it in production.

I am going to use two tables for this example, Fact.Sales and Dim.Customer. I only want my Fact Sales table to have customer info for the Buying Group Tailspin Toys. That info is in the Dim Customer table. I can do this several ways.

  1. Do the merge, expand the Buying Group column from the Customer table, then filter to only show those rows.

  2. Pre-filter the Customer table for Tailspin Toys in the Buying Group column, then do the merge. It would need to be an Inner Join, otherwise you will get nulls in the Sales table when you expand, and then you have to filter those out. Which you can do, but it is more steps.

  3. You can bypass all of that and do it in one step.

I’m not going to bore you with how to do methods 1 and 2. Let’s do method 3, a conditional join!

First, create the merge as you normally would. In the Sales table, select Merge Queries from the ribbon, then select the Customer table. Select the Customer Key fields in both. Also make sure this is an inner join so it will remove all records from the Sale Table that are not Tailspin Toys. Your Merge dialog box will look like this:

Inner join dialog box

Inner join dialog box

Now click Ok. It will do the merge. Now we have to fix this so it is a conditional join. When you did the merge, Power Query wrote the M code shown below for you, which you can see in the Advanced Editor. Note that I put in the line feeds to make this more readable. I have also already expanded the Buying Group column. The “Merged Queries” step is what we want to dig into

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            #"Dimension Customer", 
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

Table.NestedJoin has the following parameters per MS documentation:

  1. table1 as table,

  2. key1 as any,

  3. table2 as any,

  4. key2 as any,

  5. newColumnName as text,

  6. optional joinKind as nullable number,

  7. optional keyEqualityComparers as nullable list) as table

We want to tweak 3rd parameter - table2, which is our Customer table, We don’t want the full table2, we only want table2, filtered for Topspin Toys in the Buying Group field.

So in the code above, we need to replace the 3rd parameter which is just a reference to #”Dimension Customer” - the name of the Customer Table. (It is just Dimension Customer - the #”name here” syntax is just how Power Query references objects with spaces and special characters.)

We want to replace that with this code:

Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys")

So the full M code in the advanced editor becomes this:

let
    Source = Sql.Databases("localhost"),
    WideWorldImportersDW = Source{[Name="WideWorldImportersDW"]}[Data],
    Fact_Sale = WideWorldImportersDW{[Schema="Fact",Item="Sale"]}[Data],
    #"Merged Queries" = 
        Table.NestedJoin(
            Fact_Sale, 
            {"Customer Key"}, 
            Table.SelectRows(#"Dimension Customer", each [Buying Group] = "Tailspin Toys"),
            {"Customer Key"}, 
            "Dimension Customer", 
            JoinKind.Inner
        ),
    #"Expanded Dimension Customer" = Table.ExpandTableColumn(#"Merged Queries", "Dimension Customer", {"Buying Group"}, {"Buying Group"})
in
    #"Expanded Dimension Customer"

That’s it. You are done. And if you are doing this on SQL Server, you can right-click the Expanded Dimension Customer step, the final step where you expand the Buying Group column and see that View Native Query is showing. This means query folding is working. It generates the following SQL code. (I’ve removed some columns from the Sales to make the code shorter)

select [$Outer].[Sale Key] as [Sale Key],
    [$Outer].[City Key] as [City Key],
    [$Outer].[Customer Key2] as [Customer Key],
    [$Outer].[Bill To Customer Key] as [Bill To Customer Key],
    [$Outer].[Stock Item Key] as [Stock Item Key],
    [$Outer].[Invoice Date Key] as [Invoice Date Key],
    [$Outer].[Delivery Date Key] as [Delivery Date Key],
    [$Outer].[Salesperson Key] as [Salesperson Key],
    [$Outer].[WWI Invoice ID] as [WWI Invoice ID],
    [$Outer].[Description] as [Description],
    [$Inner].[Buying Group] as [Buying Group]
from 
(
    select [Sale Key] as [Sale Key],
        [City Key] as [City Key],
        [Customer Key] as [Customer Key2],
        [Bill To Customer Key] as [Bill To Customer Key],
        [Stock Item Key] as [Stock Item Key],
        [Invoice Date Key] as [Invoice Date Key],
        [Delivery Date Key] as [Delivery Date Key],
        [Salesperson Key] as [Salesperson Key],
        [WWI Invoice ID] as [WWI Invoice ID],
        [Description] as [Description]
    from [Fact].[Sale] as [$Table]
) as [$Outer]
inner join 
(
    select [_].[Customer Key],
        [_].[WWI Customer ID],
        [_].[Customer],
        [_].[Bill To Customer],
        [_].[Category],
        [_].[Buying Group],
        [_].[Primary Contact],
        [_].[Postal Code],
        [_].[Valid From],
        [_].[Valid To],
        [_].[Lineage Key]
    from [Dimension].[Customer] as [_]
    where [_].[Buying Group] = 'Tailspin Toys'
) as [$Inner] on ([$Outer].[Customer Key2] = [$Inner].[Customer Key])

Since 100% of the work is done on the SQL Server, it will be very fast compared to how it would run if the Power Query mashup engine on your PC or On-Premise Gateway would process it.

2020-09-04 16_37_41-.png

If you are using some other source, like text files or Excel files, folding won’t happen of course, but it should still perform well as this is single filter applied, not a filter applied for every row of either table.

I cannot say that this method is any faster than doing it the longer ways (methods #1 and #2 above) but it is shorter code. Plus, you can get fancier with the Table.SelectRows() function we used by filtering on multiple fields using and/or criteria. You may break folding if you get too fancy and the Power Query engine cannot figure the SQL out, so be careful. If performance is a must, methods #1 or #2 are more likely to fold with many filters in the condition, but you won’t know until you try. Happy Querying!

Add a Refresh Time Stamp To Your Power BI Reports

It is often useful to tell your users when the report refreshed so they have some idea of how current, or stale, the data is. Unfortunately, you cannot just add a TODAY() formula to your report as that will change each time they open the report. This method will give you the refresh time stamp and compensate for Daylight Savings Time.

First, let’s get the time and date in Power Query. This time will only change on the report refresh, which is what you want.

  1. Open Power Query. You can do this using Get Data on the main Power BI menu, and we want a Blank Query.

  2. Click on Advanced Editor on the Home ribbon of Power Query.

  3. Remove the default text.

  4. Replace it with this code

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZDLCQAhDAV78Sxonv9axP7bWLNPMN4mDMYhc7oUBAFRmvNOJBTy8r8RnTpNJh8TdRo0iUzT94BIIeTzRBdAaBr5GF0A0FTyMZqGdNM2mwDkG7CZZuhQKEA2ZdWI+pQ1U9ae/7v5v9vTYNzTYNiyFG/Z5rU+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dtDSTStart = _t, dtDSTEnd = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"dtDSTStart", type date}, {"dtDSTEnd", type date}}, "en-US"),
    varCurrentDate = DateTime.Date(DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8)),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each [dtDSTStart] < varCurrentDate and [dtDSTEnd] > varCurrentDate),
    varDSTOffset = Table.RowCount(#"Filtered Rows"),
    #"Last Refresh Date" = #table(
        type table
            [
                #"RefreshDate"=datetimezone
            ],
        {
            {DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8 + varDSTOffset,0)}
        }
        )
in
    #"Last Refresh Date"

Wow. What is all of that? Just to get a refresh timestamp? Yes. Let’s look at the steps.

Source: This is the Daylight Savings Time Start and End dates for the US. You will need to change this for your region as applicable. This goes through 2030 right now, and is subject to change at the whims of politicians.

You can manually edit this table by clicking on the gear icon in the Source step.

You can also get rid of all of that and point the source to a table you have somewhere. A table on your SQL server, a SharePoint List, Excel File, whatever. Swapping out the code for that is beyond the scope of this article. I wanted to provide a complete solution in one query without having to rely on external data sources. I know there are websites with this data you can link to, but I don’t like relying on websites for my reports, and don’t want to add those sites to my on-prem gateway just for a DST table.

Changed Type: This just changes the data keyed in the Source line to dates. It is always text by default. Those are in the US Date format and I told Power Query to use the en-US culture setting to do the type conversion.

varCurrentDate: This gets today’s date, with an offset for my UTC timezone, which is -8 - Pacific Time. You’d need to use Google for yours, or use this handy article on Wikipedia. You can use a decimal number here, so if you are in the Marquesas Islands, your offset is -9:30, which means you would use -9.5 here.

Filtered Rows: This will return 1 or 0 rows when the varCurrentDate variable is greater than the Start Date and less than the End Date for when DST occurs.

varDSTOffset: This is a variable that counts the rows returned in the Filtered Rows step. It is either 1 or 0. This will add 1 or 0 hrs to the time calculated next, adjusting for DST.

Last Refresh Date: This uses the formula below to get the current date and time of the refresh, adjusting for the varDSTOffset calculated above. It then puts it into a table and changes the type to “datetimezone” so it can actually be loaded into the DAX data model. Note, again, you’ll need to adjust that -8 to your UTC offset.

=DateTimeZone.SwitchZone(DateTimeZone.LocalNow(),-8 + varDSTOffset,0)

The final result is a date with a timezone offset that looks like this. Remember, I am in the -8 Pacific Timezone, and DST is in effect, so it added 1, making the net UTC Offset -7.

Note: If you do not observe DST in your area, just remove the +varDSTOffset part of the formula. You can also remove about 90% of this code, but I’ll leave that to you. Leaving the DST code in and removing varDSTOffset from your formula will not impact performance at all.

The DAX model will understand that and will adjust the time shown to the user. Finally, rename the query to “Refresh Time Stamp” and make sure it is set to load. Close Power BI and apply changes.

In DAX, create a new measure:

Last Update = 
     "Last Update on " 
     & UNICHAR(10) 
     & FORMAT(
         MAX('Refresh Time Stamp'[RefreshDate]),
         "MMM DD, YYYY H:MM AM/PM"
     )

Now drop that measure in to card. It will look something like this:

Resize and position the card as desired. If you do not like the forced line feed before the date, get rid of the UNICHAR(10) function. You can also customize the format of the time shown using the codes from this article by MIcrosoft.

Finally, to you can right-click on the Refresh Time Stamp table in Power BI and hide it. The measure will still work, but the table won’t clutter up your list of measures, tables, and fields.

One final bit of minutia. The varCurrentDate variable itself does not adjust itself for DST, so if your report runs between midnight and 3am on the day DST switches, it may be off one hour for that refresh. It will adjust though if it is run after the 3am switch.

Return Row Based on Max Value From One Column when Grouping

When you group data in Power Query, you are able to do a number of aggregations, like count, sum, min, max, etc. The problem is each of those aggregations only operate on that column. Let me show you what I mean. Suppose you have this data:

The Data

The Data

You need to return a table to the DAX model (or an Excel table) that only has one record per product, and you want the latest date, and the sales person associated with that date. So for product A, it is June 18, 2020 and Suresh, for B it is June 14, 2020 and Ana, and so on.

The Goal

The Goal

If you do a Group By in Power Query and group on the product, then select the Max aggregation for the date and sales person, with the Group By dialog box configured as shown below, you get the following table:

20200715 - Incorrect Grouping.png

The first row is correct, but that is just a coincidence. The max value for that row for product A is “Suresh” because that is alphabetically the highest name, starting with the letter S. The second row though you can see is wrong. It picked up Sally, the last name alphabetically in the Sales Person column. Sally is not associated with the June 14 record though, Ana is.

So, how do we fix this? There are two ways. Let’s do the long way. First, change the group by dialog box by removing the Max aggregation for the name and replace it with the All Rows aggregation as shown. It will return a new column called “All Rows” that has a nested table.

20200715 - Correct Grouping.png

Looking at the image below, if you click to the right of the word Table you can see at the bottom of your screen the contents of that field. Do not click directly on the word Table itself as that will cause the table to expand. If you do that, just delete the last step Power Query generated to get back to this view. You can see below it has all 3 sales records for product B, and we want to return Ana.

20200715 - All Rows Table Contents.png

You need to add a new column at this point, so go to the Add Columns tab and add a new column. Enter this formula:

= Table.Max([All Rows], "Sale Date")

Table.Max will take the table you give it, which is [All Rows] in this case, and return the record that has the maximum value for the field you specify, which is “Sale Date'“. You now have a new column that will have the word Record in yellow. You can see the formula at location 1 below. If you click to the right of it on the Product B row (2), at the bottom of the screen you will see it returned the June 14 record and has Ana’s name (3). We are getting there!

20200715 - the Table Max Result.png

Now just click the Expansion icon in the upper right of the “Max Record” field and select Sales Person.

20200715 - Table Max almost final.png

Now each product record has the correct sales person that corresponds to the latest sales date. At this point, just remove the All Rows column and change the data type of Sales Person to text.

But that is 6 steps. Why can’t we do this in a single grouping step and radically cut down all of the extra steps? We can! But that requires a bit of manual M code. Let’s see how that works. First, let’s take a look at the entire M code for what we just did through the expansion of the final record:

20200715 - M Code Before Tweaking.png

You can ignore the Source line. That is just how Power Query stores data when you use the Enter Data function. I pasted the data in from Excel vs linking directly to Excel so I could share the PBIX file with at the bottom of this post.

The second step is “Grouped Rows” and I’ve circled that in Red. I added a lot of line feeds and indentation to make it easier to read. Remember, it isn’t M code unless it is formatted. What we are interested in is what starts on line 9 that is marked “All Rows.” This is how that nested table gets built. We don’t want the nested table though. We just need someone’s name. So, that little bit of code is as shown below:

{"All Rows", each _, type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

What this is doing is saying for each “_” return a table. What does that mean though, the “each _” part? Well, if you want to do a deep dive, you can read more about that at Ken Puls site. For our purposes, the _ character is a temporary variable that stores the entire table for for the current grouping being passed by the Table.Group function, which is at the Product field level. It is the same as:

{"All Rows", each _}

The rest of the statement is just classifying the table fields. If you want to go down that rabbit hole, be my guest. You will see though that we don’t need that table formatting. What we want to do is take that temporary _ variable with our table in it, and return the right sales person for the maximum date. All we have to do is wrap the variable with Table.Max, as follows:

{"All Rows", each Table.Max(_, "Sale Date"), type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

Now, instead of a nested table, we’ve jumped straight to the single record we want within the Table.Group function, as shown below. Now we just need to get the sales person’s name.

20200715 - M Code 02 with Table Max Record Showing.png

To get a single value from a field in a record, just append the field name in square brackets, as follows:

{"All Rows", each Table.Max(_, "Sale Date")[Sales Person], type table [Product=nullable text, Sale Date=nullable text, Sales Person=nullable text]}

Our final tweak will be to rename the “All Rows” field to “Sales Person”, then get rid of the table format code. That leaves you with this very short statement below. You’ll notice that instead of the type table with the field formatting, I told it to just set this value to text.

{"Sales Person", each Table.Max(_, "Sale Date")[Sales Person], type text}

20200715 - Final Result - Just Needs Data Type.png

That’s it. The table above is what is returned, and you’ll see that Ana was properly returned for product B. Just set the Sales Person data type to text, and you are done. The full M code is below. You’ll notice that there are only two steps here, the Source step, and the Table.Group step.

20200715 - Final M Code.png

The beauty of this is, let’s say you have four columns you need to pull out this way, perhaps including the sales person’s last name, the region, and their commission rate. With the long method above, you have to add a new column with the Table.Max function, then extract the field you need, then expand the record for each one. You’d be well over 20 steps for the entire query.

With this method, you just keep adding columns to the Table.Group function, like this:

{"Last Name", each Table.Max(_, "Sale Date")[Last Name] type text},
{"Region", each Table.Max(_, "Sale Date")[Sales Region] type text},
{"Commission Rate", each Table.Max(_, "Sale Date")[Commission Rate], Percentage.Type}

That is still just two steps! I have confirmed this runs much faster than the multiple step process. I came up with this when I had a 20K record table that I thought was taking way too long to finish. This updated method took a fraction of the time, it just required a bit of work on my part.

I hope that helps you. The entire example can be found in this PBIX file.

Using List.Contains To Filter Dimension Tables

In order to make a clean user experience for Power BI report consumers, I will limit the data in my Dimension tables to only include data that is in the Fact tables. For example, a customer dimension table might have 1,000 customers in it, but the sales fact table may only have 400 customers in it. This can happen because the sales data is limited to recent years, or specific regions, for a given report.

I don’t like loading up a slicer with dozens or hundreds of items that have no corresponding records. The same would apply if there was no slicer, but the consumer wanted to filter using the Filter pane. So I’ll filter the customer table so it only includes what I would call “active customers” that are shown in the sales table.

The most straight forward way to do this is by doing an Inner Join between the tables, but there is another way, using the powerful List.Contains() feature of Power Query. And what makes it so powerful is not just it’s utility, but when you run it against data in a SQL Server or similar server, Power Query will fold the statement.

Let me walk you through both methods so it is clear. I’ll be using the WideWorldImportersDW database. If you are a Power BI developer and don’t have a local copy of SQL Server and some test databases on your machine, you are doing yourself a disservice. See this video for quick instructions on getting SQL Server and some test databases up and running. All of the products are free.

There is a Power BI file attached below but you can easily follow along in a new PBIX file for these examples. For both example, you’ll need to connect to the Fact Sales and Dimension Stock Item tables. There are 627 items in the Stock Item table, but only 227 unique items in the Fact Sales table. So any slicers or filters would have 400 unnecessary items. Those are what we want to get rid of. First, let’s do the Inner Join method.

Inner Join to Limit Records in the Dimension Stock Item Table

  • First, we need a unique list of items from the Fact Sales table to work with. To do this, right-click on the Fact Sales table, and create a reference. Right-click on that newly created query and make sure it “Enable Load” is not checked. This table will not be loaded to the data model.

  • Right-click on the Stock Item Key field in this query and select Remove Other Columns. This step isn’t necessary, but it makes the query logic a bit easier to understand.

  • Right-click on the Stock Item Key and select Remove Duplicates. Power Query will probably tell you in the lower left you have 227 rows.

  • From the Dimension Stock Item Table, click on the Merge Queries button on the Home Ribbon.

  • Select the Stock Item Key column in the window.

  • Select the Fact Sale (2) table in the dropdown. This is the one we created a few steps ago. Then select the Stock Item Key field.

  • Select “inner” for the Join Kind. Your screen should look like this. You’ll notice it found 227 matches out of the total 627 items in the Dimension Stock Item table

2020-05-20 08_01_56-.png
  • Press Ok, then expand the field.

  • Right-click on the newly expanded Stock Item key.1 field and remove it. If you were to delete this field before you expanded it, it would have broken query folding, which means all data would have to be brought back to your Power Query engine and processed locally, or in the on-prem gateway if this were refreshing in the service. But by expanding and then deleting the field, it has the SQL Server do all of the work for you.

Now your Dimension Item Table has 227 records, which is the same number of unique items sold in the Fact Sales table. And it all folds.

There is another way to do this though, and it has some advantages.

List.Contains() to Limit Records in the Dimension Stock Item Table

  • The first few steps are the same:

    • Right-click on the Fact Sales table, and create a reference. Make sure it is not set to load to the data model. I’m going to rename this query ActiveItemList to make things easier, because we’ll be manually typing it in later.

    • Remove all but the Stock Item Key fields.

    • Remove duplicates. You should have 227 records.

  • Now we need to convert this to a list. This is actually simple. Right-click on the Stock Item Key field, and select Drill Down. This is necessary because List.Contains operates on lists as its name implies, not tables, fields, etc.

2020-05-20 08_10_28-.png
  • Now you have the 227 items in a list, not a table. The Drill Down command just added the field name in brackets to the table name, so you will see something like this in the formula bar:

= #"Removed Duplicates"[Stock Item Key]
  • Now we are ready to filter our Dimension Stock Item table. Go back to that table. You should delete the Inner Join merge and expansion steps from the previous example.

  • Filter the Stock Item Key by unchecking any item. Doesn’t matter which one. We just want Power Query to get the formula started for us. You will have something like this in the formula bar:

= Table.SelectRows(#"Dimension_Stock Item", each ([Stock Item Key] <> 2))
  • We are going to remove the ([Stock Item Key] <> 2) part, and make the entire formula look like this:

= Table.SelectRows(#"Dimension_Stock Item", each List.Contains(ActiveItemList, [Stock Item Key]))

List.Contains() takes our list named ActiveitemList and compares it to each value, the [Stock Item Key] field, in the Dimension Stock Item table and only keeps those that return true, i.e., a match was found.

You may think that this method is not too efficient. And you would be right, except for what Power Query does with this statement when folding is going on with a SQL Server. If you used this logic on a few hundred thousand rows from a CSV file, SharePoint list, or other source that didn’t fold, this would take forever. The Inner Join method is much faster for those scenarios, but the Inner Join method has more steps, it takes a bit more effort to set up, and the logic of why you are merging a table then deleting what you merged isn’t immediately clear when your future self is reviewing and modifying the code.

Power Query takes the List.Contains() logic and turns it into a SQL IN operator and lets the server do the work. If you right-click on this step in the Query Properties list, and select View Native Query, you’ll see something like this, which is the SQL statement sent to the server.

2020-05-20 08_28_46-.png

So which is better? It depends. If you have relatively small data set, say data from Excel, a CSV file, or a smaller SharePoint List, then List.Contains() will do fine. Anything under 10K records or perhaps a bit more would be ok. A dataset of any size will work fine with a source that will fold your statement, like SQL Server.

There are two reasons I like List.Contains() though and prefer it over an Inner Join when the data set allows it:

  1. The logic is cleaner. I am just filtering a column by a list. Not merging another table then deleting the merge. It is just one step in the Query Properties, a Filtered Rows operation. It is immediately clear what this step is doing.

  2. I can easily create a manual list to filter off of if I just want a few items. For example, if I only wanted a few products in the example above, my ActiveItemList could be one of the manual lists like below rather than a list generated from another table. The first example is few manually chosen numbers, the latter all items from 200 through 250. Creating a normal filter

= {200, 201, 202, 210, 254}

= {200..250}

So there you have it. Another way to filter one table from another set of data. See my Power BI file here.

Edit Oct 27, 2020 - to make this run much faster, do the following:

  1. In your list, right-click on the last step of the list and select “Insert Step After”

  2. Whatever is in the formula bar, wrap it with List.Buffer().

This will cause the list to be processed by Power Query much faster and create the native query almost instantly. The server will run the query at the same speed, which will be very fast. This will radically cut down on the time it takes Power Query to generate the SQL statement though, especially on larger lists.

Thanks to Nathan Watkins on Twitter for this suggestion.

Case Insensitive Merges In Power Query

Power Query is notorious for being case sensitive. Even its language is case sensitive. Often though you get data from users where they are using different cases for the same data. Some never use the shift key, and others CAPSLAP everything. It results in data that looks like this:

2020-04-26 09_49_43-Book1 - Excel.png

There are at least 2 ways to fix this before merging:

  1. Manually fix the data at the source. {shudder}

  2. Use one of the Power Query functions that will adjust the case - Text.Proper, Text.Lower, or Text.Upper.

The first is just unacceptable, unless you are a manager and can tell whomever entered it that way to go fix it. 😉

The second isn’t ideal because neither upper nor lower is right, and proper case will not correctly adjust names like Joe McGuinness. Text.Proper(“Joe McGuinness”) returns Joe Mcguinness.

Unfortunately, in the Table.NestedJoin() function documentation, there is no obvious way to do a join that ignores case. There is an optional parameter called keyEqualityComparers but believe it or not, that is never defined anywhere on Microsoft’s site. If you search for it, it will only show up as options for some functions, but there is no guidance on how to use it, and it may not help even if there were.

The good new is, in Power Query for both Power BI and Excel, there is a relatively new feature called “Fuzzy Merge.” It allows you to match names like Will, Bill, and William in merges. You have to tweak the sensitivity or the matches either don’t work, or return way too many matches.

But there is a way to use Fuzzy Merge to do exact matches, yet ignore case sensitivity.

You can see below if I merge the normal case table to the CAPSLAPPER table, I get no matches. It shows 0 of 10 records matched.

2020-04-26 10_02_52-.png

But if you do the following, you’ll get a perfect match on all records:

  1. Check “Use fuzzy matching…” at the bottom of the Join Kind section

  2. Expand the Fuzzy Matching options. You must do this. If you don’t, you will get a 10 out of 10 match in this sample data, but you will also get other matches that are not exact in a larger data set. For example, If I added a “MERLE BERBER” to the table, it would return as a match for “Merle Barber” - which we do not want.

  3. Change the Similarity Threshold to 1. The default is 0.8, and that is why in step 2 above you’ll get more matches than you want. By setting it to 1, it becomes an exact match.

  4. Check the “Ignore Case” box. This is where the magic happens.

  5. Uncheck the “Match by Combining Parts” box, and ignore the remaining boxes in the Fuzzy Matching Options. They aren’t needed for this.

2020-04-26 10_09_20-.png

Now your merge will show a 10 out of 10 match, without having to change the source data or change the case within Power Query.

Use Power BI's Enter Data Feature In Excel

EDIT: This article is no longer necessary as Excel’s version of Power Query now has an Enter Data feature built in. But the article below will show you how the Enter Data feature works, so still has some value. You do need Excel 2022 or Office 365. If you have an earlier version of Excel, you may still need these steps.

One of the cool things about Power BI is you have a nice “Enter Data” button on the home ribbon where you can quickly add data to your model. The table isn’t dynamic, but sometimes you just need a small table to finish your model to add descriptions or some other bit of data without creating another table in a source that needs to be connected to. Enter Data is perfect for this.

2020-02-23 18_04_10-Query1 - Power Query Editor.png

It looks like the image above. You just start typing the column names, some data, and name your table. The merge it with other queries or bring it directly into the data model.

Except that isn’t Power BI. I created that image in Excel! Notice how it is green instead of Power BI’s yellow? How did I do that? With just a little bit of trickery.

First, I created a table using Enter Data in Power BI with just a single column and record. Then I copied the M Code Power BI created. It looks like this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}})
in
    #"Changed Type"

Now create a new Blank query in Excel’s Power Query, and copy the Power BI code, then paste it in the Advanced Editor. Now in that query, select the little gear icon next to the Source statement, just like you would in Power BI to edit the table.

2020-02-23 18_08_42-Query1 - Power Query Editor.png

That’s it. The Create Table dialog shows up and you can add and change data and records however you like. I keep that code snippet in Evernote where I can do this in Excel anytime I need to without firing up Power BI to get the code. Power Query is just storing this data as a compressed binary string of characters, and can usually store up to 3,000 cells.

Why would you do this though? Unlike Power BI, Excel has a place for this kind of data, in a Table in the worksheet you import into Power Query. That is probably why the Excel team didn’t bother to expose this functionality anywhere in Excel. The underlying Power Query tool is very similar to the one in Power BI though, so all of that Create Table code is still there.

Maybe you have some data you want to be hidden, or at least difficult to find or modify, and don’t want to fool with hiding sheets, locking workbook properties, or setting worksheet’s hidden property in VBA to xlVeryHidden. All of those work well, but it also makes it harder for you to get at to edit if you need to in the future. This method is easy because it stays in the Power Query UI, where you are comfortable working.

Avoid Using Excel XLS Files As A Data Source

Power Query in both Power BI and Excel can use all Excel file formats as a data source, including the old 2003 format that ends in XLS. Most of the time, this works fine, but I recently hit an issue that consistently returned incorrect results.

Microsoft replaced the XLS format over a decade ago when Office 2007 launched with the new .XLSX format, but there are still companies that use the older XLS format. I frequently get data exported from a national payroll service that offers the XLS file format as the only spreadsheet based format.

It turns out that if data is formatted a certain way in the XLS file, Power Query will return the wrong value. Look at the data below.

2020-02-15 14_51_55-Negative Number Format Error  -  Compatibility Mode - Excel.png

Column A has integers 1-4, and then a negative 5. Column B is simply a list of formulas pointing to column A, but it is formatted with the number format 0.00;[Red]0.00 which is a normal format in the Number section. It just shows the number in red with no negative sign. I’m not sure why anyone would format a number this way. A negative number needs to either have a negative sign or be in parenthesis, but I’ve seen a lot of people use this, and this national payroll provider’s systems are programmed to use for all numerical data.

2020-02-15 14_55_16-Negative Number Format Error  -  Compatibility Mode - Excel.png

If the file is a 2007 or later format, this is no problem. But oddly enough, Power Query doesn’t see it that way in the XLS format. Power Query sees the below data when you connect to the file. Here is what it looks like when you start the import process:

In the preview window

In the preview window

In the editor

In the editor

You can see this is not just a visual issue. I added a column called “Addition” that just adds the values in [Regular Number] to [Special Format]. The positive numbers 1-4 added correctly, but -5 was added to 5 (instead of -5) returning the incorrect result of 0. It should be -10.

There is no fix for this. I’ve contacted Microsoft’s Power BI team and they confirmed it but understandably weren’t going to commit to spending a lot effort to fix a file format that was discontinued years before Power BI was first released.

Those of you that are a bit more savvy in working with XLS files might be thinking “Well, of course. To fully support XLS files you need to have the 2010 MIcrosoft Access Data Engine files installed.” If you’ve ever worked with Microsoft Access files in Power BI you have these files. And yes, the ACE engine does have some addition support for XLS beyond what Power Query supports natively, but it does not fix this issue. There are at least two workarounds for this:

  1. Save the file as an XLSX file or move the tab with the data you need into an existing XLSX file.

  2. Change the format, or better yet, remove all formatting. As long as the negative sign is visible to the human eye, then Power Query will see it as well.

It took me about an hour to figure out why a report I recently wrote was returning bad data. As you can imagine, the number format was the last thing I would consider to be an issue. I don’t know if this is the only issue with XLS files, but going forward, I am putting filters into my queries that will filter out XLS files before any File/Combine operations are done with Excel files, and fully documenting processes to save all files as XLSX before storing them in network, SharePoint, or Azure DataLake folders.

If you have XLS files in your workflow, you should consider doing the same. You never know what data formatting landmine lurks in that ancient file format that Power Query will read incorrectly.

Create A Dynamic Date Table In Power Query

Most Power BI and Power Pivot (Excel) reports require a date table so time intelligence functions can calculate correctly. I’m going to show you how to quickly set up a date table in Power Query, one that will be dynamic. The starting and ending dates will move with your data over time. And because it is in Power Query, it will be an imported table which is what Power BI loves most, being more efficient than a table created in DAX with calculated columns.

You can download the Excel sample data and Power BI file at the bottom of this article if you want to follow along step by step.

EDIT: I have a big block of M code you can use to create your own date table at this link - it is an Evernote page. The contents of this blog post though are still important is it will show you how to make that block of code dynamic.

First of all, we’ll create a static table to get the query set up, then modify it so it is dynamic. The fastest way I’ve seen to create a dates table in Power Query is to start with the one line statement shown below, which we’ll put in a Blank Query. In Power Query in Power BI, it is on the home menu, New Source, Blank Query. In Excel, it depends on the version you have.

Office 365 versions and Excel 2019:

  1. Data tab in Excel

  2. Get & Transform Data section of the ribbon

  3. Get Data dropdown menu

  4. From Other Sources

  5. Blank Query

In Excel 2016 and earlier, you will need to open the Power Query window first. This also works in Office 365 and Excel 2019 if you are already in Power Query.

  1. On the Home tab, select New Source

  2. Other Sources

  3. Blank Query

20190317 - Static Date Table.png

Now you have a blank query called Query1, with a Source in the applied steps (#1 in the above image), but nothing is there. In the formula bar, type the following (#2) and press enter:

={Number.From(#date(2018,1,1))..Number.From(#date(2019,12,31))}

Let’s break that down:

  • Lists are defined in Power Query by enclosing them in what I call squiggly brackets, but are also known as braces or curly brackets. { }

  • When used with numerical data, if you have two periods between the first and second number, it will generate a list of all numbers between those numbers. So {1..10} would generate a list of numbers from 1 to 10.

  • My first number is Number.From(#date(2018,1,1)). This returns 43101. That is the same thing You’d get in Excel if you typed 1/1/2018 in a cell, then formatted it as a number.

  • My second number is 43830, the numerical value of December 31, 2019.

  • The total list has 730 numbers, 43101 - 43830, which is two years of data. Perfect. No skipped dates.

Now we need this to be a table of dates, not a list of numbers. Let’s fix that.

  1. Click the “To Table” icon (#4) in the tool bar. In the To Table dialog box, leave the defaults of “None” for the delimiter and “Show as errors” for extra columns. There are no delimiters or extra columns here, so there won’t be any issues.

  2. For the data type where it says ABC/123, click that and change it to Date. This is the tiny box below with the arrow pointing to it in the image below.

  3. Rename the column to Date.

  4. Rename the query to Date.

  5. Right Click on the query and make sure Enable Load is checked.

20190317 - Date Column.png

At this point, it is a valid date table, but quite spartan, and it isn’t dynamic. First let’s make it dynamic so the date ranges move with your data. I’ve created some sample data that I’ll use for this. I want my date table to encompass the earliest data from my Sales table so all of my sales history is covered, and I want the latest dates in the table to handle dates in the Open Purchase Orders table, which has expected receipts of merchandise well into the future.

A couple of rules about Date tables first:

  • The dates must be contiguous. There can be no gaps. Even if your business doesn’t operate on the weekends or holidays, those dates cannot be missing from your table. The way the data was created using a List ensures no dates are skipped.

  • You should almost always include the full year for any date in your table. So if your data has Jan 1, 2019 in it, you should have all 365 days of 2019 in your date table. I am working with calendar years in this example. If you have a fiscal year, that is fine, but your data should still encompass a full fiscal year. So if your year is July 1 through June 30, if you have July 1, 2019 in your dates table, you should have all dates through June 30, 2020, a full fiscal year.

So how do we make this dynamic? We need to figure out what the earliest date is in the sales table, since that table is going to have the oldest data for sales history, and we need to know the latest date will be based on the last date in the OpenPOs table.

20190317 - Earliest Date.png
  1. Right-click on the Sales table make sure “Enable Load” is not checked. You’ll see why in a minute.

  2. Right-click again on the Sales table and select “Reference”. We now have a second query that is identical to the Sales table, because it is simply pointing to the Sales table. Any changes to the Sales table will affect this new query, which is probably called Sales (2) at this point.

  3. Rename Sales (2) to pmEarliestDate. I recommend you put no spaces in the name as that complicates things later on.

  4. Right-Click and ensure “Enable Load” is not checked. Not only is there no need to load this, it will create an error as lists cannot be loaded into Power BI or Power Pivot.

  5. Select the Date column (#1 in the above image)

  6. On the Transform tab, select Date (#2), then Earliest (#3).

Now we have the earliest date as a list. A single item list. In my sample data, it is Jan 2, 2015. Jan 1 was a holiday, so I didn’t expect any data there, but remember, I need to make this a full year, so let’s tweak the formula that was generated by Power Query.

  1. Power Query used the List.Min() function to the earliest date. List.Min(Source[Date])

  2. We need to extract the year first, so wrap that in Date.Year(). Now it is just 2015.

  3. Now convert it back to a date, starting with January 2. Wrap step 2 with #date(year,1,1). The final formula is:

= #date(Date.Year(List.Min(Source[Date])),1,1)

If you were on a fiscal year, you’d need to do a bit more math to get this to July 1, 2014, for example, but still pretty easy to do. You’ve probably figured out by now that #date() is the same things as DATE() in Excel, taking year, month, and day for the arguments.

Now, we need the last date. I’ll repeat the steps above on my OpenPOs table’s date column, but instead of the Earliest date, I want the Latest date. My final formula is:

= #date(Date.Year(List.Max(Source[Expected Receipt Date])),12,31)

This will give me a date of Dec 31, 2020 since the latest date in that table is Mar 31, 2020. Note this time Power Query used the List.Max function to get the latest date in the table. Call this query pmLatestDate

Now I know my first and last date, and both are calculated dynamically. If dates in my source tables change, so to these two parameters.

So how do we make the Date table we created use these dates? Remember our original Date table started with this Source line:

={Number.From(#date(2018,1,1))..Number.From(#date(2019,12,31))}

We need to edit that formula a bit. We just get rid of the hard coded dates, and replace with our dynamically calculated dates. It becomes this:

={Number.From(pmEarliestDate)..Number.From(pmLatestDate)}

Now we have a date table with 2,192 consecutive dates from Jan 1, 2015 to Dec 31, 2020, with leap years accounted for automatically for us.

So why did I create a reference from both the Sales and OpenPOs table, and ensure those original table are not loaded? Now I can tweak my date ranges in those original tables, and the dynamic dates will account for it. So if my actual sales table has history going back to 1995, I can filter that in the original sales table to be 2015 and later if I want, and my actual data and Date table will reflect that.

Same thing with my OpenPOs table. I’ve seen companies put fake data, say Dec 31, 2999, for things like blanket POs, or some future forecast, or whatever. I don’t want a date table or actual data going out that far, so I limit it to realistic date ranges in the OpenPOs table with a filter on the Date column.

To load relevant data, I need to create two more references:

  1. Create a reference from Sales, and call it Sales Data.

  2. Create a reference from OpenPOs, and call it Open Purchase Orders.

  3. Makes sure “Enable Load” is checked for both of these.

  4. At this point, in both tables, you can do any further transformations, such as removing columns, grouping data, or whatever you want. Because these transformations are on references to original Sales and OpenPOs table, the original tables are unaffected, and the Date table will be unaffected, which is what you want.

It can help to see how the queries are related to each other. Click on the View tab in Power Query, then Query Dependencies. You should see something similar to this:

20190317 - Query Dependencies.png

You can see how the calculation of the earliest date, pmEarliestDate, and the Sales Data query both come from the original Sales query, but changes in the Sale Data query will not affect the date calculation. You can also see which queries are loaded into Power BI from here, and which have the load disabled.

Finally, I need to enhance the Date table. Right now, it is just dates. But to have rich reporting capabilities, we need things like month names and numbers, quarters, years, etc. I’ll create a few as a place to get started:

  1. Select the Date column in the Date table, then on the Add Columns tab, select Date, Month, Month. This gives us the month number.

  2. Select the Date column, Add Columns, Date, Year, Year.

  3. Select the Date column, Add Columns, Date, Month, Name of Month

You get the idea. Now let’s add a few things that you cannot do through the tool bar. We are going to add a column that will give us the Month and Year in the format of MMM-YY, so Jan-15, Feb-15, Mar-15, etc. This is handy for a number of visuals.

  1. Add Column

  2. Custom Column

  3. Name the column MMM-YY

  4. Type in the following formula:

=Text.Start([Month Name],3) & "-" & Text.End(Text.From([Year]),2)

Make sure to set the type to Text so it isn’t the Any data type (the ABC/123 type)

Now, if you know anything about creating charts or tables in Power BI and Excel, that column will sort alphabetically, which is useless, unless you want April to be before January. Of course you don’t. So we need to create a sorting column for this. Add a new column, call it “MMM-YY Sort” and add this formula:

=[Year]*100 + [Month]

This will create a column that will have 201501, 201502, 201503, etc to correspond to Jan-15, Feb-15, Mar-15, etc. You can use this column to sort your MMM-YY sort. Lastly, change this column to a Whole Number format.

At this point you can add dozens of more columns depending on your date needs. Quarters, weeks, ISO 8601 formatted dates, etc. But in our simple example, this is a really short query, and is fully dynamic. In the Advanced Editor, it would look something like this:

let
    Source = {Number.From(pmEarliestDate)..Number.From(pmLatestDate)},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type date}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Date"}}),
    #"Inserted Month" = Table.AddColumn(#"Renamed Columns", "Month", each Date.Month([Date]), Int64.Type),
    #"Inserted Year" = Table.AddColumn(#"Inserted Month", "Year", each Date.Year([Date]), Int64.Type),
    #"Inserted Month Name" = Table.AddColumn(#"Inserted Year", "Month Name", each Date.MonthName([Date]), type text),
    #"Added MMM-YY" = Table.AddColumn(#"Inserted Month Name", "MMM-YY", each Text.Start([Month Name],3) & "-" & Text.End(Text.From([Year]),2)),
    #"Added MMM-YY Sort" = Table.AddColumn(#"Added MMM-YY", "MMM-YY Sort", each [Year]*100 + [Month]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added MMM-YY Sort",{{"MMM-YY Sort", Int64.Type}, {"MMM-YY", type text}})
in
    #"Changed Type1"

The final step is once this is loaded, close Power Query and set this as your date table for Power BI:

  1. Make sure automatic date logic is off in Power BI. Select File, Options and Settings, Options, Data Load. Uncheck “Auto Date/Time”. Leaving this checked will create all sorts of unnecessary hidden tables and columns. You’ve created a perfect Date table. Don’t let Power BI’s AI mess this up.

  2. In the Visual view, right-click on the Date table, and select Mark as Date Table, then select the Date column.

  3. It will validate it by ensuring you have no skipped or duplicate dates. Press Ok.

Now you can start creating your visuals, measures, etc. in Power BI, or pivot tables in Excel. As data is updated from your source system, your Date table will dynamically expand. This is handy as you would never have to edit the date range in the table and republish the report. The Power BI service will refresh all of the data, including calculating the ever-expanding date range as sales and purchase orders continue to happen into the future.

Below are the two files I used in my example, the PBIX file and the Excel XLSX file with the sample source data.

Power BI Dynamic Date Table

Excel Sample Source Data

Quickly Pad Columns in Power Query with Text

When working with data in Power Query, you often will need to pad either the beginning or ending of a column with specific text, especially if the final result will be data you export to a CSV file for uploading into a different system. Common examples include:

  • Some sort of “number” that needs to be padded with leading zero’s.

  • Labels that need to be padded with trailing spaces to ensure the cell contents are a specific length.

This can even be useful in Power BI if a report is designed to be exported by the end user to upload or otherwise use for input into a different system.

In Excel, you can do this in a worksheet cell with a formula. This will create a string 15 characters wide with trailing periods, assuming the text you want to convert to a 15 character string is in cell A1.

=LEFT(A1&REPT(".",15),15)

You can do the same in Power Query with the following formula, but this is not the easiest way.

Text.Start([Labels] & Text.Repeat(".", 15), 15)

The Text.Start() function is equivalent to Left() in Excel, and Text.Repeat() corresponds to REPT().

But there is a faster way to do this and the Power Query function name tells you exactly what it is doing.

=Text.PadEnd([Labels],15,".")

Text.PadEnd() will cause the text value, [Labels] in this example, to be padded until it is 15 characters long. I used a period so you could see the results, though you’ll most often use a space. You can see the results are the same for Excel and Power Query.

20190224 - TextPad Start.png

I changed the font in the Power Query window to monospace so you could see the text is the same length with both methods. You can change to monospace on the View tab of Power Query.

Text.PadEnd() has 3 arguments:

  1. The text you are manipulating. This can be a single filed, or a formula that generates text.

  2. The length you want your field to be. Note that if the text is longer than the padded length, Text.PadEnd() will not truncate your data to your desired length. It will return the full length of your text, but with no padding as the string is already longer than the length you set.

  3. The text to pad with, which is optional. If you leave this off, it will use spaces, ASCII code 32. If you use two or more characters, it will result in an error. It must be a single character.

You can use Text.PadStart() to pad at the beginning. It works exactly the same way as its sister function Text.PadEnd().

=Text.PadStart([Labels],15,"0")

As you can see, you only need to change from Text.PadEnd() to Text.PadStart(). Using the Excel method, you not only have to switch from the LEFT() to RIGHT() function, but change the order of the concatenation of the text in the formula, making sure not to mess up the Text.Repeat() function. The Excel method would be:

=Text.End(Text.Repeat("0", 15) & [Labels], 15)
20190224 - TextPad Start.png

In this way, Power Query is very much like Excel in that there is usually more than one way to do the same thing. I think using the Text.Pad* functions makes your code more readable, and easier to edit without having to nest multiple functions.

Intellisense in Power BI's Power Query Formula Bar

Late in 2018, the Power BI team added Intellisense to the Advanced Editor for queries in Power Query, only in Power BI. This has not yet come to Power Query in Excel.

In the February 2019 version of Power BI, that Intellisense now comes to the formula bar.

20190214 - Intellisense In M Language.png

To see this you have to enable two things:

  1. Turn on the Formula Bar in the Layout section of the View ribbon in Power Query. That should always be on by the way. Working without this is like disabling the formula bar in Excel so you cannot see the cell contents.

  2. In Power BI, go to File|Options & Settings|Options, and scroll down to the Preview Features. Check the “M Intellisense” checkbox. You’ll need to restart Power BI for this to take effect.

It still doesn’t have Intellisense in the Custom Column formula box, but we are getting closer. I suspect this feature came as a result of the recently announced “Improved Python & R Script Editor” in Power BI this month.