Avoid Using Excel XLS Files As A Data Source
/Power Query in both Power BI and Excel can use all Excel file formats as a data source, including the old 2003 format that ends in XLS. Most of the time, this works fine, but I recently hit an issue that consistently returned incorrect results.
Microsoft replaced the XLS format over a decade ago when Office 2007 launched with the new .XLSX format, but there are still companies that use the older XLS format. I frequently get data exported from a national payroll service that offers the XLS file format as the only spreadsheet based format.
It turns out that if data is formatted a certain way in the XLS file, Power Query will return the wrong value. Look at the data below.
Column A has integers 1-4, and then a negative 5. Column B is simply a list of formulas pointing to column A, but it is formatted with the number format 0.00;[Red]0.00 which is a normal format in the Number section. It just shows the number in red with no negative sign. I’m not sure why anyone would format a number this way. A negative number needs to either have a negative sign or be in parenthesis, but I’ve seen a lot of people use this, and this national payroll provider’s systems are programmed to use for all numerical data.
If the file is a 2007 or later format, this is no problem. But oddly enough, Power Query doesn’t see it that way in the XLS format. Power Query sees the below data when you connect to the file. Here is what it looks like when you start the import process:
You can see this is not just a visual issue. I added a column called “Addition” that just adds the values in [Regular Number] to [Special Format]. The positive numbers 1-4 added correctly, but -5 was added to 5 (instead of -5) returning the incorrect result of 0. It should be -10.
There is no fix for this. I’ve contacted Microsoft’s Power BI team and they confirmed it but understandably weren’t going to commit to spending a lot effort to fix a file format that was discontinued years before Power BI was first released.
Those of you that are a bit more savvy in working with XLS files might be thinking “Well, of course. To fully support XLS files you need to have the 2010 MIcrosoft Access Data Engine files installed.” If you’ve ever worked with Microsoft Access files in Power BI you have these files. And yes, the ACE engine does have some addition support for XLS beyond what Power Query supports natively, but it does not fix this issue. There are at least two workarounds for this:
Save the file as an XLSX file or move the tab with the data you need into an existing XLSX file.
Change the format, or better yet, remove all formatting. As long as the negative sign is visible to the human eye, then Power Query will see it as well.
It took me about an hour to figure out why a report I recently wrote was returning bad data. As you can imagine, the number format was the last thing I would consider to be an issue. I don’t know if this is the only issue with XLS files, but going forward, I am putting filters into my queries that will filter out XLS files before any File/Combine operations are done with Excel files, and fully documenting processes to save all files as XLSX before storing them in network, SharePoint, or Azure DataLake folders.
If you have XLS files in your workflow, you should consider doing the same. You never know what data formatting landmine lurks in that ancient file format that Power Query will read incorrectly.