Excel has a special place in my heart. Excel is kind of a mongrel piece of software…it has accumulated many, many features over time and there are obviously benefits to the hybridization. At the same time, there are ways that Excel just plain works differently than more “purebred” software like Tableau and that can create confusion for new Tableau users. This post is about one of those places that works differently - data types and display formats.
How Excel Sees Data
Data types specify how data is actually stored on disk in memory. Excel basically recognizes three data types: numbers, text, and boolean (TRUE/FALSE) values. We don’t see these in Excel's user interface, we can only evaluate them using TYPE() function.
Display formats specify how the data type is displayed. This is what we see in the number formatting section of the Excel Home ribbon:
So Excel (mostly) hides the details of data types while letting us manipulate the display format as we want to, which in some cases will actually change the datat type. Here are a few examples:
Numbers We can type in a number and then format it as Text, and this isn’t just a formatting change, Excel actually changes the data type to text, while Excel also warns us that we’re storing a number as text.
Zip Codes can be stored as numbers in Special format that will preserve the leading zip codes, for example my office’s is in 04005. We can also store zip codes as text. However, in Excel if we are opening a text file with five-digit zip codes just stored as numbers and don’t explicitly tell Excel to treat those zip codes as text then Excel will trim the leading zeroes. Argh!
Percents In Excel we can just start typing in a number like 40% and Excel will store it as 0.4 and apply percent formatting to it.
Dates We can type in a date like 2/1/17 and Excel displays it as a date with General format. However it is stored as a number data type with the value 42767 that is the number of days since 12/31/1899 (skipping the 2/29/1900 leap year because of either a conscious decision or unconscious mistake made by the Lotus 1-2-3 developers). If I change the date to a Number format then I’ll see the 42767. If I type in dates in a field that is formatted as Text then Excel will display the entered text and actually change the data type to text.
Like many pieces of software Excel tries to hide complexity such as the underlying distinction between data types, however that can lead to problems when we're using the data that's in Excel for another purpose such as building a Tableau visualization.
How Tableau Sees Data
Tableau treats data much more like a database does and in this case there are two things that databases do that are different from Excel:
- Databases require that all records in a given column have the same data type. This is for reasons of performance and economizing data storage sizes.
- Databases explicitly separate data types from display formats. Again, this is done for performance reasons and also because database designers explicitly recognize that formatting is something that is dependent on what we’re querying the data for at a given time.
In Excel we can mix and match data types and display formats however we want…so one cell in a column might be 2/1/17 and be stored as a number and formatted as a date, and the next cell in that column might be 2/1 /17 (a mistyped value) and be stored as text. Or we’ve got a column of numbers formatted in different ways.
How Tableau Reads Excel Files
The data types that Tableau recognizes are decimal numbers, whole numbers, text (also called string), boolean, dates, datetimes, and ( coming in Tableau 10.2 ) a new spatial data type that is called geography.
When Tableau connects to an Excel file using the default (new as of Tableau v8.2) connector it uses a combination of Excel’s data type (number, text, or boolean) and an analysis of first 1000 rows of actual values in a column and hints from the display formatting to determine the default data type for that column. For example, I might have set a column to format as number in Excel but have actually mixed both numbers and text in the values of the column so Tableau will treat it as a text field.
The reason why Tableau keeps it as text is that standard behavior for databases is when a value that fails to properly cast (convert) from one data type to another the new value is set to NULL. So if we cast a column of numbers & text to a number data type then the text values would turn into NULLs; therefore Tableau plays it safe and keeps that column as text.
Another key point in the above paragraph is that Tableau does not automatically set the display format to be Excel’s format, in fact it ignores it. Here are a few examples:
Percent If I’ve defined the percent number format in Excel to be something like 42.7% then in Tableau I’ll see 0.427. In that case we’ll need to set up Tableau’s built-in number formatting to re-apply the desired number format.
Decimal number formatted to whole in Excel If I’ve got a column with decimal numbers but I’ve formatted the column to only show whole numbers Tableau imports the raw decimal numbers. If I want to just show whole numbers in Tableau I’ll again need to set up number formatting.
Date formatted as text in Excel In this case Tableau will read in the date as text, and we can change the data type to Date or DateTime thenTableau will likely recognize it if we’re using a common way of representing the date. Tableau v10.2 gets even better with the new Automatic Dateparse feature.
(FYI Tableau’s legacy connector that uses the Microsoft JET driver works differently in recognizing data types and formatting, see http://onlinehelp.tableau.com/current/pro/desktop/en-us/upgrading_connection.html for details on that.)
So the three things I’ll first do when connecting to an Excel file are:
(1) Verify that the data I’m seeing in Tableau matches what is stored in Excel. This can be a little tricky in cases where I’ve used Excel’s number formatting to hide details, I may need to do some reformatting in Excel to see what’s really going on. The way I’ll generally do this is using the data pane in the the Data Source window.
(2) Verify that the data types I have in Tableau are the ones that I ultimately want. Besides the data pane we can use the Manage metadata button in the Data Source window to quickly scroll down and review this:
For example if I’ve got dates stored as text in Tableau and I want to work with them as a date data type and take advantage of Tableau’s built-in date hierarchies then I’ll need to convert them to a date data type. When I do this I’ll need to go through a verification step to make sure everything converts, again using the data pane.
(3) Set the default number or date formatting for each column. We can’t do this from the data source window, once we’re in the main workspace we can right-click on a dimension or measure and choose Default Properties->Number Format... or Default Properties->Date Format… to set the default formatting.
For more info on the kinds of formatting we can do Robert Mundigl at Clearly and Simply has a great post on Custom Number Formats .
That’s an introduction to the differences between data types and display formats between Excel and Tableau. If you have questions on this please ask below, or you can get one on one help from us!