Pragmatic Works Blog

Tableau Webinar Follow Up

Written by Pragmatic Works | Oct 18, 2016

Hi everyone! About a week ago we held a bonus webinar on Tableau presented by Jonathan Drummey from DataBlick. It was a great session and we had tons of questions. Below are answers to many questions we did not get to in the presentation. 

 

Power BI has a free environment for Developers to learn on. Does Tableau have an equivalent environment?

Great question and one I didn’t get to! Tableau has a free two week trial download of the full product. They also offer Tableau Public Desktop http://public.tableau.com/ which has the ability to connect to Excel, text file, MS Access, and a number of web data sources and gives the full authoring environment. For publishing with Tableau Public Desktop, visualizations with up to 1M rows can be saved to Tableau Public, which is the largest Tableau Server installation in the world.

Besides being browsed over the web, the visualizations on Tableau Public can be downloaded to Tableau Public Desktop or Tableau Desktop. To me this is a huge part of why Tableau has been so successful - just as we can download and view the HTML and Javascript for a web page to see how it was made - we can download a Tableau visualization that we like and open that up and to see exactly how it was built. It’s an incredible learning tool. For example, two Tableau users (Andy Cotgreave from Tableau and Andy Kriebel from The Information Lab) have started a weekly #MakeoverMonday where on Sunday afternoon an existing public visualization with data is made available and users can remix and create their own new visualizations, see http://www.makeovermonday.co.uk for more info.  

Finally, if you have Tableau Server in-house then you can give users access to web authoring without them needing a Tableau Desktop license. In version 10.0 the web authoring tool is (in my estimate) 80+% of the functionality of Tableau Desktop and every new release gets closer to parity with Tableau Desktop.

Power BI or Tableau?  Looks like Power BI can do the same thing as Tableau and is less expensive? 

On the surface they can look the same, but underneath they are different. I suggest you take some time to get to know them both and find out which one is best for your needs. I’ve never found two pieces of software that have 100% overlap in capabilities. FYI, Ken Black is doing an in-depth blog series comparing Power BI to Tableau & Alteryx: https://3danim8.wordpress.com/2016/10/06/impressions-from-our-first-month-of-using-powerbi/. (Full disclosure: I asked Ken to do this as someone I trust to give an honest comparison of the good and not-so-good parts in each tool).

I’d also like to point out a key difference between the companies building the tools: Microsoft has a time-tested (and generally successful) strategy of combining a “mostly good enough” stack of software with aggressive pricing and Power BI is very much about being part of the MS stack. If that works for you, that’s great! On the other hand, Tableau has their eggs in one basket — visual analytics — and has demonstrated an incredible focus on empowering data analysts, and the much larger population of people who do data analysis but don’t have “analyst” in their job title, to do more with their data. So in the long run, I personally trust Tableau to be better at meeting the needs of the people who work with data every day. 

And no matter what, the market for visual analytics is big enough that there’s room for both tools and more, and Microsoft’s entrance into visual analytics will only help everyone (including Tableau) improve their game to the benefit of us as customers and users.

Is Tableau better than QlikView? What if you are dealing with big data? Such as couple TB? Which one performs better?

“Better” is relative, as I noted above about Power BI, different tools will meet different needs. From my perspective the biggest difference between QlikView and Tableau is that QlikView is firstly (though not only) a tool for dashboard construction designed for use by trained developers, and Tableau is firstly (though not only) a tool for interactive analysis of data designed for everyone. QlikView does have more data prep capability built-in than Tableau, however, Tableau continues to close that gap with each new release. QlikView’s in-memory model also enables some forms of interactivity to be faster than Tableau. The learning curve on QlikView is steeper than for Tableau, and in numerous reports from friends in the business who conduct shootouts for prospective clients, Tableau is quicker to results than QlikView.

Performance is a large topic, I address that in a question below.

Can Tableau be used to replace SSRS for reporting (mainly for redefined reports)?

I’m not sure what is meant by “redefined reports”, so I’ll have to skip that part. Depending on your use of SSRS, Tableau may be able to entirely replace SSRS, however, Tableau is not designed to burst out huge batches of printer-perfect reports, it’s instead designed for interactive analysis via screens (although we can output to PDF, PNG, and CSV). So if you need large (like batches of thousands to tens of thousands or more) bursting capabilities, then Tableau may not be right for you, although there are third-party tools such as Metric Insights http://metricinsights.com/ and VizAlerts https://github.com/tableau/VizAlerts (full disclosure - I’m one of the contributors to VizAlerts) that might meet your needs. Tableau is also actively adding new subscription & delivery functionality in each new release.

In a broader view, what will you suggest are the advantages of using Tableau over Excel or SSRS and tabular model?

I’ll answer this with a few points:

1) Visual analytics is faster to answer questions than tabular text tables because visual analytics takes advantage of human biology, in particular, how we visually perceive and interpret the world around us. Watch https://tc14.tableau.com/schedule/content/1172 from Tableau for an example, this is an earlier (static) version of the same presentation : http://www.slideshare.net/TableauSoftware/visual-analytics-best-practices

2) Tableau has been designed from the ground up to support interactive visual analytics. Besides automatically generating appropriate views based on the science of visual analysis, Tableau’s fundamental user interface model is that we are directly manipulating the elements (fields) of our data by dragging and dropping the pills in the interface, whereas Excel and SSRS work at at least one step removed from that, so we can move faster from question to answer. In addition, calculations that require custom queries, writing complex MDX or DAX expressions, and/or nesting pivot tables, and interactions such as brushing, highlighting, and filtering that require custom VB coding, etc. in Excel and SSRS, can often be done with a few clicks in Tableau.

100% of the Excel and SSRS users that I’ve shown Tableau to have acknowledged that they could get more done faster in Tableau. (That said, there are still things that Excel and SSRS can do that are not suitable for Tableau).

3) SSRS requires us to know SQL, Tableau generally doesn’t (although it can help in some situations). Tableau’s VizQL technology turns the layout of the view (based on dragging and dropping pills and their configuration) into a specification that generates SQL on the fly so Tableau has advantages in ease of use.

When working with warehouse-size data in a RDBMS with some normalization in the schema, and particularly when considering interactive dashboard visualizations on Tableau Server (so aggregations/summations exist and response times count), how would you recommend using data extracts and/or live connections to balance storage efficiency vs. performance?  Should we be looking at live connections to underlying data models such as star-schemas or denormalized/flat views, or just have Tableau self-manage a whole lot of extracts with joins between them?

That’s a big question, additional factors include the complexity of the visualizations, the desired interactivity, the number of interactors, frequency of interaction, frequency of data updates, network latency, hardware sizing, etc. Also please note the ability to join separate Tableau data extracts is brand new in Tableau v10.0 and is not available everywhere (for example, we can’t yet join across Tableau Server Published Data Sources that could be data extracts). For those of you new to Tableau, please note that we’ve always been able to join tables within a single connection to a database (or worksheets in an Excel file), and for a long time now we’ve been able to use Tableau’s data blending to join across separate databases/Excel files/web data sources/etc., though Tableau’s data blending has performance limitations. (Which is part of why Tableau has introduced cross data source joins in v10.0).

Balancing all these factors is something of an art, so I’m going to have to give you some general principles:

First of all, if a Tableau visualization isn’t fast in Tableau Desktop, then it’s a lot less likely to be faster on Tableau Server, so tuning your visualizations there is a good thing.

See http://www.tableau.com/learn/whitepapers/designing-efficient-workbooks for the best set of tips available. This white paper is updated for each major new release of Tableau. Some of the principles I bring up below are discussed in that paper.

Secondly, I avoid trying to build a single “mother of all data sources” to answer every business question. Very often dashboards will start out with summary views that might only need 1000s of rows of summarized data to generate them, and then once a user has made some selections that can trigger a Filter Action in Tableau to bring up a detail view, that is going against a much larger data store. That summarized view might be an aggregate Tableau data extract while the detail view is coming from a live connection to the warehouse.

Finally, where I always attempt to start is with a live connection to the raw data using joins in Tableau. If/when that starts running into performance issues, then I’ll look in the following directions in roughly this order:

- Performance tuning the source of the live connection. We can do Performance Recordings in Tableau Desktop and Tableau Server to identify problematic parts of visualizations and even see the database queries. The queries are also visible in the Tableau log files. If you’ve got a DBA, then give those to your DBA and put them to work, otherwise you can put yourself to work. :) I’ve gotten 10x-100x performance improvements just by adding some indexes. Having referential integrity set up (or telling Tableau to assume that referential integrity exists) can also speed up views.

- Creating a Tableau data extract (which can be aggregated if need be).

- Creating a database view or stored procedure.

- Using an ETL tool such as Alteryx, Trifacta, Lavastorm, etc. (these can write out Tableau Data Extracts and some tools can even publish them to Tableau Server. If your tool of choice doesn’t do this, Tableau does offer an API for doing this).

- Restructuring the data to support the desired analyses.

- Finally, investing in a high-performance data store that supports analytic queries - this is where the “big data” vendors come into play.

Notice that I’m explicitly leaving cube data sources out of this picture of what to do for performance; the basic reason  is that cubes are predefined structures that eliminate data that we often need for analysis. It’s very rare for organizations to have cubes that are up-to-date with the needs of the business, particularly when Tableau enters the picture. Tableau’s fundamental design goal is to enable users to quickly ask questions, get answers, and repeat. And since cubes require up-front decisions about what questions can be asked of the data, Tableau users will very  often run into the limitations of those earlier decisions and need columns and/or record-level details that were not built into the cube.

Another possible performance issue is the question of row-level security. If row-level security is a requirement, then no matter what you’re using, that’s going to impose an overhead on performance that has to be accounted for. There are cases where we’ve been able to avoid that overhead by building aggregated views in Tableau and then disallowing access to the underlying data and downloads of the visualization (also in Tableau).

Tableau has an open source tool called TabJolt for doing load testing on Tableau Server.  Here’s the link: http://www.tableau.com/about/blog/2015/4/introducing-tabjolt-point-and-run-load-testing-solution-tableau-server-38604

How well does Tableau perform when the tables from the data warehouse are several million records even when aggregated?

Fundamentally, Tableau isn’t going to be any faster querying the data warehouse than directly running the query on the data warehouse, so there’s no substituting for good data warehouse design. Beyond that, see my comments above on performance.

If you have a large dataset, does it take a long time to load?

This question came from someone also asking about QlikView, so I’ll do a comparison as best I can. Tableau only queries the source when necessary for the data needed for the current view (or dashboard). So there’s no time spent loading data into memory for the data to be accessible (which is QlikView’s historical model, I’m not sure of where QlikView is at today). Tableau also tries to combine queries where feasible, and uses a number of levels of caching to ensure that queries to the data source are minimized.

Also keep in mind that Tableau is generally issuing aggregate queries to the source. So if I’m querying a 1B row source to get data for 100 products * 4 quarters that’s only 400 records in the query results, most data sources will make this extremely fast.

With various clients, I have seen a strong correlation between Hadoop, R, Vertica, and Tableau.  Have you historically seen/encountered this technology set also?

I’ve seen it, and many other combinations as well. I’m a big fan of finding the right tool for the job and the staff (and not reinventing the wheel).

Using SQL to link to Tableau -  In order to get the reports I need, I have to layer a lot of tables...is this a good opportunity to use a piece of software like Trifacta or Alteryx?

I’m not sure what you mean by “layer a lot of tables”. I’m guessing that you are pre-processing and transforming data via scripts or stored procedures from one table into another and ultimately ending up with a Tableau-ready data set. In this case, I’d first suggest looking into taking advantage of Tableau’s aggregation and calculation features such as conditional Sets, Level of Detail expressions, and/or table calculations. There are cases I’ve seen where clients were pre-processing data where the calculations could have been done at run-time in Tableau with more flexibility. If those don’t meet your needs then tools like Alteryx and Trifacta are purpose-built for this kind of requirement.

Can Tableau support Write Function? What if you are comparing Actual Sales and Budget, and the user wants to edit Budget figures to see "what if" scenarios?

Tableau does not presently have native support for writing out to data sources, however, Tableau does support run-time user entry of parameters. So if you have some controls that you want to give to users to try out different scenarios, you can combine parameters and filters. Workarounds for write-back exist in a few different forms including using URL Actions from Tableau and Tableau’s Javascript API, among others.

Can you do something like: Find out lowest 4 Average? 

Yes, we can. We can do top & bottom filters in Tableau in several different ways. In addition, Tableau has a feature called “Sets” where we can build cohorts on the fly and combine them with point and click. In this workbook on Tableau Public I set up a couple of bottom 4 filters (one using a Level of Detail expression since the data is at the level of order item and I wanted to get the average of the sum of sales per order), turned them into Sets, and created a combined Set. Other than a tiny bit of typing (Tableau does tab-completion on calculations) for the calculations and typing for the labels, this was all point and click in Tableau.

Here’s a link to the Bottom 4 by Average Sales view on Tableau Public, you can click through the tabs to see other views: https://public.tableau.com/views/pragmaticworksq-n-a20161006/Bottom4byAvgSales?:embed=y&:display_count=yes

Can you combine Tree map and Heat map?

I’m not sure what is meant by this since tree maps have a specific layout and heat maps will use the mark size and/or color to indicate heat. If you mean coloring a treemap by another measure, then that’s one click from building a treemap, see this worksheet on Tableau Public: https://public.tableau.com/views/pragmaticworksq-n-a20161006/TreemapwithColor?:embed=y&:display_count=yes

How do you do a grouped/side-by-side bar chart?

There are a few different ways to do this depending on whether the grouped values are made by values of a field (i.e. different records) or different columns, and based on the kind of layout you want. See the Grouped Bars tabs starting with https://public.tableau.com/views/pragmaticworksq-n-a20161006/GroupedBarsdiscrete?:embed=y&:display_count=yes

Can you show what the different end user delivery options are with Tableau and what the end users will see/be able to manipulate?

There are several delivery options:

- Via web browser or Tableau’s iOS/Android apps where the visualizations are served up by Tableau Server, Tableau Online (which is Tableau available as SAAS), or even Tableau Public. Also note that besides the web interactive you can append .pdf, .csv, and .png to any Tableau Server or Tableau Online URL to get a static view of the Tableau viz.

- Email via Tableau’s build-in subscriptions or a third-party add-on.

- Via Tableau Reader, which is a free interactive read-only tool for opening Tableau workbooks similar to what Adobe Acrobat Reader does for PDFs. Users get essentially the same interactivity as they would get over the web.

- Via Tableau Desktop, which can download visualizations from Tableau.

- Embedded into a separate portal or application (see below for a more detailed answer on that).

In terms of what the users can see and manipulate, I suggest you check out http://public.tableau.com and look at the visualizations there, plus for your industry at http://www.tableau.com/stories, there are many examples there.

Can you embed Tableau reports in your own web app?

There are a couple of ways you can embed Tableau views and dashboards. The simplest (once you have authentication worked out) is to use the fact that Tableau views are accessed via URLs (and we can include URL parameters to change the values of parameters and filters in Tableau) and simply put an iframe or div in your web app that calls the appropriate URL. For more advanced interactivity, you can use the Tableau Javascript API.

What is the best format for dates when using Tableau?  Can it handle SAS dates (i.e. number of days since Jan 1 1960)?

There are a variety of ways to connect to SAS (via SAS ODBC, using a SAS .sas7bdat file, SAS writing out a CSV, etc.). I’ve only connected to SAS data sets that were exported to CSV (so an agreed-upon format was used such as my personal favorite, ISO 8601 dates https://en.wikipedia.org/wiki/ISO_8601 ) so I’m not sure how easy the other options are to use. I am 100% confident that SAS dates can be handled, for example if a SAS date comes across as an integer, then a Tableau calculated field using a formula, such as DATEADD(‘day’, [SASDate], #1960-01-01#), would convert the SASDate field to a Tableau date data type.

An anecdote: The Tableau developers once did an analysis of tens of thousands of workbooks published to Tableau Public and found over 200 different date & date time formats. If Tableau doesn’t natively recognize the dates as a date datatype, there are several conversion functions such as DATEPARSE(), DATE(), DATETIME(), MAKEDATE(), MAKEDATETIME(), and (referenced above) DATEADD() that we can use.

Is it possible to switch from SQL authentication to Windows authentication for the Tableau web reports?

Yes. There are two levels of authentication to think about when delivering Tableau visualizations over the web: the data source(s) and Tableau. Tableau can be configured to use Windows Active Directory or other authentication mechanisms, and there are several options for access to the data source(s). One way is to pass through authentication so, for example, once a user is authenticated in Windows AD, then they can connect to Tableau and seamlessly interact with the visualizations. Another way is to require authentication, and a third is to embed credentials, and finally, for some data sources, Tableau can be configured to use its own authentication instead of the user’s.