Category: SAS
Connecting Tableau to SAS Datasets via the SAS OLE DB Provider
One of the topics which generated a lot of interest during my presentation on SAS/Tableau Integration at the recent ‘TCC13’ Tableau Conference, was the idea of connecting to SAS datasets via the SAS OLE-DB Provider. While SAS themselves refused to allow such a presentation at their own customer conference…, it was unexpected as much as rewarding to find an audience of around 150 people at a Tableau conference interested in SAS connectivity!
Currently, Tibco Spotfire is the only 3rd party Data Discovery tool I’m aware of, which natively supports OLE-DB connectivity to SAS datasets. The benefit of OLE-DB is that it does not require access to SAS Server in order to connect to a SAS dataset. This is important, because often SAS datasets are stored locally or on a network share. SAS ODBC requires SAS Server connectivity, which in turn requires SAS client/server connectivity (eg SAS Integration Technologies)
A workaround is to connect to SAS datasets via OLE-DB using Excel as a ‘gateway’. Since the OLE-DB connection definition dialog is fully exposed within Excel, the connection details to the SAS dataset can be set up and tested. Then Tableau can be pointed to the Excel file, through which the SAS data can be retrieved.
Since the OLE-DB connection provides a way to automate the refresh of the SAS data from within Excel, this method can help ensure that the Tableau workbook is kept up-to-date as the underlying SAS data changes.
To follow are the steps to set up the Ole-DB connectivity.
When the Tableau workbook is saved, the SAS data can be refreshed by periodic refresh of the Excel connection. This can be scheduled to ensure that Tableau is retrieving the most up-to-date information from SAS.
Here is a video demonstrating the process described above (sorry, no audio):
Setting up SAS dates for input into Tableau Desktop
Some techniques for setting up integration between the SAS and Tableau solutions, specifically with respect to date formats.
SAS and Tableau are ‘best of breed’ in their own areas – SAS in the area of Analytics and ‘Analytical Data Preparation’, Tableau in the area of data visualization and interactive dashboarding in an intuitive, drag & drop environment. Consequently, it makes sense to find ways to integrate these technologies to provide an Integrated Information Framework which leverages the strengths of both solutions.
Leveraging SAS capabilities in this way provides a way to ‘rapid prototype’ business reporting requirements, without the costs and delays typically seen when attempting to model emerging business requirements in the Data Warehouse via traditional ETL methods.
In addition, this approach suggests a way to protect the investment in analytical reporting as developed by the SAS team, by providing a platform to publish those reports for easy consumption, plus easy re-formatting and ‘slice & dice’ of these reports in the Tableau environment
Example
Typically, the easiest way to prepare SAS data for consumption in Tableau is to set up an export process in SAS Enterprise Guide:
Using the SAS Enterprise Guide Export wizard, a SAS dataset can be exported as CSV, as a step in the process. Dataset is exported to a network location. SAS process looks like this, and can be set up to run on a schedule eg monthly.
In this example, we have a number of dates in our SAS dataset:
Often, the default date format as set up in SAS is adequate to be imported into, and correctly interpreted by, Tableau.
Where this is not the case, the SAS date format ‘DDMMYYS8.’ can be used:
Which would produce SAS code similar to the following:
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_POLICY1 AS
SELECT t1.trandate FORMAT=DDMMYYS8.,
t1.polexpdt FORMAT=DDMMYYS8.,
t1.commdate FORMAT=DDMMYYS8.
FROM WORK.QUERY_FOR_POLICY t1
QUIT;
On importing the dataset into Tableau, the dates are correctly interpreted, automatically providing the ability to drill from Year to Quarter to Month:
Note: While it is generally easier to prepare the data in the most appropriate format for Tableau using the extensive SAS’ data functions and formatting options, there are also date functions within Tableau. For example, to convert a text (string) field containing dates in the format YYYYMMDD into a date within Tableau, the following could be set up as a calculated field:
Date(right([PERIOD],2) + “/” + mid([PERIOD],5,2) + “/” + left([PERIOD],4))
Summary
In addition to the approach described above, another way to ‘prepare’ SAS data for consumption in Tableau involves using the SAS ‘PROC SQL’ method to output SAS results directly to a relational table. With the SAS/ACCESS interface, database objects can be referenced directly in a DATA step or SAS procedure using the SAS LIBNAME statement. PROC SQL can be used to update, delete or insert data into a relational table, for example via Bulk Load.
A/NZ Business Intelligence Survey
Recently, I attended the announcement of the Longhaus Pulse Research on BI and Analytics for 2011. Longhaus Pulse is the most comprehensive and regionally focused assessment of the Business Intelligence and Analytics market in Australia.
One area I found interesting was Longhaus Managing Director Peter Carr‘s observations around the evolution of the Business Intelligence market. Having worked in the BI industry for almost 20 years, this is something I can definitely relate to. Peter compared the current exciting trends around “Next Generation BI” to the prevalent trends of the 1980’s, 1990’s and 2000’s. Right now, Peter observed, trends around “Cloud” and Social Media are “putting the power back into the hands of the knowledge workers”. It’s certainly interesting that the “Next Generation BI” trends are all around Data Visualization, End User Self-Service, and Mobility (Mobile access to information). In other words, empowering the business user. (Funny, I thought we already had that in the 90’s…). Unfortunately, the 2000s became about “BI Standardization”, Portals and the like which were all heavily IT-centric. In the process, BI vendors perhaps lost sight of what BI was all about. Enabling users to get access to the Right Information, at the Right Time, to make Better Decisions. Longhaus talk about the “Digital Divide”, with BI technology outpacing end-user capability/need (just how many ‘Studios’ do I actually need to get access to my information???)
The consequence? Spreadsheets. And more spreadsheets. Spreadsheets everywhere.
The problem with this? Disparate data silos. Poor data integrity. Lack of an audit trail. Lost productivity. Inability to make timely, fact-based decisions.
Which brings me to my recent survey of BI trends in A/NZ. Covering all industries and geographies across A/NZ, I received 230 responses (7% response rate). What became clear from the results was that BI, although assumed to be mature and pervasive, had not solved the Reporting problem in most organisations. For example:
- Disparate Systems and Poor Data Quality are still views as the biggest challenged faced when relying on data to drive performance.
- 90% of respondents state that spreadsheets are still being used as part of the query & reporting process
- More than 1 in 4 spend 5 or more days each month preparing regular reports (ie one week of every month)
- Over 60% rely on others or have no ability to create their own ad-hoc reports
Most surprising of all, 40% are currently evaluating (or re-evaluating) their BI vendors, with a further 26% planning to do so within the next 12 months.
We need to heed the message of frustrated users, and focus on empowering the Knowledge Workers to solve complex business problems. Intuitively. And fast.
For anyone interested in my full survey results report, I can be contacted via email.
Your mailbox is almost full
Less than 2 months after I joined SAS Institute, I was somewhat surprised to see the following email in my inbox:
Your mailbox is almost full.
Your mailbox is approaching the allocated quota limit or has already reached it! Failure to act now may prevent you from sending, and in some cases, receiving any mail.
Conversely, my free Gmail account which I’d had for around 5 years was telling me:
“You are currently using 1229 MB (16%) of your 7491 MB”
In fact, since I started using email back in the mid-90s, I can’t recall ever having such a small email quota as 195Mb. So this got me to thinking…
195 megabytes = 0.1904296875 gigabytes
= 0.0001859664 terabytes
I think SAS has around 11,000 employees, so:
11,000 X 195 megabytes = 2,145,000 megabytes
= 2,094.7265625 gigabytes
= 2.0456314086 terabytes
So, if we were to see an increase in the allocated mailbox quota per employee to a more typical/common-sense 1Gb, this would give:
11,000 X 1 gigabytes = 11,000 gigabytes
= 10.7421875 terabytes
Now, I recently purchased a Seagate Blackarmor 6TB RAID NAS drive for my home network (cost around USD$730 on eBay). This has 4 hot-swappable 1.5TB Seagate drives, which in a RAID-5 configuration gives me around 6TB of useable storage. So, for purposes of storing SAS employee mailboxes, a 16TB NAS drive configured for RAID5 should give at least 12TB effective available storage. For example, I found the following on Amazon:
QNAP 16-Terabyte (16TB) Turbo NAS TS-809U-RP 8-Bay High Performance RAID 0/1/5/6/JBOD Network Attached Storage Server with iSCSI & Dual Redundant Power for Business – Powered by Western Digital WD20EADS 2TB 32MB Cache 7200RPM SATA/300 Hard Drive
This was listed at USD$4,595, which works out to 42c per employee..
Contrast that to the current scenario:
If we estimate a conservative 30 minutes per week wasted archiving emails @ an average employee cost to SAS of $50 per hour per employee = $275,000 per week lost productivity.
Let’s say we replace the network storage drive every year (call that preventative maintenance..) so:
As they say, “Do the Math”…
Footnote:
SAS Institute have subsequently increased the mailbox size limits for employees
Connect
Connect with us on the following social media platforms.