Wow! The TC16 sessions are already available on tclive.tableau.com. Be sure to check them out, especially the sessions from DataBlick.
For this year’s conference I undertook a project with Keith Helfrich to harvest tweets tagged with #data16. We collected the tweets regularly throughout the week, and updated a view of high level summaries and detailed network visualizations. This post details some of the highs and lows that we came across, and provides access to the workbook so you can do your own analysis and review as well. Please also be sure to check out Keith’s post on the same subject here.
Overall stats obtained from Twitter (as of this post):
- Tweets harvested: 22,081
- Mentions harvested: 32,227
- Tweeters harvested: 3,800
- Topics included: about 28
Note: we harvested a number of times to test how the search API (and our python code) was responding to our queries. We still have some open ended questions on that. But for now we are publishing the data we could access and providing the caveat that the dataset is likely incomplete.
Ronald Sujithan was the main contributor as he authored the python code. And with the caveat above about completeness of the data set using Twitter’s free / public API, we had a pretty solid process for harvesting tweets. Keith was able to run the process during sessions throughout the day and mostly pay attention to the conference.
Pipeline & Data Prep:
Keith and I met a couple of times before conference to put our work together and automate the pipeline as much as possible. It was ultimately semi-automated, which we ran a few times during the first few days and then again before this post to offer a final version of the workbook.
This end-to-end process involved four analytics tool sets: Python > Alteryx > R > Tableau. This worked pretty well, and we did also encounter a few errors which we’d not seen before. Troubleshooting required some attention, but we addressed small issues during conference without too much stress.
At this point we believe the data set is representative of the TC16 tweet activity, though perhaps occasionally incomplete. And, as such, it should still provide insights into conference.
A few insights:
As expected, there was a ton of tweet activity, spikes of activity during key events and @tableau was included in about ¼ of the tweets. Viewing the network of over 22k tweets is extremely dense and requires time and interactivity to parse out meaning.
With some adjustments to opacity and overall size/layout we can see the spread of connections with @tableau (dense/dark band starting from top white (e.g., Vendor) node.
One of my favorite topics. It is both awesome and also not surprising to see the volume of people that @databrit, @datachloe & several others were able to reach during conference on this topic. Of course, these Twitter relationships are a subset of the people attending, discussing and supporting #datapluswomen at #data16. I’m thrilled to see this much activity and hope for even more next year.
We can see the same happening here with the two Andy’s and #makeovermonday. We can also see that the topic of the live session was related to restaurants in Austin via the word cloud.
Francois Ajenstat tweeted about this once and it got a ton of activity (e.g., retweets). Also, we see some activity related to the TabPy session in this view. The dense areas in the Hive Plot correspond to the central nodes in the clusters of the network graph.
If you didn’t attend #data16 you may or may not know that Domo tried to crash the party and brought Flo Rida and Snoop Dogg along to help them in their efforts. We can see this, and we can also see that domo received minimal engagement (on Twitter at least) from the community, especially when we look at vendors/partners and Tableau Zen Masters/Social Ambassadors. Is this an indication that people were loyal to Tableau? You tell me…
Crazy and fun how much traction this got this year!
As expected, Keith and I are central to tweets about our little project, this is the result of us trying to get it out there and in front of people. Most of the activity appears to be related to that effort and not a ton of discussion is going out within the wider community itself, this is evidenced in the hive plot by little to no interactivity on the blue node axis and dense sections coming from outer most orange nodes (Keith and I). Having said that, we did get a small bump from my mention of the project during my #jedicharts talk with Adam McCann Wednesday afternoon.
We hope that you enjoy digging around in the data and looking at your own network graphs in the view embedded below.