2. Plot Types¶
2.1. More Plotting Examples¶
Alright, let’s now take a shot at creating some of the plots we just learned about.
I’m going to be using the street-trees.csv
file as a data source available in the Google drive here.
2.1.1. Pie Chart¶
Let’s say we want to see the proportion of trees planted with root barriers to trees planted without root barriers.
First, make a new worksheet.
Step by Step Instructions
1. Drag the Root Barrier
field to the Color icon.
2. Drag the Tree Id
field to the Area icon.
3. Convert the Tree Id
field to a Count Measure by right-clicking it and selecting from the dropdown menu.
4. Click on the Show Me menu on the top right side of the workspace.
5. Select the pie chart icon.
And voila! We baked made a pie chart 🥧!
2.1.2. Stacked Bars¶
A pie chart may not be the best visualization to see the proportion of trees planted with root barriers to trees planted without root barriers.
Let’s see how it looks like as a stacked Bar chart.
Make a new worksheet or clear the current sheet you are on.
Step by Step Instructions
1. Drag the Tree Id
field to the Rows shelf.
2.Convert this field to a Count Measure by right-clicking and selecting the appropriate measure.
3. We can now add the stacking part of this bar chart by dragging Root Barrier
to the Color icon in the Marks card.
4. Let’s transpose this graph so it’s a little clearer by clicking the Swap Rows and Columns icon in the toolbar.
Great! Here is our stacked bar chart!
2.1.3. Side-by-Side Bars¶
Still maybe not the right plot for this question. Let’s go with a barplot with the categories side-by-side.
Step by Step Instructions
1. Drag the Root Barrier
field to the Columns shelf.
2. Drag the Tree Id
field to the Rows shelf again.
You may have to indicate that want to Add All Members
3. Convert the Tree Id
field to a Count Measure by right-clicking and selecting the appropriate measure.
4. Let’s add a little bit of colour to this plot. This isn’t a necessary step, however, we are doing this for consistency to compare to the last three charts.
Now that we’ve done that, which plot out of the pie, stacked bars and side-by-side plot do you most prefer?
2.1.4. Scatter Plot¶
With this particular data source, we don’t really have 2 good continuous numeric columns. To demonstrate how to make a scatter plot, we are going to use what we have and make the best of it.
Let’s plot and see if there is a relationship between the diameter of the trees’ trunks and their height.
Step by Step Instructions
1. First, let’s drag the Height Range Id
column to the Columns shelf.
2. Let’s make sure this is a continuous field and convert it by right clicking and selecting Continuous from the drop down.
3. Next drag the Diameter
Measure to the Rows shelf.
4. We need to make sure this Diameter
field becomes a continuous Dimension as well, which we can do by right clicking and selecting it from the drop-down.
Great!
2.1.5. Line Graph¶
We are now interested in answering the question How many trees were planted over the years?
Before you start, let’s make a new worksheet.
Step by Step Instructions
1. Drag the Date Planted
field to the Columns shelve and the tree Id
field to the Rows shelf.
2.\ We are again interested in the number of trees planted at selected dates so once again, we want to transform this field to a Count Measure.
3. Since Date Planted
is a continuous variable, it’s a good idea to right-click and transform this field into a Continuous Dimension.
4. This automatically generates the number of trees planted each year (but there are null values!)
4. We can change the YEAR(Date Planted)
field to:
MONTH(Date Panted)
(top month choice when right-clicking) - which aggregates months together for all years.
MONTH(Date Panted)
(Bottom month choice when right-clicking) - which will make a sequential plot.
We are going to stick with the year dimension though!
5. We can add a circle for clarity at each year as part of our line graph by dragging a second Tree Id
field to the Rows shelf.
Warning
You may get a popup warning when you do this where I specify Add All Members since we are converting it to a COUNT measure after this.
6. We need to make sure we also convert it to a Count
measure.
At first, we should get 2 graphs on top of each other.
7. We can right-click one of them and select “Dual Axis”.
This will superimpose one on another with a left and a right axis title.
8. We can hide the one on the right by right-clicking the axis and unticking the “Show Header” option.
9. In the Marks card, select the `CNT(Tree Id)(2), and from the dropdown, select circle.
Now we have a line plot with points!
10. To change the colour of the line and the points, we need to make sure we change the colour of both measures by selecting the “All” tab under the “Marks” card on the right.
11. Don’t forget to give it a title and edit the y-axis label as we did before!.
2.1.6. Histograms¶
Let’s now start practicing making distributions. Tableau doesn’t easily facilitate density plots, so we are going to stick with learning how to make histograms.
Perhaps we are interested in the distribution of tree trunk diameter length. Remember histograms are used to visualize the distribution of a numeric continuous variable.
Step by Step Instructions
1. First, drag the Diameter
Measure to the Columns shelf
2. You can then go to the Show Me menu and click on the Histogram option. Tableau will then assign the correct measures to the shelves and cards.
3. And there you have a histogram! Now, this already seems a little problematic because we didn’t choose the bin size and it’s clear that our distribution is skewed.
It might also be helpful to see this distribution shape without the outliers on the far right and with different bin size.
The majority of the data looks like it’s between 0-50 so let’s make the bin size 2 and limit the axis to 0-50.
4. You’ll notice that Tableau’s been kind and has made us a new continuous dimension named Diameter (bin)
. Right-click on this new field and click on Edit from the dropdown menus.
This is where we are going to change the bin size.
5. This will result in a popup window where we can change the size of the bins. Let’s go ahead and change it to 2. Remember bin size can cause bias in your plots so be careful when choosing this value. Click OK.
6. Now we can see that our bars are a lot thinner (If only exercising was this easy).
7. Let’s fix the axis range now. You’ll not have to do this often but for this particular problem and question, removing the outliers could give us a bit of a clearer distribution shape.
Right-click the axis we want to limit and from the dropdown click Edit Axis….
8. From the popup, select a Fixed Range and Fixed end at 50 for this plot.
Great!
9. We Are going to go one step forward and change the tick mark intervals too. Click on the Tick Marks option at the top of the popup window.
10. We can decrease the tick interval to 2 to help make our bar values easier to identify.
And we did it!
We now can see that the majority of trees in Vancouver have a diameter between 2 and 3 cms. We also see that it’s very skewed to the right.
2.1.7. Boxplot¶
Although there is an option to make boxplots using the Show Me menu, I find that it can often plot things differently than how I want them to. These are the steps I generally take.
Suppose that we want to see if the difference between the distributions of trunk diameter between trees planted with root barriers and without root barriers.
Step by Step Instructions
1. Begin by dragging Root Barrier
to the Columns shelf.
2. Next, you’ll want to drag the Diameter
field to the Rows shelf. You’ll have a beautiful bar plot now measuring the sum of all the trees diameters for each barrier type.
3. Since we want individual observations for each tree (somewhat), we need to convert the Diameter
column to a dimension.
4. Let’s change the mark. Convert the mark from Automatic to Circle.
This will produce a circle for each tree now.
5. This is where we make the box part of our boxplot! Right-click on the axis with the continuous variable - in our case, that’s Diameter
. Select the Add Reference Line option.
6. When we select this option, a popup with many different option tabs displays. We want the Boxplot tab!
7. Here we want to “Hide the underlying marks (except outliers)”. The reason we are hiding them, in this case, is because we have THOUSANDS of observations! If our dataset was smaller, it might be a good idea to show all the underlying marks.
8. We can also change the colour of the box Fill to a green palette which goes nicely with our tree theme.
We can now leave this popup screen by clicking OK.
9. Ok, so our outlying observations are rather large right now. Let’s decrease the size.
Ahh, that’s a bit cleaner.
10. We can also change the points to a green colour to go with the rest of the plot. This can be done by clicking the Color icon.
11. This is a completed boxplot! One thing you can do to get a better idea of the distributions is to transpose them.
Ahh, beautiful!
Tip!
When you have multiple boxplots and you want to sort them in some order, using the sorting buttons in the toolbar won’t quite sort them properly or may not sort them how you are intending them to.
The best way to sort your boxplots to some criteria is as follows:
1. Click on the dimension field - here it’s our Root Barrier
column and from the dropdown select Sort…
2. This will produce a popup window where we selected a Nested option to sort our data by.
3. We can select if we want the field to be sorted in Ascending or Descending order, choose a field name (Diameter
for us) and then choose an Aggregation. We are going to be selecting Median which is the center line of our boxes in the boxplot.
2.1.8. Heatmap¶
Let’s see what the joint distribution is for the presence of a curb and if the tree has root barriers or not.
This will need a heat map or a heat map with a size channel. let’s explore the former first.
Step by Step Instructions
1. Drag the Root Barrier
to the Columns shelf- here we will first drag the Root Barrier
column.
2. We then can drag our second discrete dimension to the Rows shelf. We will drag the Curb
column.
3. To add a count field, we will drag the Tree Id
to the Detail icon in the Marks card. As we have done before, we “Add all members” when prompted by the popup.
4. We now transform this field to a Count Measure by right-clicking and selecting it from the drop-down.
5. Although we already have square marks, let solidify it and convert the Automatic mark to a Square mark. This is to make sure nothing is transformed when we add additional fields to our graph.
6. We can include a value in each quadrant by dragging the Tree Id
to the Label mark.
7. We then must convert it to a Count Measure by right-clicking and selecting it accordingly.
Nice!
2.1.9. Heatmap with Size Channel¶
If we also want to include an area channel in the plot, we can continue from the steps of the heatmap.
1. Here we will add all the counts of the trees by dragging the Tree Id
to the Size icon.
2. As we have seen many times before we transform the dimension to a Count Measure by clicking and selecting from the dropdown menu.
9. The labels seem to ruin the esthetics of this plot, so let’s remove this from the plot by right-clicking and selecting Remove.
That’s better. Nice job!
2.2. Quick Quiz¶
True or False: Sorting a boxplot can be done by using the sort buttons on the toolbar.
True or False: Histograms can be made with a click from the Show Me window.
What column type are the fields used in the Columns and Rows shelf for scatter plots- Continuous or Discrete?
Which of the following fields acts as a hierarchy by default
Row Id
,Date Issued
,Gender
,Latitude
?What mark shape is needed for a heatmap?
Solutions!
False
True
Continuous
Date Issued
Square