Storytelling with Charts. Part 4 (I): Do you want to show… | by Darío Weitz | Jun, 2023

Storytelling with Charts. Part 4 (I): Do you want to show… | by Darío Weitz | Jun, 2023

[ad_1]

Part 4 (I): Do you want to show composition?

Darío Weitz
Towards Data Science
Photo by Hiral Parikh on Unsplash

This is the fourth article in a series aimed at helping people to decide which type of chart to use according to the message they are trying to show to their particular audience.

The previous three articles focused on the following messages: Article 1, displaying the distribution of a single numerical variable; Article 2, showing the magnitude of a series of numbers; Article 3, comparing items.

The purpose of this article is to indicate which are the most commonly used charts when showing Composition. Remember that Composition relates to a Whole that can be divided into individual Parts and how each Part relates (absolutely or relatively) to that Whole. The analysis can be Static (shows composition at a moment in time) or Dynamic (shows changes in composition over time).

Charts frequently used for displaying composition are as follows:

· Pie Charts

· Stacked Bar Charts

· Stacked Area Charts

· Waterfall Charts

· Mekko Charts

· Treemaps

In this article we will concentrate on describing the following chart types: Pie Charts; Stacked Bar Charts; and Treemaps. In the following article, we will describe the remaining three.

Pie Charts (PCs) (Figure 1) are circular diagrams divided into wedged-like sectors used to display Parts of a Whole of mutually exclusive and not overlapping categories. The full circle represents the Whole while the wedges (slices, sectors, segments) represent the Parts. So, the full circle must represent the sum of all data and must consistently add up to 100%. Numerical data included in one slice must not be included in another slice because, as previously indicated, sectors must be mutually exclusive and overlapping is forbidden. Conceptually, they indicate a simple share of the Whole.

Fig.1: a pie chart made by the author with Plotly Express.

PCs encode numerical values through two visual markers: 1) the area of each sector; 2) the length of each sector across the perimeter of the circle. Unlike most other charts, the axis and scale of a pie chart are not linear.

It is not easy for human beings to visually calculate areas or distances along the perimeter of a curve. This is the main objection to this type of chart and the origin of an endless controversy: they are very simple to make, and audiences are accustomed to their use, but they are very difficult to interpret if they do not include annotations and percentages that clarify the context.

Sometimes, the message delivered by PCs can be enhanced using the following alternatives: A1) Donut Charts; A2) Segment Separation.

A1: Donut Charts (Figure 2), conceptually equivalent to pie charts, differ from them in that they have a blank space (like a hole) in the center of the diagram where some kind of additional information is displayed to enhance the storytelling.

Fig.2: a donut chart with an annotation made by the author with plotly.graph_objects.

The blank space in the center does not allow to make a comparison of areas, so donut charts have only one visual marker: numerical values of every sector are only encoded by means of arc lengths along the perimeter of the circle.

A2: Segment Separation, the message can be enhanced by pulling out or separating one segment (or a few) from the standard pie chart or the donut chart.

Fig.3: a donut chart with a segment pulled out. Made by the author with plotly.graph_objects.

Of course, there must be a well-founded reason to justify such a separation because, inevitably, the audience’s attention will be focused on that sector. In addition, there is a visual distortion that makes it difficult to make direct comparisons with other sectors.

Finally, Pie Charts only show composition at a moment in time (Static Composition). More details about PCs can be found in my previous article.

Stacked Bar Charts (SBCs) (Figure 4) are rectangular bars that can be oriented vertically (horizontally). They have two axes: one axis shows categories, and the other axis shows numerical values with its corresponding scale. Each bar represents a principal category and it is divided into rectangular sectors representing subcategories of a second categorical variable. The numerical value of each subcategory is shown by the height (length) of those rectangular segments that are stacked end to end vertically (horizontally). The final height (length) of each principal bar indicates the total amount of each category (except in 100 percent stacked bar charts).

Fig. 4: a simple stacked bar made by the author with Matplotlib.

There are two particular types of SBCs: 1) Simple Stacked Bars (Figure 4); 2) 100 Percent Stacked Bars (Figure 5).

Simple SBs place the absolute value of each subcategory over (after) the previous one whilst 100 Percent SBs place the percentage of each subcategory over (after) the previous one. Principal bars in Simple SBs habitually have different heights (lengths) whilst all the principal bars have the same height in 100 Percent SBs. You must use 100 Percent SBs when only relative differences matter while using Simple SBs when relative and absolute differences matter.

Fig. 5: a 100 Percent stacked bar made by the author with Matplotlib.

SBCs excel in showing composition changes over time (Dynamic Composition). For this type of dynamic analysis, it is essential to use stacked bars oriented vertically with the variable related to time (days, months, years, temporal ranges) always on the horizontal axis (Figure 6).

Fig. 6: a stacked bar chart made by the author with Matplotlib.

Caution should be exercised with the number of stacked sectors or when charting over long periods of time. It is advisable not to stack more than four or five sectors on each principal bar. The audience may also get confused when there are too many principal bars or more than three sectors for very long periods of time. Given this situation, our recommendation is to employ stacked area charts when you need to display a lot of temporal data and/or four or more sectors per principal bar.

More details can be found in my previous article.

This particular type of chart was invented by Ben Shneiderman, professor of Computer Science at the University of Maryland, when he was looking for “a compact visualization of directory tree structures” (#2).

In my own words: “A Treemap is a rectangle-based visualization that allows you to represent a hierarchically-ordered (tree-structured) set of data. The conceptual idea is to compare quantities and show patterns of some hierarchical structure in a physically restricted space. For that purpose, rectangles of different sizes and colors are used to display the dataset from different perspectives. The goal is not to indicate the exact numerical values but to ‘break’ the dataset into its constituent parts and quickly identify its larger and smaller components” (#3).

Fig.7: a Treemap made by the author with Plotly Express.

It was later found that they could be an alternative to pie charts showing a Part of a Whole relationship. As the area of every rectangle is directly proportional to the numerical value it represents, they began to be used to indicate relative proportions and differences between parts. The full rectangle area must represent the sum of all data. Treemaps only show composition at a moment in time (Static Composition).

Treemaps have two principal advantages against pie charts: 1) they can include ten or thousands of Parts in a scheme of nested rectangles in a relatively small space; 2) they code numerical values with areas, a better visual attribute than arc lengths along the perimeter of the circle.

You must always indicate numerical values with proper annotations because the absence of a common baseline seriously difficult the comparison between the rectangles that conform the parts.

Fig.8: a Treemap with annotations made by the author with Plotly Express.

More details can be found in my previous article.

Many times, we have to show Composition to our audience. This part to a whole analysis is not always simple to decode by our particular audience. Therefore, beforehand, we must analyze which methods we have and what are their advantages and disadvantages related to our data and our message.

As previously indicated, six different types of charts can be used to show composition: Pie Charts; Stacked Bar Charts; Treemaps; Stacked Area Charts; Mekko Charts; Waterfall Charts. Here, we described three of them, particularly their characteristics, advantages, and some precautions to be taken into account.

Stay tuned for the following article describing the remaining charts.

References

#1: https://serialmentor.com/dataviz/visualizing-proportions.html

#2 Ben Shneiderman (1992). “Tree visualization with tree-maps: 2-d space-filling approach”. ACM Transactions on Graphics. 11: 92–99. doi:10.1145/102377.115768.

#3 https://medium.com/towards-data-science/treemaps-why-and-how-cfb1e1c863e8

If you find this article of interest, please read any of my 55 previous: https://medium.com/@dar.wtz

[ad_2]
Source link

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *