Seaborn for data visualization in Python

by Alex
Seaborn for data visualization in Python

Seaborn is a library for making statistical plots in Python. It is built on top of matplotlib and tightly integrates with pandas data structures. Seaborn helps you explore and understand the data. Its graphing functions work with datasets and perform all the necessary transformations to create informative graphs. Its dataset-oriented syntax allows you to focus on the graphs rather than the details of their construction. The official documentation is in English: https://seaborn.pydata.org/index.html.

Installing seaborn

Official seaborn releases can be installed from PyPI:

pip install seaborn

The library is also part of the Anaconda distribution:

conda install seaborn

The library works with Python version 3.6+. If not already present, these libraries will be loaded when you install seaborn: numpy, scipy, pandas, matplotlib. Once you have installed seaborn, you can download and build a test chart for one of the built-in datasets:

import seaborn as sns
df = sns.load_dataset("penguins")
sns.pairplot(df, hue="species")

By running this code in Jupyter Notebook, you will see a graph like this. Seaborn для визуализации данных в Python If you are not working with Jupyter, you may need to call matplotlib.pyplot.show() explicitly:

import matplotlib.pyplot as plt
plt.show()

Let’s take a closer look at building popular types of graphs. All further code will be executed in the Jupyter Notebook

Drawing a Bar Plot in Seaborn

Histograms show numeric values on one axis and category variables on the other. They allow you to see the parameter values for each category. Histograms can be used to visualize time series as well as categorical data only.

Drawing a histogram

To draw a histogram in Seaborn, you need to call the barplot() function, and pass it the categorical and numeric variables you want to visualize, as in the example:

import matplotlib.pyplot as plt
import seaborn as sns
x = ['A', 'B', 'C']
y = [10, 50, 30]
sns.barplot(x=x, y=y);

In this case, we have several categorical variables in a list – A, B and C. And also continuous variables (numbers) in another list – 10, 50 and 30. The dependence between these two elements is visualized in a histogram, for which the two lists are passed to the function sns.barplot(). The result is a clear and simple histogram: Построение гистограмм в Seaborn More often than not, you will be working with datasets that contain much more data than the one in the example. Sometimes you need to sort to these datasets, or count how many times this or that value is repeated. When you work with data, you may encounter errors and omissions in the data. Fortunately, Seaborn protects us and automatically applies a filter that is based on calculating the average of the data provided. Let’s import the classic Titanic dataset and visualize the Bar Plot with this data:

# Import data
titanic_dataset = sns.load_dataset("titanic")
# Plotting the bar graph
sns.barplot(x="sex", y="survived", data=titanic_dataset;)

In this case, we have assigned "sex" and "survived" columns to the X and Y axes, instead of the hard-coded ones. If we display the first lines of the dataset(titanic_dataset.head()), we see a table like this:

   survived pclass sex age sibsp parch fare ...
0 0 3 male 22.0 1 0 7.2500 ...
1 1 1 female 38.0 1 0 71.2833 ...
2 1 3 female 26.0 0 0 7.9250 ...
3 1 1 female 35.0 1 0 53.1000 ...
4 0 3 male 35.0 0 0 8.0500 ...

Make sure that the column names are the same as the ones you assigned to the x and y variables. Finally, we use this data and pass it as argument to the function we’re working with. And we get this result: датасет Titanic и Bar Plot

Drawing a Horizontal Histogram

To draw a horizontal histogram instead of a vertical one, just swap the variables passed to x and y. In this case, the categorical variable will be displayed along the y-axis, which leads to the construction of a horizontal graph:

x = ['A', 'B', 'C']
y = [10, 50, 30]
sns.barplot(x=y, y=x);

The graph will look like this: Построение горизонтальной гистограммы

How to change the color in barplot()

Changing the color of the bars is easy. To do this, you have to set the color parameter of the barplot function and then the color of all the columns will change to the color you specify. Let’s change it to blue:

x = ['A', 'B', 'C']
y = [10, 50, 30]
sns.barplot(x=x, y=y, color='blue');

Then the graph will look like this: Как изменить цвет в barplot() Or, better yet, set the pallete argument, which can take a large number of colors. A fairly common value for this parameter is hls:

sns.barplot(
x="embark_town",
y='survived',
palette='hls',
data=titanic_dataset
);

Which will lead to this result: Как изменить цвет в bar plot

Bar Plot Grouping in Seaborn

Often you want to group columns in plots by a single attribute. Let’s say you want to compare some general data, passenger survival rates, and group them according to given criteria. We may want to visualize the number of surviving passengers according to class (first, second, and third), but also take into account the city from which they arrived. All this information can be easily displayed on a bar graph. To group the columns together, we use the hue argument. This argument groups the corresponding data and tells the Seaborn library how to color the columns. Let’s look at the example we just discussed:

sns.barplot(x="class", y="survived", hue="embark_town", data=titanic_dataset);

We get a graph like this: Группировка Bar Plot в Seaborn

Configuring the order in which the groups of columns are displayed on the histogram

You can change the default order of the columns. This is done with the order argument, which takes a list of values and the order in which they are placed. For example, so far it has ordered classes one through three. What if we want to do the opposite?

sns.barplot(
x="class",
y="survived",
hue="embark_town",
order=["Third", "Second", "First"],
data=titanic_dataset
);

We would get a schedule like this: Настройка порядка групп столбцов на гистограмме

Change the confidence interval in barplot()

You can also experiment with the confidence interval by specifying the ci argument. For example, you can disable it by setting it to None, or use the standard deviation instead of the mean by setting sd, or even set the upper limit on the error bars by setting capsize. Let’s experiment a bit with the confidence interval attribute:

sns.barplot(
x="class",
y="survived",
hue="embark_town",
ci=None,
data=titanic_dataset
);

We get this result: Изменяем доверительный интервал в barplot() Or we can use the standard deviation:

sns.barplot(
x="class",
y="survived",
hue="who",
ci="sd",
capsize=0.1,
data=titanic_dataset
);

Использование стандартного отклонения We have looked at several ways to build a histogram in Seaborn using examples. Now let’s move on to heatmaps.

Building Heatmaps in Seaborn

Let’s see how we can work with the Seaborn library in Python to create a basic correlation heatmap. For our purposes, we will use the Ames housing dataset available at Kaggle.com. It contains over 30 metrics that can potentially affect property values. Since Seaborn was written based on the Matplotlib data visualization library, they are fairly easy to use together. Therefore, in addition to the standard modules, we are also going to import Matplotlib.pyplot.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

The following code creates a correlation matrix between all studied indicators and our variable y (real estate value).

dataframe.corr()

матрица корреляции A correlation matrix with only 13 variables. That’s not to say it’s not readable at all. But why not make life easier with visualization?

A simple heat map in Seaborn

sns.heatmap(dataframe.corr());

Простая тепловая карта в SeabornAbout as beautiful as it is useless. Seaborn is easy to use, but rather difficult to navigate. The library comes with many built-in features and extensive documentation. It can be hard to figure out which arguments to use if you don’t want every possible trick. Let’s make a basic heatmap more useful with a minimum of effort. Take a look at the list of heatmap arguments:

seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt='.2g', annot_kws=None, linewidths=0, linecolor='white', cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels='auto', yticklabels='auto', mask=None, ax=None, **kwargs)
  • vmin, vmax – sets the range of values that is the basis of a color map (colormap).
  • cmap – defines the particular colormap we want to use (check the full range of color palettes here).
  • center – takes a real number to center the color map; if cmap is not specified the default colormap is used; if set to True all colors are replaced by blue.
  • annot – if set to True, numeric correlation values are displayed inside the cells.
  • cbar – if set to False, the color bar (serving as a legend) disappears.
# increase size
heatmap plt.figure(figsize=(16, 6))
# Save the heatmap object in a variable so you can access it easily,
# when you want to include additional features (like a header display).
# Set the range of values to display on the color map to -1 to 1 and set the annotation (annot) to True,
# to display the numeric correlation values on the heatmap.
heatmap = sns.heatmap(dataframe.corr(), vmin=-1, vmax=1, annot=True)
# Give the heatmap a name. The padding parameter defines the distance of the title from the top of the heatmap.
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12);

Простая тепловая карта в Seaborn The best way to work with heatmap is to use a divergent color palette. It has two very different dark (saturated) colors at the corresponding ends of the interpolated value range with a pale, almost colorless midpoint. Let’s illustrate this statement and deal with one more small detail: how to save the created heatmap to a png file with all the necessary x and y labels(xticklabels and yticklabels).

plt.figure(figsize=(16, 6))
heatmap = sns.heatmap(dataframe.corr(), vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':18}, pad=12);
# Save the map as a png file
# The dpi parameter sets the resolution of the saved image in dots per inch
# bbox_inches, when set to 'tight', does not allow labels to be cropped
plt.savefig('heatmap.png', dpi=300, bbox_inches='tight')

Тепловая картаThe stronger correlation at both ends of the spectrum appears as dark (saturated) cells, the weaker one as light cells.

Triangular thermal correlation map

Take a look at any of the above heat maps. If you discard one of its halves along the diagonal marked with units, you won’t lose any information. So, let’s reduce the heat map, leaving only the bottom triangle. The mask argument of the heatmap comes in handy to hide part of the heatmap. The mask takes as its argument an array of boolean values or a table data structure(dataframe). If it is provided, heatmap cells for which the mask value is True are not displayed. Let’s use the np.triu() function of the numpy library to isolate the upper triangle of the matrix, turning all values in the lower triangle into 0. np.tril() will do the same, only for the lower triangle. In turn, np.ones_like() will change all isolated values to 1.

np.triu(np.ones_like(dataframe.corr()))

Seaborn для визуализации данных в PythonWhen we convert the data type to logical, all 1s will turn True and all 0s will turn False.

plt.figure(figsize=(16, 6))
# Define a mask to set the values in the top triangle to True
mask = np.triu(np.ones_like(dataframe.corr(), dtype=np.bool))
heatmap = sns.heatmap(dataframe.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16);

Треугольная тепловая карта корреляции

Correlation of Independent Variables with Dependent Variables

Quite often we want to create a colored map that shows the severity of the relationship between each independent variable included in our model and the dependent variable. The following code returns the correlation of each parameter with “selling price,” the only dependent variable, in descending order.

dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False)

Let’s use the resulting list as data to display on the heat map.

plt.figure(figsize=(8, 12))
heatmap = sns.heatmap(dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False), vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Features Correlating with Sales Price', fontdict={'fontsize':18}, pad=16);

Корреляция независимых переменных с зависимойFeatures Correlation with the selling price These examples demonstrate the basic functionality of the heatmap in Seaborn. Now let’s move on to scatter plots.

Building a Scatter Plot in Seaborn

Let’s look at the process of creating a Scatter Plot in Seaborn. We will build simple and three-dimensional scatter plots, as well as group plots based on FacetGrid.

Importing Data

We will use a dataset based on world happiness. Comparing its index to other indicators will reflect the factors that influence the level of happiness in the world.

Drawing a dot plot

Let’s show on the chart the correlation of the happiness index to the country economy (GDP per capita):


dataframe = pd.read_csv('2016.csv')
sns.scatterplot(data=dataframe, x="Economy (GDP per Capita)", y="Happiness Score");

With Seaborn, it is very easy to make simple graphs like scatter plots. We don’t have to use a Figure object and Axes instances or customize anything. Here we passed the dataframe as an argument with the data, and the signs with the information we want to visualize in x and y. The axes of the diagram are signed by default with column names that correspond to headers from the loaded file. Below we look at how to change this. After executing the code, we get the following: Построение точечной диаграммы в Seaborn The result showed a direct correlation between GDP per capita and the estimated level of happiness of the inhabitants of a particular country or region.

Constructing a group of scatterplots using FacetGrid

If you want to compare many variables with each other, for example, average life expectancy along with estimates of happiness and level of economy, there is no need to build a 3D graph. While two-dimensional charts exist to visualize the relationship between sets of variables, not all of them are easy to use. With the FacetGrid object, the Seaborn library allows you to process the data and build group correlated graphs from it. Let’s take a look at the following example:

grid = sns.FacetGrid(dataframe, col="Region", hue="Region", col_wrap=5)
grid.map(sns.scatterplot, "Economy (GDP per Capita)", "Health (Life Expectancy)")
grid.add_legend();

Построение группы графиков scatterplot при помощи FacetGrid In this example, we created an instance of the FacetGrid object with the dataframe parameter as the data. When we pass the "Region" value to the col argument, the library groups the dataset into regions and builds a scatterplot for each of them. The hue parameter sets each region to its own hue. Finally, the col_wrap argument limits the width of the Figure region to 5 charts. Once this limit is reached, the next graphs will be plotted on a new line. We use the map() method to prepare the grid before displaying it on the screen. The diagram type is passed in the first argument with the value sns.scatterplot, and the x and y variables serve as axes. The result will be 10 plots for each region with their corresponding axes. Immediately before printing, we call the method that adds a legend with the color labeling.

Drawing a 3D scatter plot

Unfortunately, Seaborn lacks its own 3D engine. Being only an add-on to Matplotlib, it relies on the graphical capabilities of the main library. Nevertheless, we can still apply the Seaborn style to a 3D diagram. Let’s see how it would look with sampling by levels of happiness, economy, and health:

%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
df = pd.read_csv('Downloads/2016.csv')
fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')
x = df['Happiness Score']
y = df['Economy (GDP per Capita)']
z = df['Health (Life Expectancy)']
ax.set_xlabel('Happiness')
ax.set_ylabel('Economy')
ax.set_zlabel("Health")
ax.scatter(x, y, z)
plt.show()

Running the code will produce an interactive 3D visualization that can be rotated and scaled in three dimensions: Seaborn для визуализации данных в Python

Customizing the Scatter Plot

With Seaborn you can easily customize the various elements of the created diagrams. For example, it is possible to change the color and size of each point on the chart. Let’s try to set some parameters and see how its appearance changes:

sns.scatterplot(
data=dataframe,
x="Economy (GDP per Capita)",
y="Happiness Score",
hue="Region",
size="Freedom"
);

Here we have applied a tint to the regions – this means that the data for each of them will be colored differently. In addition, using the size argument we have set the proportions of points depending on the level of freedom. The larger the value, the larger the dot on the diagram: Настройка Scatter Plot Or you can simply set the same color and size for all points:

sns.scatterplot(
data=dataframe,
x="Economy (GDP per Capita)",
y="Happiness Score,
color="red",
sizes=5
);

Great, you’ve learned a few ways to build a scatter plot in Seaborn. Let’s move on to another popular plot.

Drawing a Box Plot in Seaborn

Box Plot, also called:

  • by rectangle graphs,
  • box plots,
  • spread charts
  • or whisker boxes for their appearance.

They are used to visualize summary statistics of a dataset. Box Plot display distribution attributes such as range and distribution of data within a range (rectangle, “whiskers”, median).

Data Import

You need continuous numeric data to create a box plot because such a plot displays summary statistics – median, range, and outliers. For example, let’s use forestfires.csv data set (information about forest litter moisture index, precipitation, temperature, wind, etc.). We import pandas to load and analyze the dataset, seaborn and the pyplot module from matplotlib for visualization:

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

Let’s use pandas to read the CSV file in the dataframe and output the first 5 lines. We also check if the dataset contains missing values(Null, NaN):

# specify your path to the file
dataframe = pd.read_csv("Downloads/forestfires.csv")
print(dataframe.isnull().values.any())
dataframe.head()

The code will return False and the top of the table. XYmonthdayFFMCDMCDCISItempRHwindrainarea075marfri86.226.294.35.18.2516.70.00.0174octtue90.635.4669.16.718.0330.90.00.0274octsat90.643.7686.96.714.6331.30.00.0386marfri91.733.377.59.08.3974.00.20.0486marsun89.351.3102.29.611.4991.80.00.0 Print printed False, so there are no missing values. If there were, we would have to additionally process the missing values. After checking the data, we need to select the signs that will be visualized. For convenience, we will store them in variables with the same names.

FFMC = dataframe["FFMC"]
DMC = dataframe["DMC"]
DC = dataframe["DC"]
RH = dataframe["RH"]
ISI = dataframe["ISI"]
temp = dataframe["temp"]

These are the columns that contain continuous numeric data.

The box plot

We will use the boxplot function in Seaborn to create a diagram, passing the variables for the visualization as arguments:

sns.boxplot(x=DMC);

To visualize the distribution of only one attribute, we pass it to the x variable. In this case, Seaborn will automatically calculate the values on the y-axis, as seen in the following image. Построение box plot If you want a particular distribution segmented by type, you can pass a categorical variable to x and a continuous variable to y as arguments for the boxplot function.

sns.boxplot(x=dataframe["day"], y=DMC);

Now we have a block diagram created for each day of the week. Построение категориального box plot If you want to visualize several columns at the same time, the x and y arguments will not be enough. For this purpose, the data argument is used, to which a data set containing the required variables and their values is passed. Create a new dataset containing only the data we want to visualize. Then apply the melt() function to it. The resulting dataset is passed to the data argument. The default values from melt (value and variable) are passed to the x and y arguments in this case:

df = pd.DataFrame(data=dataframe, columns=["FFMC", "DMC", "DC", "ISI"])
sns.boxplot(x="variable", y="value", data=pd.melt(df))

Визуализация нескольких столбцов одновременно

Changing the color of the boxplot

Seaborn automatically assigns different colors to different variables so that you can easily distinguish them visually. You can change the color of the charts by providing your color list. After defining a list of colors as HEX values or names of available Matplotlib colors, you can pass them to boxplot() as a palette argument:

colors = ['#78C850', '#F08030', '#6890F0', '#F8D030', '#F85888', '#705898', '#98D8D8']
sns.boxplot(x=DMC, y=dataframe["day"], palette=colors);

Изменение цвета boxplot

Configuring Axis Signatures

With Seaborn, you can easily customize the X- and Y-axis captions. For example, change the font size, captions, or rotate them to make them easier to read.

df = pd.DataFrame(data=dataframe, columns=['FFMC', 'DMC', 'DC', 'ISI'])
boxplot = sns.boxplot(x="variable", y="value", data=pd.melt(df))
boxplot.axes.set_title("Forest fire indicator distribution", fontsize=16)
boxplot.set_xlabel("Indicators", fontsize=14)
boxplot.set_ylabel("Values", fontsize=14);

Настройка подписи осей

Changing the order in which blocks are displayed

To display block diagrams in a certain order, use the order argument, which is passed a list of column names in the order in which you want them to be arranged:

df = pd.DataFrame(data=dataframe, columns=["FFMC", "DMC", "DC", "ISI"])
boxplot = sns.boxplot(x="variable", y="value", data=pd.melt(df), order=["DC", "DMC", "FFMC", "ISI"])
boxplot.axes.set_title("Forest fire distribution", fontsize=16)
boxplot.set_xlabel("Indicators", fontsize=14)
boxplot.set_ylabel("Values", fontsize=14);

Изменение порядка отображения блоков

Creating subplots with Matplotlib

If you want to divide a common box plot into several for individual features, you can do so. Define the drawing area(fig) and the desired number of coordinate axes(axes) using the subplots function from Matplotlib. The desired area of the axes object can be accessed through its index. The boxplot() function takes an ax argument which, by the index of the axes object, gets the area to be plotted:

fig, axes = plt.subplots(1, 2)
sns.boxplot(x=day, y=DMC, orient='v', ax=axes[0])
sns.boxplot(x=day, y=DC, orient='v', ax=axes[1]);

Создание subplots с помощью Matplotlib

Box Plot with Scatter Plot

For a clearer view of the distribution, you can overlay the boxplot on the boxplot. To this end, we create two diagrams one after the other. The diagram created by stripplot() will be superimposed on top of the box plot, since they are displayed in the same area:

df = pd.DataFrame(data=dataframe, columns=["DC", "DMC"])
boxplot = sns.boxplot(x="variable", y="value", data=pd.melt(df), order=["DC", "DMC"])
boxplot = sns.striplot(x="variable", y="value", data=pd.melt(df), marker="o", alpha=0.3, color="black", order=["DC", "DMC"])
boxplot.axes.set_title("Forest fire distribution", fontsize=16)
boxplot.set_xlabel("Indicators", fontsize=14)
boxplot.set_ylabel("Values", fontsize=14);

Box Plot с диаграммой рассеивания We looked at several ways to build a Box Plot using Seaborn and Python. We also learned how to set up colors, axis captions, diagram order, overlaying point diagrams, and separating diagrams for individual values. The last type of chart worth mentioning is the Violin Plot.

Building a Violin Plot in Seaborn

Violin Plot or violin plots are used to visualize the distribution of data by displaying the data range, median, and area of data distribution. Such plots, like whisker boxes, show summary statistics. Additionally, they include distribution density plots, which is what determines the shape/distribution of the data when visualized.

Importing Data

For our example, let’s use the Gapminder dataset, which contains information on population, life expectancy, and other data by country and year since 1952. We import pandas, seaborn, and the pyplot module from matplotlib:


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Next, we load the dataset and see what it consists of.


dataframe = pd.read_csv(
"Downloads/gapminder_full.csv",
error_bad_lines=False,
encoding="ISO-8859-1"
)
dataframe.head()

The result is: countryyearpopulationcontinentlife_expgdp_cap0Afghanistan19528425333Asia28.801779.4453141Afghanistan19579240934Asia30.332820.8530302Afghanistan196210267083Asia31.997853.1007103Afghanistan196711537966Asia34.020836.1971384Afghanistan197213079460Asia36.088739.981106 Let’s define the features that we are going to visualize. For convenience, we store them in variables with the same names.


country = dataframe.country
continent = dataframe.continent
population = dataframe.population
life_exp = dataframe.life_exp
gdp_cap = dataframe.gdp_cap

Drawing a simple violin diagram

Now that we have loaded the data and chosen the values we want to visualize, we can create a violin diagram. Let’s use the function violinplot(), to which we pass the variable for visualization as argument x. The y-axis values will be calculated automatically.


sns.violinplot(x=life_exp);

Построение простой скрипичной диаграммы I note that you could not pre-select the data by the column name and not save it in the variable life_exp. Using the argument data, to which our dataset is passed, and the argument x, to which the variable name “life_exp” is assigned, we get exactly the same result.


sns.violinplot(x="life_exp", data=dataframe);

Note that in this image, Seaborn plots the distribution of life expectancy for all countries at once, because only one life_exp variable was used. In most cases this type of variable is considered based on other variables, such as country or continent in our case.

Construction of the Violin Plot with the X and Y axes

In order to get a visualization of data distribution segmented by type, you must use a categorical variable for x and a continuous variable for y as function arguments. There are many countries in this data set. If you plotted all the countries, there would be too many to consider. You could, of course, select a subset of the dataset and just plot, say, 10 countries. Instead, let’s build a violinplot for the continents.


sns.violinplot(x=continent, y=life_exp, data=dataframe);

Построение Violin Plot с осями X и Y

Changing the Chart Header Axis Caption

Suppose we need to change some of the headers and captions of our chart to make it easier to analyze it. Although Seaborn automatically captures the X and Y axes, you can change the captions with the set_title() and set_label() functions after creating the axes object. We just have to pass the name we want to give to our graph to set_title(). To sign the axes, use the set() function with the arguments xlabel and ylabel or the wrapper functions set_xlabel()/set_ylabel():


ax = sns.violinplot(x=continent, y=life_exp)
ax.set_title("Life expectancy by continent")
ax.set_ylabel("Life expectancy")
ax.set_xlabel("Continent");

Изменение подписи осей заголовка диаграммы

Changing the color of a chart

To change the color of diagrams, you can create a list of pre-selected colors and pass this list with the pallete parameter to violinplot():


colors_list = [
'#78C850', '#F08030', '#6890F0',
'#A8B820', '#F8D030', '#E0C068',
'#C03028', '#F85888', '#98D8D8'
]
ax = sns.violinplot(x=continent, y=life_exp, palette=colors_list)
ax.set_title("Life expectancy by continent")
ax.set_ylabel("Life expectancy")
ax.set_xlabel("Continent");

Изменение цвета violinplot

Violin Plot with Scatter Diagram

A scatterplot can be overlaid on a violin diagram to see the placement of the points that make up that distribution. To do this, simply create one drawing area, and then create two diagrams in it in sequence.


colors_list = [
'#78C850', '#F08030', '#6890F0',
'#A8B820', '#F8D030', '#E0C068',
'#C03028', '#F85888', '#98D8D8'
]
plt.figure(figsize=(16,8))
sns.violinplot(x=continent, y=life_exp,palette=colors_list)
sns.swarmplot(x=continent, y=life_exp, color="k", alpha=0.8)
plt.title("Life expectancy by continent")
plt.ylabel("Life expectancy")
plt.xlabel("Continent");

Violin Plot с диаграммой рассеивания

Changing the style of the violin diagram

You can easily change the style and color of our diagram using set_style() and set_palette(), respectively. Seaborn supports several different options for changing the style and color palette of charts:


plt.figure(figsize=(16,8))
sns.set_palette("RdBu")
sns.set_style("darkgrid")
sns.violinplot(x=continent, y=life_exp)
sns.swarmplot(x=continent, y=life_exp, color="k", alpha=0.8)
plt.title("Life expectancy by continent")
plt.ylabel("Life expectancy")
plt.xlabel("Continent");

Изменение стиля скрипичной диаграммы

Constructing a Violin Plot for Different Traits

If you want to separate the visualization of the columns from the dataset into their own plots, you can do so. Create a drawing area and a grid with plots in the cells. The add_subplot() function, which is passed the address of this cell, is used to display the diagram in the corresponding cell. Create the diagram as usual, using the axes object. You can use y=variable, or data=variable.


fig = plt.figure(figsize=(6, 4))
gs = fig.add_gridspec(1, 3)
ax = fig.add_subplot(gs[0, 0])
sns.violinplot(data=population)
ax.set_xlabel("Population")
ax = fig.add_subplot(gs[0, 1])
sns.violinplot(data=life_exp)
ax.set_xlabel("Life span")
ax = fig.add_subplot(gs[0, 2])
sns.violinplot(data=gdp_cap)
ax.set_xlabel("GDP volume")
fig.tight_layout()

Seaborn для визуализации данных в Python

Grouping violinplots by categorical variable

A really useful thing for violinplot is grouping by categorical variable values. For example, if there is a categorical variable that has two values (usually True/False), then you can group the charts by those values. Suppose there is a population employment dataset with an employment column and its values employed and unemployed. Then you can group the graphs by type of employment. Since there is no column in the Gapminder dataset suitable for such grouping, it can be done by calculating the average life expectancy for a certain subset of countries, for example, European countries. We assign a Yes/No value to the new above_average_life_exp column for each country. If the average life expectancy is higher than the dataset average, then the value is Yes, and vice versa:


# Separate the European countries from the original dataset
europe = dataframe.loc[dataframe["continent"] == "Europe"]
# Calculate the average value of "life_exp" variable
avg_life_exp = dataframe["life_exp"].mean()
# Add new column
europe.loc[:, "above_average_life_exp"] = europe["life_exp"] > avg_life_exp
europe["above_average_life_exp"].replace(
{True: "Yes", False: "No"},
inplace=True
)

Now, if we output our dataset, we get the following: countryyearpopulationcontinentlife_expgdp_capabove_average_life_exp12Albania19521282697Europe55.231601.056136No13Albania19571476505Europe59.281942.284244No14Albania19621728137Europe64.822312.888958Yes15Albania19671984060Europe66.222760.196931Yes16Albania19722263554Europe67.693313.422188Yes We can now plot the fiddle charts, grouped by the new column we inserted. Given that there are many European countries, let’s select the last 50 rows using europe.tail() for easy visualization:


europe = europe.tail(50)
ax = sns.violinplot(x=europe.country, y=europe.life_exp, hue=europe.above_average_life_exp)
ax.set_title("life expectancy by country")
ax.set_ylabel("Life expectancy")
ax.set_xlabel("Countries");

The result will be: Группировка скрипичных диаграмм по категориальному признаку Now the countries with life expectancy less than the average, expected life expectancy differ by color.

Separating violinplots by categorical variable

If you use the hue argument for a categorical variable that has two values, then by applying the split argument in violinplot() and setting it to True, you can split violinplots in half with the hue value. In our case, one side of the violin (the left side) will represent records with above-average life expectancy, while the right side will be used to plot below-average life expectancy:


europe = europe.tail(50)
ax = sns.sninplot(
x=europe.country,
y=europe.life_exp,
hue=europe.above_average_life_exp,
split=True
)
ax.set_title("Life expectancy by country")
ax.set_ylabel("Life expectancy")
ax.set_xlabel("Countries");

Seaborn для визуализации данных в Python We have looked at several ways to construct a Violin Plot in Seaborn. This is the last type of plots worth paying attention to. In this article, we looked at examples of plotting:

  • Bar Plot
  • Scatter Plot
  • Box Plot
  • Heatmap
  • Violin Plot

Related Posts

LEAVE A COMMENT