Before you start working on a data project, you need to look at the data set. Exploratory Data Analysis (EDA) is a very important step, because the data can be confusing and a lot can go wrong in the process. The Pandas library has several functions to solve this problem, and `value_counts`

is one of them. It returns an object containing unique values from the Pandas dataframe in sorted order. However, many people forget about this feature and use the default parameters. In this piece, let’s see how to get the most out of `value_counts`

by changing the default parameters.

Table of Contents

## What is value_counts()?

The `value_counts()`

function is used to get `a Series`

containing unique values. It will return the result sorted in descending order, so that the first item in the collection is the most encountered. NA-values are not included in the result.

**Syntax **`df['your_column'].value_counts()`

– will return the number of unique matches in a particular column.

It is important to note that `value_counts`

only works with series, not dataframe. So you need to specify single square brackets `df['your_column']`

rather than a pair of `df[['your_column']]`

**Parameters**

`normalize`

(bool, False by default) – if`True`

, the returned object will contain values relative to the frequency of encountered values.`sort`

(bool, True by default) – sorting by frequency.`ascending`

(bool, False by default) – ascending sorting.`bins`

(int) – groups the values together into segments, but this only works with numeric data.- dropna (bool, defaults to True) – does not include NaN counts.

## Load data for demonstration

Let’s see how to use this method on real data. Let’s take a dataset from a Coursera course on Kaggle as an example. First, we import the required libraries and the data itself. This is needed in any project. After that, we analyze the data in the Jupyter notebook.

```
# library import
import pandas as pd
# Loading data
df = pd.read_csv('Downloads/coursea_data.csv', index_col=0)
# check csv data
df.head(10)
```

Check how many records are in the dataset and if we have any gaps.

```
df.info()
```

The result shows that there are 981 records in the dataset, and no NA.

```
<class 'pandas.core.frame.DataFrame'>
```

Int64Index: 891 entries, 134 to 163

Data columns (total 6 columns):

# Column Non-Null Count Dtype

— —— ————– —–

0 course_title 891 non-null object

1 course_organization 891 nonnull object

2 course_Certificate_type 891 non-null object

3 course_rating 891 non-null float64

4 course_difficulty 891 non-null object

5 course_students_enrolled 891 non-null object

dtypes: float64(1), object(5)

memory usage: 48.7+ KB

## 1. value_counts with default parameters

`value_counts`

. Let’s start with a basic use of the function.**Syntax**: `df['your_column'].value_counts()`

. Get the amount of each value for the “course_difficulty” column. The `value_counts`

function will return the number of matches of all unique values for the given index without gaps. This will allow you to see that the most courses with a difficulty level are “Beginner”, followed by “Intermediate” and “Mixed”. And “Difficult” is in last place.

**df['course_difficulty'].value_counts()
**---------------------------------------------------
Beginner 487
Intermediate 198
Mixed 187
Advanced 19
Name: course_difficulty, dtype: int64

Now it’s time to work with the parameters.

## 2. Sorting in ascending order

`value_counts()`

returns data in descending order. You can change the behavior by setting the `ascending`

parameter `to True`

.**Syntax:** `df['your_column'].value_counts(ascending=True)`

.

**df['course_difficulty'].value_counts(ascending=True)**
---------------------------------------------------
Advanced 19
Mixed 187
Intermediate 198
Beginner 487
Name: course_difficulty, dtype: int64

## 3. Alphabetical sorting

In certain cases, there may be a need to sort records alphabetically. This is done by adding `sort_index(ascending=True)`

after `value_counts()`

. By default, the function sorts “course_difficulty” by the number of matches, but with `sort_index`

it sorts by the index (the name of the column for which the function is used):

**df['course_difficulty'].value_counts().sort_index(ascending=True)
**---------------------------------------------------
Advanced 19
Beginner 487
Intermediate 198
Mixed 187
Name: course_difficulty, dtype: int64

If, however, you want to display `value_counts()`

in reverse alphabetical order, you need to reverse the sort direction: `.sort_index(ascending=False)`

.

## 4. Sort by value and then alphabetically

We use a different dataset for this example.

```
df_fruit = pd.DataFrame({
'fruit':
['persimmons']*5 + ['apples']*5 + ['bananas']*3 +
['peaches']*3 + ['carrots']*3 + ['apricots'] + ['mango']*2
})
```

So, we need to get the output sorted first by the number of matching values, and then alphabetically. This can be done by combining `value_counts()`

with `sort_index(ascending=False)`

and `sort_values(ascending=False)`

.

**df_fruit['fruit'].value_counts()
.sort_index(ascending=False)
.sort_values(ascending=False)**
-------------------------------------------------
persimmon 5
apples 5
bananas 3
carrots 3
peaches 3
mango 2
apricots 1
Name: fruit, dtype: int64

## 5. Relative frequency of unique values

Sometimes you need to get relative values, not just numbers. With the parameter `normalize=True`

, the object will return the relative frequency of unique values. The default value of this parameter is `False`

. **Syntax: **`df['your_column'].value_counts(normalize=True)`

.

**df['course_difficulty'].value_counts(normalize=True)
**-------------------------------------------------
Beginner 0.546577
Intermediate 0.222222
Mixed 0.209877
Advanced 0.021324
Name: course_difficulty, dtype: float64

## 6. value_counts() to break the data into discrete intervals

Another trick that often gets overlooked. `value_counts()`

can be used to break up data into discrete intervals using the `bin`

parameter. This only works with numeric data. The principle is similar to `pd.cut`

. Let’s see how it works using the “course_rating” column as an example. Let’s group the column values into 4 groups. **The syntax:** `df['your_column'].value_counts(bin=number of groups)`

.

**df['course_rating'].value_counts(bins=4)
**-------------------------------------------------
(4.575, 5.0] 745
(4.15, 4.575] 139
(3.725, 4.15] 5
(3.297, 3.725] 2
Name: course_rating, dtype: int64

Binnig makes it easy to get insights. So, you can see that most people rate the course a 4.5. And only a few courses are rated below 4.15.

## 7. value_counts() with omissions

By default, the number of NaN values is not included in the result. But you can change this behavior by setting the `dropna`

parameter to `False`

. Since there are no zero values in the dataset, this will not affect anything in this example. But the parameter itself should be remembered. **Syntax: **`df['your_column'].value_counts(dropna=False)`

.

## 8. value_counts() as dataframe

As noted, `value_counts()`

returns a Series, not a Dataframe. If, however, you want to get results in the latter form, you can use `.to_frame()`

after `.`

. *value_counts()***Syntax: **`df['your_column'].value_counts().to_frame()`

. This will look like this:

If you want to name a column or rename an existing one, this conversion can be done in a different way.

```
value_counts = df['course_difficulty'].value_counts()
# conversion to df and giving new names to the columns
df_value_counts = pd.DataFrame(value_counts)
df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts for course_difficulty']
df_value_counts
```

## Groupby and value_counts

Groupby is a very popular method in Pandas. It can be used to group the result of one column and calculate the values in another. **Syntax: **`df.groupby('your_column_1')['your_column_2'].value_counts()`

. So, with `groupby`

and `value_counts`

you can calculate the number of certificate types for each level of course difficulty.

**df.groupby('course_difficulty')['course_Certificate_type'].value_counts()
**-------------------------------------------------
course_difficulty course_Certificate_type
Advanced SPECIALIZATION 10
COURSE 9
Beginner COURSE 282
SPECIALIZATION 196
PROFESSIONAL CERTIFICATE 9
Intermediate COURSE 104
SPECIALIZATION 91
PROFESSIONAL CERTIFICATE 3
Mixed COURSE 187
Name: course_Certificate_type, dtype: int64

This is a multi-index, allowing you to have multiple levels of indexes in the dataframe. In this case, the course complexity corresponds to index level zero, and the certificate type corresponds to index level one.

## Filtering values by minimum and maximum

Working with a dataset, you may need to return the number of limited occurrences using `value_counts()`

. **Syntax:** `df['your_column'].value_counts().loc[lambda x : x > 1]`

. This code will filter out all unique data values and show only those where the value is greater than one. For the example, let’s limit the course rating to a value of 4.

**df.groupby('course_difficulty')['coudf['course_rating']**
**.value_counts().loc[lambda x: x > 4]**
-------------------------------------------------
4.8 256
4.7 251
4.6 168
4.5 80
4.9 68
4.4 34
4.3 15
4.2 10
Name: course_rating, dtype: int64

value_counts() is a handy tool that allows you to do convenient one-line analysis.