Get dummies python как работает

A dataset may contain various type of values, sometimes it consists of categorical values. So, in-order to use those categorical value for programming efficiently we create dummy variables. A dummy variable is a binary variable that indicates whether a separate categorical variable takes on a specific value.
Explanation:
As you can see three dummy variables are created for the three categorical values of the temperature attribute. We can create dummy variables in python using get_dummies() method.
Pandas: Data Manipulation — get_dummies() function
The get_dummies() function is used to convert categorical variable into dummy/indicator variables.
Syntax:
Parameters:
| Name | Description | Type | Default Value | Required / Optional |
|---|---|---|---|---|
| data | Data of which to get dummy indicators. | array-like, Series, or DataFrame | Required | |
| prefix | String to append DataFrame column names. | str, list of str, or dict of str | Default: None | Optional |
| prefix_sep | If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix. | str | Default: ‘_’ | Optional |
| dummy_na | Add a column to indicate NaNs, if False NaNs are ignored. | bool | Default: False | Optional |
| columns | Column names in the DataFrame to be encoded. If columns is None then all the columns with object or category dtype will be converted. | list-like | Default: None | Optional |
| sparse | Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False) | bool | Default: False | Optional |
| drop_first | Whether to get k-1 dummies out of k categorical levels by removing the first level. | bool | Default: False | Optional |
| dtype | Data type for new columns. Only a single dtype is allowed. | dtype | Default: np.uint8 | Optional |
Returns: DataFrame — Dummy-coded data.
Example:
Download the Pandas DataFrame Notebooks from here.
Follow us on Facebook and Twitter for latest update.
- Weekly Trends
We are closing our Disqus commenting system for some maintenanace issues. You may write to us at reach[at]yahoo[dot]com or visit us at Facebook
Get Dummy Variables for a column in Pandas: pandas.get_dummies()

Do you want to convert the categorical variable to the dummy variable? If yes then this post is for you. Here you will know how to get dummy variables for a column in pandas using the pandas get_dummies method.
Syntax of Pandas get_dummies method
Before going to the demonstration part let’s learn the syntax of the method.
The explanation of the most used parameters is below.
data: Your input dataframe or a column of it.
prefix: String to append before the name of columns of the dataframe.
prefix_sep: Use to add custom words separator. The default value is “_”.
dummy_na: Use to ignore or consider the NaN value in a column. The default value is False.
columns: On which column you want to encode. If it is None then the encoding will be done on all columns.
sparse: Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
drop_first: Use it to get k-1 dummies out of k categorical levels by removing the first level.
dtype: Define the type of the column. Only a single dtype is allowed.
In the next section, you will know the steps to implement pandas get_dummies() method
Step to implement Pandas get_dummies method
Step 1: Import the necessary libraries.
Here I am using two python modules one is pandas for dataframe creation. And the other module is NumPy for creating NaN values. So let’s import them.
Step 2: Create a Sample Dataframe.
Let’s create a dataframe to implement the pandas get_dummies() function in python. You can use your own dataset but for the sake of simplicity, I am creating a very simple dataframe. Use the below code to create it.

Output Sample Datafrme for implementing the get_dummies method
Step 3: Get Dummy Variables for Dataframe using pandas get_dummies()
Now let’s apply the get_dummies() method and convert categorical values into dummy variables. You will know each example.
Example 1: Finding Dummy Variables For Whole Dataframe.
To find a dummy variable for the whole dataframe, you have to just pass the dataframe and it will create it.
Output

Finding Dummy Variables For the Whole Dataframe
You can see each categorical value has been converted to a dummy variable.
Example 2: Finding Dummy Variables For a Single Column
Suppose I want to create a dummy variable for a single column. To do so you have to pass that column as an argument. For example, I want to create dummy variables for the col1, then I will execute the following code.

Output Finding Dummy Variables For a Single Column
Example 3: Dummy variables with NaN value
In this example, I will explain how to include the NaN value and ignore it while the creation of the dummy variable. But before that let’s add NaN value.
Run the below code.

Output Sample Dataftame with NaN value
Now let’s create the Dummy variable on col1 with the additional parameter dummy_na=True. It will also consider NaN as the category variable.
Output

get_dummies() implementation of Dataframe with NaN
If you want to ignore NaN then use dummy_na= False.

Output get_dummies() implementation of Dataframe with dummy_na= False
Example 4: Dropping the First Categorical Variable
Suppose I want to ignore the first variable then you will use drop_first=True as an additional argument. It will remove the first categorical variable and convert dataframe to dummy variables using the remaining variables.
Run the code and see the output.

Output Dropping the First Categorical Variable
Conclusion
The pandas get_dummies() method allows you to convert the categorical variable to dummy variables. It is also known as hot encoding. And this feature is very useful in making good machine learning models. These are the examples I have compiled for you for deep understanding. Even if you have any queries then you can contact us.
- Total 0
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.
We respect your privacy and take protecting it seriously
Thank you for signup. A Confirmation Email has been sent to your Email Address.
Mastering Pandas get_dummies(): A Guide for Python Users

Data analytics has gone a long distance in quite a short time. With the technology advancing strength after strength in the field of computation and automation, new techniques have emerged to pump up the efficiency with which the data analysis is being carried out. This article shall focus on one such function from the pandas library of Python – the get_dummies( ) function. So, let us get started by importing this library using the below code.
Thereafter, we shall explore further the get_dummies( ) function through each of the following sections.
- Why use a dummy variable?
- Syntax of theget_dummies( ) function
- Use cases for theget_dummies( )function
Why use a dummy variable?
Those familiar with machine learning know, how numerical things can get. Numbers are always better to analyze than case-sensitive alphabets; bring in the tildes & all goes swoosh! So, the dummy variables might be a savior in that case.
They work like a charm when it comes to machine learning algorithms such as regression which strictly deal with numbers. Have no belief? Try feeding in some textual data into your linear regression and witness the montage of errors being thrown at, the very moment the code is run!
Syntax of the get_dummies() function
Dummy variables ease the treacherous task of data cleaning by assigning a numerical value to the categorical data of the given dataframe. Following is the syntax of the get_dummies( ) function detailing the fundamental constituents required for its proper functioning.
- data –Categorical dataframe that is to be converted into dummy variables
- prefix –An optional component set to ‘None’ by default and is used to assign column names to the dummy variable dataframe
- prefix_sep –An optional component set to ‘_’ by default and is used to differentiate the categorical entry from the column name in the dummy variable dataframe
- dummy_na – An optional component set to ‘False’ by default and is used to add a column to indicate the positions where there are zeros in every column of the dummy variable dataframe
- columns –An optional component set to ‘None’ by default and is used to encode the column names in the input categorical dataframe before conversion into dummy variables
- sparse –An optional component set to ‘False’ by default and is set to ‘True’ if the dummy encoded columns are to be backed by a sparse array rather than a numpy array
- drop_first –An optional component set to ‘False’ by default and is set to ‘True’ if the first level from the input categorical data is to be removed while converting to dummy variables
- dtype –An optional component set to ‘None’ by default and is used to specify the data type for the new columns of dummy variables
Use cases for the get_dummies() function
In this section, we shall demonstrate the use of a handful of components within the get_dummies( ) function with the following dataframe.

Input Dataframe
We shall use only the Region column from the above dataframe for conversion into dummy variables.

Values For Region
Once done, let us try running it through the get_dummies( ) function with its default setting.

Dummy Variable Dataframe With Default Settings
Now let us deploy some of the components within the get_dummies( ) function to do the following,
- Assign a prefix ‘option’ with ‘-‘ as a separator
- Create an additional column to indicate the locations where values are not available
- Remove the first level of categorical data
- Return all dummy variables as ‘float’ data type
All the above-listed requirements when translated into a code become the ones given below.

Dummy Variable Dataframe After Custom Settings
Since first-level categorical data is removed, entries with Africa have vanished into thin air whilst the rest of the changes are presumed to be self-explanatory.
Conclusion
Now that we have reached the end of this article, hope it has elaborated on how to use the get_dummies( ) function from the pandas library. Here’s another article that details the usage of the from_dummies ( ) function from the pandas library in Python. There are numerous other enjoyable and equally informative articles in AskPython that might be of great help to those who are looking to level up in Python. Ciao!