Get dummies python как работает
Перейти к содержимому

Get dummies python как работает

  • автор:

Get dummies python как работает

A dataset may contain various type of values, sometimes it consists of categorical values. So, in-order to use those categorical value for programming efficiently we create dummy variables. A dummy variable is a binary variable that indicates whether a separate categorical variable takes on a specific value.

Explanation:

As you can see three dummy variables are created for the three categorical values of the temperature attribute. We can create dummy variables in python using get_dummies() method.

Pandas: Data Manipulation — get_dummies() function

The get_dummies() function is used to convert categorical variable into dummy/indicator variables.

Syntax:

Parameters:

Name Description Type Default Value Required / Optional
data Data of which to get dummy indicators. array-like, Series, or DataFrame Required
prefix String to append DataFrame column names. str, list of str, or dict of str Default: None Optional
prefix_sep If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix. str Default: ‘_’ Optional
dummy_na Add a column to indicate NaNs, if False NaNs are ignored. bool Default: False Optional
columns Column names in the DataFrame to be encoded. If columns is None then all the columns with object or category dtype will be converted. list-like Default: None Optional
sparse Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False) bool Default: False Optional
drop_first Whether to get k-1 dummies out of k categorical levels by removing the first level. bool Default: False Optional
dtype Data type for new columns. Only a single dtype is allowed. dtype Default: np.uint8 Optional

Returns: DataFrame — Dummy-coded data.

Example:

Download the Pandas DataFrame Notebooks from here.

Follow us on Facebook and Twitter for latest update.

  • Weekly Trends

We are closing our Disqus commenting system for some maintenanace issues. You may write to us at reach[at]yahoo[dot]com or visit us at Facebook

Get Dummy Variables for a column in Pandas: pandas.get_dummies()

Get Dummy Variables for a column in Pandas using pandas.get_dummies()

Do you want to convert the categorical variable to the dummy variable? If yes then this post is for you. Here you will know how to get dummy variables for a column in pandas using the pandas get_dummies method.

Syntax of Pandas get_dummies method

Before going to the demonstration part let’s learn the syntax of the method.

The explanation of the most used parameters is below.

data: Your input dataframe or a column of it.

prefix: String to append before the name of columns of the dataframe.

prefix_sep: Use to add custom words separator. The default value is “_”.

dummy_na: Use to ignore or consider the NaN value in a column. The default value is False.

columns: On which column you want to encode. If it is None then the encoding will be done on all columns.

sparse: Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).

drop_first: Use it to get k-1 dummies out of k categorical levels by removing the first level.

dtype: Define the type of the column. Only a single dtype is allowed.

In the next section, you will know the steps to implement pandas get_dummies() method

Step to implement Pandas get_dummies method

Step 1: Import the necessary libraries.

Here I am using two python modules one is pandas for dataframe creation. And the other module is NumPy for creating NaN values. So let’s import them.

Step 2: Create a Sample Dataframe.

Let’s create a dataframe to implement the pandas get_dummies() function in python. You can use your own dataset but for the sake of simplicity, I am creating a very simple dataframe. Use the below code to create it.

Sample Datafrme for implementing the get_dummies method

Output Sample Datafrme for implementing the get_dummies method

Step 3: Get Dummy Variables for Dataframe using pandas get_dummies()

Now let’s apply the get_dummies() method and convert categorical values into dummy variables. You will know each example.

Example 1: Finding Dummy Variables For Whole Dataframe.

To find a dummy variable for the whole dataframe, you have to just pass the dataframe and it will create it.

Output

Finding Dummy Variables For Whole Dataframe

Finding Dummy Variables For the Whole Dataframe

You can see each categorical value has been converted to a dummy variable.

Example 2: Finding Dummy Variables For a Single Column

Suppose I want to create a dummy variable for a single column. To do so you have to pass that column as an argument. For example, I want to create dummy variables for the col1, then I will execute the following code.

Finding Dummy Variables For a Single Column

Output Finding Dummy Variables For a Single Column

Example 3: Dummy variables with NaN value

In this example, I will explain how to include the NaN value and ignore it while the creation of the dummy variable. But before that let’s add NaN value.

Run the below code.

Sample Dataftame with NaN value

Output Sample Dataftame with NaN value

Now let’s create the Dummy variable on col1 with the additional parameter dummy_na=True. It will also consider NaN as the category variable.

Output

get_dummies() implementation of Dataframe with NaN

get_dummies() implementation of Dataframe with NaN

If you want to ignore NaN then use dummy_na= False.

get_dummies() implementation of Dataframe with dummy_na= False

Output get_dummies() implementation of Dataframe with dummy_na= False

Example 4: Dropping the First Categorical Variable

Suppose I want to ignore the first variable then you will use drop_first=True as an additional argument. It will remove the first categorical variable and convert dataframe to dummy variables using the remaining variables.

Run the code and see the output.

Dropping the First Categorical Variable

Output Dropping the First Categorical Variable

Conclusion

The pandas get_dummies() method allows you to convert the categorical variable to dummy variables. It is also known as hot encoding. And this feature is very useful in making good machine learning models. These are the examples I have compiled for you for deep understanding. Even if you have any queries then you can contact us.

  • Total 0
Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

We respect your privacy and take protecting it seriously

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Mastering Pandas get_dummies(): A Guide for Python Users

Numpy Get Dummies

Data analytics has gone a long distance in quite a short time. With the technology advancing strength after strength in the field of computation and automation, new techniques have emerged to pump up the efficiency with which the data analysis is being carried out. This article shall focus on one such function from the pandas library of Python – the get_dummies( ) function. So, let us get started by importing this library using the below code.

Thereafter, we shall explore further the get_dummies( ) function through each of the following sections.

  • Why use a dummy variable?
  • Syntax of theget_dummies( ) function
  • Use cases for theget_dummies( )function

Why use a dummy variable?

Those familiar with machine learning know, how numerical things can get. Numbers are always better to analyze than case-sensitive alphabets; bring in the tildes & all goes swoosh! So, the dummy variables might be a savior in that case.

They work like a charm when it comes to machine learning algorithms such as regression which strictly deal with numbers. Have no belief? Try feeding in some textual data into your linear regression and witness the montage of errors being thrown at, the very moment the code is run!

Syntax of the get_dummies() function

Dummy variables ease the treacherous task of data cleaning by assigning a numerical value to the categorical data of the given dataframe. Following is the syntax of the get_dummies( ) function detailing the fundamental constituents required for its proper functioning.

  • data –Categorical dataframe that is to be converted into dummy variables
  • prefix –An optional component set to ‘None’ by default and is used to assign column names to the dummy variable dataframe
  • prefix_sep –An optional component set to ‘_’ by default and is used to differentiate the categorical entry from the column name in the dummy variable dataframe
  • dummy_na – An optional component set to ‘False’ by default and is used to add a column to indicate the positions where there are zeros in every column of the dummy variable dataframe
  • columns –An optional component set to ‘None’ by default and is used to encode the column names in the input categorical dataframe before conversion into dummy variables
  • sparse –An optional component set to ‘False’ by default and is set to ‘True’ if the dummy encoded columns are to be backed by a sparse array rather than a numpy array
  • drop_first –An optional component set to ‘False’ by default and is set to ‘True’ if the first level from the input categorical data is to be removed while converting to dummy variables
  • dtype –An optional component set to ‘None’ by default and is used to specify the data type for the new columns of dummy variables

Use cases for the get_dummies() function

In this section, we shall demonstrate the use of a handful of components within the get_dummies( ) function with the following dataframe.

Input Dataframe

Input Dataframe

We shall use only the Region column from the above dataframe for conversion into dummy variables.

Values For Region

Values For Region

Once done, let us try running it through the get_dummies( ) function with its default setting.

Dummy Variable Dataframe With Default Settings

Dummy Variable Dataframe With Default Settings

Now let us deploy some of the components within the get_dummies( ) function to do the following,

  • Assign a prefix ‘option’ with ‘-‘ as a separator
  • Create an additional column to indicate the locations where values are not available
  • Remove the first level of categorical data
  • Return all dummy variables as ‘float’ data type

All the above-listed requirements when translated into a code become the ones given below.

Dummy Variable Dataframe After Custom Settings

Dummy Variable Dataframe After Custom Settings

Since first-level categorical data is removed, entries with Africa have vanished into thin air whilst the rest of the changes are presumed to be self-explanatory.

Conclusion

Now that we have reached the end of this article, hope it has elaborated on how to use the get_dummies( ) function from the pandas library. Here’s another article that details the usage of the from_dummies ( ) function from the pandas library in Python. There are numerous other enjoyable and equally informative articles in AskPython that might be of great help to those who are looking to level up in Python. Ciao!

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *