Как сравнить два датафрейма в python на совпадения
Перейти к содержимому

Как сравнить два датафрейма в python на совпадения

  • автор:

pandas.DataFrame.compare#

Compare to another DataFrame and show the differences.

New in version 1.1.0.

Object to compare with.

align_axis <0 or ‘index’, 1 or ‘columns’>, default 1

Determine which axis to align the comparison on.

    0, or ‘index’ Resulting differences are stacked vertically

with rows drawn alternately from self and other.

with columns drawn alternately from self and other.

If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.

keep_equal bool, default False

If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

result_names tuple, default (‘self’, ‘other’)

Set the dataframes names in the comparison.

New in version 1.5.0.

DataFrame that shows the differences stacked side by side.

The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.

Compare Two DataFrames for Equality in Pandas

While working with pandas dataframes, it may happen that you require to check whether two dataframes are same or not. In this tutorial, we’ll look at how to compare two pandas dataframes for equality along with some examples.

The pandas dataframe equals() function

The pandas dataframe function equals() is used to compare two dataframes for equality. It returns True if the two dataframes have the same shape and elements. For two dataframes to be equal, the elements should have the same dtype . The column headers, however, do not need to have the same dtype. The following is the syntax:

Here, df1 and df2 are the two dataframes you want to compare. Note that NaN s in the same location are considered equal.

Examples

Let’s see using some examples of how the equals() function works and what to expect when using it to compare two dataframes.

1. Compare two exactly similar dataframes

In the above example, two dataframes df1 and df2 are compared for equality using the equals() method. Since the dataframes are exactly similar (1. values and datatypes of elements are the same and values and 2. datatypes of row and column labels are the same) True is returned.

2. Compare two exactly similar dataframes with NaNs

In the above example, you can see that NaN s and None are considered equal if they occur at the same location.

3. Compare two dataframes with equal values but different dtypes

In the above example, the column A has equal values but different dtypes in dataframes df1 and df2 hence we get False . For the dataframes to be equal the elements should have the same values and same dtypes.

4. Compare dataframes with columns having different dtype

Will the dataframes be equal if the column names are equal but have different dtypes given that the elements are the same?

In the above example we find that dtypes of column names does not matter so long as they are equal.

5. Compare dataframes with same elements but different column names

What will the equals() function return if two dataframes have the same elements but different column names?

In the above example, we see that the elements of the dataframes df1 and df2 are the same but since the column names are different both the dataframes cannot be said to be equal.

6. Compare dataframes with same elements but different index

In the above example, we can see that as was the case with column names, dataframes having different indices cannot be said to be equal even if they have the same elements. If you want to compare two dataframes with different index schemes, first reset the index and then check for equality.

For more on the pandas dataframe equals() function, refer to its official documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5 and numpy version 1.18.5

More on Pandas DataFrames –

Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.

Как сравнить два кадра данных в Pandas

Часто вам может быть интересно сравнить значения между двумя пандами DataFrames, чтобы определить их сходства и различия.

В этом руководстве объясняется, как это сделать.

Пример: сравнение двух фреймов данных в Pandas

Предположим, у нас есть следующие два кадра данных pandas, каждый из которых содержит данные о четырех баскетболистах:

Пример 1. Узнайте, идентичны ли два кадра данных.

Сначала мы можем узнать, идентичны ли два кадра данных, используя функцию DataFrame.equals() :

Два кадра данных не содержат одинаковых значений, поэтому эта функция правильно возвращает False .

Пример 2. Найдите различия в статистике игроков между двумя кадрами данных.

Мы можем найти разницу между передачами и очками для каждого игрока, используя функцию pandas subtract() :

Интерпретировать это можно следующим образом:

  • У игрока А было одинаковое количество очков в обоих DataFrame, но у него было на 3 передачи больше в DataFrame 2.
  • У игрока Б было на 9 очков больше и на 2 передачи больше в DataFrame 2 по сравнению с DataFrame 1.
  • У игрока C было на 9 очков больше и на 3 передачи больше в DataFrame 2 по сравнению с DataFrame 1.
  • У игрока D было на 5 очков больше и на 5 передач больше в DataFrame 2 по сравнению с DataFrame 1.

Пример 3: Найдите все строки, которые существуют только в одном DataFrame.

Мы можем использовать следующий код, чтобы получить полный список строк, которые появляются только в одном DataFrame:

В этом случае два DataFrames не имеют одинаковых строк, поэтому всего 8 строк появляются только в одном из DataFrames.

Столбец под названием «Существует» удобно сообщает нам, в каком DataFrame уникально появляется каждая строка.

pandas.DataFrame.equals#

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.

Parameters other Series or DataFrame

The other Series or DataFrame to be compared with the first.

True if all elements are the same in both objects, False otherwise.

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.

Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.

Like assert_series_equal, but targets DataFrames.

Return True if two arrays have the same shape and elements, False otherwise.

DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True.

DataFrames df and different_column_type have the same element types and values, but have different types for the column labels, which will still return True.

DataFrames df and different_data_type have different types for the same values for their elements, and will return False even though their column labels are the same values and types.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *