Как очистить csv файл python

python: Как очистить файл csv

Я новичок в Python и хочу очистить CSV-файл для анализа. Однако я столкнулся с проблемой с кодом.

с кодом выше вывод похож на

Я хотел бы получить следующий результат:

Я определил разделитель как «,» но все же «;» используемый. Я не знаю, почему это происходит. Также, даже если я пытался заменить пробел на «NaN», но все равно каждый пробел заменяется на «NaN».

Был бы очень признателен, если бы кто-нибудь дал мне советы

В конце концов, я хотел бы проанализировать каждый столбец, например, процент «NaN» и т. Д.

4 ответа

Вы можете получить желаемый результат:

указав ‘;’ как разделитель при построении объекта чтения
передача каждой строки через функцию, которая преобразует пустые ячейки в ‘NaN’ (или другое значение по вашему выбору)

Вот пример кода:

Вам необходимо указать правильный разделитель:

В вашем CSV-файле используется точка с запятой, а не запятая, поэтому вам нужно указать reader() что использовать.

Совет:

Не назначайте переменной что-то другое. Перед выполнением этой строки filename это строка с именем файла, который нужно открыть. После этого задания filename теперь список строк из файла. Вместо этого используйте другое имя переменной:

Теперь две переменные различны, и их значение ясно из названий. Конечно, не стесняйтесь использовать что-нибудь кроме rows Кроме как filename .

python clear csv file

how can I clear a complete csv file with python. Most forum entries that cover the issue of deleting row/columns basically say, write the stuff you want to keep into a new file. I need to completely clear a file — how can I do that?

3 Answers 3

Basically you want to truncate the file, this can be any file. In this case it’s a csv file so:

Your question is rather strange, but I’ll interpret it literally. Clearing a file is not the same as deleting it.

You want to open a file object to the CSV file, and then truncate the file, bringing it to zero length.

If you want to delete it instead, that’s just a os filesystem call:

The Python csv module is only for reading and writing whole CSV files but not for manipulating them. If you need to filter data from file then you have to read it, create a new csv file and write the filtered rows back to new file.

The Overflow Blog

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

How to clean CSV data in Python?

Cleaning Dataset Using Python

Effective implementation of Machine learning algorithms or even when doing some data analysis on a dataset, we require clean data.

There’s a well-known saying about numerical modeling with data, “Trash in Trash out” we can’t expect decent results when our data isn’t clean.

in this article, we’ll explore common techniques we can use to clean CSV data using the python pandas library.

CSV Data Cleaning Checks

We’ll clean data based on the following:

Missing Values
Outliers
Duplicate Values

1. Cleaning Missing Values in CSV File

In Pandas, a missing value is usually denoted by NaN , since it is based on the NumPy package it is the special floating-point NaN value particular to NumPy.

You can find the dataset used in this article here.

Finding Missing Values

Let’s first see how we can find if there’s a missing value in our data.

#Approach 1: visually

Missing Values Using Heatmap

Missing Values Using the Heatmap

The isnull() method returns boolean values indicating if there’s a missing value in the data.

However, this process could be limited to only medium to small datasets.

#Approach 2

We can use .sum() method after applying .isnull() , this will return the sum of missing values within each column in the data frame.

Finding Sum Of Missing Values

Cleaning Missing Values from Data

We found that our dataset does have some missing values in it, what should we do next to get clean data?

We can either drop the rows and columns containing missing values in them or replace the missing values with appropriate value i.e. mean, median, or mode.

Dropping Missing Values:

The above code will drop the rows from the dataframe having missing values.

Let’s look at .dropna() method in detail:

df.dropna() – Drop all rows that have any NaN values
df.dropna(how=’all’) – Drop only if ALL columns are NaN
df.dropna(thresh=2) – Drop row if it does not have at least two values that are not NaN
df.dropna(subset=[1]) – Drop only if NaN in specific column

One must be careful when considering dropping the missing values as it might affect the quality of the dataset.

2. Replacing Missing values

Before And After Filling Null Values

Pandas module has the .fillna() method, which accepts a value that we want to replace in place of NaN values. We just calculated the mean of the column and passed it as an input argument to fillna() method.

2. Dealing with Outliers

Outliers can change the course of entire predictions therefore it is essential we detect and remove outliers.

Using Z-Score

Let’s detect outliers in the Votes column in our dataset and filter the outliers using a z-score.

The idea behind this method lies in the fact that values lying 3 standard deviations away from the mean will be termed an Outlier.

The column on which this method is applied should be a numerical variable and not categorical.

Using Quantiles

By this method values falling below 0.01 quantile and above 0.99 quantiles in the series will be filtered out.

3. Dealing with Duplicate entries

We can check for any duplicates in a DataFrame using .duplicated() method. This returns a Pandas Series and not a DataFrame.

To check duplicate values in a specific column we can provide the column name as an input argument into the .duplicated( ) method.

Let’s see this in action.

Luckily we have no duplicate values in our data frame, so we will append some values from the data frame itself to create duplicate values.

Now, .drop_duplicates() method is used to drop the duplicate values from the dataframe.

Summary

CSV data cleaning in Python is easy with pandas and the NumPy module. Always perform data cleaning before running some analysis over it to make sure the analysis is correct.