Как прочитать csv файл python

How to Read and Write to CSV Files in Python

A CSV file is a plain text file that contains rows and columns. The rows in a CSV file are separated by a newline, and the columns are separated by commas.

CSVs provide a simple way of storing data and are commonly used to export tabular data on many websites.

A simple CSV file

Let’s write a CSV file that contains students’ data, as shown below.

Python CSV Library

Python provides several ways to read and write CSV files. In this tutorial, we will use the CSV module and the pandas library to read and write data to CSV files

Reading a CSV file

Let's start by creating 2 files namely

students.txt
students.py

Next, enter the following data into the students.txt file

Now open students.py files and start by importing the CSV module

The open() file function is used to open a Python file and is supports 3 modes of file manipulation

Read-mode -r
Write-mode -w
Append-mode -a

csv.reader reads the file’s contents by iterating over each line in the CSV file. Then we get each row and print the contents. the delimiter specifies which character is used to separate the fields, and in our case, its a comma
The result will be;

Writing CSV files with CSV

We have learned how to read to a CSV file, but suppose we want to write to a CSV file; how would we do that?.
The syntax for writing to a file will remain the same but instead, we will use the write-mode

Let's write a copy of our students' data to another file

When writing to a CSV, we use the writer function and the write mode. writerow will write a row of data to a new line. Here is our new data.

Read CSV File using Pandas

Pandas is a module that allows working with tabular data such as Excel and CSV. To use the pandas module, we first need to install it with pip.

When you have data in a CSV, you can read it with the Pandas module using .read_csv(): pandas provide the read_csv() function for reading data stored as a CSV file.

In the code above, you use the .read_csv() method and pass the file as an argument. The result will be:

As you can see using pandas provides an easy way to read CSV files.

Write to CSV File using Pandas

To write data to a CSV using pandas, we use Dataframes. A Dataframe is an object that stores data as rows and columns.

Conclusion

This tutorial has covered the concepts required to begin manipulating tabular data. The pandas library is a powerful tool used in data science. It’s also easy to use, hence saving time and resources.

Enjoy reading on medium, Create an account for full access. If you enjoyed reading this, you might relish?

Как читать и писать CSV-файлы в Python.

Формат CSV является наиболее часто используемым форматом импорта и экспорта для баз данных и электронных таблиц. В этом руководстве будет подробно рассказано о CSV, а также о модулях и классах, доступных для чтения и записи данных в файлы CSV. Также будет рассмотрен рабочий пример, показывающий, как читать и записывать данные в файл CSV на Python.

Что такое файл CSV?

Файл CSV (значения, разделенные запятыми) позволяет сохранять данные в табличной структуре с расширением .csv. CSV-файлы широко используются в приложениях электронной коммерции, поскольку их очень легко обрабатывать. Некоторые из областей, где они были использованы, включают в себя:

импорт и экспорт данных клиентов
импорт и экспорт продукции
экспорт заказов
экспорт аналитических отчетов по электронной коммерции

Модули для чтения и записи

Модуль CSV имеет несколько функций и классов, доступных для чтения и записи CSV, и они включают в себя:

функция csv.reader
функция csv.writer
класс csv.Dictwriter
класс csv.DictReader

csv.reader

Модуль csv.reader принимает следующие параметры:

csvfile : обычно это объект, который поддерживает протокол итератора и обычно возвращает строку каждый раз, когда вызывается его метод __next__() .
dialect=»excel»: необязательный параметр, используемый для определения набора параметров, специфичных для определенного диалекта CSV.
fmtparams : необязательный параметр, который можно использовать для переопределения существующих параметров форматирования.

Вот пример того, как использовать модуль csv.reader.

модуль csv.writer

Этот модуль похож на модуль csv.reader и используется для записи данных в CSV. Требуется три параметра:

csvfile : это может быть любой объект с методом write() .
dialect = «excel» : необязательный параметр, используемый для определения набора параметров, специфичных для конкретного CSV.
fmtparam : необязательный параметр, который можно использовать для переопределения существующих параметров форматирования.

Классы DictReader и DictWriter

DictReader и DictWriter — это классы, доступные в Python для чтения и записи в CSV. Хотя они и похожи на функции чтения и записи, эти классы используют объекты словаря для чтения и записи в CSV-файлы.

DictReader

Он создает объект, который отображает прочитанную информацию в словарь, ключи которого задаются параметром fieldnames . Этот параметр является необязательным, но если он не указан в файле, данные первой строки становятся ключами словаря.

DictWriter

Этот класс аналогичен классу DictWriter и выполняет противоположную функцию: запись данных в файл CSV. Класс определяется как csv.DictWriter(csvfile, fieldnames,restval=»», extrasaction=»raise»,dialect=»excel», *args, **kwds)

Параметр fieldnames определяет последовательность ключей, которые определяют порядок, в котором значения в словаре записываются в файл CSV. В отличие от DictReader, этот ключ не является обязательным и должен быть определен во избежание ошибок при записи в CSV.

Диалекты и форматирование

Диалект — это вспомогательный класс, используемый для определения параметров для конкретного экземпляра reader или writer . Диалекты и параметры форматирования должны быть объявлены при выполнении функции чтения или записи.

Есть несколько атрибутов, которые поддерживаются диалектом:

delimiter: строка, используемая для разделения полей. По умолчанию это «,» .
double quote: Управляет тем, как должны появляться в кавычках случаи, когда кавычки появляются внутри поля. Может быть True или False.
escapechar: строка, используемая автором для экранирования разделителя, если в кавычках задано значение QUOTE_NONE .
lineterminator: строка, используемая для завершения строк, созданных writer . По умолчанию используется значение «\r\n» .
quotechar: строка, используемая для цитирования полей, содержащих специальные символы. По умолчанию это «»» .
skipinitialspace: Если установлено значение True , любые пробелы, следующие сразу за разделителем, игнорируются.
strict: если установлено значение True , возникает Error при неправильном вводе CSV.
quoting: определяет, когда следует создавать кавычки при чтении или записи в CSV.

Чтение файла CSV

Давайте посмотрим, как читать CSV-файл, используя вспомогательные модули, которые мы обсуждали выше.

Создайте свой CSV-файл и сохраните его как example.csv. Убедитесь, что он имеет расширение .csv и заполните некоторые данные. Здесь у нас есть CSV-файл, который содержит имена учеников и их оценки.

Как читать и писать CSV-файлы в Python.

Ниже приведен код для чтения данных в нашем CSV с использованием функции csv.reader и класса csv.DictReader .

Чтение CSV-файла с помощью csv.reader

В приведенном выше коде мы импортируем модуль CSV, а затем открываем наш файл CSV в виде File . Затем мы определяем объект reader и используем метод csv.reader для извлечения данных в объект. Затем мы перебираем объект reader и извлекаем каждую строку наших данных.

Мы показываем прочитанные данные, печатая их содержимое на консоль. Мы также указали обязательные параметры, такие как разделитель, кавычка и цитирование.

Вывод

Чтение CSV-файла с помощью DictReader

Как мы упоминали выше, DictWriter позволяет нам читать CSV-файл, отображая данные в словарь вместо строк, как в случае с модулем csv.reader . Хотя имя поля является необязательным параметром, важно всегда помечать столбцы для удобства чтения.

Вот как читать CSV, используя класс DictWriter.

Сначала мы импортируем модуль csv и инициализируем пустой список results , который мы будем использовать для хранения полученных данных. Затем мы определяем объект reader и используем метод csv.DictReader для извлечения данных в объект. Затем мы перебираем объект reader и извлекаем каждую строку наших данных.

Наконец, мы добавляем каждую строку в список результатов и выводим содержимое на консоль.

Вывод

Как вы можете видеть выше, лучше использовать класс DictReader, потому что он выдает наши данные в формате словаря, с которым легче работать.

Запись в файл CSV

Давайте теперь посмотрим, как приступить к записи данных в файл CSV с использованием функции csv.writer и класса csv.Dictwriter , которые обсуждались в начале этого урока.

Запись в файл CSV с помощью csv.writer

Код ниже записывает данные, определенные в файл example2.csv .

Сначала мы импортируем модуль csv, и функция writer() создаст объект, подходящий для записи. Чтобы перебрать данные по строкам, нам нужно использовать функцию writerows() .

Вот наш CSV с данными, которые мы записали в него.

Как читать и писать CSV-файлы в Python.

Запись в файл CSV с использованием DictWriter

Давайте напишем следующие данные в CSV.

Код, как показано ниже.

Сначала мы определим fieldnames , которые будут представлять заголовки каждого столбца в файле CSV. Метод writerrow() будет записывать по одной строке за раз. Если вы хотите записать все данные одновременно, вы будете использовать метод writerrows() .

Вот как можно записать все строки одновременно.

Заключение

В этом руководстве рассматривается большинство вопросов, необходимых для успешного чтения и записи в файл CSV с использованием различных функций и классов, предоставляемых Python. Файлы CSV широко используются в приложениях, потому что их легко читать и управлять ими, а их небольшой размер делает их относительно быстрыми для обработки и передачи.

Не стесняйтесь, и посмотрите, что у нас есть для продажи и для изучения на рынке , и не стесняйтесь задавать любые вопросы и предоставить свой ценный отзыв, используя канал комментариев ниже.

14.1. csv — CSV File Reading and Writing¶

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.

The csv module’s reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.

PEP 305 — CSV File API The Python Enhancement Proposal which proposed this addition to Python.

14.1.1. Module Contents¶

The csv module defines the following functions:

csv. reader ( csvfile, dialect=’excel’, **fmtparams ) ¶

Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called — file objects and list objects are both suitable. If csvfile is a file object, it should be opened with newline=» . [1] An optional dialect parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the Dialect class or one of the strings returned by the list_dialects() function. The other optional fmtparams keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters .

Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed unless the QUOTE_NONNUMERIC format option is specified (in which case unquoted fields are transformed into floats).

A short usage example:

Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. csvfile can be any object with a write() method. If csvfile is a file object, it should be opened with newline=» [1]. An optional dialect parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the Dialect class or one of the strings returned by the list_dialects() function. The other optional fmtparams keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters . To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string. While this isn’t a reversible transformation, it makes it easier to dump SQL NULL data values to CSV files without preprocessing the data returned from a cursor.fetch* call. All other non-string data are stringified with str() before being written.

A short usage example:

Associate dialect with name. name must be a string. The dialect can be specified either by passing a sub-class of Dialect , or by fmtparams keyword arguments, or both, with keyword arguments overriding parameters of the dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters .

csv. unregister_dialect ( name ) ¶

Delete the dialect associated with name from the dialect registry. An Error is raised if name is not a registered dialect name.

csv. get_dialect ( name ) ¶

Return the dialect associated with name. An Error is raised if name is not a registered dialect name. This function returns an immutable Dialect .

Return the names of all registered dialects.

csv. field_size_limit ( [ new_limit ] ) ¶

Returns the current maximum field size allowed by the parser. If new_limit is given, this becomes the new limit.

The csv module defines the following classes:

Create an object that operates like a regular reader but maps the information in each row to an OrderedDict whose keys are given by the optional fieldnames parameter.

The fieldnames parameter is a sequence . If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the ordered dictionary preserves their original ordering.

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None ). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with None .

All other optional or keyword arguments are passed to the underlying reader instance.

Changed in version 3.6: Returned rows are now of type OrderedDict .

A short usage example:

Create an object which operates like a regular writer but maps dictionaries onto output rows. The fieldnames parameter is a sequence of keys that identify the order in which values in the dictionary passed to the writerow() method are written to file f. The optional restval parameter specifies the value to be written if the dictionary is missing a key in fieldnames. If the dictionary passed to the writerow() method contains a key not found in fieldnames, the optional extrasaction parameter indicates what action to take. If it is set to ‘raise’ , the default value, a ValueError is raised. If it is set to ‘ignore’ , extra values in the dictionary are ignored. Any other optional or keyword arguments are passed to the underlying writer instance.

Note that unlike the DictReader class, the fieldnames parameter of the DictWriter is not optional. Since Python’s dict objects are not ordered, there is not enough information available to deduce the order in which the row should be written to file f.

A short usage example:

The Dialect class is a container class relied on primarily for its attributes, which are used to define the parameters for a specific reader or writer instance.

The excel class defines the usual properties of an Excel-generated CSV file. It is registered with the dialect name ‘excel’ .

class csv. excel_tab ¶

The excel_tab class defines the usual properties of an Excel-generated TAB-delimited file. It is registered with the dialect name ‘excel-tab’ .

class csv. unix_dialect ¶

The unix_dialect class defines the usual properties of a CSV file generated on UNIX systems, i.e. using ‘\n’ as line terminator and quoting all fields. It is registered with the dialect name ‘unix’ .

New in version 3.2.

The Sniffer class is used to deduce the format of a CSV file.

The Sniffer class provides two methods:

sniff ( sample, delimiters=None ) ¶

Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.

Analyze the sample text (presumed to be in CSV format) and return True if the first row appears to be a series of column headers.

An example for Sniffer use:

The csv module defines the following constants:

Instructs writer objects to quote all fields.

Instructs writer objects to only quote those fields which contain special characters such as delimiter, quotechar or any of the characters in lineterminator.

Instructs writer objects to quote all non-numeric fields.

Instructs the reader to convert all non-quoted fields to type float.

Instructs writer objects to never quote fields. When the current delimiter occurs in output data it is preceded by the current escapechar character. If escapechar is not set, the writer will raise Error if any characters that require escaping are encountered.

Instructs reader to perform no special processing of quote characters.

The csv module defines the following exception:

exception csv. Error ¶

Raised by any of the functions when an error is detected.

14.1.2. Dialects and Formatting Parameters¶

To make it easier to specify the format of input and output records, specific formatting parameters are grouped together into dialects. A dialect is a subclass of the Dialect class having a set of specific methods and a single validate() method. When creating reader or writer objects, the programmer can specify a string or a subclass of the Dialect class as the dialect parameter. In addition to, or instead of, the dialect parameter, the programmer can also specify individual formatting parameters, which have the same names as the attributes defined below for the Dialect class.

Dialects support the following attributes:

A one-character string used to separate fields. It defaults to ‘,’ .

Controls how instances of quotechar appearing inside a field should themselves be quoted. When True , the character is doubled. When False , the escapechar is used as a prefix to the quotechar. It defaults to True .

On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is found in a field.

A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False . On reading, the escapechar removes any special meaning from the following character. It defaults to None , which disables escaping.

The string used to terminate lines produced by the writer . It defaults to ‘\r\n’ .

The reader is hard-coded to recognise either ‘\r’ or ‘\n’ as end-of-line, and ignores lineterminator. This behavior may change in the future.

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ‘"’ .

Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the QUOTE_* constants (see section Module Contents ) and defaults to QUOTE_MINIMAL .

When True , whitespace immediately following the delimiter is ignored. The default is False .

When True , raise exception Error on bad CSV input. The default is False .

14.1.3. Reader Objects¶

Reader objects ( DictReader instances and objects returned by the reader() function) have the following public methods:

Return the next row of the reader’s iterable object as a list (if the object was returned from reader() ) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader) .

Reader objects have the following public attributes:

A read-only description of the dialect in use by the parser.

The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.

DictReader objects have the following public attribute:

If not passed as a parameter when creating the object, this attribute is initialized upon first access or when the first record is read from the file.

14.1.4. Writer Objects¶

Writer objects ( DictWriter instances and objects returned by the writer() function) have the following public methods. A row must be an iterable of strings or numbers for Writer objects and a dictionary mapping fieldnames to strings or numbers (by passing them through str() first) for DictWriter objects. Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all).

csvwriter. writerow ( row ) ¶

Write the row parameter to the writer’s file object, formatted according to the current dialect.

Changed in version 3.5: Added support of arbitrary iterables.

Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.

Writer objects have the following public attribute:

A read-only description of the dialect in use by the writer.

DictWriter objects have the following public method:

Write a row with the field names (as specified in the constructor).

New in version 3.2.

14.1.5. Examples¶

The simplest example of reading a CSV file:

Reading a file with an alternate format:

The corresponding simplest possible writing example is:

Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding() ). To decode a file using a different encoding, use the encoding argument of open:

The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file.

Registering a new dialect:

A slightly more advanced use of the reader — catching and reporting errors:

And while the module doesn’t directly support parsing strings, it can easily be done:

Reading and Writing CSV Files in Python

Let’s face it: you need to get information into and out of your programs through more than just the keyboard and console. Exchanging information through text files is a common way to share info between programs. One of the most popular formats for exchanging data is the CSV format. But how do you use it?

Let’s get one thing clear: you don’t have to (and you won’t) build your own CSV parser from scratch. There are several perfectly acceptable libraries you can use. The Python csv library will work for most cases. If your work requires lots of data or numerical analysis, the pandas library has CSV parsing capabilities as well, which should handle the rest.

In this article, you’ll learn how to read, process, and parse CSV from text files using Python. You’ll see how CSV files work, learn the all-important csv library built into Python, and see how CSV parsing works using the pandas library.

So let’s get started!

Free Download: Get a sample chapter from Python Basics: A Practical Introduction to Python 3 to see how you can go from beginner to intermediate in Python with a complete curriculum, up-to-date for Python 3.8.

Take the Quiz: Test your knowledge with our interactive “Reading and Writing CSV Files in Python” quiz. Upon completion you will receive a score so you can track your learning progress over time:

What Is a CSV File?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. Here’s what that structure looks like:

Notice how each piece of data is separated by a comma. Normally, the first line identifies each piece of data—in other words, the name of a data column. Every subsequent line after that is actual data and is limited only by file size constraints.

In general, the separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab ( \t ), colon ( : ) and semi-colon ( ; ) characters. Properly parsing a CSV file requires us to know which delimiter is being used.

Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

CSV files are very easy to work with programmatically. Any language that supports text file input and string manipulation (like Python) can work with CSV files directly.

Parsing CSV Files With Python’s Built-in CSV Library

The csv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. The csv library contains objects and other code to read, write, and process data from and to CSV files.

Reading CSV Files With csv

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader , which does the heavy lifting.

Here’s the employee_birthday.txt file:

Here’s code to read it:

This results in the following output:

Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

Reading CSV Files Into a Dictionary With csv

Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary (technically, an Ordered Dictionary) as well.

Again, our input file, employee_birthday.txt is as follows:

Here’s the code to read it in as a dictionary this time:

This results in the same output as before:

Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting the fieldnames optional parameter to a list containing them.

Optional Python CSV reader Parameters

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

delimiter specifies the character used to separate each field. The default is the comma ( ‘,’ ).

quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ‘ » ‘ ).

escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

These parameters deserve some more explanation. Suppose you’re working with the following employee_addresses.txt file:

This CSV file contains three fields: name , address , and date joined , which are delimited by commas. The problem is that the data for the address field also contains a comma to signify the zip code.

There are three different ways to handle this situation:

Use a different delimiter
That way, the comma can safely be used in the data itself. You use the delimiter optional parameter to specify the new delimiter.

Wrap the data in quotes
The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with the quotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.

Escape the delimiter characters in the data
Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using the escapechar optional parameter.

Writing CSV Files With csv

You can also write to a CSV file using a writer object and the .write_row() method:

The quotechar optional parameter tells the writer which character to use to quote fields when writing. Whether quoting is used or not, however, is determined by the quoting optional parameter:

If quoting is set to csv.QUOTE_MINIMAL , then .writerow() will quote fields only if they contain the delimiter or the quotechar . This is the default case.
If quoting is set to csv.QUOTE_ALL , then .writerow() will quote all fields.
If quoting is set to csv.QUOTE_NONNUMERIC , then .writerow() will quote all fields containing text data and convert all numeric fields to the float data type.
If quoting is set to csv.QUOTE_NONE , then .writerow() will escape delimiters instead of quoting them. In this case, you also must provide a value for the escapechar optional parameter.

Reading the file back in plain text shows that the file is created as follows:

Writing CSV File From a Dictionary With csv

Since you can read our data into a dictionary, it’s only fair that you should be able to write it out from a dictionary as well:

Unlike DictReader , the fieldnames parameter is required when writing a dictionary. This makes sense, when you think about it: without a list of fieldnames , the DictWriter can’t know which keys to use to retrieve values from your dictionaries. It also uses the keys in fieldnames to write out the first row as column names.

The code above generates the following output file:

Parsing CSV Files With the pandas Library

Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Installing pandas and its dependencies in Anaconda is easily done:

As is using pip / pipenv for other Python installations:

We won’t delve into the specifics of how pandas works or how to use it. For an in-depth treatment on using pandas to read and analyze large data sets, check out Shantnu Tiwari’s superb article on working with large Excel files in pandas.

Reading CSV Files With pandas

To show some of the power of pandas CSV capabilities, I’ve created a slightly more complicated file to read, called hrdata.csv . It contains data on company employees:

Reading the CSV into a pandas DataFrame is quick and straightforward:

That’s it: three lines of code, and only one of them is doing the actual work. pandas.read_csv() opens, analyzes, and reads the CSV file provided, and stores the data in a DataFrame. Printing the DataFrame results in the following output:

Here are a few points worth noting:

First, pandas recognized that the first line of the CSV contained column names, and used them automatically. I call this Goodness.
However, pandas is also using zero-based integer indices in the DataFrame . That’s because we didn’t tell it what our index should be.

Further, if you look at the data types of our columns , you’ll see pandas has properly converted the Salary and Sick Days remaining columns to numbers, but the Hire Date column is still a String . This is easily confirmed in interactive mode:

Let’s tackle these issues one at a time. To use a different column as the DataFrame index, add the index_col optional parameter:

Now the Name field is our DataFrame index:

Next, let’s fix the data type of the Hire Date field. You can force pandas to read data as a date with the parse_dates optional parameter, which is defined as a list of column names to treat as dates:

Notice the difference in the output:

The date is now formatted properly, which is easily confirmed in interactive mode:

If your CSV files doesn’t have column names in the first line, you can use the names optional parameter to provide a list of column names. You can also use this if you want to override the column names provided in the first line. In this case, you must also tell pandas.read_csv() to ignore existing column names using the header=0 optional parameter:

Notice that, since the column names changed, the columns specified in the index_col and parse_dates optional parameters must also be changed. This now results in the following output:

Writing CSV Files With pandas

Of course, if you can’t get your data out of pandas again, it doesn’t do you much good. Writing a DataFrame to a CSV file is just as easy as reading one in. Let’s write the data with the new column names to a new CSV file:

The only difference between this code and the reading code above is that the print(df) call was replaced with df.to_csv() , providing the file name. The new CSV file looks like this:

Conclusion

If you understand the basics of reading CSV files, then you won’t ever be caught flat footed when you need to deal with importing data. Most CSV reading, processing, and writing tasks can be easily handled by the basic csv Python library. If you have a lot of data to read and process, the pandas library provides quick and easy CSV handling capabilities as well.

Are there other ways to parse text files? Of course! Libraries like ANTLR, PLY, and PlyPlus can all handle heavy-duty parsing, and if simple String manipulation won’t work, there are always regular expressions.

But those are topics for other articles…

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Reading and Writing CSV Files

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Jon Fincher

Jon taught Python and Java in two high schools in Washington State. Previously, he was a Program Manager at Microsoft.

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Как прочитать csv файл python

How to Read and Write to CSV Files in Python

A simple CSV file

Python CSV Library

Reading a CSV file

Writing CSV files with CSV

Read CSV File using Pandas

Write to CSV File using Pandas

Conclusion

Как читать и писать CSV-файлы в Python.

Что такое файл CSV?

Модули для чтения и записи

csv.reader

модуль csv.writer

Классы DictReader и DictWriter

DictReader

DictWriter

Диалекты и форматирование

Чтение файла CSV

Чтение CSV-файла с помощью csv.reader

Чтение CSV-файла с помощью DictReader

Запись в файл CSV

Запись в файл CSV с помощью csv.writer

Запись в файл CSV с использованием DictWriter

Заключение

14.1. csv — CSV File Reading and Writing¶

14.1.1. Module Contents¶

14.1.2. Dialects and Formatting Parameters¶

14.1.3. Reader Objects¶

14.1.4. Writer Objects¶

14.1.5. Examples¶

Reading and Writing CSV Files in Python

What Is a CSV File?

Where Do CSV Files Come From?

Parsing CSV Files With Python’s Built-in CSV Library

Reading CSV Files With csv

Reading CSV Files Into a Dictionary With csv

Optional Python CSV reader Parameters

Writing CSV Files With csv

Writing CSV File From a Dictionary With csv

Parsing CSV Files With the pandas Library

Reading CSV Files With pandas

Writing CSV Files With pandas

Conclusion

Добавить комментарий Отменить ответ