CSV in Python

by Alex
CSV in Python

Programmers are often faced with the task of handling large amounts of structured data. Python has a built-in CSV library with which the programmer can work with special CSV files. They are a kind of spreadsheet.

What are CSV files?

A CSV file is a special kind of file that allows you to structure large amounts of data. It is essentially an ordinary text file, but each new element is separated from the previous one by a comma or other separator. Typically, each entry starts on a new line. CSV data can easily be exported to spreadsheets or databases. The programmer can extend the CSV file by adding new lines. Example of a CSV file that uses a comma as a separator:

Name,Profession,Year of Birth
Viktor, Turner, 1995
Sergey, Welder, 1983
As you can see from the example, the first line usually specifies what information will be in each column. In addition, after the last element of the line there is no comma, the interpreter determines the end of the line by the hyphenation character.

Any other delimiter can be used instead of a comma, so when reading a CSV file, you should know in advance which character is used. It is important to remember that CSV is a plain text file and does not support characters in encodings other than ASCII or Unicode.

The CSV library

This is the main library for working with CSV files in Python. The csv library is built-in, so you don’t need to download it, just use regular imports:

import csv

Reading from files (parsing)

In order to read data from a file, the programmer must create an object reader:

reader_object = csv.reader(file, delimiter = ",")

reader has a __next__() method, that is, iterable object, so reading from the file is as follows:

import csv
with open("classmates.csv", encoding='utf-8') as r_file:
    # We create a reader object, specify the delimiter character ","
    file_reader = csv.reader(r_file, delimiter = ",")
    # Counter for counting number of lines and outputting column titles
    count = 0
    # Read data from CSV file
    for row in file_reader:
        if count == 0:
            # Output the row containing the headers for the columns
            print(f'The file contains columns: {", ".join(row)}')
        else:
            # Output rows
            print(f' {row[0]} is {row[1]} and he was born in {row[2]} year.')
        count += 1
    print(print(f'There are {count} lines in the file.')

Suppose we have a CSV file that contains the following information:

Name,Grades,Year of Birth
Sasha,honor student,200
Masha,good,1999
Petya,Failing,2000

Then, if you open this file in our program, you will get the following results:

The file contains columns: Name, Grades, Year of Birth
    Sasha is an excellent student and he was born in year 200.
    Masha is a good student and he was born in 1999.
    Petya is a "C" student and he was born in 2000.
In total there are 4 lines in the file.

The use of the with…as construct allows the programmer to be sure that the file will be closed, even if some error occurs during code execution.

Note that when you open it, you must specify the correct encoding in which the data is saved. In this case encoding=’utf-8′. If you do not, the default encoding will be used. For Windows it is cp1251.

Library CSV allows you to work with files as with dictionaries, for this you need to create DictReader object. You can address to items by column names, not by indexes. For the original program to make similar output it must be changed as follows:

import csv
with open("classmates.csv", encoding='utf-8') as r_file:
    # Create a DictReader object, specify the delimiter character ","
    file_reader = csv.DictReader(r_file, delimiter = ",")
    # Counter for counting number of lines and outputting column titles
    count = 0
    # Read data from CSV file
    for row in file_reader:
        if count == 0:
            # Output the row containing the headers for the columns
            print(f'The file contains columns: {", ".join(row)}')
        # show the rows
        print(f' {row['Name']} - {row['Grades']}', end='')
        print(f' and he was born in {row['Year of birth']} year.')
        count += 1
    print(print(f'There are {count + 1} lines in the file.')
It’s more convenient to refer to items by column name, besides, it simplifies code understanding.

Note that the first iteration of the for loop will write the first row, not the table’s header, into the row. Therefore, when the number of rows is output, the variable count is incremented by 1.

Additional parameters of the DictReader object

DictReader has parameters:

  • dialect – A set of parameters for formatting information. More about them below.
  • line_num – Sets the number of lines that can be read.
  • fieldnames – Defines column headers, if this attribute is not defined, it will contain elements from the first read line of the file. Headers are needed to make it easy to understand what information is or should be contained in a column.

For example, if classmates.csv did not have the first row with headings, you could open it as follows:

fieldnames = ['Name', 'Grades', 'Year Born']
file_reader = csv.DictReader(r_file, fieldnames = fieldnames)

You can also use the __next__() method to get the next line. This method makes the reader object iterable. That is, it is called at each iteration and returns the next line. This method is used at each iteration of the for loop to get the next line.

Recording into files

To write information to a CSV file you must create a writer object:

file_writer = csv.writer(w_file, delimiter = "t")

The writerow() method, which has the following syntax, is used to write the data to the file:

writecol("first name", "last name", "middle name")

The program code for writing the CSV file looks like this

import csv
with open("classmates.csv", mode="w", encoding='utf-8') as w_file:
    file_writer = csv.writer(w_file, delimiter = ",", lineterminator="r")
    file_writer.writerow(["Name", "Class", "Age"])
    file_writer.writerow(["Zhenya", "3", "10"])
    file_writer.writerow(["Sasha", "5", "12"])
    file_writer.writerow(["Masha", "11", "18"])
Note that the writer used, lineterminator="r". This is the delimiter between table rows, by default it is "rn"

. After executing the program the CSV file will have the following text:

Name,Class,Age
Zhenya,3,10
Sasha,5,12
Masha,11,18

The writerow() method takes as a parameter the list whose elements will be written to the string through the delimiter character. Writing to a file can also be done using the DictWriter object. It is important to remember that it requires an explicit fieldnames parameter. A dictionary is used as an argument to the writerow method. The program code looks like this:

import csv
with open("classmates.csv", mode="w", encoding='utf-8') as w_file:
    names = ["Name", "Age"]
    file_writer = csv.DictWriter(w_file, delimiter = ",", 
                                 lineterminator="r", fieldnames=names)
    file_writer.writeheader()
    file_writer.writerow({"Name": "Sasha", "Age": "6"})
    file_writer.writerow({"Name": "Masha", "Age": "15"})
    file_writer.writerow({"Name": "Vova", "Age": "14"})

The output to the file will be as follows:

Name,Age
Sasha,6
Masha,15
Vova, 14

Additional DictWriter parameters

The writer object also has an attribute dialect, which determines how the data will be formatted when writing to the file, it will be described below. In addition,writer has methods:

  • writerows(rows) – Writes all elements of rows.
  • writeheader() – Outputs headers for columns. Headers must be passed to the writer object as a list, as a fieldnames attribute.

writerheader was used in the previous example. Consider the use of writerows:

file_writer.writerows([{"Name": "Sasha", "Age": "6"},
    {"Name": "Masha", "Age": "15"},
    {"Name": "Vova", "Age": "14"}])

Dialects

To avoid specifying the format of input and output data each time, certain formatting parameters are grouped into dialects. A programmer can specify the required dialect when creating a reader or writer object. In addition, some dialect parameters can be overridden manually by also specifying them when creating the object. To create a dialect, use the command:

register_dialect("name", delimiter = "t", ...)

The Dialect class allows you to define the following formatting attributes:

Attribute Value
delimiter Sets the character by which elements in the file are delimited. The default setting is comma.
doublequote If True, the quotechar character is doubled, if False, then ecsapechar is added to the qutechar character as a prefix.
escapechar A single character string which is used to escapechar the separator character.
lineterminator Defines the delimiter for strings, the default is “rn”
quotechar Specifies the character that is used to surround the delimiter character. The default is double quotes, i.e. quotechar = ‘ ‘ ‘.
quoting Specifies the character that is used to surround the separator character (if quotation marks are not used).
skipinitialspace If set to True all spaces after the delimiter character are ignored.
strict If set to True an Error exception will be thrown if CSV is not entered correctly.

Example usage:

import csv
csv.register_dialect('my_dialect', delimiter=':', lineterminator="r")
with open('classmates.csv', mode='w', encoding='utf-8') as w_file:
    file_writer = csv.writer(w_file, 'my_dialect')
    file_writer.writerow(['Name', 'Class', 'Age'])
    file_writer.writerow(['Zhenya', '3', '10'])
    file_writer.writerow(["Sasha", "5", "12"])
    file_writer.writerow(["Masha", "11", "18"])

Theresult will be:

Name:Class:Age
Zhenya:3:10
Sasha:5:12
Masha:11:18

Related Posts

LEAVE A COMMENT