Programmers are often faced with the task of handling large amounts of structured data. Python has a built-in CSV library with which the programmer can work with special CSV files. They are a kind of spreadsheet.
Table of Contents
What are CSV files?
A CSV file is a special kind of file that allows you to structure large amounts of data. It is essentially an ordinary text file, but each new element is separated from the previous one by a comma or other separator. Typically, each entry starts on a new line. CSV data can easily be exported to spreadsheets or databases. The programmer can extend the CSV file by adding new lines. Example of a CSV file that uses a comma as a separator:
Name,Profession,Year of Birth Viktor, Turner, 1995 Sergey, Welder, 1983
Any other delimiter can be used instead of a comma, so when reading a CSV file, you should know in advance which character is used. It is important to remember that CSV is a plain text file and does not support characters in encodings other than ASCII or Unicode.
The CSV library
This is the main library for working with CSV files in Python. The csv library is built-in, so you don’t need to download it, just use regular imports:
import csv
Reading from files (parsing)
In order to read data from a file, the programmer must create an object reader:
reader_object = csv.reader(file, delimiter = ",")
reader has a __next__()
method, that is, iterable object, so reading from the file is as follows:
import csv with open("classmates.csv", encoding='utf-8') as r_file: # We create a reader object, specify the delimiter character "," file_reader = csv.reader(r_file, delimiter = ",") # Counter for counting number of lines and outputting column titles count = 0 # Read data from CSV file for row in file_reader: if count == 0: # Output the row containing the headers for the columns print(f'The file contains columns: {", ".join(row)}') else: # Output rows print(f' {row[0]} is {row[1]} and he was born in {row[2]} year.') count += 1 print(print(f'There are {count} lines in the file.')
Suppose we have a CSV file that contains the following information:
Name,Grades,Year of Birth Sasha,honor student,200 Masha,good,1999 Petya,Failing,2000
Then, if you open this file in our program, you will get the following results:
The file contains columns: Name, Grades, Year of Birth Sasha is an excellent student and he was born in year 200. Masha is a good student and he was born in 1999. Petya is a "C" student and he was born in 2000. In total there are 4 lines in the file.
The use of the with…as construct allows the programmer to be sure that the file will be closed, even if some error occurs during code execution.
Library CSV allows you to work with files as with dictionaries, for this you need to create DictReader object. You can address to items by column names, not by indexes. For the original program to make similar output it must be changed as follows:
import csv with open("classmates.csv", encoding='utf-8') as r_file: # Create a DictReader object, specify the delimiter character "," file_reader = csv.DictReader(r_file, delimiter = ",") # Counter for counting number of lines and outputting column titles count = 0 # Read data from CSV file for row in file_reader: if count == 0: # Output the row containing the headers for the columns print(f'The file contains columns: {", ".join(row)}') # show the rows print(f' {row['Name']} - {row['Grades']}', end='') print(f' and he was born in {row['Year of birth']} year.') count += 1 print(print(f'There are {count + 1} lines in the file.')
Note that the first iteration of the for
loop will write the first row, not the table’s header, into the row. Therefore, when the number of rows is output, the variable count
is incremented by 1.
Additional parameters of the DictReader object
DictReader has parameters:
dialect
– A set of parameters for formatting information. More about them below.line_num
– Sets the number of lines that can be read.fieldnames
– Defines column headers, if this attribute is not defined, it will contain elements from the first read line of the file. Headers are needed to make it easy to understand what information is or should be contained in a column.
For example, if classmates.csv did not have the first row with headings, you could open it as follows:
fieldnames = ['Name', 'Grades', 'Year Born'] file_reader = csv.DictReader(r_file, fieldnames = fieldnames)
You can also use the __next__()
method to get the next line. This method makes the reader object iterable. That is, it is called at each iteration and returns the next line. This method is used at each iteration of the for
loop to get the next line.
Recording into files
To write information to a CSV file you must create a writer object:
file_writer = csv.writer(w_file, delimiter = "t")
The writerow() method, which has the following syntax, is used to write the data to the file:
writecol("first name", "last name", "middle name")
The program code for writing the CSV file looks like this
import csv with open("classmates.csv", mode="w", encoding='utf-8') as w_file: file_writer = csv.writer(w_file, delimiter = ",", lineterminator="r") file_writer.writerow(["Name", "Class", "Age"]) file_writer.writerow(["Zhenya", "3", "10"]) file_writer.writerow(["Sasha", "5", "12"]) file_writer.writerow(["Masha", "11", "18"])
lineterminator="r"
. This is the delimiter between table rows, by default it is "rn"
. After executing the program the CSV file will have the following text:
Name,Class,Age Zhenya,3,10 Sasha,5,12 Masha,11,18
The writerow()
method takes as a parameter the list whose elements will be written to the string through the delimiter character. Writing to a file can also be done using the DictWriter object. It is important to remember that it requires an explicit fieldnames parameter. A dictionary is used as an argument to the writerow method. The program code looks like this:
import csv with open("classmates.csv", mode="w", encoding='utf-8') as w_file: names = ["Name", "Age"] file_writer = csv.DictWriter(w_file, delimiter = ",", lineterminator="r", fieldnames=names) file_writer.writeheader() file_writer.writerow({"Name": "Sasha", "Age": "6"}) file_writer.writerow({"Name": "Masha", "Age": "15"}) file_writer.writerow({"Name": "Vova", "Age": "14"})
The output to the file will be as follows:
Name,Age Sasha,6 Masha,15 Vova, 14
Additional DictWriter parameters
The writer object also has an attribute dialect, which determines how the data will be formatted when writing to the file, it will be described below. In addition,writer has methods:
- writerows(rows) – Writes all elements of rows.
- writeheader() – Outputs headers for columns. Headers must be passed to the writer object as a list, as a fieldnames attribute.
writerheader was used in the previous example. Consider the use of writerows
:
file_writer.writerows([{"Name": "Sasha", "Age": "6"}, {"Name": "Masha", "Age": "15"}, {"Name": "Vova", "Age": "14"}])
Dialects
To avoid specifying the format of input and output data each time, certain formatting parameters are grouped into dialects. A programmer can specify the required dialect when creating a reader or writer object. In addition, some dialect parameters can be overridden manually by also specifying them when creating the object. To create a dialect, use the command:
register_dialect("name", delimiter = "t", ...)
The Dialect class allows you to define the following formatting attributes:
Attribute | Value |
delimiter | Sets the character by which elements in the file are delimited. The default setting is comma. |
doublequote | If True, the quotechar character is doubled, if False, then ecsapechar is added to the qutechar character as a prefix. |
escapechar | A single character string which is used to escapechar the separator character. |
lineterminator | Defines the delimiter for strings, the default is “rn” |
quotechar | Specifies the character that is used to surround the delimiter character. The default is double quotes, i.e. quotechar = ‘ ‘ ‘. |
quoting | Specifies the character that is used to surround the separator character (if quotation marks are not used). |
skipinitialspace | If set to True all spaces after the delimiter character are ignored. |
strict | If set to True an Error exception will be thrown if CSV is not entered correctly. |
Example usage:
import csv csv.register_dialect('my_dialect', delimiter=':', lineterminator="r") with open('classmates.csv', mode='w', encoding='utf-8') as w_file: file_writer = csv.writer(w_file, 'my_dialect') file_writer.writerow(['Name', 'Class', 'Age']) file_writer.writerow(['Zhenya', '3', '10']) file_writer.writerow(["Sasha", "5", "12"]) file_writer.writerow(["Masha", "11", "18"])
Theresult will be:
Name:Class:Age Zhenya:3:10 Sasha:5:12 Masha:11:18