Week 4 - File Handling
This week we will learn how to read and write files in Python. We will also learn how to use the `with` statement to open files.
Files
Files are named locations on disk to store related information. They are used to permanently store data in a non-volatile memory (e.g. hard disk).
Since Random Access Memory (RAM) is volatile (which loses its data when the computer is turned off), we use files for future use of the data by permanently storing them.
When we want to read from or write to a file, we need to open it first. When we are done, it needs to be closed so that the resources that are tied with the file are freed.
Hence, in Python, a file operation takes place in the following order:
Open a file
Read or write (perform operation)
Close the file
Opening Files in Python
Python has a built-in open()
function to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.
The open()
function takes two parameters; filename, and mode.
We can specify the mode while opening a file. In mode, we specify whether we want to read (r
), write (w
), or append (a
) to the file. We also specify if we want to open the file in text mode (t
, default) or binary mode (b
).
The default is reading in text mode. In this mode, we get strings when reading from the file.
On the other hand, binary mode returns bytes and this is the mode to be used when dealing with non-text files like images or executable files.
There are various modes available for opening a file. The default mode is r
, which means open for reading in text mode. In addition, there are several modes:
- r
- open for reading (default)
- w
- open for writing, truncating the file first
- x
- open for exclusive creation, failing if the file already exists
- a
- open for writing, appending to the end of the file if it exists
- b
- binary mode
- t
- text mode (default)
- +
- open for updating (reading and writing)
- w+
- open for reading and writing, truncating the file first
- r+
- open for reading and writing, starting at the beginning of the file
- a+
- open for reading and writing, appending to the end of the file if it exists
- x+
- open for updating, failing if the file already exists
- b+
- open for updating in binary mode
- wb
- open for writing in binary mode
- rb
- open for reading in binary mode
- ab
- open for appending in binary mode
- rt
- open for reading in text mode (default)
- wt
- open for writing in text mode
- at
- open for appending in text mode
- xt
- open for exclusive creation in text mode, failing if the file already exists
Unlike other languages, the character 'a' does not represent the number 97 until it is encoded using an encoding scheme like ASCII or UTF-8.
Moreover, the default encoding is platform dependent. In windows, it is cp1252 but utf-8 in Linux.
So, we must not also rely on the default encoding or else our code will behave differently in different platforms.
Hence, when working with files in text mode, it is highly recommended to specify the encoding type.
Closing Files in Python When we are done with performing operations on the file, we need to properly close the file.
Closing a file will free up the resources that were tied with the file. While Python's garbage collector might eventually close unreferenced files, explicitly closing files is crucial.
The close()
method is used for this:
However, if an exception occurs during file operations, the f.close()
line might be skipped. A try...finally
block can guarantee closure:
The recommended and most Pythonic way to handle files is using the with
statement. This ensures that the file is automatically closed when the block inside the with
statement is exited, even if errors occur.
This approach is cleaner and safer.
Writing to Files in Python
In order to write into a file in Python, we need to open it in write w
, append a or exclusive creation x
mode.
We need to be careful with the w
mode, as it will overwrite the file if it already exists, erasing all previous data. If the file does not exist, it creates a new one.
Writing a string (in text mode) or a sequence of bytes (in binary mode) is done using the write()
method.
In text mode,
write()
returns the number of characters written.In binary mode, it returns the number of bytes written.
This program will create a new file named test.txt
in the current directory if it does not exist. If it does exist, it is overwritten with the new content.
We must include newline characters () ourselves to distinguish different lines in text files.
Reading Files in Python
To read a file in Python, we must open the file in reading r mode.
There are various methods available for this purpose. We can use the read(size) method to read in the size number of data. If the size parameter is not specified, it reads and returns up to the end of the file.
Assuming test.txt
was created with the content:
We can read it in the following ways:
We can see that the read()
method returns a newline as '\n'
. Once the end of the file is reached, we get an empty string on further reading.
We can change our current file cursor (position) using the seek()
method. Similarly, the tell()
method returns our current position (in number of bytes).
Output for seek()
/tell()
example (assuming test.txt
content from above):
We can read a file line-by-line using a for
loop. This is both efficient and memory-friendly for large files.
Output:
In this program, the lines in the file itself include a newline character . So, we use the end parameter of the print()
function to avoid two newlines when printing.
Alternatively, we can use the readline()
method to read individual lines of a file. This method reads a file up to and including the next newline character.
Lastly, the readlines()
method reads all remaining lines from the file and returns them as a list of strings. Each string in the list includes the newline character.
Python File Methods
There are various methods available with the file object. Some of them have been used in the above examples.
Here is the complete list of methods in text mode with a brief description:
close()
Closes an opened file. It has no effect if the file is already closed.
detach()
Separates the underlying binary buffer from the TextIOBase and returns it.
fileno()
Returns an integer number (file descriptor) of the file.
flush()
Flushes the write buffer of the file stream.
isatty()
Returns True if the file stream is interactive.
read(n)
Reads at most n characters from the file. Reads till end of file if it is negative or None.
readable()
Returns True if the file stream can be read from.
readline(n=-1)
Reads and returns one line from the file. Reads in at most n bytes if specified.
readlines(n=-1)
Reads and returns a list of lines from the file. Reads in at most n bytes/characters if specified.
seek(offset,from=SEEK_SET)
Changes the file position to offset bytes, in reference to from (start, current, end).
seekable()
Returns True if the file stream supports random access.
tell()
Returns an integer that represents the current position of the file's object.
truncate(size=None)
Resizes the file stream to size bytes. If size is not specified, resizes to current location.
writable()
Returns True if the file stream can be written to.
write(s)
Writes the string s to the file and returns the number of characters written.
writelines(lines)
Writes a list of lines to the file.
Directory and File Path Management
Python provides robust tools for managing directories and file paths. While the os
module has traditionally been used, the pathlib
module (introduced in Python 3.4) offers an object-oriented and more readable approach.
Using pathlib.Path
(Recommended)
The pathlib
module provides Path
objects to represent file system paths.
Creating a Directory:
Creating Nested Directories:
Deleting a Directory: Note: These methods typically only work if the directory is empty.
Renaming a File or Directory:
Moving a File or Directory: Moving is often achieved by renaming to a different path.
pathlib
offers many more features for path manipulation, checking existence, iterating over directory contents, etc., making it a powerful tool for modern Python development.
Error Handling for File Operations
When working with files, various errors can occur. Common ones include FileNotFoundError
if a file doesn't exist when trying to read it, or PermissionError
if the script doesn't have the necessary permissions. It's good practice to handle these exceptions using try-except
blocks.
Working with Common File Formats (CSV & JSON)
Python's standard library includes modules for easily working with common structured data formats like CSV and JSON.
CSV (Comma-Separated Values)
CSV files are simple text files where values are typically separated by commas. The csv
module helps manage the nuances of CSV formatting.
Writing to a CSV file:
Note: newline=''
is recommended when opening CSV files for writing to prevent blank rows.
Reading from a CSV file:
JSON (JavaScript Object Notation)
JSON is a lightweight data-interchange format that is easily readable by humans and parsable by machines. The json
module is used to work with JSON data.
Writing Python dictionary to a JSON file:
Reading from a JSON file into a Python dictionary:
These modules simplify handling the specifics of these formats, such as escaping special characters in CSV or converting between Python types and JSON types.
Last updated
Was this helpful?