4.15 Updating a Random-Access File
Credit: Luther Blissett
4.15.1 Problem
You
want to read a binary record from
somewhere inside a large file of fixed-length records, change the
values, and write the record back.
4.15.2 Solution
Read the record, unpack it, perform whatever computations you need
for the update, pack the fields back into the record, seek to the
start of the record again, and write it back. Phew. Faster to code
than to say:
import struct
thefile = open('somebinfile', 'r+b')
record_size = struct.calcsize(format_string)
thefile.seek(record_size * record_number)
buffer = thefile.read(record_size)
fields = list(struct.unpack(format_string, buffer))
# Perform computations, suitably modifying fields, then:
buffer = struct.pack(format_string, *fields)
thefile.seek(record_size * record_number)
thefile.write(buffer)
thefile.close( )
4.15.3 Discussion
This approach works only on files (generally binary ones) defined in
terms of records that are all the same, fixed size; it
doesn't work on normal text files. Furthermore, the
size of each record must be that defined by a
struct's format string, as shown
in the recipe's code. A typical format string, for
example, might be "8l", to specify that each
record is made up of eight four-byte integers, each to be interpreted
as a signed value and unpacked into a Python int.
In this case, the fields variable in the recipe
would be bound to a list of eight ints. Note that
struct.unpack returns a tuple. Because tuples are
immutable, the computation would have to rebind the entire
fields variable. A list is not immutable, so each
field can be rebound as needed. Thus, for convenience, we explicitly
ask for a list when we bind fields. Make sure,
however, not to alter the length of the list. In this case, it needs
to remain composed of exactly eight integers, or the
struct.pack call will raise an exception when we
call it with a format_string that is still
"8l". Also note that this recipe is not suitable
for working with records that are not all of the same, unchanging
length.
To seek back to the start of the record, instead of using the
record_size*record_number offset again, you may
choose to do a relative
seek:
thefile.seek(-record_size, 1)
The second argument to the seek method
(1) tells the file object to seek relative to the
current position (here, so many bytes back, because we used a
negative number as the first argument).
seek's default is to seek to an
absolute offset within the file (i.e., from the start of the file).
You can also explicitly request this default behavior by calling
seek with a second argument of
0.
Of course, you don't need to open the file just
before you do the first seek or close it right
after the write. Once you have a file object that
is correctly opened (i.e., for update, and as a binary rather than a
text file), you can perform as many updates on the file as you want
before closing the file again. These calls are shown here to
emphasize the proper technique for opening a file for random-access
updates and the importance of closing a file when you are done with
it.
The file needs to be opened for updating (i.e., to allow both reading
and writing). That's what the
'r+b' argument to
open means: open for reading and writing,
but do not implicitly perform any transformations on the
file's contents, because the file is a binary one
(the 'b' part is unnecessary but still recommended
for clarity on Unix and Unix-like systems—however,
it's absolutely crucial on other platforms, such as
Macintosh and Windows). If you're creating the
binary file from scratch but you still want to be able to reread and
update some records without closing and reopening the file, you can
use a second argument of
'w+b' instead. However, I have never
witnessed this strange combination of requirements; binary files are
normally first created (by opening them with 'wb',
writing data, and closing the file) and later opened for update with
'r+b'.
4.15.4 See Also
The sections of the Library Reference on file
objects and the struct module; Perl
Cookbook Recipe 8.13.
|