Source

The open function explained

This is a guest post by Philipp.

open opens a file. Pretty simple, eh? Most of the time, we see it being used like this:

f = open('photo.jpg', 'r+')
jpgdata = f.read()
f.close()

The reason I am writing this article is that most of the time, I see open used like this: There are three errors in the above code. Can you spot them all? If not, read on. By the end of this article, you’ll know what’s wrong in the above code, and, more importantly, be able to avoid these mistakes in your own code.

Let’s start with the basics: The return of open is a file handle, given out from the operating system to your Python application. You will want to return this file handle once you’re finished with the file, if only so that your application won’t reach the limit of the number of open file handle it can have at once.

Explicitly calling close closes the file handle, but only if the read was successful. If there is any error just after f = open(...), f.close() will not be called (depending on the Python interpreter, the file handle may still be returned, but that’s another story). To make sure that the file gets closed whether an exception occurs or not, pack it into a with statement:

with open('photo.jpg', 'r+') as f:
    jpgdata = f.read()

The first argument of open is the filename. The second one (the *mode*) determines *how* the file gets opened.

  • If you want to read the file, pass in r
  • If you want to read and write the file, pass in r+
  • If you want to overwrite the file, pass in w
  • If you want to append to the file, pass in a

While there are a couple of other valid mode strings, chances are you won’t ever use them. The mode matters not only because it changes the behavior, but also because it may result in permission errors. For example, if we were to open a jpg-file in a write-protected directory, open(.., 'r+') would fail.

The mode can contain one further character; we can open the file in binary (you’ll get a string of bytes) or text mode (a string of characters). In general, if the format is written by humans, it tends to be text mode. jpg image files are not generally written by humans (and are indeed not readable to humans), and you should therefore open them in binary mode by adding a b to the text string (if you’re following the opening example, the correct mode would be rb).

If you open something in text mode (i.e. add a t, or nothing apart from r/r+/w/a), you must also know which encoding to use - for a computer, all files are just bytes, not characters. Unfortunately, open does not allow explicit encoding specification in Python 2.x. However, the function io.open is available in both Python 2.x and 3.x (where it is an alias of open), and does the right thing.

You can pass in the encoding with the encoding keyword. If you don’t pass in any encoding, a system- (and Python-) specific default will be picked. You may be tempted to rely on these defaults, but the defaults are often wrong, or the default encoding cannot actually express all characters (this will happen on Python 2.x and/or Windows). So go ahead and pick an encoding. 'utf-8' is a terrific one.

When you write a file, you can just pick the encoding to your liking (or the liking of the program that will eventually read your file). How do you find out which encoding a file you read has? Well, unfortunately, there is no sureproof way to detect the encoding - the same bytes can represent different, but equally valid characters in different encodings. Therefore, you must rely on metadata (for example, in HTTP headers) to know the encoding. Increasingly, formats just define the encoding to be UTF-8.

Armed with this knowledge, let’s write a program that reads a file, determines whether it’s JPG (hint: These files start with the bytes FF D8), and writes a text file that describe the input file.

import io
 
with open('photo.jpg', 'rb') as inf:
    jpgdata = inf.read()
 
if jpgdata.startswith(b'\xff\xd8'):
    text = u'This is a jpeg file (%d bytes long)\n'
else:
    text = u'This is a random file (%d bytes long)\n'
 
with io.open('summary.txt', 'w', encoding='utf-8') as outf:
    outf.write(text % len(jpgdata))

Newsletter

×

If you liked what you read then I am sure you will enjoy a newsletter of the content I create. I send it out every other month. It contains new stuff that I make, links I find interesting on the web, and occasional discount coupons for my book. Join the 5000+ other people who receive my newsletter:

I send out the newsletter once every other month. No spam, I promise + you can unsubscribe at anytime

✍️ Comments

gunther

thanks, very clear and instructive explanation!!!

Jim Mooney

So what if you open without a handle? I often see this: print(open(‘data.txt’).read()) I assume it’s always closed automatically, but I hate to assume ;')

galo

ditto, that was a good question from Jim Mooney…. as of 2017 february… can someone plz share feedback?

Ernesto89

CONTENT = r"C:\Users\CHARLES\Desktop\test.txt"
D = open(CONTENT,'r').read()
print(type(D))
open(CONTENT,'r').close()

If you see the second line without the “.read()” the result will feed the object D, so that you can use D with .close() or .read(), however, I am putting .read() right next to the open() cause it will catch the information that will be saved in D as an object and instead of having D as an object a am turning this info into a string. You can test it with the following; print(type(D))

scoutchorton

JIM MOONEY, more than likely not. If it closed automatically, then why would there be a close function. The open() can be assigned to a variable, and that can be closed, read, written to, etc., so no it shouldn’t close automatically.

Say something

Send me an email when someone comments on this post.

Thank you!

Your comment has been submitted and will be published once it has been approved. 😊

OK