How to handle file uploads with Flask

In this article, let me tell you about how to allow your users to send files to your Flask app. There are simple solutions and more complex ones, depending on your needs!

Today let’s talk about two solutions: one suitable for small files, and one suitable for large files.

When a Flask app is receiving a file upload, it can’t do anything else while the file data is received. If the files are very large, this can completely block a Flask app and prevent other users from doing anything.

For small file uploads, we will accept this limitation. For large files, we will find a way around it so that other users of the app aren’t locked out of it while someone uploads a large file.

Easily handle small file uploads with Flask

There are multiple Flask extensions to help us with file uploads. The most actively developed one I found is Flask-Reuploaded. This is a fork of a widely used library, Flask-Uploads, which has been deprecated.

To install the library in your virtual environment:

pip install flask-reuploaded

Then, in your Flask app, you must create an UploadSet. To create this, we need a name for it and an iterable of allowed file extensions. The name is important, as we’ll use it later.

import secrets  # built-in modulefrom flask import Flask, flash, render_template, request, redirect, url_forfrom flask_uploads import IMAGES, UploadSet, configure_uploadsapp = Flask(__name__)photos = UploadSet("photos", IMAGES)

Then, let’s add a secret key to our application and configure the folder where photos will be saved.

The name of app.config["UPLOADED_PHOTOS_DEST"] contains PHOTOS, which is the uppercased name of our UploadSet. So if you created your UploadSet with UploadSet("invoices", ["pdf"]), then the configuration should be UPLOADED_INVOICES_DEST. More info in the documentation.

Calling configure_uploads(app, photos) stores the configuration of each UploadSet in the app, so Flask can access it.

app.config["UPLOADED_PHOTOS_DEST"] = "static/img"app.config["SECRET_KEY"] = str(secrets.SystemRandom().getrandbits(128))configure_uploads(app, photos)

Finally, we can create a view function that accepts a file coming in request.files, and use the UploadSet instance, stored in the photos variable, to save it:

@app.post("/upload")def upload():    if "photo" in request.files:        photos.save(request.files["photo"])        flash("Photo saved successfully.")        return redirect(url_for("index"))

My index route just renders a template which displays a file upload form:

@app.get("/")def index():    return render_template("upload.html")

And this is the upload.html template:

<!DOCTYPE html><html lang="en"><head>    <meta charset="UTF-8">    <meta http-equiv="X-UA-Compatible" content="IE=edge">    <meta name="viewport" content="width=device-width, initial-scale=1.0">    <title>Flask-Reuploaded Example</title></head><body>    {% with messages = get_flashed_messages() %}        {% if messages %}        <ul class="flashes">        {% for message in messages %}            <li>{{ message }}</li>        {% endfor %}        </ul>        {% endif %}    {% endwith %}    <form method="POST" enctype="multipart/form-data" action="{{ url_for('upload') }}">        <input type="file" name="photo">        <button type="submit">Submit</button>    </form></body></html>

How to handle large file uploads with Flask

We’ve briefly discussed how we can’t just send large files the same way as small files, because it could take a long time to send the file and that can block the Flask app.

Instead, we can use a client-side library to split the file into chunks and then upload them one at a time. That way the Flask app can serve other requests in between each chunk.

For this, the client-side library Dropzone.js is fantastic, and easy to work with!

When we upload a file using Dropzone.js, it splits the file into chunks and uploads them one at a time. With each chunk, some information is also included. At this point, massive props to Chris Griffith for his initial post on using this library with Flask. This section is heavily based on his original post.

This is what request.form contains on each chunk:

  • dzuuid: a UUID which identifies the file.
  • dztotalfilesize: the final file size of the entire file.
  • dzchunkindex: starts at 0 and increases by 1 per chunk (e.g. if there are 10 chunks, goes from 0 to 9).
  • dztotalchunkcount: the number of chunks to expect (e.g. 10).
  • dzchunksize: the size of each chunk, although the last chunk may be smaller than this.
  • dzchunkbyteoffset: the chunk size * chunk index, or how far into the file to start writing data.

In addition to this, request.files will contain the chunk data and upload filename.

Let’s start with making a Flask app to serve the template (which will include Dropzone.js) and to process the chunks:

import osfrom pathlib import Pathfrom flask import Flask, render_template, requestfrom werkzeug.utils import secure_filenameapp = Flask(__name__)@app.get("/")def index():    return render_template("index.html")@app.post("/upload")def upload_chunk():    pass

Here’s index.html. It:

  • Includes the Dropzone.js JavaScript and CSS files.
  • Defines an HTML form with an id property.
  • Has a JavaScript script that initializes Dropzone.js on that form, and makes sure it uses chunking.

Make sure to read the Dropzone.js documentation if you want to do more advanced things with it, because there are a lot of configuration options available.

So now that we’ve got this, we can simply upload a file and it will start sending chunks to our /upload endpoint. Let’s work on that next!

First let’s decide on where to save the incoming image. Because the image data will potentially come spread out across multiple requests, we need Flask to know the filename across requests. We can use the upload filename, which is the filename in the user’s computer, but that can lead to problems. It’s way too common for users to try upload photo.jpg for example, and then we would need to implement error handling so we don’t overwrite existing images.

This is where the dzuuid field will come in handy. We can just append a few characters from that to the upload filename, to come up with a unique filename:

@app.post("/upload")def upload_chunk():    file = request.files["file"]    file_uuid = request.form["dzuuid"]    # Generate a unique filename to avoid overwriting using 8 chars of uuid before filename.    filename = f"{file_uuid[:8]}_{secure_filename(file.filename)}"

With that, we can now come up with a path to save the image to. Let’s say we want to save them to static/img/FILENAME. Create the static/img folder if you haven’t already:

@app.post("/upload")def upload_chunk():    file = request.files["file"]    file_uuid = request.form["dzuuid"]    # Generate a unique filename to avoid overwriting using 8 chars of uuid before filename.    filename = f"{file_uuid[:8]}_{secure_filename(file.filename)}"    save_path = Path("static", "img", filename)

Finally, let’s start saving chunks! All we have to do is open the file at save_path, go to the end of it, and write the chunk that we’ve received.

To go to the end of the file we’ll use dzchunkbyteoffset as that should tell us the end of the file:

@app.post("/upload")def upload_chunk():    file = request.files["file"]    filename = f"{uuid.uuid4().hex[:8]}_{secure_filename(file.filename)}"    save_path = Path("static", "img", filename)    with open(save_path, "ab") as f:        f.seek(int(request.form["dzchunkbyteoffset"]))        f.write(file.stream.read())    return "Chunk upload successful.", 200

Voilà! You can test this out, and your file uploads should work!

But, not so fast… What about error handling?

Well, error handling doesn’t do well in blog posts! It’s much longer and more complicated than the actual working code!

Nonetheless, there are a few things we can do.

First, when we’re done saving chunks, we can check that the final filesize is equal to the filesize the user tried to upload. We can do this by comparing the size of the file we’ve saved to request.form["dztotalfilesize"]. We’ll also need the current chunk and the total number of chunks.

If there is an error, we’ll respond with a short message and a status code of 500. This will then show up in the Dropzone form.

@app.post("/upload")def upload_chunk():    file = request.files["file"]    file_uuid = request.form["dzuuid"]    # Generate a unique filename to avoid overwriting using 8 chars of uuid before filename.    filename = f"{file_uuid[:8]}_{secure_filename(file.filename)}"    save_path = Path("static", "img", filename)    current_chunk = int(request.form["dzchunkindex"])    with open(save_path, "ab") as f:        f.seek(int(request.form["dzchunkbyteoffset"]))        f.write(file.stream.read())    total_chunks = int(request.form["dztotalchunkcount"])    # Add 1 since current_chunk is zero-indexed    if current_chunk + 1 == total_chunks:        # This was the last chunk, the file should be complete and the size we expect        if os.path.getsize(save_path) != int(request.form["dztotalfilesize"]):            return "Size mismatch.", 500    return "Chunk upload successful.", 200

Okay, that’s not so bad! Let’s do some more. What if there’s an error writing to the file. For example, the file already exists, or the directory to which we want to save the file doesn’t exist. Let’s handle that when we open the file:

@app.post("/upload")def upload_chunk():    file = request.files["file"]    file_uuid = request.form["dzuuid"]    # Generate a unique filename to avoid overwriting using 8 chars of uuid before filename.    filename = f"{file_uuid[:8]}_{secure_filename(file.filename)}"    save_path = Path("static", "img", filename)    current_chunk = int(request.form["dzchunkindex"])    try:        with open(save_path, "ab") as f:            f.seek(int(request.form["dzchunkbyteoffset"]))            f.write(file.stream.read())    except OSError:        return "Error saving file.", 500    total_chunks = int(request.form["dztotalchunkcount"])    if current_chunk + 1 == total_chunks:        # This was the last chunk, the file should be complete and the size we expect        if os.path.getsize(save_path) != int(request.form["dztotalfilesize"]):            return "Size mismatch.", 500    return "Chunk upload successful.", 200

Something else you can do, which Chris does in his original post, is add logging to the various stages of upload. Most people forget about logging, so it’s a good idea to check out his article and see how he handles it!

Note: Chris Griffith has, since publishing his post, published a new and improved article at https://codecalamity.com/upload-large-files-fast-with-dropzone-js/. This touches on multithreading, which adds another bunch of complexity to the issue!

Where should you store file uploads with Flask?

Storing files is a tough problem! Files are extremely large, at least when compared to text, so you usually don’t want to store files in a database. It will slow the database down, and it will make it more difficult to scale.

Instead, your options are:

  • Store the files in the server that runs your app.
  • Or upload the files to a third-party file-storage service like Amazon S3 or Backblaze B2.

Storing files in the filesystem

If your service is simple, and you use a single server, then storing files in the server’s filesystem could work. This has the benefit that, at least using Python, all you have to do is what we’ve done in the code examples above.

Write to a file, and you’re done.

Then when you want your app to use the file you can just access the file in the disk.

Simple!

But there are numerous drawbacks…

  • If you ever want to scale to 2+ servers, you’ll need to find a different solution. Why? Because each server will have different files, and they won’t be able to access each other’s files easily.
  • You have to deal with backups yourself, so that you don’t lose everything in the server by accident.
  • Servers usually don’t have much storage because most applications don’t need a lot of storage. You may end up running out of disk space.
  • You could find that the hard drive becomes a bottleneck if you deal with a lot of uploads and downloads of files.

So instead, it’s usually better to use a third party service for file upload. I’ve been using Backblaze B2, and it’s quite easy to use. Its free tier is also generous at 1TB of storage.

If you decide to use a third party file storage service, then your Flask app will have to accept the incoming file from the client, save it to the filesystem in a temporary folder, and upload it to the third party service. It’s a bit more work, but then you don’t have to store anything in the server. We’ve written an article on how to upload files to Backblaze B2 using Python.

This means that if you want to spin up more servers to deal with increased traffic to your app, you can do that without worry. Also, file storage services will handle backups for you, and will have enough performance to deal with your requests no matter how many you make.

Of course, if you require more than the free plans offer in these file upload services, you have to pay! That’s the main drawback!

If you are building websites and using Flask, you may be interested in one of our courses, such as our Web Developer Bootcamp and Flask and Python. In it, we cover web design, web development, Flask, databases, and much more. Click the link to take the course with our best price!

Xổ số miền Bắc