7 min read

Sync a Folder to Any Cloud with Pulumi

Sync a Folder to Any Cloud with Pulumi
Elliott Bay, Seattle, WA

Lately I've been hacking away on a handful of Pulumi templates to make it easier to deploy common cloud architectures --- static websites, serverless apps, containers, virtual machines, and the like. The idea with templates is that with a single run of pulumi new, you should be able to kick out a finished program, in your language of choice, that works out of the box and can handle the most common scenarios, while still leaving you plenty of room for expansion and customization. Templates, I think, are one of Pulumi's most unsung (and unique) features; unlike components, they're open rather than closed, so they tend to be great not just for bootstrapping new projects, but also as tools for learning, as they're pretty much made to be tinkered with.

My first task was to build out a set of templates for static websites, which, as you probably know, are essentially just a bunch of HTML, CSS, and JavaScript files stuffed into a folder somewhere. This site, for example, is a static website built with Next.js, one of the more popular static-site generators these days, and as of today, it clocks in at a little over a thousand pages. At work, we build pulumi.com with another popular static-site generator called Hugo, and all up, that site comes in at a little over 20,000 pages. Static websites vary drsmatically in terms of size and complexity, but ultimately they all have that same thing in common: they're all just a bunch of files in a folder on a computer connected to the internet --- a computer that likely belongs either to Amazon Web Service, Microsoft Azure, or Google Cloud Platform.

But while building out the first set of these templates, I ran into a problem: getting files into the cloud isn't always as easy as you'd think, at least not with declarative infrastucture-as-code tools like Pulumi.

Here's an example. Imagine you had a bunch of files in a folder on your computer that you wanted to get into the cloud. With Pulumi, the way to do that would be to begin by writing a program declaring some cloud storage, which, if you were using AWS, might look something like this:

import * as aws from "@pulumi/aws";

// Make a bucket.
const bucket = new aws.s3.Bucket("my-bucket");

Easy enough, bucket created. Now, let's upload some files!

Here's where things get tricky, though. Most of the examples you'll find online (including several I've written myself) handle this by using the programming environment to fetch a list of files from the filesystem, then declare them, one by one, as individual cloud resources, like so:

import * as aws from "@pulumi/aws";
import * as fs from "fs";
import * as mime from "mime";
import * as glob from "glob";

// Make a bucket.
const bucket = new aws.s3.Bucket("my-bucket");

// List the files in `myFolder`, then for each one, declare an `s3.BucketObject`.
glob.sync(`${myFolder}/**/*`).forEach((path: string) => {
    if (!fs.lstatSync(path).isDirectory()) {
        let object = new aws.s3.BucketObject(path.replace(siteDir, ""), {
            bucket: this.resource,
            source: new pulumi.asset.FileAsset(path),
            contentType: mime.getType(path) || "text/plain",
        });
    }
});

And this works, to be sure. With Python, Go, C#, Java, things look a bit different of course, but in general, the approach is the same: make a bucket, then use the facilities of your chosen programming language to get the files into the bucket.

But while being able to break out of the declarative box and write a little code when you need to is awesome --- indeed, it's one of the other things that makes Pulumi so special --- there's something about having to write code like this I also find a bit irritating. All I need to do point a push a folder into the cloud. Shouldn't I be able to just say that? Why do I have to dig into the filesystem, generate and manipulate a bunch of file paths, programmatically deduce MIME types, and so on, just to move a bunch of stuff verbatim from one folder to another? I mean it's great that you can do that, sure. But it'd also be great if you didn't have to.

And sometimes, you just can't. Take YAML, for instance. I love our YAML support. It's so tidy:

resources:

  # Make a bucket.
  my-bucket:
    type: aws:s3:Bucket

But there's one little problem with YAML: It isn't a programming language. Unlike with languages like TypeScript and others, you can't break out of the box and reach into the filesystem with YAML --- and that's by design. We built YAML to give our users the tersest, most statically declarative option we could, and we did: you literally can't write code with Pulumi YAML, just expressions. Of course, you can always convert your program to a different language if you find YAML too limiting, and that's great, too --- but it's also a pretty big thing to have consider when all you need to do is push a few files into the cloud.

So I took a few days to figure this out, for all languages and all three major clouds. The result is a little Pulumi package called Synced Folder.

Introducing Synced Folder

Synced Folder is a multi-language Pulumi component that you can install and use with any Pulumi-supported language. It's deliberately simple and focused, and it works --- or at least it's my hope that it works --- exactly as you'd imagine: make a bucket, sync a folder to that bucket, and be done. Here's how it looks in TypeScript, for example:

import * as aws from "@pulumi/aws";
import * as synced from "@pulumi/synced-folder";

// Make a bucket.
const bucket = new aws.s3.Bucket("my-bucket", {
    acl: aws.s3.PublicReadAcl,
});

// Sync the contents of `my-folder` to the bucket.
const folder = new synced.S3BucketFolder("synced-folder", {
    path: "./my-folder",
    bucketName: bucket.bucket,
    acl: aws.s3.PublicReadAcl,
});

Here's an example in Python:

from pulumi_aws import s3
import pulumi_synced_folder

# Make a bucket.
bucket = s3.Bucket(
    "my-bucket",
    acl=s3.CannedAcl.PUBLIC_READ,
)

# Sync the contents of `my-folder` to the bucket.
folder = pulumi_synced_folder.S3BucketFolder(
    "synced-folder",
    path="./my-folder",
    bucket_name=bucket.bucket,
    acl=s3.CannedAcl.PUBLIC_READ,
)

And in Go:

package main

import (
	"github.com/pulumi/pulumi-aws/sdk/v5/go/aws/s3"
	synced "github.com/pulumi/pulumi-synced-folder/sdk/go/synced-folder"
	"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
)

func main() {
	pulumi.Run(func(ctx *pulumi.Context) error {

        // Make a bucket.
		bucket, err := s3.NewBucket(ctx, "my-bucket", &s3.BucketArgs{
			Acl: s3.CannedAclPublicRead,
		})
		if err != nil {
			return err
		}

        // Sync the contents of `my-folder` to the bucket.
		_, err = synced.NewS3BucketFolder(ctx, "synced-folder", &synced.S3BucketFolderArgs{
			Path:       pulumi.String("./my-folder"),
			BucketName: bucket.Bucket,
			Acl:        s3.CannedAclPublicRead,
		})
		if err != nil {
			return err
		}

		return nil
	})
}

In C#:

using Pulumi;
using Pulumi.Aws.S3;
using Pulumi.SyncedFolder;

return await Deployment.RunAsync(() =>
{
    // Make a bucket.
    var bucket = new Bucket("my-bucket", new BucketArgs {
        Acl = CannedAcl.PublicRead,
    });

    // Sync the contents of `my-folder` to the bucket.
    var folder = new S3BucketFolder("synced-folder", new S3BucketFolderArgs {
        Path = "./my-folder",
        BucketName = bucket.BucketName,
        Acl = (string)CannedAcl.PublicRead,
    });
});

And finally, in YAML:

resources:

  # Make a bucket.
  my-bucket:
    type: aws:s3:Bucket
    properties:
      acl: public-read

  # Sync the contents of `my-folder` to the bucket.
  synced-folder:
    type: synced-folder:index:S3BucketFolder
    properties:
      path: ./my-folder
      bucketName: ${my-bucket.bucket}
      acl: public-read

And if you're targeting Microsoft Azure or Google Cloud, it's just the same: make an Azure Blob Storage container or a Google Cloud Storage bucket, then drop in a reference to Synced Folder and call it a day. See the docs for examples with various languages and clouds.

To deploy the contents of the folder, just run pulumi up in the usual way:

$ pulumi up

When you make local changes to the files in your synced folder --- add new files, delete old ones, change existing ones, etc. --- simply deploy the changes again in the same way. (Changes, by the way, are synced one way only, folder to cloud, so deleting a file in the cloud will have no effect on the files on your local machine.)

To remove the cloud bucket and all of its contents, run pulumi destroy:

$ pulumi destroy

Pretty neat, IMHO --- and again, hopefully works just like you'd expect.

Bonus feature: Unmanaged file objects

One thing you'll notice about the component's default behavior is that it manages your files as individual cloud resources --- e.g., as discrete Amazon s3.BucketObjects or Azure storage.Blobs --- and not just as an abstract "folder" resource. For folders that contain only a few files, that's probably fine, and indeed it's sometimes nice to have Pulumi track changes to those files individually as well. But if your folder were to contain, say, a thousand files or more, you could end up waiting a long time for Pulumi to examine and reconcile the state of every last one of those file objects. Moreover, if you happen to be using Pulumi on a commercial plan, you might not love the idea of each one of those objects contributing to your monthly bill.

So to address these two unpleasantries, I added an option to instruct Pulumi to ignore individual files and instead, delegate the work of synchronization to the cloud provider's official CLI. By setting the optional property managedObjects to false, you can have Pulumi invoke the AWS, Azure, or Google Cloud CLI during an update to sync your files that way instead:

import * as aws from "@pulumi/aws";
import * as synced from "@pulumi/synced-folder";

const bucket = new aws.s3.Bucket("my-bucket", {
    acl: aws.s3.PublicReadAcl,
});

const folder = new synced.S3BucketFolder("synced-folder", {
    path: "./my-folder",
    bucketName: bucket.bucket,
    acl: aws.s3.PublicReadAcl,

    // 👇 Set this property to false to manage files with `aws s3 sync`.
    managedObjects: false,
});

Synchronization-wise, the effect the same: files that exist locally but not remotely are uploaded, and those that exist remotely but not locally are deleted. But the upshot is that it's much faster and less expensive than using Pulumi alone, so if you're okay with Pulumi shelling out to the aws, az, or gcloud command-line tools, you might want to give it a try. Again, see the docs for details and examples.

Enjoy! And happy Pulumifying.