Announcing Flick Filmmaker Residency. Apply to redefine what great films can be with Flick.

Behind "The Herder": When Rain Has a Job

Jaime Yao

Mar 2026

It started on a rainy day in Los Angeles—the kind that makes the city feel briefly unfamiliar, like the volume has been turned down. I was indoors watching short films and letting the weather fill the room. That's when I ran into La Luna and El Empleo. Two completely different tones, but they hit the same thought: what if the things we take for granted aren't "just there"—what if they're held up by someone's labor?

El Empleo especially stuck with me. It quietly flips everyday convenience into a question: what's behind the smooth surface we call "normal"? I kept thinking about perspective—how we look straight ahead at a functioning world and assume it's automatic. But "automatic" might just mean "someone else is doing the work where we don't look."

Outside, the rain started to feel like part of the same idea. Rain is one of those facts you don't negotiate with. It arrives, it keeps going, it sets the street's rhythm. And then a simple "what if" landed: what if rain doesn't simply happen? What if there's a job up there—ordinary, tiring, unromantic—that makes the rain stop when it's time?

That was the first spark of The Herder: not a hero, not a wizard—just a worker whose craft becomes invisible the moment it works. What I need is a short script.

Image

For the visual language of The Herder, I kept coming back to miniature macro photography—the tilt-shift feeling where the world is sharp in one slice and melts away everywhere else. It's perfect for this story because it makes labor feel physical. Wet wood grain, rope fibers, water beading on metal—those tiny textures convince you someone is really working up there. The shallow depth of field also does something magical: it simplifies the background into soft shapes, so the audience doesn't need a huge explanation of the world. You're guided straight to the job—hands, tools, ladder, clouds. And the "miniature" vibe gives the film a quiet fable quality: real enough to believe, stylized enough to accept a cloud-trimming profession without over-explaining it. Practically, it also plays nicely with an AI pipeline—small inconsistencies get absorbed into bokeh and haze, while the important things stay readable and controlled.

Before anything else, I had to pick an era. I landed on "modern Europe" (roughly 1800s–early 1900s) almost by instinct. Maybe because it sits in a sweet spot: close enough to us that coats, tools, and street life feel practical, but far enough that a strange job can hide in plain sight. It's a world where craft still matters—leather straps, iron hardware, heavy wool, umbrellas that look like real working gear—not props. That texture helps the Cloud Herder feel like a tradesman, not a fantasy character.

To keep him consistent from front to back (and shot to shot), I treated him like a real production asset. I drew multiple turnaround sheets—front/side/back—in different "states": with hat, without hat, with umbrella, hands free, wet coat, gear shifting. The goal wasn't just design; it was continuity. When the film cuts fast, his silhouette and proportions stay readable, and the audience never has to wonder if it's a different guy.

Image

After the early concept decisions, the workflow becomes pretty "factory mode," and that's where most of my AI tips actually come from. I usually start in Midjourney to get a solid base frame—just enough character description to lock the vibe and lighting. I'm not trying to finalize the protagonist there. The real continuity work happens later: once I have a few usable frames, I use Nano Banana to swap in my designed character based on the turnaround sheets, so the front/side/back silhouette stays stable across shots.

One habit that changed everything for me is thinking in first frame / last frame pairs. If I want clean control over motion and story direction, I'll often generate the "end state" first, then build the "start state" by editing that same frame—or I'll do it in reverse. For example, here I treated the version with the protagonist as the last frame, then generated an empty plate as the first frame, and asked the video model to bridge them so he "emerges" out of the snow/cloud.

And honestly, Flick.art became the glue for this whole mess. When you're juggling base images, variants, character swaps, and multiple in-betweens, having a clear visual board that keeps everything organized makes the process feel readable again.

Image

This is a good example of an image being "wrong" in content, but strangely right in mood. The very first version was nothing like the shot I had in my head. It was just a simple cotton ground—flat, foggy, quiet—and a ladder stuck into it. No dramatic cloud "ceiling," no clear destination, more like an awkward object in an empty field. Conceptually it missed the point. But the atmosphere was perfect: the soft whiteout, the scale, the silence, that feeling of being alone inside weather.

That's why I kept it. Instead of throwing it away, I treated it as raw material. I flipped the frame in Photoshop to test a new read, and suddenly the same elements started to behave like an ascent—like the world finally had an up and a down. From there, Nano Banana became a process of careful iteration: lock the framing, refine the ladder, control the fog, protect the negative space, and let the figure's movement stay minimal and believable.

In the end, the shot became my favorite not because it was planned perfectly, but because it was discovered: a wrong image with the right feeling, slowly turned into the moment where he reaches the edge of the clouds. The feeling of the shot is quiet, but tense—like watching someone approach a boundary they're not supposed to cross. The scale is almost absurd: one human, one ladder, and an endless blank.

Image

If I had to sum up The Herder, it's this: I'm someone who thinks in images first. I usually start with a feeling—rain that won't stop, a job you never notice, a person climbing toward a blank white edge—and only later try to wrap a story around it. Writing a perfectly "complete" narrative isn't my strongest skill, but framing and mood are. I know when a shot is honest, when the scale feels right, and when the silence is doing the work. The workflow wasn't one big magic button—it was a lot of small decisions.