Asset 20 8 2

Join 15,000 business owners, marketers and entrepreneurs. The Sunday newsletter you'll be annoyed only arrives once a week.

The Resource Library

The AI Avatar Video Workflow Mini-Guide

Script to clone to publish without sounding like a robot.

AI avatar video sounds like a shortcut. It is, but only if you set it up well. Most people skip the writing layer, rush the clone, and end up with content that feels cold and obviously synthetic. This guide walks you through the exact workflow that produces videos that feel human, publish consistently, and scale without you recording every time.

Section 1

Pick Your Platform Before You Build Anything

1.1

Choose your tool based on what you need

The three tools most businesses land on are HeyGen, Synthesia, and D-ID. HeyGen is the strongest for realistic motion and lip sync quality. Synthesia is the easiest to use for teams who need volume without technical setup. D-ID is the cheapest entry point for solo founders testing the format. Do not mix platforms mid-workflow. Pick one, build your avatar there, and stay inside that ecosystem until you have a clear reason to switch. The mistake most people make is choosing based on demo videos on the vendor's website. Those are best-case examples with professional lighting, proper source footage, and careful script pacing. Your clone will only be as good as what you feed in. The platform matters less than your preparation.

Section 2

Film Your Clone Footage Once, Get It Right

2.1

The source footage determines everything downstream

You need at least two minutes of continuous talking footage for a functional clone, though five to ten minutes produces noticeably better results. Record in a quiet room with consistent natural light on your face, no strong shadows, no backlight. A plain or softly blurred background works best. Speak naturally at your real pace, not slower or more enunciated than usual. The model learns your patterns, so performing differently in the clone session than you do in real life creates an uncanny result. Wear what you would normally wear on video. Avoid busy patterns, small stripes, or anything that creates visual noise. You will likely reuse this clone for months, so choose an outfit that feels neutral and professional for your positioning. Record multiple takes of the same content in slightly different emotional registers: calm, warmer, slightly more animated. Some platforms let you blend tone across takes, which helps vary the output later.

Section 3

Write Scripts That Sound Like You Talking, Not You Writing

3.1

The script is where most avatar videos fall apart

Avatar lip sync is only as convincing as the script is natural. If you write the way you write emails or proposals, the avatar will sound stiff. Script for speech. Use short sentences. Cut subordinate clauses. Replace every 'however' with 'but' and every 'in order to' with 'to'. Read the script aloud before you submit it. If you stumble on a phrase, rewrite it. If it takes more than one breath to say a sentence, split it. Aim for 120 to 150 words per minute for a conversational pace. Most written content runs much denser. A one-minute video needs around 130 words, which feels extremely short on the page but lands naturally on screen. Write the hook first: the opening two sentences must give the viewer a reason to stay. Do not start with your name or your business. Start with the thing they are trying to solve.

3.2

Use a repeatable script structure for every video

A four-part structure works for most short marketing videos. Open with the problem the viewer is experiencing. Name why it keeps happening. Deliver the shift: the specific insight or action that changes the outcome. Close with a single call to action. That is it. Every deviation from this structure adds length without adding value. For a 60 to 90 second video, you have room for one problem, one insight, and one action. For a two to three minute video, you can add a brief context section between the problem and the shift. Do not add more than that. Avatar video fatigue hits faster than real camera video because the subtle absence of full physical presence means attention drops sooner. Keep it tight.

Section 4

Add a Human Layer Before You Export

4.1

The human layer is what separates good avatar content from synthetic content

After you generate the raw video, watch it once at full speed before you do anything else. You are listening for two things: unnatural pauses between phrases, and words where the lip sync drifts. Most platforms let you adjust pacing in the script itself by adding commas or ellipses to slow delivery, or by removing pauses in the editor. Fix those two things first. Then add b-roll. A talking head, even a good avatar, benefits from cutaways to screen recordings, product screenshots, data visualisations, or relevant footage. The b-roll does not need to be elaborate. It needs to appear at the moments where you are making a specific point. When the avatar says 'here is what that looks like in practice', cut to what it looks like in practice. The viewer's brain registers the combination as more credible than either element alone.

4.2

Captions are non-negotiable

Most short video is consumed without sound on LinkedIn, Instagram, and Facebook. Without captions, you are making silent content that the viewer has to work to understand. Add captions as a baseline. Use a tool like Captions, Descript, or the native captioning inside CapCut. Clean up any transcription errors before you publish, particularly on technical terms or your name. Do not use auto-generated captions and leave them unchecked. AI transcription typically achieves around 90 to 95 percent accuracy, which means one to two errors per minute of content. Those errors break credibility on specialised content. Read every caption against the audio before the video goes anywhere.

Section 5

Build a Production System, Not a One-Off Process

5.1

Batch your scripts to create volume efficiently

The real leverage in avatar video is batching. Write four scripts in a single sitting. Generate all four avatars in one session. Edit captions and add b-roll in one sitting. This workflow produces four videos in roughly the same time it takes to produce one video if you approach each individually. Build a simple script library: a folder of finished scripts categorised by topic and funnel stage. Before you write a new script, check the library. You will often find a script that needs only minor adjustments for a new use case rather than a full rewrite. Over six months, this library becomes a significant asset. You can repurpose scripts for different platforms, different audiences, or as the basis for written content.

5.2

Use a short quality checklist before every publish

Before any avatar video goes live, run through five checks. First: does the hook land in under five seconds? Second: are all captions accurate with no transcription errors? Third: is there at least one b-roll cutaway? Fourth: does the video end with a single, specific call to action? Fifth: does the audio sound natural at the pacing used? If all five pass, it publishes. If any fail, fix only the failing item. Do not use the checklist as a reason to re-examine choices that already passed. The checklist exists to catch genuine problems, not to invite additional rounds of refinement.

Section 6

Publish Smart Across Platforms

6.1

One video, three formats, three platforms

Export every avatar video in 16:9 landscape for LinkedIn and YouTube, and 9:16 portrait for Instagram Reels, TikTok, and Facebook Reels. Most platforms let you reframe from landscape to portrait inside their native tools, but for avatar video specifically, reframe before export so the avatar stays centred in the portrait crop. A talking head that is cut off at the forehead or chin reads as unpolished regardless of how good the underlying content is. You do not need a different script for each format. The same 60 to 90 second script works across all three exports. The difference is crop, and occasionally the addition of a platform-specific caption line. Do not create unique scripts per platform until you have volume and data showing which content type performs on which platform for your specific audience.

6.2

Publish to a schedule, not a mood

Consistency matters more than frequency for avatar video. Two videos per week published on a fixed schedule outperforms five videos in week one and nothing in week two. LinkedIn rewards consistency in its algorithm. Instagram Reels and Facebook both surface content from accounts that publish regularly over accounts that publish in bursts. Choose a cadence you can hold for three months with no support. If that is one video per week, that is your cadence. Use your batched production sessions to stay two to three weeks ahead of the publish schedule. The moment you are producing and publishing on the same day, your quality will drop and your schedule will slip.

Want this built for you?

You do not have to do this yourself.

This resource hands you the volume. The strategy, the judgement, and the bit where it all connects is the work I do for clients: lead generation, ads, SEO, workflow automation, HubSpot, and the systems that make them compound. Done for you, consulting, coaching, or training.

Book a free 30-minute call Or get the Sunday newsletter

Lilach Bullock has spent 21 years in marketing. Forbes Top 20 (twice), Oracle Social Influencer of Europe, and ranked the number one digital marketing influencer in the UK. She now builds AI-powered marketing systems for entrepreneurs, service businesses, and founders. The Sunday newsletter goes to 15,000 readers at a 70%+ open rate.

lilachbullock.com