What these tools actually do

HeyGen is an AI video generation platform. You provide a script and a digital avatar (either a stock avatar or a custom one trained on your likeness), and it generates a video of the avatar delivering the script. Output: MP4, ready to upload natively to LinkedIn, YouTube, or wherever.

ElevenLabs is an AI voice synthesis and cloning platform. It can generate hyper-realistic speech from text, and it can train a voice model on a few minutes of your own speech so generated audio sounds like you.

Used together, they let you produce a short explanatory video of yourself delivering a script in under an hour — at a fraction of the cost and time of filming, lighting, and editing real footage.

These tools don't replace a good on-camera presence or a genuine personal brand. They lower the production barrier enough that "I don't have time to film" is no longer a valid excuse for not producing video content.

HeyGen — setup and workflow

There are two ways to use HeyGen: with a stock avatar (quick, no setup, less personal) or with a custom avatar trained on your likeness (more setup, more authentic, significantly better results for personal brand content).

Setting up a custom avatar

The requirements: 2–5 minutes of clean video footage of you talking directly to camera. Flat background, natural light (or a ring light), no background noise. You don't need to say anything specific — just speak naturally in varied sentences, including some with natural pauses, pitch changes, and emphasis.

Upload the footage to HeyGen → request avatar training (takes a few hours to process) → receive a notification when the avatar is ready. First render is typically 80–90% convincing. The main tells at first: unusual mouth movements on certain phonemes, occasional blink timing that's slightly off.

The production workflow

Total time per 60-second video with this workflow: 45–60 minutes, including script writing.

ElevenLabs — voice cloning

ElevenLabs produces more natural voice output than HeyGen's built-in voices — particularly for conversational content and longer-form narration. I use it in two ways:

As a voice layer for HeyGen

Generate the audio in ElevenLabs → download the MP3 → import into HeyGen as a custom voice track instead of using the platform's synthesis. This gives more natural delivery at the cost of an extra production step.

For audio-only content

Short audio clips for social posts, podcast-style narration for blog content, or voiceover for screen recordings. ElevenLabs produces natural-sounding narration from text that can replace or supplement real recording for most non-personal content.

Training the voice clone

Upload 3–5 minutes of clean speech audio. Varied sentence structures, natural delivery, no background noise. The resulting model handles new scripts with accurate inflection within a day or two of training. Edge cases where it struggles: unusual proper nouns, very long compound sentences, acronyms it hasn't seen pronounced.

The quality ceiling (and when it matters)

AI-generated video is visibly AI-generated if you look closely — particularly in close-up shots, emotional content, and anything that requires natural micro-expressions. The technology is good enough to be convincing in most B2B contexts; it is not good enough to be indistinguishable from filmed footage.

When quality ceiling matters:

When quality ceiling doesn't matter:

Best B2B use cases

The formats where AI video has the highest ROI in B2B marketing:

Common mistakes

My production workflow

Here is the exact workflow I use to produce a batch of 4 videos in a single session (roughly 3 hours total):

  1. Write all 4 scripts in one sitting. 100–150 words each. Store in Notion.
  2. Paste all scripts into HeyGen at once. Generate all 4 simultaneously (HeyGen processes in the background).
  3. While generating: build the Canva thumbnail for each video (5 min per thumbnail, template).
  4. Download all 4 MP4s when ready.
  5. Import into Descript in batch — auto-generate captions for all, clean up any errors, export.
  6. Schedule all 4 in Typefully with the thumbnails and LinkedIn post copy.

Batch production cuts the per-video overhead significantly. Context switching is the real time cost — batching eliminates it.