The spec for X (Twitter)
If your goal is a short-form video that earns attention in the X feed, build to these constraints and norms:
- Aspect ratio - Vertical 9:16 performs best in mobile feed. Square 1:1 is viable for cross-posts. Horizontal 16:9 is fine for demos but reduces vertical real estate.
- Duration - The universal cap is 2 minutes 20 seconds, though Premium tiers allow longer uploads. For short-form, target 20 to 45 seconds. Keep trim versions under 15 seconds for replies and retweets.
- Autoplay behavior - Videos autoplay muted as users scroll. Assume sound-off by default. Hook visually in the first 2 seconds and always include captions.
- Captions - Burned-in captions are the safest since many viewers never tap sound. Include a separate SRT if available for accessibility.
- Encoding - MP4 (H.264) with AAC audio. 1080x1920 for 9:16, 1080x1080 for square. 30 fps for talking heads, up to 60 fps for fast motion. Target 8 to 12 Mbps for 1080p vertical to avoid muddy text.
- File weight - Keep under 500 MB for speedy upload and cross-posting. Smaller is better when quality holds.
The structure that works
Short-form on X wins by front loading value, using visual pattern interrupts, and respecting skim-speed attention. Here is a beat map that consistently performs for 20 to 45 second videos:
0:00 - 0:02 - Pattern interrupt
- A quick motion frame, a bold on-screen claim, or a surprising B-roll cut.
- Visually show the problem before saying a word - for example, a laggy UI micro clip if you are selling speed.
0:02 - 0:06 - Hook
- One sentence promise that is ultra-specific. Put the most important words on screen. Do not waste syllables on intros.
- Example: Shave 120 ms off your API calls with one header.
0:06 - 0:12 - Context + payoff preview
- Show the before and after in 2 to 3 cuts or a side-by-side.
- On-screen text: the outcome in numbers, not adjectives.
0:12 - 0:30 - Steps or proof
- Three beats maximum. Each beat gets a single line caption and a supporting visual. Cut relentlessly every 2 to 3 seconds.
- Examples:
- Step 1 - Turn on Brotli, show the config snippet.
- Step 2 - Add the header, show the before and after waterfall.
- Step 3 - Result, show the metric delta.
0:30 - 0:38 - CTA that matches X behaviors
- Make the ask fit the platform: Reply "config" for the snippet, link in bio for the full post, or follow for part 2.
- Keep the CTA on screen for at least 2 seconds. Do not stack more than one CTA.
If you need a longer cut for premium accounts, extend the proof block to 60-75 seconds, but preserve the first 12 seconds as-is. The open must still deliver the core value quickly.
Hooks that earn attention
Hooks on X pull in with specificity and tension. Use formulas that telegraph a result fast and make the scroller curious enough to watch 6 more seconds.
Problem-to-payoff
- Formula: Struggling with X? Here is the 10-second fix that does Y.
- Examples:
- Paying for cold starts? Here is the 10-second fix that cuts p95 by 28%.
- PRs keep stalling? One label gets reviews in under 2 hours.
Data-first reveal
- Formula: [Metric] changed by [number]. Here is the exact change.
- Examples:
- Signup conversion jumped 14.7%. Here is the one line we moved.
- p50 TTFB dropped 120 ms. Here is the header you are missing.
Myth-buster
- Formula: You do not need [common practice]. Do this instead.
- Examples:
- You do not need yet another cache layer. Set s-maxage correctly.
- Stop A/B testing button colors. Test copy that names the value.
Countdown micro-guide
- Formula: 3 steps to [result] in 30 seconds.
- Examples:
- 3 steps to faster Lighthouse scores in 30 seconds.
- 3 checks to cut render blocking by half.
Show the punchline first
- Formula: [Outcome], rewind to see how.
- Examples:
- CR at 4.1%, rewind to see the microcopy change.
- Cold emails booking at 8%, here is the first sentence.
Brand + voice
A single viral clip rarely builds trust. A recognizable brand system does. Consistent color, motion, type, lower thirds, and tone make your videos self-identifying within the first frames. That repetition compounds recall and lifts conversion over a campaign, not just a post.
With HyperVids' per-project brand kit, you define the reusable pieces once, then every cut stays on-brand without manual fiddling. Set your primary and secondary colors, upload your type stack, choose lower third styles, define transition rules, and lock your CTA format and end slate. The result is consistent and fast to iterate.
- Visual consistency - Prebuilt styles apply across hooks, steps, and CTAs. Your logo, safety margins, and color contrast are enforced.
- Voice consistency - Define tone descriptors like direct, technical, and concise. The script stays tight and aligned with your brand lexicon.
- Motion rules - Establish cut cadence, easing, and intro length. Your videos feel like a series instead of one-offs.
The workflow is optimized for developers. You can keep prompts short, lean on reusable brand context, and let the compositor assemble the layers. The /hyperframes skill coordinates framing, b-roll inserts, and on-screen text timing using your existing Claude CLI subscription so you do not spend hours in a timeline editor.
Captions + accessibility
X is a sound-off-first environment, so captions are not optional. Besides reach, accessible captions serve viewers who are deaf or hard of hearing, and they help second language audiences follow along quickly.
- Always-on captions - Burn in open captions so value is clear even muted. If the upload workflow allows, include an SRT for screen reader support.
- Readability rules
- Max 2 lines, 28 to 34 characters per line for mobile legibility.
- Keep 1 to 1.2 seconds minimum on screen per line, target 160 to 180 words per minute reading speed.
- Use high-contrast styling. White text with 2 to 3 px black outline or a 60% black box behind text meets 4.5:1 contrast ratio in most scenes.
- Safe margins: at least 5% from edges so UI chrome and player controls do not overlap.
- Avoid all caps for long sentences. Use sentence case with bold for emphasis only.
- Placement - Keep captions low but above the very bottom 12% to avoid being covered by UI. Nudge up when lower thirds appear.
- Speaker IDs - Only if multiple voices are present. Otherwise keep it clean.
- Color use - Reserve color in captions for CLI commands, code literals, or metrics. Do not rainbow-strobe your text.
A sample HyperVids prompt
Here is a realistic one-liner that fits X, assuming your brand kit already defines color, fonts, lower thirds, and transitions:
One-line prompt: "In 35 seconds, show how to cut p95 API latency by 120 ms using Brotli and the Accept-Encoding header, with a before and after waterfall, 3 quick steps, and a single CTA to reply 'config' for the snippet."
What happens next:
- The brand kit auto-applies your opener, caption style, and end slate. No manual keyframing.
- The /hyperframes skill proposes the beat map, drops the pattern interrupt clip up front, and times captions to 160 to 170 words per minute.
- Claude CLI supplies the tight script, code-safe caption wording, and concise hook text that fits 28 to 34 characters per line.
- You get a 9:16 master at 1080x1920 with a 30 fps talking-head layer, overlay screenshots for the before and after waterfall, and baked-in captions. A square 1:1 version is rendered for replies.
Common failure modes
Here is what reliably sinks short-form videos on X and how to avoid it:
- Soft open - Starting with your logo, a long intro, or a slow pan kills retention. Start with movement or the core claim in the first 2 seconds.
- Audio-dependent storytelling - If captions are not sufficient to convey the value, many viewers will never get it. Assume muted playback.
- Wall-of-text overlays - Long paragraphs on screen lead to skip. Force line length discipline and two-line maximum captions.
- Wrong aspect ratio - Posting 16:9 only in a vertical feed wastes pixels. Always render a 9:16 cut.
- Slow pacing - Shots that linger more than 3 seconds in a short clip feel sluggish. Cut faster, use punch-ins on key details.
- Overstuffed CTAs - Asking for a like, follow, click, and signup in one breath produces no action. Pick one.
- Illegible UI captures - Tiny code or dashboards shrink badly on mobile. Crop in, highlight the changed line, and use callouts.
- Low bitrate exports - Blurry type erodes trust. Export at 8 to 12 Mbps for 1080p vertical and avoid heavy compression on upload.
- Off-brand visuals - Inconsistent fonts and colors make your series forgettable. Lock styles once, reuse them across episodes.
- Hashtag spam - Let the content carry discovery. One or two precise tags are plenty if you use any at all.
Conclusion
Short-form videos that work on X respect the feed, get to value in seconds, and make the message clear without sound. Anchor on vertical 9:16, keep the open crisp, and use captions that anyone can read at a glance. A consistent brand system multiplies the effect over time, because viewers learn to recognize you before they even read the handle. Production does not need to be heavy if you set the structure, captions, and brand rules once, then iterate weekly on topics and hooks.
FAQ
What is the ideal length for X?
20 to 45 seconds. You have room for 2 minutes 20 seconds in most accounts, but attention patterns reward brevity. Keep the open under 6 seconds, body under 24 seconds, and a 2 to 4 second CTA.
Should I post square or vertical?
Vertical 9:16 first. Square 1:1 is a good secondary export for replies and cross-posts, but the main feed is vertically oriented on mobile, which is where most views happen.
Do I need captions if I have a strong voiceover?
Yes. X autoplays muted. Burned-in captions increase comprehension and watch time, and they make your content accessible. Keep lines short, high contrast, and time them to the voiceover for a polished result.