Marking the first anniversary of the Chinese language video era device, Kling AI, its father or mother firm, Kuaishou, has launched their most superior mannequin but – Kling 2.1. After the success of Kling 1.6 and a pair of.0, customers and creators have been ready for the discharge of Kling AI’s subsequent large factor, and it’s lastly right here. With superior video era capabilities and higher coherence and rendering abilities, Kling 2.1 stands as a formidable contender within the AI video era area in opposition to proprietary fashions akin to Google’s Veo 3 and OpenAI’s Sora. On this article, we’ll discover the options and video era capabilities of Kling 2.1 and see how properly it performs in opposition to Veo 3.
What Is Kling 2.1?
Kling 2.1 is a sophisticated AI-powered video era mannequin developed by Kuaishou. It transforms reference photos and textual content prompts into high-definition, cinematic movies, leveraging subtle applied sciences like 3D spatiotemporal consideration mechanisms and diffusion transformer architectures. Designed to simulate real-world physics and complicated movement dynamics, Kling 2.1 goals to ship movies which are each visually beautiful and contextually coherent. Constructing upon its predecessor, Kling 2.0, this newest iteration introduces enhancements that cater to each newcomers in addition to seasoned professionals.
Options of Kling 2.1
Listed below are a number of the key options of Kling 2.1:
- Body-based Video Technology: Versus most video era fashions that concentrate on text-to-video era, Kling 2.1 generates movies primarily based on enter photos as reference frames.
- Real looking Movement and Physics Simulation: Using a 3D spatiotemporal joint consideration mechanism, Kling 2.1 precisely fashions complicated actions, making certain that generated movies adhere to the legal guidelines of physics and exhibit pure movement.
- Dynamic Facial Expressions: The mannequin excels in producing life-like facial expressions and correct actions, enhancing the realism of characters and making them extra partaking.
- A number of Video Choices: Kling 2.1 provides creating a number of movies from the identical immediate, giving customers extra freedom and selection, with out the necessity for a number of iterations.
- AI-powered Prompting: For individuals who discover it tough to write down detailed and correct prompts for video era, the mannequin provides a DeepSeek-powered AI device for producing prompts.
Additionally Learn: 10 Wonderful Video Technology Instruments You Must Verify Out Right now!
How one can Entry Kling 2.1
Kling 2.1 and its Grasp model are each obtainable on the Kling AI web site and app. Customers all over the world can enroll with simply an e mail ID, and check out the fashions straight for image-to-video era, utilizing the free credit given throughout enroll. Word that these fashions can solely be used for image-to-video era, as of now.
How one can Use Kling 2.1
Right here’s how one can generate movies from photos utilizing Kling 2.1 and Kling 2.1 Grasp:
- Choose the Mannequin on Kling AI
When you open the web site, choose Kling 2.1 (or Kling 2.1 Grasp) from the mannequin choice drop-down menu on high.
- Add Reference Photos
Beneath the image-to-video tab, choose ‘Frames’ and add a reference picture for use because the beginning body or finish body of the generated video. Please word that the Components characteristic is at the moment not supported by Kline 2.1.
- Add a Immediate
You may have the choice of including a immediate to explain the video or a detrimental immediate explaining what you wouldn’t need within the video. You’ll be able to even use DeepSeek to generate detailed prompts for you primarily based in your description, theme, or thought.
- Configure the Properties
After getting the reference picture and prompts (non-obligatory) in place, select if you’d like a typical or skilled (for VIP customers) video. Then resolve on the size of the video (5 or 10 seconds) and the variety of outputs you want to generate (upto 4). Please word that solely VIP customers have the choice of producing a number of movies from a single picture/immediate.
- Generate the Video
Now that you just’re all set, merely click on on ‘Generate’ and wait in line for the mannequin to generate your video. Within the free model, this would possibly take as much as 120 minutes.
- Generate Sound (non-obligatory)
As soon as the video is generated, Kling offers you the choice of including sound to it utilizing their sound era device. You’ll be able to add your immediate right here and generate 4 completely different sounds and dialogues to match the scene. Nevertheless, please word that the device solely generates audio in Chinese language for now and doesn’t robotically lip sync with the video.
Video Technology Capabilities of Kling 2.1
Customers have taken to social media, praising Kling 2.1’s capability to provide movies with lifelike movement and expressive characters. Let’s take a look at a couple of of the movies generated by Kling 2.1 from completely different picture prompts, to see how good this device actually is.
1. Hyper-realistic Human Video
Enter Picture:

Immediate: “A girl is dancing to fast-paced music.”
Output:
Supply: Kling AI Library
2. Animated Gaming Video
Enter Picture:

Description: “automotive within the metropolis racing, 4K extremely lifelike high-octane chase. Easy motion, photorealistic, top quality.”
DeepSeek-generated Immediate: “A smooth hover-car weaving between towering holographic billboards, blue plasma thrusters igniting, cityscape reflecting off its chrome physique, 4K ultra practical, dynamic movement”
Output:
Supply: Kling AI Library
3. Dynamic Motion Video
Enter Picture:

Immediate: “Cinematic motion shot within the fashion of an motion film with a drone racing by a forest woodland at midday, navigating between bushes. Daylight streaking by leaves, shut entrance comply with angle, dynamic motion, excessive distinction, intense environment, detailed composition.”
Detrimental Immediate: “morphing, erratic fluctuation in movement, noisy, dangerous high quality, distorted, poorly drawn, blurry, grainy, low decision, oversaturated, lack of element, inconsistent lighting. Mistaken anatomy, unnatural facial expressions, unnatural actions, blur, warp, distortion, disfigurement, pixelation, noisy, grainy, overly vibrant colours, harsh shadows, oversaturated colours, erratic fluctuation, artefacts, glitch, low high quality, dangerous face, transition, morphing, titles, texts, logos, Cartoonish options.”
Output:
Supply: Kling AI Library
Kling 2.1 vs Veo 3 vs Sora: Options Comparability
Talking of superior video era, we should learn the way good this free device is as in comparison with proprietary fashions like Google’s Veo 3 and OpenAI’s Sora. Right here’s a typical comparability of the options of all three video era fashions.
Function | Kling 2.1 | Veo 3 | Sora |
Max Video Size | 3 minutes | 1 minute | 1 minute |
Decision | 1080p | 1080p | 1080p |
Lip-Sync Functionality | No | Sure | No |
Physics Simulation | Sure | Sure | No |
Facet Ratio Flexibility | Low | Average | Low |
Enhancing Instruments | Primary | Primary | Primary |
Entry Availability | World (Beta) | Restricted (US solely) | Restricted |
Kling 2.1 vs Veo 3: Efficiency Comparability
Now, let’s evaluate the efficiency of the 2 fashions we at the moment have entry to: Kling 2.1 and Veo 3.
Right here’s a video I discovered on-line, which was generated utilizing Veo 3.
I’ll use a screenshot of this video as the primary body reference picture, add a immediate describing the scene, and see what Kling 2.1 does with it.
Enter Picture:

Immediate: “An American man sporting a blue t-shirt is on the boarding counter on the airport along with his pet penguin. The airline employees, girl wearing blue, doesn’t let him take the penguin on board. He’s annoyed as she tries to clarify the state of affairs to him.”
Video Generated by Kling 2.1
Now let’s use Kling 2.1 so as to add audio to the generated video.
Comparative Evaluation
Veo 3 generated a really lifelike video with nice detailing, acceptable expressions, and really properly lip-synced audio. Even the movement of the motion and the readability and tone of the dialogues have been high notch. On the entire, this is among the finest AI instruments I’ve ever come throughout for video era.
Kling 2.1 is exceptionally good at recreating movies from reference frames, as seen above. It generated fairly lifelike individuals and animals with correct expressions and particulars. As a free device, it does a greater job than most others. Nevertheless, in terms of producing audio and syncing it, Kling 2.1 is relatively disappointing. Be it the tone or the timing, it merely doesn’t align with the video. In order that’s one thing I feel the device nonetheless must work on.
Conclusion
Kling 2.1 proves to be a promising mannequin within the AI-powered video era panorama. Its easy-to-use interface, high quality of making coherent movies, and skill so as to add audio to it, make it the most effective free-to-use AI video turbines on the market. Its capabilities in lifelike movement simulation, facial features rendering, and artistic artistry take it a step forward of most of its contemporaries. That being mentioned, the mannequin nonetheless has room for enchancment in terms of producing audio and precisely lip syncing. So, right here’s trying ahead to Kling AI’s subsequent model that’ll in all probability repair these points as properly.
Login to proceed studying and luxuriate in expert-curated content material.