Music has always been one of the most human of arts — a delicate dance of melody, harmony, rhythm, and emotion that seemed impossible for machines to truly understand. Yet the journey of AI music generation began quietly decades ago. In 1957, Lejaren Hiller and Leonard Isaacson wrote the Illiac Suite — the first musical composition created by a computer — using an algorithm at the University of Illinois. Through the 1960s and 1970s, computer-generated music evolved with pioneers like Max Mathews at Bell Labs inventing digital synthesis.
The journey from still images to moving pictures has always fascinated humanity. For decades, creating video meant expensive cameras, complex editing software, and hours of manual work. Then AI changed everything. In 2022, the first generation of AI image generators — DALL-E 2, Midjourney, Stable Diffusion — proved that machines could create stunning visuals from text. The natural next question was: what about video?
🥇 llava — 13.9M pulls — 👁️ Best vision pioneer with video support
The OG multimodal model on Ollama. LLaVA (Large Language and Vision Assistant) combines a vision encoder with Vicuna for general-purpose visual understanding. Updated to version 1.6, it processes individual frames from videos for analysis. Available in 7B, 13B, and 34B sizes. While not explicitly designed for video, you can feed it video frames sequentially for frame-by-frame analysis.