Marvin's turning 2.0!
Your favorite AI toolkit just got simpler, more powerful, and fully multi-modal
Today we’re introducing Marvin 2.0, a new, streamlined version of our lightweight AI engineering toolkit. Marvin lets you build natural language interfaces that are reliable, scalable, and easy to trust.
You can upgrade right now:
pip install marvin -U
Marvin consists of a variety of useful tools, all designed to be used independently. Each one represents a common LLM use case, and packages that power into a simple, self-documenting interface. With 2.0, we’ve introduced dedicated functionality for the most frequent tasks and gone multi-modal!
Here’s an overview of all the new functionality:
General
🦾 Write custom AI-powered functions without source code
Text
🏷️ Classify text into categories
🔍 Extract structured entities from text
🪄 Transform text into structured data
✨ Generate synthetic data from a schema
Images
🖼️ Create images from text or functions
📝 Describe images with natural language
🏷️ Classify images into categories
🔍 Extract structured entities from images
🪄 Transform images into structured data
Audio
🎙️ Generate speech from text or functions
Interaction
🤖 Chat with assistants and use custom tools
🧭 Build applications that manage persistent state
Marvin is focused on making cutting-edge LLM technology feel just like any other function. Instead of building heavy pipelines, Marvin lets you add just the right amount of AI Magic™️ to your traditional software. Marvin 2.0 introduces a number of new, dedicated interfaces for the most common LLM use cases.
Classification
To give you a taste for what it’s like to use Marvin, here’s how to perform sentiment analysis by classifying text:
Entity Extraction
One of Marvin’s core use cases has always been structuring text into Python types, including Pydantic models. Here, we use the new extract
function to automatically pull a list of people out of the text and create a “Person” model for each one:
Transforming images into data
Marvin 2.0 is fully multi-modal, with support for both images and audio! Here’s an example of transforming an image into a list of strings to see if we forgot any items on our shopping list. This also shows how every Marvin tool can be given natural language instructions for precise control:
Please note that image support will be beta until OpenAI finalizes the vision API.
AI functions
When the standard tools aren’t enough, Marvin’s AI functions let you combine any inputs, instructions, and output types to create custom AI-powered behaviors... without source code. These functions look and feel like normal Python functions, but when they are called, the result is generated on-demand by an LLM This is done safely, without generating or executing source code, by using the LLM as a sort of “runtime” to predict the function output. Marvin functions can handle complex use cases and behaviors that would be difficult or impossible to express as code. Here’s one that can generate structured recipe objects from a variety of inputs:
Assistants
OpenAI’s new assistants API handles many of the most complex aspects of building agents or chatbots, including memory, threading, code execution, and more. Marvin makes it simple to create and interact with assistants programmatically, including OpenAI tools like the code interpreter and any custom functions you give the assistant to call.
Please note that assistants support will be beta until OpenAI finalizes the assistants API.
Generating images
For all of Marvin’s advanced functionality, sometimes the most fun thing to do is making simple calls to generate speech or images. Here’s how to generate the image at the top of this post:
To learn more about Marvin, check out the docs, star us on GitHub, join us in Discord, or …x?… us on X.
Happy engineering!
This is really cool stuff! Will you be interested in giving a presentation about Marvin at the Multimodal Weekly webinar series I've been hosting?
https://youtube.com/playlist?list=PLvqwYT_ECloZPB2BsBerHXxMpLGr2xuw9&si=qJt-KqeoGzHjs98u