🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation¶
📚 What you'll learn¶
This notebook demonstrates how to provide images as context to generate text descriptions using vision-language models.
The same multi_modal_context field can also carry audio or video context when the selected model supports those modalities.
- ✨ Visual Document Processing: Converting images to chat-ready format for model consumption
- 🔍 Vision-Language Generation: Using vision models to generate detailed summaries from images
- 🧩 Media Context Pattern: Understanding how
ImageContext,AudioContext, andVideoContextfit into the same configuration field
If this is your first time using Data Designer, we recommend starting with the first notebook in this tutorial series.
📦 Import Data Designer¶
data_designer.configprovides access to the configuration API.DataDesigneris the main interface for data generation.
# Standard library imports
import base64
import io
import uuid
# Third-party imports
import pandas as pd
import rich
from datasets import load_dataset
from IPython.display import display
from rich.panel import Panel
# Data Designer imports
import data_designer.config as dd
from data_designer.interface import DataDesigner
⚙️ Initialize the Data Designer interface¶
DataDesigneris the main object responsible for managing the data generation process.When initialized without arguments, the default model providers are used.
data_designer = DataDesigner()
🏗️ Initialize the Data Designer Config Builder¶
The Data Designer config defines the dataset schema and generation process.
The config builder provides an intuitive interface for building this configuration.
When initialized without arguments, the default model configurations are used.
config_builder = dd.DataDesignerConfigBuilder()
🌱 Seed Dataset Creation¶
In this section, we'll prepare our visual documents as a seed dataset for summarization:
- Loading Visual Documents: We use a small pets image dataset containing labeled images
- Image Processing: Convert images to base64 format for vision model consumption
- Metadata Extraction: Preserve relevant image information (label, etc.)
The seed dataset will be used to generate detailed text descriptions of each image.
# Dataset processing configuration
IMG_COUNT = 512 # Number of images to process
BASE64_IMAGE_HEIGHT = 512 # Standardized height for model input
# Load the pets dataset (train split, ~23 MB total)
img_dataset_cfg = {"path": "rokmr/pets", "split": "train"}
def resize_image(image, height: int):
"""
Resize image while maintaining aspect ratio.
Args:
image: PIL Image object
height: Target height in pixels
Returns:
Resized PIL Image object
"""
original_width, original_height = image.size
width = int(original_width * (height / original_height))
return image.resize((width, height))
def convert_image_to_chat_format(record, height: int) -> dict:
"""
Convert PIL image to base64 format for chat template usage.
Args:
record: Dataset record containing image and metadata
height: Target height for image resizing
Returns:
Updated record with base64_image and uuid fields
"""
image = resize_image(record["image"], height)
img_buffer = io.BytesIO()
image.save(img_buffer, format="PNG")
byte_data = img_buffer.getvalue()
base64_encoded_data = base64.b64encode(byte_data)
base64_string = base64_encoded_data.decode("utf-8")
return record | {"base64_image": base64_string, "uuid": str(uuid.uuid4())}
# Load and process the image dataset
print("📥 Loading and processing images...")
img_dataset = load_dataset(**img_dataset_cfg).map(
convert_image_to_chat_format, fn_kwargs={"height": BASE64_IMAGE_HEIGHT}
)
img_dataset = pd.DataFrame(img_dataset[:IMG_COUNT])
print(f"✅ Loaded {len(img_dataset)} images with columns: {list(img_dataset.columns)}")
img_dataset.head()
# Add the seed dataset containing our processed images
df_seed = pd.DataFrame(img_dataset)[["uuid", "label", "base64_image"]]
config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))
🧩 Media context and model capabilities¶
multi_modal_context accepts media context descriptors such as ImageContext, AudioContext, and VideoContext. Data Designer reads the referenced seed columns and serializes them for the model request, but the selected model still determines which modalities are valid.
This notebook uses image context only because image-capable VLMs are broadly available. Before combining image, audio, and video in one column, choose a model alias backed by an omni or otherwise modality-compatible model, and check that the provider accepts every context type you send.
For base64 seed columns, store the raw base64 payload without a data:<media-type>;base64, prefix and specify the media format on the context object:
media_context = [
dd.ImageContext(
column_name="image_base64",
data_type=dd.ModalityDataType.BASE64,
image_format=dd.ImageFormat.PNG,
),
dd.AudioContext(
column_name="audio_base64",
data_type=dd.ModalityDataType.BASE64,
audio_format=dd.AudioFormat.MP3,
),
dd.VideoContext(
column_name="video_base64",
data_type=dd.ModalityDataType.BASE64,
video_format=dd.VideoFormat.MP4,
),
]
URL-backed media can use data_type=dd.ModalityDataType.URL, subject to the provider's URL support and file-size limits. Local audio/video paths require explicit URL mode and require the model endpoint to have filesystem access to the same paths, typically a colocated vLLM server configured for local media access.
# Add a column to generate detailed image descriptions
config_builder.add_column(
dd.LLMTextColumnConfig(
name="description",
model_alias="nvidia-vision",
prompt=(
"Provide a detailed description of the content in this image in Markdown format. "
"Describe the main subject, background, colors, and any notable details."
),
multi_modal_context=[dd.ImageContext(column_name="base64_image")],
)
)
data_designer.validate(config_builder)
🔁 Iteration is key – preview the dataset!¶
Use the
previewmethod to generate a sample of records quickly.Inspect the results for quality and format issues.
Adjust column configurations, prompts, or parameters as needed.
Re-run the preview until satisfied.
preview = data_designer.preview(config_builder, num_records=2)
# Run this cell multiple times to cycle through the 2 preview records.
preview.display_sample_record()
# The preview dataset is available as a pandas DataFrame.
preview.dataset
📊 Analyze the generated data¶
Data Designer automatically generates a basic statistical analysis of the generated data.
This analysis is available via the
analysisproperty of generation result objects.
# Print the analysis as a table.
preview.analysis.to_report()
🔎 Visual Inspection¶
Let's compare the original image with the generated description to validate quality:
# Compare original image with generated description
index = 0 # Change this to view different examples
# Merge preview data with original images for comparison
comparison_dataset = preview.dataset.merge(pd.DataFrame(img_dataset)[["uuid", "image"]], how="left", on="uuid")
# Extract the record for display
record = comparison_dataset.iloc[index]
print("📄 Original Image:")
display(resize_image(record.image, BASE64_IMAGE_HEIGHT))
print("\n📝 Generated Description:")
rich.print(Panel(record.description, title="Image Description", title_align="left"))
🆙 Scale up!¶
Happy with your preview data?
Use the
createmethod to submit larger Data Designer generation jobs.
results = data_designer.create(config_builder, num_records=10, dataset_name="tutorial-4")
# Load the generated dataset as a pandas DataFrame.
dataset = results.load_dataset()
dataset.head()
# Load the analysis results into memory.
analysis = results.load_analysis()
analysis.to_report()
⏭️ Next Steps¶
Now that you've learned how to use visual context for image summarization in Data Designer, explore more:
Experiment with different vision models for specific image types
Try different prompt variations to generate specialized descriptions (e.g., technical details, key findings)
Combine image, audio, or video context with other column types after confirming your selected model supports those modalities
Apply this pattern to other vision tasks like image captioning, OCR validation, or visual question answering
Generating images with Data Designer