You can find the code for this in Github: https://github.com/davearch/chatgpt_alt_text
This is a fairly straightforward process but ChatGPT is still quite entertaining even after 2 years. And it is still cool what you can do now with only an API key, and pretty cheap as well! Plus I wanted to intertwine my own work with the buzziest of buzz words (artificial intelligence/machine learning). Always neat these days.
First things first, make an account at OpenAI and create a project API Key.
On your local machine you can spin up a Drupal site (I use DDEV, long live GoLang) which is well documented in other places so I won't go over it here. What you want to do is require and install the AI module. At first I wrote my own module and service which was basically a wrapper around the OpenAI API but alas - it is best to go along with the community. Plus the AI module is way more robust and even includes other providers besides OpenAI. So that is my recommendation. Although writing everything from scratch is also fun.
Require the module with composer like so:ddev composer require drupal/ai
Then install:ddev drush en ai
This will automatically download and install the key module as well. Now you can set up your API Key. I chose to do it with an external file. So if you want to do it that way, just copy your API key and place it in a file, wherever you want to place it. I chose to name mine "openai_key.txt" (clever, I know) and place it in the root directory of my project. Please be reminded that this is just a local setup that I am going to throw away and never use again. Definitely don't do this on a production website and absolutely don't commit something like this to git.
Anyway, my key config ended up looking something like this:
Now you can enable a provider, obviously in this tutorial we are going with OpenAI:ddev drush en provider_openai
Then head to /admin/config/ai/providers/openai where you can select key you just made from the dropdown. Easy peazy. And now you are ready to make some dangerous ChatGPT calls!
Let's talk about the migration scenario. Let's assume you have an image file migration already written and working as a yaml config file and you are writing a media entity migration. The media migration will set the reference for it's image field based on the image file migration by using migration_lookup. The problem is that there are no alt tags! Perfect opportunity to generate text with spooky ChatGPT hocus pocus.
Here is a short excerpt from the relevant "process" portion of our media migration yaml file, first we make a pseudo field with the migration_lookup plugin that will return to us the ID of the entity that was generated in the previous migration:pseudo_target_id:
plugin: migration_lookup
migration: images
source: unique_id
The image field is an entity reference to the file, so we set it to the returned ID. If there isn't one found we can skip it. Probably not the best practice if we zoom out a little and realize that there shouldn't be a missing entity at this point. But that's not what this blog post is about so let's face the music later!
field_media_image/target_id:
-
plugin: skip_on_empty
source: '@pseudo_target_id'
method: row
Finally, the alt text can use a process plugin that we write ourselves, taking the entity ID as input.field_media_image/alt:
plugin: generate_alt_text
source: '@pseudo_target_id'
Now we can make our process plugin class. Start by making the class in the src/Plugin/migrate/process/ directory and name it GenerateAltText.php. We use dependency injection to grab the services we want from the container. Namely, the EntityTypeManager to load the file using the ID, the ai.settings (ImmutableConfig), and the AiProviderPluginManager for getting the correct AI provider and model (OpenAI and gpt-4o).
Here is our create function:
public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
return new static(
$configuration,
$plugin_id,
$plugin_definition,
$container->get('entity_type.manager'),
$container->get('config.factory')->get('ai.settings'),
$container->get('ai.provider')
);
}
And here is our constructor:
public function __construct(array $configuration, $plugin_id, $plugin_definition, EntityTypeManagerInterface $entity_type_manager, ImmutableConfig $ai_config, AiProviderPluginManager $ai_provider) {
parent::__construct($configuration, $plugin_id, $plugin_definition);
$this->entityTypeManager = $entity_type_manager;
$this->aiConfig = $ai_config;
$this->aiProvider = $ai_provider;
}
The bread and butter comes down to the transform method. Here we need a prompt, the file (which we load from the entity ID) the provider and method. The AiProviderPluginManager comes with a helpful method "getDefaultProviderForOperationType" which if we pass in our scenario "chat_with_image_vision", will give us the correct provider and model in an associative array. We also could have hardcoded our values for our simple example: "openai" and "gtp-4o".
Keep in mind this is of course the quick and dirty way of doing things. There is another module called "ai_image_alt_text", which I borrowed from, but is more robust. It includes config schema and default settings (including for a better prompt) and does some error checking in their generation function. However, it is not a process plugin but a literal HTML button they place in the UI so users can generate alt tags on the website. I wanted to go a more programatic route. So here I am foregoing all of that useful stuff to just focus on the meat so to speak.
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) : string {
$prompt = "You are the world's foremost expert in alternative texts for images for accessibility. You want to generate the best possible alternative text for the given image. Please keep it 100 characters or less.";
$file = $this->entityTypeManager->getStorage('file')->load($value);
$image = new ImageFile();
$image->setFileFromFile($file);
$images = [$image];
$input = new ChatInput([
new ChatMessage('user',
$prompt,
$images
),
]);
$default_provider = $this->aiProvider->getDefaultProviderForOperationType('chat_with_image_vision');
// $default_provider['provider_id'] == 'openai';
// $default_provider['model_id'] == 'gpt-4o';
$provider = $this->aiProvider->createInstance($default_provider['provider_id']);
$model = $default_provider['model_id'];
$output = $provider->chat($input, $model);
return $output->getNormalized()->getText();
}
Now, assuming that our API key was set up correctly (make sure to check your usage and cost!), we can run the migration
ddev drush mim media_entities
If you get an error about exceeding your quota, make sure that you have added a payment option in your OpenAI account. Now, everything works at the moment but if I re-ran the migration (and anyone who has done a migration will tell you - you will) I would of course be charged again, and my usage would increase unnecessarily. So this makes the process plugin a bit costly. What else can we do? Maybe we can run it as a drush command that takes all the files by ID that we want to create alt text for, and then writes the alt tag to a file along with the ID in csv format so we can migrate based on that. That way we can generate them once, and save some precious money. Here is a ready example:
public function generateAltText(int $file_id = NULL): void {
$prompt = 'Please provide alt text with this amazing prompt.';
$provider_plugin_manager = \Drupal::service('ai.provider');
$files = \Drupal::entityTypeManager()->getStorage('file')->load($file_id);
$image = new ImageFile();
$image->setFileFromFile($files);
$input = new ChatInput([
new ChatMessage('user',
$prompt,
[$image]
),
]);
$provider = $provider_plugin_manager->createInstance('openai');
$output = $provider->chat($input, 'gpt-4o');
$this->writeToFile($file_id, $output->getNormalized()->getText());
}
The writeToFile function can append to a csv, which we can use as the basis for the media migration. And you could loop over some IDs to go to this, though I think an even better idea would be to utilize the batch API to collect some of these images together and save some API calls. But maybe that's something I can do in a future tutorial. Thanks!
Comments