Google tackles AI's spelling problem in new image generation model

As confident as artificial intelligence assistants can sound in chat responses, if you ask them to generate an image containing several text phrases, chances are the resulting imagery will contain some typos or distorted fonts.

Some models have gotten better at it over time, but they’re not consistently reliable – which has limited their potential as a design tool for professionals.

On Thursday, Alphabet Inc’s Google announced a new image-generation and editing model that it says addresses the issue. It’s hoping to persuade consumers and advertisers alike to use its latest tools for accurately generating complex graphics and diagrams.

The new image model, Nano Banana Pro, can produce better visuals with more precise and legible text in multiple languages, Google said in a blog post. Those improvements were made possible by Gemini 3, the latest version of the company’s AI model released on Tuesday, which the company says represents a "massive jump” in reasoning and coding ability. The update was met with a warm reception from investors, who sent Alphabet shares to a record high on Wednesday.

Thursday’s announcement marks the search giant’s latest attempt to monetize its AI technology. Google said users of its free Gemini product around the world will be able to use the new Nano Banana Pro model, with quotas, after which they will revert to an older model. Members of paid AI plans will have a higher limit. The model is also integrated with some popular design tools, including Canva, Figma and Adobe Inc’s Firefly and Photoshop.

A Google spokesperson said the Nano Banana Pro model is better at planning the text placement, its font characteristics and spatial relationship to other image elements, all before rendering the final image. For example, the technology can help recast the text of a recipe as an illustrated flow chart, or visualise real-time information like weather or sports, the company said in the blog post.

For brands that want to incorporate their own designs when brainstorming new marketing campaigns, the model can take in up to 14 reference images from users and arrange them in new scenarios they describe in the text prompt, while retaining the characteristics of the input materials, the company said.

Users can further refine the image by specifying in the prompt any preferred camera angles, depth of field, color grading and aspect ratios, as if they were capturing the image with a camera.

As part of Thursday’s announcements, Google also said users can upload an image to the Gemini app and ask if it was generated by Google AI. It plans to expand that capability soon to include audio and video, it added. Google currently embeds an imperceptible digital watermark for all media created with its AI tools, as well as a visible one for images created by free or Pro tier users. That visible watermark is removed for people who subscribe to the most expensive Ultra plan. – Bloomberg

Topic:

AI Internet Technology Mobile apps

Report a mistake

What is the issue about?

Spelling and grammatical error

Factually incorrect

Story is irrelevant

Thank you for your report!

Google tackles AI's spelling problem in new image generation model

Sunway Group steps up efforts to elevate Malaysia’s tourism

Next In Tech News

Others Also Read

Thank you for downloading.

Google tackles AI's spelling problem in new image generation model

Related News

Advancing AI partnerships

Opinion: Chinese AI videos used to look fake. Now they look like money

Governing AI the Asean way

Related News

Advancing AI partnerships

Opinion: Chinese AI videos used to look fake. Now they look like money

Governing AI the Asean way

Sunway Group steps up efforts to elevate Malaysia’s tourism

Next In Tech News

Trending in Tech

Others Also Read

Thank you for downloading.