Google tackles AI's spelling problem in new image generation model


A Google spokesperson said the Nano Banana Pro model is better at planning the text placement, its font characteristics and spatial relationship to other image elements, all before rendering the final image. — AP

As confident as artificial intelligence assistants can sound in chat responses, if you ask them to generate an image containing several text phrases, chances are the resulting imagery will contain some typos or distorted fonts.

Some models have gotten better at it over time, but they’re not consistently reliable – which has limited their potential as a design tool for professionals.

On Thursday, Alphabet Inc’s Google announced a new image-generation and editing model that it says addresses the issue. It’s hoping to persuade consumers and advertisers alike to use its latest tools for accurately generating complex graphics and diagrams.

The new image model, Nano Banana Pro, can produce better visuals with more precise and legible text in multiple languages, Google said in a blog post. Those improvements were made possible by Gemini 3, the latest version of the company’s AI model released on Tuesday, which the company says represents a "massive jump” in reasoning and coding ability. The update was met with a warm reception from investors, who sent Alphabet shares to a record high on Wednesday.

Thursday’s announcement marks the search giant’s latest attempt to monetize its AI technology. Google said users of its free Gemini product around the world will be able to use the new Nano Banana Pro model, with quotas, after which they will revert to an older model. Members of paid AI plans will have a higher limit. The model is also integrated with some popular design tools, including Canva, Figma and Adobe Inc’s Firefly and Photoshop.

A Google spokesperson said the Nano Banana Pro model is better at planning the text placement, its font characteristics and spatial relationship to other image elements, all before rendering the final image. For example, the technology can help recast the text of a recipe as an illustrated flow chart, or visualise real-time information like weather or sports, the company said in the blog post.

For brands that want to incorporate their own designs when brainstorming new marketing campaigns, the model can take in up to 14 reference images from users and arrange them in new scenarios they describe in the text prompt, while retaining the characteristics of the input materials, the company said.

Users can further refine the image by specifying in the prompt any preferred camera angles, depth of field, color grading and aspect ratios, as if they were capturing the image with a camera.

As part of Thursday’s announcements, Google also said users can upload an image to the Gemini app and ask if it was generated by Google AI. It plans to expand that capability soon to include audio and video, it added. Google currently embeds an imperceptible digital watermark for all media created with its AI tools, as well as a visible one for images created by free or Pro tier users. That visible watermark is removed for people who subscribe to the most expensive Ultra plan. – Bloomberg 

Follow us on our official WhatsApp channel for breaking news alerts and key updates!

Next In Tech News

How do I reduce my child's screen time?
Anthropic buys Super Bowl ads to slap OpenAI for selling ads in ChatGPT
Chatbot Chucky: Parents told to keep kids away from talking AI dolls
South Korean crypto firm accidentally sends $44 billion in bitcoins to users
Opinion: Chinese AI videos used to look fake. Now they look like money
Anthropic mocks ChatGPT ads in Super Bowl spot, vows Claude will stay ad-free
Tesla 2.0: What customers think of Model S demise, Optimus robot rise
Vista Equity Partners and Intel to lead investment in AI chip startup SambaNova, sources say
Apple plans to allow external voice-controlled AI chatbots in CarPlay, Bloomberg News reports
Goldman Sachs teams up with Anthropic to automate banking tasks with AI agents, CNBC reports

Others Also Read