How D-Id is merging avatars with conversational AI for enterprise use cases

Image Credit: Liu Zishan/Shutterstock

Generating digital humans (avatars) is a process increasingly making use of artificial intelligence (AI).

And, the power of generative AI is now coming to avatars. This could have wide-ranging implication for enterprises, including customer support and experience.

Today, Israeli startup D-ID announced the launch of its new chat.d-id chat, which melds its widely-used digital human platform with Large Language Models (LLMs) for conversational AI. D-ID’s eponymous platform has been used to generate more than 100 million lifelike digital humans over the last two years. The core D-ID platform enables anyone to simply load up a new image or choose from an existing inventory of pre-built avatars that are able to vocalize text-to-speech using different voices and in different languages.

The integration of generative AI now enables avatars to benefit from real-time streaming that provides a conversational AI approach. So instead of just a one-way vocalization of text-to-speech, D-ID avatars can now converse with and provide answers to real humans. D-ID technology is also being extended with an application programming interface (API) that will enable developers to build customized conversational AI avatar experiences for enterprise use cases.

“This is an evolution of the digital person from just presenting one-way communication,” Gil Perry, CEO and cofounder of D-ID, told VentureBeat. “The streaming capability enables our partners and developers to build products that enable you to converse with the avatar in real time.”

Putting a (digital) human face to conversational AI for enterprise

Chatbots are perhaps one of the most common use cases of conversational AI today.

With a chatbot, a customer can interact with a vendor’s support service. In 2023, an emerging trend has been the integration of LLM-powered chatbots, with ChatGPT being perhaps the most notable. One thing that most chatbots have had in common is that they are text based, with some also using audio. But, Perry’s goal is to provide a more personalized experience with a life-like digital human avatar.

The goal with chat.d-id isn’t to just integrate with an existing LLM, but to help enterprises customize a generative AI model for a specific business and its operations. The chat.d-id approach isn’t just about providing answers, but also about automation, said Perry. It has the ability to execute operations such as updating a customer’s account or changing a service level.

“So instead of trying to understand how to operate your new computer, app or website, you just speak to it (as you would) speak with a person, because you don’t want to speak with text, as it’s harder to understand,” said Perry. “We humans are wired to communicate with humans.”

Extending avatars for enterprise with API

The ability to programmatically integrate with an existing enterprise application workflow is critical to enable adoption, said Perry. That’s where APIs will now fit in.

With the API, Perry said, developers will have full access to the capabilities of the chat.d-id platform, enabling an enterprise to highly customize and integrate an avatar into an existing user experience workflow. He said he also expects that enterprise developers will build entirely new support workflows around the API that help to improve user experience.

Perry said that d-id has a session about its API at the upcoming Nvidia GTC conference. The company will go into detail on how it works and can be implemented by developers.

“The vision here is to disrupt how humans interface with anything digital,” said Perry.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Read More

Sean Michael Kerner