A prominent Stanford University computer scientist is reportedly creating a start-up that uses three-dimensional visual observations to improve the reasoning of which artificial intelligence (AI) models are capable, in what could be a significant step ahead for the popular technology.
Fei-Fei Li recently raised funds for the company in a seed funding round whose investors included Silicon Valley venture firm Andreessen Horowitz, as well as Radical Ventures, a Canadian firm Li has worked for as scientific partner since last year, Reuters reported, citing unnamed sources.
The start-up is reportedly based on concepts Li spoke of at the TED conference in Vancouver last month, which she termed “spatial intelligence”, involving AI that can extrapolate based on observations of three-dimensional environments and act based on those extrapolations.
During the talk Li showed a picture of a cat pushing a glass toward the edge of a table and spoke of how the human mind could assess what was happening and take action to prevent the glass from falling.
‘Spatial intelligence’
“Nature has created this virtuous cycle of seeing and doing, powered by spatial intelligence,” Li said at the time.
Li co-directs Stanford’s Human-Centered AI Institute, which she said at the TED talk was trying to teach computers “how to act in the 3D world”, for instance by using a large language model (LLM) to enable a robotic arm to perform tasks such as opening a door or making a sandwich in response to verbal instructions.
Such technology may be significant for AI, which has come to prominence over the past year and a half following the launch of OpenAI’s ChatGPT, which initiated a [rush of user interest and corporate investment](https://www.silicon.co.uk/e-management/lay-off/ai-poses-jobs-apocalypse-warns-report-5564780 into “generative AI” technologies.
Such technologies, generally powered by large language models (LLMs), have limitations including a tendency to generate text that includes a mixture of false and accurate information, a phenomenon known as “hallucination”.
Limits of language
“Spatial intelligence” could be a way to temper such tendencies by grounding them in three-dimensional visual information.
Li spoke of the issue most recently at the Team ’24 tech conference in Las Vegas last week, citing the limits of language, which she said is “the most lossy representation of our world”.
“We open our eyes, we hug the people we love, we take care of the patch of garden we love,” she said at a panel discussion.
“Our interaction, our relationship with this world is so much more profound than just the syllabus of words. There’s so much more AI technology can do to help us to interact with the world, interact with each other, beyond the form of language, snd that’s what I’m excited about.”
Li added that she hopes such applications will transform “especially healthcare and education”.