Meet 3D-GPT: An Artificial Intelligence Framework for Instruction-Driven 3D Modelling that Makes Use of Large Language Models (LLMs)


Using meticulously detailed models, 3D content production in the metaverse age redefines multimedia experiences in gaming, virtual reality, and film industries. However, designers frequently need help with a time-consuming 3D modeling process, starting with fundamental forms (such as cubes, spheres, or cylinders) and using tools like Blender for exact contouring, detailing, and texturing. Rendering and post-processing bring this labor-intensive production to a close and give the polished final model. Although changeable parameters and rule-based systems make procedural generation effective in automating content development, it necessitates a thorough understanding of generation rules, algorithmic frameworks, and individual parameters. 

Another element of complexity is added when these procedures are coordinated with customers’ creative aspirations through efficient communication. This emphasizes the importance of streamlining the conventional 3D modeling approach to enable creators in the metaverse age. LLMs have demonstrated remarkable planning and tool use skills and language understanding ability. In addition, LLMs show exceptional skill in characterizing object qualities like structure and texture, which enables them to improve details from basic descriptions. They also excel in understanding complex code functions and parsing brief textual material while effortlessly facilitating effective user interactions. They explored the new uses of these exceptional skills in procedural 3D modeling. 

Their main goal is to use LLMs to their full potential to exercise control over 3D creative software in compliance with customer demands. To realize this goal, researchers from Australian National University, the University of Oxford and Beijing Academy of Artificial Intelligence introduce 3D-GPT, a framework designed to facilitate instruction-driven 3D content synthesis. By dividing the 3D modeling process into smaller, more manageable segments and deciding when, where, and how to complete each one, 3D-GPT empowers LLMs to act as problem-solving agents. The conceptualization agent, the 3D modeling agent, and the job dispatch agent are the three main agents that makeup 3DGPT. By adjusting the 3D generating functions, the first two agents work in unison to satisfy the responsibilities of 3D conceptualization and 3D modeling. 

The third agent then controls the system by accepting the first text input, managing subsequent commands, and promoting efficient communication between the first two agents. In doing so, they advance two important goals. It improves initial scene descriptions by pointing them toward more in-depth and contextually relevant forms and then modifies the textual input based on further directions. Second, they use procedural generation, a method of interacting with 3D software that uses changeable parameters and rule-based systems rather than directly creating each component of 3D material. Their 3D-GPT can derive relevant parameter values from the enhanced text and comprehend procedural generating routines. By using users’ written descriptions as a guide, 3D-GPT provides accurate and customizable 3D creation. 

In complicated scenarios with many different elements, manually specifying each controllable parameter in procedural creation lessens the effort. Additionally, 3D-GPT improves user participation, streamlining the creative process and putting the user first. Additionally, 3D-GPT smoothly integrates with Blender, giving users access to various manipulation tools, including mesh editing, physical motion simulations, object animations, material changes, and primitive additions. They claim that LLMs can process more complex visual information based on their tests. 

The following is a summary of their contributions: 

• Presenting 3D-GPT, a framework for 3D scene creation that offers training without charge. Their method uses the LLMs’ built-in multimodal reasoning skills to increase the productivity of the end-user’s procedural 3D modeling. 

• Exploration of an alternate approach in text-to-3D generation, wherein their 3D-GPT creates Python programs to operate 3D software, perhaps enabling additional flexibility for real-world applications. 

• Empirical studies show that LLMs have great potential in their ability to think, plan, and use tools while creating 3D material.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..


Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.


🔥 Meet Retouch4me: A Family of Artificial Intelligence-Powered Plug-Ins for Photography Retouching

Leave a Reply

Your email address will not be published. Required fields are marked *