Google and NVIDIA are both developing text-CAD generation technology. How should it be optimized?

2023-09-15 07:18:51

Written by: Reggie Raye

Source: The Gradient

Image source: generated by Unbounded AI tool

The dust has not yet settled on AI-driven text-to-image generation technology. However, the results are already clear: a flood of bad images. Sure, there are some high-quality images, but that’s not enough to outweigh the loss in signal-to-noise ratio - for every artist who benefits from Midjourney-generated album covers, there are fifty others who benefit from Midjourney’s generated album art. Deceived by generated deepfake images. In a world where reduced signal-to-noise ratio is the root of many ills (think scientific research, journalism, government accountability), that’s not a good thing.

It is now necessary to view all images with a grain of salt. (Granted, this has been the case for a long time, but as incidents of deepfakes increase, so should people’s vigilance, which, in addition to being unpleasant, can be cognitively taxing.) Constant suspicion – or frequent misdirection – seems like a high price to pay for a digital gadget that no one cares about, and has so far delivered little benefit. Hopefully – or, more appropriately, praying – the cost-to-benefit ratio will soon come to a sane state.

But at the same time, we should pay attention to a new phenomenon in the field of generative artificial intelligence: artificial intelligence-driven text-to-CAD generation. The premise is similar to a text-to-image program, except that instead of an image, the program returns a 3D CAD model.

Ask the AI for an image of “Mona Lisa, but wearing Balenciaga” and the AI will convert it into a 3D image

Here are some definitions. First, computer-aided design (CAD) refers to software tools that allow users to create digital models of physical objects such as cups, cars, and bridges. (Models in the context of CAD have nothing to do with deep learning models; Toyota Camry ≠ Recurrent Neural Networks.) But CAD is important, too; try to think of the last time you saw an object that wasn’t designed in CAD.

Having said the definitions, now let’s take a look at the big players who want to enter the text-to-CAD world: Autodesk (CLIP-Forge), Google (DreamFusion), OpenAI (Point-E) and NVIDIA (Magic3D). Here are examples from each company:

The major players haven’t stopped startups from emerging at a rate of nearly one per month as of early 2023, with CSM and Sloyd perhaps the most promising.

Additionally, there are some fantastic tools that can be called 2.5 D because their output is somewhere between 2-D and 3-D. The principle of these tools is that users upload an image, and then artificial intelligence can guess how the image will look in three-dimensional space.

This Greedy Cup uses AI to turn the image of SBF (Sam Bankman-Fried, depicted as a wolf in sheep’s clothing and a piper) into a relief (Photo credit: Reggie Raye/TOMO)

There is no doubt that the open source animation and modeling platform Blender is the leader in this field. The CAD modeling software Rhino now also has plug-ins such as SurfaceRelief and Ambrosinus Toolkit, which can generate 3D depth maps from ordinary images very well.

It should be said at the outset that all of this is exciting. As a CAD designer, I eagerly anticipate these potential benefits. Engineers, 3D printing enthusiasts, and video game designers are among many others who will benefit as well.

However, text-to-CAD has many disadvantages, many of which are serious. A brief list is as follows:

Opening the door to the mass production of weapons, racist or other objectionable material
Trigger a wave of junk models, thereby polluting the model library
Infringes upon the rights of copyrighted content creators

Anyway, text to CAD is coming whether we want it to or not. Thankfully, there are steps technicians can take to improve the program’s output and reduce its negative effects. We have identified three key areas where such programs can improve: data set curation, usability pattern languages, and filtering.

To the best of our knowledge, these areas have been largely unexplored in the context of text-to-CAD. The idea of a usability pattern language will receive special attention because it has the potential to significantly improve output. Notably, this potential is not limited to CAD; it could improve results in most areas of generative AI, such as text and images.

Dataset Management

Passive collection

While not all text-to-CAD methods rely on a training set of 3D models (Google’s DreamFusion is an exception), curated model datasets remain the most common approach. Needless to say, the key here is to curate a good set of models to train on.

The key to achieving this is two-fold. First, technicians should avoid the obvious sources of models: Thingiverse, Cults3 D, MyMiniFactory. While there are high-quality models out there, the vast majority are junk. (The Reddit thread “Why is Thingiverse so bad?” illustrates this problem). Second, you should look for ultra-high-quality model libraries. (Scan the World is probably the best in the world).

Second, model sources can be weighted according to quality. Masters of Arts (MFA) students would likely jump at the chance to do such annotation work – and given the unfairness of the labor market, they would have to pay very little.

Active planning

Curation can and should take a more active role. Many museums, private collections and design companies are happy to 3D scan their industrial design collections. Furthermore, in addition to generating a rich corpus, scanning creates a powerful record of our fragile culture.

The reason why the French were able to rebuild Notre Dame Cathedral after the fire was entirely due to the 3D scanning technology of an American. Photo credit: Andrew Tallon/Vassar College

Rich Data

In the process of creating a high-quality corpus, technicians must think carefully about what they want the data to do. At first glance, the primary use case might be to “empower managers at hardware companies to move a few sliders, output the desired product blueprint, and then proceed to production.” However, if the history of mass customization failures is any indication, this approach is likely to fail.

We believe that a more effective use case is to ‘empower domain experts - such as industrial designers at a product design company - to prompt engineers until they get a suitable output, and then fine-tune and finalize’.

A use case like this requires something that might not be obvious at first glance. For example, domain experts need to be able to upload images of reference products, as in Midjourney, and then tag them based on their target attributes – style, materials, dynamics, etc. In this case, it might be tempting to take a faceted approach, where experts can select style type, material type, etc. in drop-down menus. But experience shows that enriching the data set to create attribute buckets is not advisable. Music streaming service Pandora used this manual approach, but was ultimately beaten by Spotify, which relied on neural networks.

reward

Little work has been done in the strict area of data set curation (with a few exceptions), so we have a lot to gain from it. This should be the primary goal for companies and entrepreneurs seeking a competitive advantage in the text-to-CAD war. A large and rich data set is difficult to create and difficult to imitate. This is the best “mote”.

From a less corporate perspective, thoughtful data set curation is an ideal way to drive the creation of beautiful products. To date, generative AI tools have reflected the priorities of their creators but have little to do with taste. We should take a stand for the importance of beauty. We should care about whether what we bring into the world will fascinate users and stand the test of time. We should be against piling mediocre products on a wave of mediocrity.

If some people believe that beauty is not an end in itself, then perhaps they will be convinced by two statistics: sustainability and profit.

The most iconic products of the past century - the Eames chair, the Leica camera, the Vespa scooter - are treasured by their owners. Energetic enthusiasts restore them, sell them, and continue to use them. Perhaps their complex design required them to emit 20% more emissions than their competitors at the time. It doesn’t matter. Their lifespans are measured in quarters of a century rather than years, which means their consumption and emissions are actually less.

1963 Vespa GS 160 sells for $13,000 in 2023

As for profits, it’s no secret that beautiful products come at a premium. . iPhone specs have never been comparable to Samsung’s. However, Apple charges 25% more than Samsung. The cute Fiat 500 subcompact doesn’t get as good a gas mileage as the F-150. But never mind, Fiat bets right, yuppies are willing to pay an extra $5,000 for cuteness.

Usability Pattern Language

Overview

Pattern languages were pioneered by generalist Christopher Alexander in the 1970s. It is defined as a set of mutually reinforcing patterns, each describing a design problem and its solution. Although Alexander’s first pattern language was aimed at architectural design, it has been successfully used in many fields (most notably programming) and is at least equally useful in the field of generative design.

In text-to-CAD, the pattern language consists of a sequence of patterns; for example, one pattern for moving parts, one pattern for hinges (a subset of moving parts, and therefore one level down abstraction), and one pattern for friction hinges (One more level of abstraction). The format of a friction hinge pattern is as follows:

Like natural language, a pattern language includes vocabulary (a set of design solutions), construction (the location of the solutions in the language), and syntax (the rules by which patterns can solve problems). Note that the above pattern “friction hinge” is a node in a hierarchical network and can be visualized visually with a directed network diagram.

These patterns embody the fundamentals of design—best practices in human factors, functionality, aesthetics, and more. Therefore, the output of these modes will be more usable, easier to understand (avoiding black box problems), and easier to fine-tune.

The bottom line is that unless a text-to-CAD program takes the fundamentals of design into consideration, the output will be garbage. Doing nothing is better than a laptop that does text-to-CAD generation but the screen can’t stay upright.

Of all these fundamental elements, perhaps the most important and most difficult to consider is human factors design. The human factors that need to be considered to design useful products are almost endless. AI must identify and design for issues such as pinch points, finger pinching, misplaced sharp edges, ergonomic proportions, and more.

Practice

Let’s look at a practical example. Suppose Jane is an industrial designer at ABC Design Studio, which has been commissioned to design a futuristic gaming laptop. With current technology, Jane could use a CAD program like Fusion 360, go into Fusion’s generative design workspace, and spend a week (or a month) working with her team to specify all the relevant constraints: loads, conditions, Targets, material properties, etc.

But no matter how powerful Fusion’s generative design workspace is, it can’t get around one key fact: users must have considerable domain expertise, CAD capabilities, and time.

A more pleasant user experience is to simply prompt text into the CAD program until its output meets the user’s requirements. Such a schema design-centric workflow might look like this:

Jane prompts her text-to-CAD program: “Show me some examples of future gaming laptops. Inspired by the shape of the TOMO laptop stand and the surface texture of the King Cobra.”

*Completely realize the conversion of text to CAD, which will realize the closed loop from images to manufacturable products. *

The program will output six concept drawings, each of which contains patterns such as “keyboard layout,” “hinge structure,” and “port layout of consumer electronics products.”

Jane could reply: “Give me some variations of image 2. Make the screen more retracted and the keyboard more textured.”

Jane: “I like the third one, what are the parameters?”

The system lists 20 parameters – length, width, monitor height, key density, etc. – based on the “solution” field for the pattern it deems most relevant.

Jane notices that the hinge type is not specified and enters “Add hinge type parameter to list and export CAD model”.

She opened the model in Fusion 360 and was pleased to see that the appropriate friction hinges had been added. Along with the hinge parameterization, she increased the width parameter because she knew Studio ABC’s clients wanted the screen to be able to withstand heavy use.

Jane continued to make adjustments until she was completely satisfied with the form and function. That way, she can give it to her colleague Joe, a mechanical engineer, who will check it out and see which custom parts can be replaced with stock versions.

Finally, Studio ABC’s management will be happy because the laptop design process has been shortened from an average of 6 months to 1 month. Much to their delight, thanks to parametric technology, any modifications requested by customers can be quickly accommodated without the need for redesign.

Thorough filtering

As AI ethicist Irene Solaiman pointed out in a recent interview, generative AI urgently needs thorough safeguards. Even with a pattern language approach, generative AI alone cannot prevent bad output. This is where guardrails come in.

We need to be able to detect and reject prompts for weapons, gore, child sexual abuse material (CSAM), and other objectionable content. Technologists who fear lawsuits might add copyrighted products to that list. However, if we speak from experience, objectionable prompts can account for a significant portion of queries.

Many of these requirements will be met once the text-to-CAD model is open sourced or leaked. (If the Defense Distributed saga has taught us anything, it’s that the genie never goes back in the bottle; thanks to a recent ruling in Texas, Americans can now legally download AR-15, 3D Print it out and – if he feels threatened – can use it to shoot someone).

Additionally, we need widely shared performance benchmarks similar to those emerging around LLMs. After all, if you can’t measure it, you can’t improve it.

____

In summary, the emergence of AI-driven text-to-CAD generation technology brings both risks and opportunities, with the ratio between the two still uncertain. The proliferation of low-quality CAD models and toxic content are just a few of the issues that require immediate attention.

Technicians can also pay useful attention to some neglected areas. The curation of the dataset is crucial: we need to trace high-quality models from high-quality sources and explore other methods, such as scanning industrial design collections. A usability pattern language can provide a powerful framework for incorporating best design practices. In addition, the pattern language will provide a powerful framework for the generation of CAD model parameters, which can be fine-tuned until the model meets the requirements for its use. Finally, comprehensive filtering technologies must be developed to prevent the generation of dangerous content.

We hope that the ideas presented in this article will help technologists avoid the pitfalls that have plagued generative AI to date and improve text-to-CAD capabilities to deliver good models that will benefit the many people who will use them.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes