Now, I AM optimistic about algorithms solving for useful relationships between pre-defined objects, like routing conduit through a building without collisions or optimizing a lumber cut-list. Finch and Hypar are interesting small companies in this space.
Rendering was added in r12.
I’m better with CAD style workflows than I am with Freeform modelling, but I’m also not an experienced enough user of any particular CAD program that I don’t spend ages frustratingly trying to understand where to go to edit the correct part of the chain of operations that built up a particular fillet, chamfer, or sweep, every program is a little different… just different enough to throw me off since I don’t use them often enough to really get over that learning threshold, for instance I’ve used Onshape one time in the last 12 months… spent an hour trying to work something out… and then 5 minutes actually making my edit…
If I could have opened up the design in 2d blueprints form, and put in some kind of multi modal query to the effect of “these are my blueprints, can you convert them to (CAD program of choice), and while you are converting them, increase cross section CS1 vertically to from 8mm to 10mm keeping everything else fixed” … would save me a lot of time making adjustments to peoples 3d models for printing. It can be really annoying to fuck around as much as is sometimes necessary due to individual cad workflow preferences just to make conceptually simple edits like “this hanging pot, but stretched 50% taller, and with all the holes/cutouts kept the same size please”
The ML models might not be able to do a perfect curve for a gorgeous arcing buttress or a complex bolt hole pattern for attaching multiple elements to a primary support structure… but it should be able to do a lot of the sorts of work that people use OpenSCAD for.
One conclusion we came to is that large language models do not understand geometry, in a pretty fundamental way! They do understand pair wise relationships that are more topological in character. But that’s not quite the same thing.
The blocker on the text to concept art to 3D model to cad route is the meshes and geometry you’ll generate won’t have the easy adjusting and parameterization manipulations you can take for granted in human authored cad!
That’s also ignoring developing a robust geometry kernel that would live underneath all this!
Not anticipating any great source 3D model training data will be made available anytime soon.
Solidworks, Creo, AutoCAD, Fusion, etc., can all take their bug ridden unoptimized single threaded rent-seeking monstrosities and stick em where the sun don't shine.
Seriously - if anyone wants to create an absolutely world-changing piece of software, start working on a new CAD kernel that takes the last 50 years of computer science advances into account, because none of the entrenched industry standards have done so. Don't worry about having to provide customer service, because none of the entrenched industry standards worry about that either.
And no - while openCascade and solvespace are impressive, they aren't fully capable, nor do they start from a modern foundation.
That's just the start of a single feature type. Now you need a bunch more feature types, and they all need to interact well with each other. The kernel also needs some way of solving the topological naming problem to be useful (FreeCAD might get a basic version of this after a decade(?) of work).
It's probably tantamount to writing a modern-day browser in terms of complexity.
Imagine an AI helping an architect ensure theirs drawings are compliant with local code ordinances, and it can produce some of the paperwork itself.
Imagine an AI that understands the manufacturing processes well enough to guide you or give useful advice about how to manufacture or modify your parts so they become easier and cheaper to build.
Imagine and AI that can know more about an electronic or mechanical project and can get outputs from various simulation tools to advise you on your design compliance with regulation, or that would recognise weak design choices pulling from a knowledge base of part failure or other real-world constraints.
It could propel computer-aided design in ways we can't imagine today, but this integration will probably be hard and not be just text-based.
Maybe the idea of a "kernel" is the problem here. A kernel the size of a browser is not a kernel.
I think what's really needed is a full-blown integration with a theorem proving system (which has an easier to define kernel of its own).
I am an amateur woodworker and wanted easier ways to quickly prototype ideas.
The sweet spot for me is more accurate measurements and better drawings than my pen and paper but without the overhead of firing up Fusion 360 and trying to lay out the 2D then 3D process.
Neither of the above is great for iteratively exploring designs either.
My last project was a custom drill press workbench, and I did the 3D in sketchup and Fusion to get a feel for both tools popular with woodworker hobbyists.
These types of designs are often sold for a few bucks with the project assembly videos posted on YouTube.
I did my initial testing of this using iterative prompts to OpenAI models asking them to refine the design of an outdoor wooden bench with dimensions appropriate for a toddler.
I had some live edge donor wood and wanted it to comply with the thickness of the materials as input.
I was able to prove to myself it could be done with generated scripts for the blender-API.
I set aim at single page that can record spoken audio, perform STT, process it into valid blender Python, export a .glb and display it on the same page.
Making a great demo is a lot of integration work and a lot of LLM programming, pre and post processing, system context refinement etc.
But it’s pretty awesome.
In my experience generating for Fusion is dicier than blender, but I suspect with specialized model training and a bunch of dark art LLM incantations this could become a prosumer tool, and possibly speed along professional work as described in the blog.
So far this is not stuff w complex mechanics or fancy hinges, so it might not meet the threshold of CAD for some. But there are a lot of folks who want “cad like” experiences without having to muck w the tools.
I’d love feedback if anyone is interested in checking it out. Demo day is toward end of this month, I can be reached at the email in my HN profile.
Triangular mesh is conceptually simple, but requires many faces to approximate curved surfaces with high precision (you may be able to use subdivision surface in some cases but intersection/union in those cases are more challenging). Also, for more complicated models, floating point errors really add up and you either have to use an exact representation (which is really slow) or try some other approaches which can be robust w.r.t. errors (e.g. https://github.com/elalish/manifold but it is really hard to get right). Another disadvantage comparing with BREP is the lack of constraint solving, which I will write about it below.
SDF is nice for mathematically defined objects. They are computationally intensive, so some sdf libraries use GPU to speedup the computation. There are approaches that can speed up the evaluation, but doesn't work well if the function is not really the distance (https://github.com/curv3d/curv/blob/master/docs/shapes/Shape...).
-----
Constraints solving: This is a big problem with mesh-based CAD. Traditional CAD usually allows you to have under-defined constraints, and users can iteratively set constraints until the model is fully defined. There is no such a thing (yet) with mesh-based CAD. Also, we don't really have nice ways to represent constraints relative to curved surfaces because there is no curved surface in our mesh...
Also, one particular challenge with text-based (or code-based) CAD is how to select the surfaces with an ergonomic API. GUI can solve this problem but writing a good GUI is a complicated task (that I am not willing to touch).
[1] https://github.com/tpaviot/pythonocc-core [2] odico.dk/ [3] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=pyth... [4] https://www.cgal.org/
[1] https://github.com/elalish/manifold/discussions/549#discussi...
Sure it may help others produce lots of low quality content you don't enjoy, but it will also help you generate content you really enjoy. Especially things you can then manifest in the physical world.
If anything I think it will help elevate more obscure styles, as it creates and effectively infinite number of artists that can produce such styles.
Anything else, we are talking about computational design + AI for interpretability.
I haven't checked all the examples in the link posted, but I know AutodeskAI Lab has some seriously impressive papers out. (code too)
Here's a LEGO interlocking block brick in cadquery: https://cadquery.readthedocs.io/en/latest/examples.html#lego... .
awesome-cadquery: https://github.com/CadQuery/awesome-cadquery
cadquery and thus also jupyter-cadquery now have support for build123d.
gumyr/build123d https://github.com/gumyr/build123d :
> Build123d is a python-based, parametric, boundary representation (BREP) modeling framework for 2D and 3D CAD. It's built on the Open Cascade geometric kernel and allows for the creation of complex models using a simple and intuitive python syntax. Build123d can be used to create models for 3D printing, CNC machining, laser cutting, and other manufacturing processes. Models can be exported to a wide variety of popular CAD tools such as FreeCAD and SolidWorks.
> Build123d could be considered as an evolution of CadQuery where the somewhat restrictive Fluent API (method chaining) is replaced with stateful context managers* - e.g. with blocks - thus enabling the full python toolbox: for loops, references to objects, object sorting and filtering, etc.*
"Build123d: A Python CAD programming library" (2023) >>37576296
build123d docs > Tips & Best Practices: https://build123d.readthedocs.io/en/latest/tips.html
BREP: Boundary representation: https://en.wikipedia.org/wiki/Boundary_representation
Manim, Blender, ipyblender, PhysX, o3de, [FEM, CFD, [thermal, fluidic,] engineering]: https://github.com/ManimCommunity/manim/issues/3362
NURBS: Non-Uniform Rational B-Splines: https://en.wikipedia.org/wiki/Non-uniform_rational_B-spline
NURBS for COMPAS: test_curve.py, test_surface.py: https://github.com/gramaziokohler/compas_nurbs :
> This package is inspired by the NURBS-Python package, however uses a NumPy-based backend for better performance.
> Curve, and Surface are non-uniform non-rational B-Spline geometries (NUBS), RationalCurve, and RationalSurface are non-uniform rational B-Spline Geometries (NURBS). They all built upon the class BSpline. Coordinates have to be in 3D space (x, y, z)
compas_rhino, compas_blender,
- [ ] compas_o3de
Blender docs > Modeling Surfaces; NURBs implementation, limits, challenges: https://docs.blender.org/manual/en/latest/modeling/surfaces/...
/? "NURBS" opencascade https://www.google.com/search?q=%22nurbs%22+%22opencascade%2...
OCCT (OCC) Open Cascade Technology: https://en.wikipedia.org/wiki/Open_Cascade_Technology
OCCT > Standard Transient _ MMtg_TShared > Geom_Geometry > Geom_Curve > Geom_BoundedCurve > Geom_BSplineCurve https://dev.opencascade.org/doc/occt-6.9.1/refman/html/class...
OCC > Standard Transient _ MMtg_TShared > Geom_Geometry > Geom_Surface > Geom_BoundedSurface > Geom_BSplineSurface: https://dev.opencascade.org/doc/occt-6.9.1/refman/html/class...
Cadquery.Shape.toSplines(degree: int = 3, tolerance: float = 0.001, nurbs: bool = False)→ T https://cadquery.readthedocs.io/en/latest/classreference.htm...
digdugdirk has the right idea, and AFAIR, there is some work on that front (https://www.fornjot.app/).
Also the Fiat 500 goes 100km on about 4l of gas, while the Ford F150 uses 7l. No clue where the author gets the idea that the Fiat would get worse mileage, perhaps he's dividing by weight?
The rest, I don't even.
(yeah I hate it that Star Trek universe is not signing a deal with Lego)(I would much rather have a Enterprise than any Star Wars item)(https://xkcd.com/1563/)
Truck[1] and Fornjot[2] are recent attempts in the Rust space, both are WIP.
But both seem to be going the traditional way. I.e. B-Rep that can be converted to (trimmed) NURBS.
I think if one wanted to incorporate the last 50 years of computer science, particularly computer graphics, one needed to broaden the feature set considerably.
You need support for precision subdivision surface modeling with variable radius creases (either via reverse subdivision where you make sure the limit surface pass through given constraints or using an interpolating subivision scheme that but has the same perks as e.g. Catmull-Clark).
Then you need to have SDF modeling ofc.
Possibly point based representations. If only as inputs.
And traditional B-Rep.
Finally, the kernel should be able to go back and forth lossless between these representations wherever possible.
And everything must be node-based, like e.g. Houdini. Completely non-destructive.
1. create an object, e.g. a cube
2. perform lots of random transformations
3. perform the inverse of the above transformations
4. subtract the object from the original object
5. end up with exactly nothing
How are modern approaches solving this (robustness), if at all?My broader point was that there is a need to start from a new paradigm that leverages the possibilities of modern, highly parallel computing hardware. The hardware requirements for performant and reliable CAD software are incredibly high, and their reliance on high clock speed single core processors is quickly being left behind by modern processing hardware.
Guns and other mechanical devices don't exist alone. A gun must interface with a bullet, a part of an aircraft must interface with other parts. So CAD AI must be able to understand the geometric context of the parts it is making.
That being said, I think AI will soon be capable of making mechanical devices. There has been some improvement in physical reasoning benchmarks like PHYRE[1]. Understanding physical reasoning and how multiple objects move with respect to each other is important in the synthesis of new mechanical devices.
SketchIt[0] demonstrated that by making reduced 2DoF description of how pairs of objects in a device may move with respect to each other, it's possible to synthesize a new device which performs the same function.
Solving PHYRE problems requires reasoning with larger degrees of freedom. The first example on the homepage has something like 5 objects which each have 3 positional DoF (translation and rotation). Even reasoning with 3DoF is quite difficult for approaches like those used in SketchIT.
Given that approaches like slotformer[2] already do somewhat well at solving these huge DoF problems, I don't think we're very far from AI being able to design complicated mechanical devices.
[0]https://dspace.mit.edu/bitstream/handle/1721.1/6773/AITR-157...
https://kittycad.io/modeling-app https://kittycad.io/geometry-engine
While we are not fully capable yet, that's the end goal by next year. Years from now we cannot still be using these single-threaded nightmares from 30 years ago.
I'd expect this to be an LLM application that's unusually suited to automatic-feedback reinforcement learning.
Especially because much of the interactive proving task/process is quite straightforward nondeterministic (Turing) machine stuff: You try (with enough randomization/temperature to not get stuck in non-creative stupidity) a prove attempt/step/hint, and get feedback after a moment of calculation from the proof assistant. Then you try further, until eventually hopefully getting "success" as the assistant's feedback.
Once you got it to succeed once, or seeded with originally human-made step sequences to teach some basic sense into the model, you know an upper bound on the _required_ step count and assistant calculation time to prove the theorem at hand. Thus you can let the LLM auto-play/train with a step limit and computation timeout close to the known upper bound, rewarding for expected total runtime/effort to combine "spamming cheap proof tactics and hope something sticks" and "elaborate careful proof process likely to succeed but always expensive/(semi-exhaustive) to go through".
Perhaps even with a GPT-4 like multi-agent LLM to specialize into the various approaches and have some way of rating/predicting each agent's expected efficiency/cost each "chat message":
Turns out, interactive theorem proving is literally a (beyond-)NP heuristics-guiding sampler (traditionally a human with trained gut feeling based problem-solving brainstorming creativity) chatting with a non-creative algorithmic oracle. At the start, if initiated not by the "human", the oracle would info-dump the theorem and appropriate context along a description of what "today's" task is:
This may be: 1) "I expect the theorem to be False because: [reason for what caused the expectation]." 2) "I expect the theorem to be True because: [reason for what caused the expectation]."
3) "I expect a weaker (in it's implications, so less general) form of that theorem to be sufficient in the proof for this situation here. It could make proving them for data types we'd like to use with this implementation/specialization cheaper than the general API's demands, potentially (though not the reason for today's task!) allowing weaker data types to be used here than in the general API. In particular, there (maybe) are cheaper (to implement) yet weaker (in effects) variants of the functions we call on the supplied data type, that still suffice in our algorithm (like weaker demands on a comparison function when only stable sorting is used; dropping distinctiveness demands for either hash function or comparison function if only the other is relied on for Set/Map data structure efficiency). Confidence of it to be True is at X (confidence measure/score the LLM can feel natively) and here's what certain outcomes would be worth (either enumerated weaker forms or a scoring function (in a form the LLM agent comparer can utilize to plan what to aim for, when to pivot, and when to declare defeat))."
4) "I expect a weaker (in it's demands, so more general) form of that theorem to be sufficient in the proof for this situation here. It would allow us to use this implementation/specialization with more data types than the general API promises. Confidence of it to be True is at X (confidence measure/score the LLM can feel natively) and here's what certain outcomes would be worth (either enumerated weaker requirements for the theorem or a scoring function (in a form the LLM agent comparer can utilize to plan what to aim for, when to pivot, and when to declare defeat))."
5) "Find (and codify!) sufficient invariants demanded from (functions on) data types so we can uphold our API's promised invariants. [Optionally, aim for something that matches this natural language description of what (parts/aspects of) the data type's function's purpose[s]/semantics are supposed to be.]"
6) "Prove this is constant-time/constant-memory-access-pattern w.r.t. that part of data (even in ways where the data in question may be in chunks behind some memcpy-reducing indirection), or e.g. that this key here affects nothing persistent other than that ciphertext/plaintext/signature/hash/success-flag."
7) "Prove time/memory complexity of this implementation. Here's what certain bounds are worth: split between various kinda of bounds, e.g. lower bounds, upper bounds, average (w.r.t. that data) complexity, best/worst case (w.r.t. that data), handle those parameters by enumerating over those and proving with their values fixed (because a general equation may be too complicated/weak, or even just too hard to derive)."
8) A classic: "Prove these two implementations produce identical (under that comparison/test-sampling method) results."
9) "Find bounds on when (conditions) and/or how much (some appropriate supplied measure) these two implementations differ."
10) "Find a faster implementation along with proven limits on it's inaccuracy. Combining the two (+) dimensions of candidate quality from the obvious pareto frontier into a single number score is according to this: [formula in useful format for directing search/exploration]."
11) "Cough up an implementation limited to those numerical primitives there, along with proof of it complying with these accuracy requirements, for this implementation that uses (inherently computationally-unsuitable) real numbers. Speed/memory performance importance: [scoring function suitable for directing where to aim, when to stop, and when to give up]."
And afterwards, the "human" would ask/explore the oracle about context, suggest/try proof tactics, and in some cases write/transform code in both LLM-style and by commanding ("textbook"/library/archive, or even freshly written) rewrite rules.
That process could then be trained with reinforcement learning, even if intermediate states have no useful score function defined, as the presence of certain results after certain amounts of expended effort is directly useful as a score for/of the solver/agent itself. The multi-agent suggestions applicability/efficiency predictor/arbiter should be amenable to more normal (stochastic) backpropagation at a completed-chat granularity, as the final efficiency/score will be known, and if it were a perfect predictor, it'd have predicted that exact score for the entire time along the path that was taken. The intermediate predictions on how much effort the chosen agent's suggestion actually took to complete is also easily recorded for training the per-step cost predictions as a more fine-grained aspect of the final-score-when-taking-this-branching-path-now machinery.
Super cool work either way! I'll be wishing you luck.