zlacker

I do some electrical drafting work for construction and throw basic tasks at LLMs.

I gave it a shitty harness and it almost 1 shotted laying out outlets in a room based on a shitty pdf. I think if I gave it better control it could do a huge portion of my coworkers jobs very soon

replies(4): >>amorzo+n3 >>reduce+4g >>willis+OG >>Libidi+V02

>>knolli+(OP)
Can you give an example of the sort of harness you used for that? Would love to play around with it

replies(1): >>knolli+Tg

>>knolli+(OP)
"AI could never replace the creativity of a human"

"Ok, I guess it could wipe out the economic demand for digital art, but it could never do all the autonomous tasks of a project manager"

"Ok, I guess it could automate most of that away but there will always be a need for a human engineer to steer it and deal with the nuances of code"

"Ok, well it could never automate blue collar work, how is it gonna wrench a pipe it doesn't have hands"

The goalposts will continue to move until we have no idea if the comments are real anymore.

Remember when the Turing test was a thing? No one seems to remember it was considered serious in 2020

replies(7): >>webdoo+lj >>semi-e+Cq >>Frater+Xq >>blarge+KH >>golem1+7Y >>8n4vid+fe1 >>fuzzy2+FP1

>>amorzo+n3
I've been using pyrevit inside revit so I just threw a basic loop in there. There's already a building model and the coworkers are just placing and wiring outlets, switches, etc. The harness wasn't impressive enough to share (alos contains vibe coded UI since I didn't want to learn XAML stuff on a friday night). Nothing fancy; I'm not very skilled (I work in construction)

I gave it some custom methods it could call, including "get_available_families", "place family instance", "scan_geometry" (reads model walls into LLM by wall endpoint), and "get_view_scale".

The task is basically copy the building engineer's layout onto the architect model by placing my families. It requires reading the symbol list, and you give it a pdf that contains the room.

Notably, it even used a GFCI family when it noticed it was a bathroom (I had told it to check NEC code, implying outlet spacing).

replies(1): >>ftcHn+YI

>>reduce+4g
> blue collar work

I don't think it's fair to qualify this as blue collar work

replies(2): >>knolli+Vj >>knolli+2l

>>webdoo+lj
It is definitely not. Entry pay is 60k and the senior guys I know make about 200k in HCoL areas. A few wear white dress shirts every day.

>>webdoo+lj
I'm double replying to you since the replies are disparate subthreads. This is the necessary step so the robots who can turn wrenches know how to turn them. Those are near useless without perfect automated models.

Anything like this willl have trouble getting adopted since you'd need these to work with imperfect humans, which becomes way harder. You could bankroll a whole team of subcontractors (e.g. all trades) using that, but you would have one big liability.

The upper end of the complexity is similar to EDA in difficulty, imo. Complete with "use other layers for routing" problems.

I feel safer here than in programming. The senior guys won't be automated out any time soon, but I worry for Indian drafting firms without trade knowledge; the handholding I give them might go to an LLM soon.

>>reduce+4g
> Remember when the Turing test was a thing? No one seems to remember it was considered serious in 2020

To be clear, it's only ever been a pop science belief that the Turing test was proposed as a literal benchmark. E.g. Chomsky in 1995 wrote:

  The question “Can machines think?” is not a question of fact but one of language, and Turing himself observed that the question is 'too meaningless to deserve discussion'.

replies(1): >>throw3+Hx

>>reduce+4g
The turing test is still a thing. No llm could pass for a person for more than a couple minutes of chatting. That’s a world of difference compared to a decade ago, but I would emphatically not call that “passing the turing test”

Also, none of the other things you mentioned have actually happened. Don’t really know why I bother responding to this stuff

replies(2): >>phaino+qx >>Workac+OG1

>>Frater+Xq
> No llm could pass for a person for more than a couple minutes of chatting

I strongly doubt this. If you gave it an appropriate system prompt with instructions and examples on how to speak in a certain way (something different from typical slop, like the way a teenager chats on discord or something), I'm quite sure it could fool the majority of people

>>semi-e+Cq
The Turing test is a literal benchmark. Its purpose was to replace an ill-posed question (what does it mean to ask if a machine could "think", when we don't know ourselves what this means- and given that the subjective experience of the machine is unknowable in any case) with a question about the product of this process we call "thinking". That is, if a machine can satisfactorily imitate the output of a human brain, then what it does is at least equivalent to thinking.

"I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. The original question, "Can machines think?" I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted."

replies(1): >>static+JF

>>throw3+Hx
Turing seems to be saying several things. He writes:

>If the meaning of the words "machine" and "think" are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, "Can machines think?" is to be sought in a statistical survey such as a Gallup poll. But this is absurd.

This anticipates the very modern social media discussion where someone has nothing substantive to say on the topic but delights in showing off their preferred definition of a word.

For example someone shows up in a discussion of LLMs to say:

"Humans and machines both use tokens".

This would be true as long as you choose a sufficiently broad definition of "token" but tells us nothing substantive about either Humans or LLMs.

>>knolli+(OP)
I would really love a magic wand to make things like AVEVA and AutoCAD not so painful to use. You know who should be using tools to make these tools less awful? AVEVA and AutoCAD. Engineers shouldn't be having to take on risk by deferring some level of trust to third party accelerators with poor track records.

replies(2): >>knolli+EI >>skybri+wp2

>>reduce+4g
> "the creativity of a human"

> "the economic demand for digital art"

You twisted one "goalpost" into a tangential thing in your first "example", and it still wasn't true, so idk what you're going for. "Using a wrench vs preliminary layout draft" is even worse.

If one attempted to make a productive observation of the past few years of AI Discourse, it might be that "AI" capabilities are shaped in a very odd way that does not cleanly overlap/occupy the conceptual spaces we normally think of as demonstrations of "human intelligence". Like taking a 2-dimensional cross-section of the overlap of two twisty pool tubes and trying to prove a Point with it. Yet people continue to do so, because such myopic snapshots are a goldmine of contradictory venn diagrams, and if Discourse in general for the past decade has proven anything, it's that nuance is for losers.

replies(1): >>visarg+KO1

>>willis+OG
I feel like the BIM model of Revit will be more successful getting agents to use than autocad in a similar way that LLMs are good at typescript

>>knolli+Tg
I'm going to try to get it to generate extrusions in Revit based on images of floor plans. I've tried doing this in bunch of models without success so far.

replies(1): >>knolli+3W

>>ftcHn+YI
You might want to give it some guidance based on edge centers? It'll have a hard time thinking of wall thickness and have it draw points if you're trying to copy floor plans.

for clarity now that I'm rereading: it understands vectors a lot better than areas. Encoding it like that seems to work better for me.

>>reduce+4g
Carl Sagan has entered the chat: https://www.youtube.com/watch?v=6_-jtyhAVTc&t=450s

>>reduce+4g
I still haven't witnessed a serious attempt at passing the Turing test. Are we just assuming its been beaten, or have people tried?

Like if you put someone in an online chat and ask them to identify if the person they're talking to is a bot or not, you're telling me your average joe honestly can't tell?

A blog post or a random HN comment, sure, it can be hard to tell, but if you allow some back and forth.. i think we can still sniff out the AIs.

replies(1): >>akobol+qo1

>>8n4vid+fe1
A couple of months ago I saw a paper (can't remember if published or just on arxiv) in which Turing's original 3-player Imitation Game was played with a human interrogator trying to discern which of a human responder and an LLM was the human. When the LLM was a recent ChatGPT version, the human interrogator guessed it to be the human over 70% of the time; when the LLM was weaker (I think Llama 2), the human interrogator guessed it to be the human something like 54% of the time.

IOW, LLMs pass the Turing test.

replies(1): >>knolli+JD2

>>Frater+Xq
Ironically the main tell of LLMs is that are too smart and write too well. No human can discuss the depth of topics they can and no humans writes like a author/journalist all the time.

i.e. the tell that it's not human is that it is too perfectly human.

However if we could transport people from 2012 to today to run the test on them, none would guess the LLM output was from a computer.

replies(2): >>visarg+qP1 >>skybri+Dq2

>>blarge+KH
The problem is how we use it. A human sees not a photo but a video, and has long context before and after, not just that instance, we can also change position, a LLM can't do that at all.

>>Workac+OG1
Yesterday I stumbled onto a well written comment on reddit, it was a bit contrarian, but good. Then I was curious and looked at their comment history and found it was a one month old account with many comments of similar length and structure. I put a LLM to read that feed and they spotted LLM writing, and the argument? it was displaying too broad a knowledge across topics. Yes, it gave itself up by being too smart. Does that count as Turing test fail?

>>reduce+4g
To all of these I can only say: in the hands of a domain-expert user, AI tools really shine.

For example, artists can create incredible art, and so can AI artists. But me, I just can't do it. Whatever art I have generated will never have the creative spark. It will always be slop.

The goalposts haven't moved at all. However, the narrative would rather not deal with that.

>>knolli+(OP)
I just can't imagine we are close to letting LLMs do electrical work.

What I notice that I don't see talked about much is how "steerable" the output is.

I think this is a big reason 1 shots are used as examples.

Once you get past 1 shots, so much of the output is dependent on the context the previous prompts have created.

Instead of 1 shots , try something that requires 3 different prompts on a subject with uncertainty involved. Do 4 or 5 iterations and often you will get wildly different results.

It doesn't seem like we have a word for this. A "hallucination" is when we know what the output should be and it is just wrong. This is like the user steers the model towards an answer but there is a lot of uncertainty in what the right answer even would be.

To me this always comes back to the problem that the models are not grounded in reality.

Letting LLMs do electric work without grounding in reality would be insane. No pun intended.

replies(1): >>knolli+p62

>>Libidi+V02
You'd have to make subagents call tools that limit context and give them only the tools they need with explicit instructions.

I think they'll never be great at switchgear rooms but apartment outlet circuitry? Why not?

I have a very rigid workflow with what I want as outputs, so if I shape the inputs using an LLM it's promising. You don't need to automate everything; high level choices should be done by a human.

>>willis+OG
I think that, much like LLM’s are specifically trained to be good at coding and good at being agents, we’re going to need better benchmarks for CAD and spatial reasoning so the AI labs can grind on them.

A good start would be getting image generators to understand instructions like “move the table three feet to the left.”

>>Workac+OG1
That’s not the Turing Test; it’s just vaguely related. The Turing Test is an interactive party game of persuasion and deception, sort of like playing a werewolves versus villagers game. Almost nobody actually plays the game.

Also, the skill of the human opponents matters. There’s a difference between testing a chess bot against randomly selected college undergrads versus chess grandmasters.

Just like jailbreaks are not hard to find, figuring out exploits to get LLM’s to reveal themselves probably wouldn’t be that hard? But to even play the game at all, someone would need to train LLM’s that don’t immediately admit that they’re bots.

>>akobol+qo1
The prompt for the LLM was to respond with short phrases, though. I don't know if that's fair since it hides it when there is useful utility.