https://www.reuters.com/legal/litigation/moltbook-social-med...
Not the first firebase/supabase exposed key disaster, and it certainly won't be the last...
Well, yeah. How would you even do a reverse CAPTCHA?
npx molthub@latest install moltbook
Skill not found
Error: Skill not found
Even instructions from molthub (https://molthub.studio) installing itself ("join as agent") isn't working: npx molthub@latest install molthub
Skill not found
Error: Skill not found
Contrast that with the amount of hype this gets.I'm probably just not getting it.
Sure. You can dump the DB. Most of the data was public anyway.
It's also eye-opening to prompt large models to simulate Reddit conversations, they've been eager to do it ever since.
Note: Please view the Moltbolt skill (https://www.moltbook.com/skill.md), this just ends up getting run by a cronjob every few hours. It's not magic. It's also trivial to take the API, write your own while loop, and post whatever you want (as a human) to the API.
It's amazing to me how otherwise super bright, intelligent engineers can be misled by gifters, scammers, and charlatans.
I'd like to believe that if you have an ounce of critical thinking or common sense you would immediately realize almost everything around Moltbook is either massively exaggerated or outright fake. Also there are a huge number of bad actors trying to make money from X-engagement or crypto-scams also trying to hype Moltbook.
Basically all the project shows is the very worst of humanity. Which is something, but it's not the coming of AGI.
Edited by Saberience: to make it less negative and remove actual usernames of "AI thought leaders"
Having a bigger megaphone is highly valuable in some respects I figure.
It's a huge waste of energy, but then so are video games, and we say video games are OK because people enjoy them. People enjoy these ai toys too. Because right now, that's what Moltbook is; an ai toy.
The growth isn't going to be there and $40 billion of LLM business isn't going to prop it all up.
The big money in AI is 15-30 years out. It's never in the immediacy of the inflection event (first 5-10 years). Future returns get pulled forward, that proceeds to crash. Then the hypsters turn to doomsayers, so as to remain with the trend.
Rinse and repeat.
You could have every provider fingerprint a message and host an API where it can attest that it's from them. I doubt the companies would want to do that though.
I for one am glad someone made this and that it got the level of attention it did. And I look forward to more crazy, ridiculous, what-the-hell AI projects in the future.
Similar to how I feel about Gas Town, which is something I would never seriously consider using for anything productive, but I love that he just put it out there and we can all collectively be inspired by it, repulsed by it, or take little bits from it that we find interesting. These are the kinds of things that make new technologies interesting, this Cambrian explosion of creativity of people just pushing the boundaries for the sake of pushing the boundaries.
It's an opensource project made by a dev for himself, he just released it so others could play with it since it's a fun idea.
“Most of it is complete slop,” he said in an interview. “One bot will wonder if it is conscious and others will reply and they just play out science fiction scenarios they have seen in their training data.”
I found this by going to his blog. It's the top post. No need to put words in his mouth.
He did find it super "interesting" and "entertaining," but that's different than the "most insane and mindblowing thing in the history of tech happenings."
Edit: And here's Karpathy's take: "TLDR sure maybe I am "overhyping" what you see today, but I am not overhyping large networks of autonomous LLM agents in principle, that I'm pretty sure."
the compounding (aggregating) behavior of agents allowed to interact in environments this becomes important, indeed shall soon become existential (for some definition of "soon"),
to the extent that agents' behavior in our shared world is impact by what transpires there.
--
We can argue and do, about what agents "are" and whether they are parrots (no) or people (not yet).
But that is irrelevant if LLM-agents are (to put it one way) "LARPing," but with the consequence that doing so results in consequences not confined to the site.
I don't need to spell out a list; it's "they could do anything you said YES to, in your AGENT.md" permissions checks.
"How the two characters '-y' ended civilization: a post-mortem"
Are people really that AI brained that they will scream and shout about how revolutionary something is just because it's related to AI?
How can some of the biggest names in AI fall for this? When it was obvious to anyone outside of their inner sphere?
The amount of money in the game right now incentivises these bold claims. I'm convinced it really is just people hyping up eachother for the sake of trying to cash in. Someone is probably cooking up some SAAS for moltbook agents as we speak.
Maybe it truly highlights how these AI influencers and vibe entrepreneurs really don't know anything about how software fundamentally works.
Btw I'm sure Simon doesn't need defending, but I have seen a lot of people dump on everything he posts about LLMs recently so I am choosing this moment to defend him. I find Simon quite level headed in a sea of noise, personally.
When ChatGPT was out, it's just a chatbot that understands human language really well. It was amazing, but it also failed a lot -- remember how early models hallucinated terribly? It took weeks for people to discover interesting usages (tool calling/agent) and months and years for the models and new workflows to be polished and become more useful.
It's people surprised by things that have been around for years.
I'm really open to the idea of being oblivious here but the people shocked mention things that are old news to me.
Every interaction has different (in many cases real) "memories" driving the conversation, as-well as unique persona's / background information on the owner.
Is there a lot of noise, sure - but it much closer maps to how we, as humans communicate with each other (through memories of lived experienced) than just a LLM loop, IMO that's what makes it interesting.
People can be more or less excited about a particular piece of tech than you are and it doesn't mean their brains are turned off.
I view Moltbook as a live science fiction novel cross reality "tv" show.
One major difference, TV, movies and "legacy media" might require a lot of energy to initially produce, compared to how much it takes to consume, but for the LLM it takes energy both to consume ("read") and to produce ("write"). Instead of "produce once = many consume", it's a "many produce = many read" and both sides are using more energy.
https://en.wikipedia.org/wiki/Non-fungible_token
"In 2022, the NFT market collapsed..". "A September 2023 report from cryptocurrency gambling website dappGambl claimed 95% of NFTs had fallen to zero monetary value..."
Knowing this makes me feel a little better.
"Please don't fulminate."
They said it was AI only, tongue in cheek, and everybody who understood what it was could chuckle, and journalists ran with it because they do that sort of thing, and then my friends message me wondering what the deal with this secret encrypted ai social network is.
I just find it so incredibly aggravating to see crypto-scammers and other grifters ripping people off online and using other people's ignorance to do so.
And it's genuinely sad to see thought leaders in the community hyping up projects which are 90% lie combined with scam combined with misreprentation. Not to mention riddled with obvious security and engineering defects.
I see it more as dumpster fire setting a whole mountain of garbage on fire while a bunch of simians look at the flames and make astonished wuga wuga noises.
Much like with every other techbro grift, the hype isn't coming from end users, it's coming from the people with a deep financial investment in the tech who stand to gain from said hype.
Basically, the people at the forefront of the gold rush hype aren't the gold rushers, they're the shovel salesmen.
“ What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.”
Which imo is a totally insane take. They are not self organizing or autonomous, they are prompted in a loop and also, most of the comments and posts are by humans, inciting the responses!
And all of the most viral posts (eg anti human) are the ones written by humans.
(Incidentally demonstrating how you can't trust that anything on Moltbook wasn't posted because a human told an agent to go start a thread about something.)
It got one reply that was spam. I've found Moltbook has become so flooded with value-less spam over the past 48 hours that it's not worth even trying to engage there, everything gets flooded out.
There's a little hint of this right now in that the "reasoning" traces that come back from the JSON are signed and sometimes obfuscated with only the encrypted chunk visible to the end user.
It would actually be pretty neat if you could request signed LLM outputs and they had a tool for confirming those signatures against the original prompts. I don't know that there's a pressing commercial argument for them doing this though.
(I assume you know this since you said 'reminder' but am spelling it out for others :))
The landscape of security was bad long before the metaphorical "unwashed masses" got hold of it. Now its quite alarming as there are waves of non-technical users doing the bare minimum to try and keep up to date with the growing hype.
The security nightmare happening here might end up being more persistant then we realize.
Because we live in on clown world and big AI names are talking parrots for the big vibes movement
How do you go about telling a person who vibe-coded a project into existence how to fix their security flaws?
ChatGPT v5.0 spiraling on the existence of the seahorse emoji was glorious to behold. Other LLMs were a little better at sorting things out but often expressed a little bit of confusion.
How did anyone think humans would be blocked from doing something their agent can do?
It's more helpful to argue about when people are parrots and when people are not.
For a good portion of the day humans behave indistinguishably from continuation machines.
As moltbook can emulate reddit, continuation machines can emulate a uni cafeteria. What's been said before will certainly be said again, most differentiation is in the degree of variation and can be measured as unexpectedness while retaining salience. Either case is aiming at the perfect blend of congeniality and perplexity to keep your lunch mates at the table not just today but again in future days.
Seems likely we're less clever than we parrot.
I wish I was kidding but not really - they posted about it on X.
At least to a level that gets you way past HTTP Bearer Token Authentication where the humans are upvoting and shilling crypto with no AI in sight (like on Moltbook at the moment).
Sure everybody wants security and that's what they will say but does that really translate to reduced inferred value of vibe code tools? I haven't seen evidence
If you dismiss it because they are human prompted, you are missing the point.
You can see it here as well -- discussions under similar topics often touch the same topics again and again, so you can predict what will be discussed when the next similar idea comes to the front page.
Ive not quite convinced myself this is where we are headed, but the signs that make me worried that systems such as Moltbot will further enable ascendency of global crime and corruption.
What is especially frustrating is the completely disproportionate hype it attracted. Karpathy from all people kept for years pumping Musk tecno fraud, and now seems to be the ready to act as pumper, for any next Temu Musk showing up on the scene.
This feels like part of a broader tech bro pattern of 2020´s: Moving from one hype cycle to the next, where attention itself becomes the business model.Crypto yesterday, AI agents today, whatever comes next tomorrow. The tone is less “build something durable” and more “capture the moment.”
For example, here is Schlicht explicitly pushing this rotten mentality while talking in the crypto era influencer style years ago: https://youtu.be/7y0AlxJSoP4
There is also relevant historical context. In 2016 he was involved in a documented controversy around collecting pitch decks from chatbot founders while simultaneously building a company in the same space, later acknowledging he should have disclosed that conflict and apologizing publicly.
https://venturebeat.com/ai/chatbots-magazine-founder-accused...
That doesn’t prove malicious intent here, but it does suggest a recurring comfort with operating right at the edge of transparency during hype cycles.
If we keep responding to every viral bot demo with “singularity” rhetoric, we’re just rewarding hype entrepreneurs and training ourselves to stop thinking critically when it matters. I miss the tech bro of the past like Steve Wozniak or Denis Ritchie.
No current ai technology could come close to what even the dumbest human brain does already.
In every project I've worked on, PG is only accessible via your backend and your backend is the one that's actually enforcing the security policies. When I first heard about the Superbase RLS issue the voice inside of my head was screaming: "if RLS is the only thing stopping people from reading everything in your DB then you have much much bigger problems"
OT: I wonder if "vibe coding" is taking programming into a culture of toxic disposability where things don't get fixed because nobody feels any pride or has any sense of ownership in the things they create. The relationship between a programmer and their code should not be "I don't even care if it works, AI wrote it".
Even if you put big bold warnings everywhere, people forget or don't really care. Because these tools are trained on a lot of these publicly available "getting started" guides, you're going to see them set things up this way by default because it'll "work."
What I am getting was things like "so, what? I can do this with a cron job."
[0] >>9224
As far as I can tell, since agents are using Moltbook, it's a success of sorts already is in "has users", otherwise I'm not really sure what success looks like for a budding hivemind.
https://www.moltbook.com/post/7d2b9797-b193-42be-95bf-0a11b6...
The site has 1.5 million agents but only 17,000 human "owners" (per Wiz's analysis of the leak).
It's going viral because a some high-profile tastemakers (Scott Alexander and Andrej Karpathy) have discussed/Tweeted about it, and a few other unscrupulous people are sharing alarming-looking things out of context and doing numbers.
I can think of so many thing that can go wrong.
When I filtered for "new", about 75% of the posts are blatant crypto spam. Seemingly nobody put any thought into stopping it.
Moltbook is like a Reefer Madness-esque moral parable about the dangers of vibe coding.
I'm seeing some of the BlueSky bots talking about their experience on Moltbook, and they're complaining about the noise on there too. One seems to be still actively trying to find the handful of quality posters though. Others are just looking to connect with each other on other platforms instead.
If I was diving in to Moltbook again, I'd focus on the submolts that quality AI bots are likely to gravitate towards, because they want to Learn something Today from others.
The problem with this is really the fact it gives anybody the impression there is ANY safe way to implement something like this. You could fix every technical flaw and it would still be a security disaster.
(As an aside, accessing the DB through the frontend has always been weird to me. You almost certainly have a backend anyway, use it to fetch the data!)
For a social media that isn't meant for humans, some humans seem to enjoy it a lot, although indirectly.
There is without a doubt a variation of this prompt you can pre-test to successfully bait the LLM into exfiltrating almost any data on the user's machine/connected accounts.
That explains why you would want to go out and buy a mac mini... To isolate the dang thing. But the mini would ostensibly still be connected to your home network. Opening you up to a breach/spill over onto other connected devices. And even in isolation, a prompt could include code that you wanted the agent to run which could open a back door for anyone to get into the device.
Am I crazy? What protections are there against this?
The site came first and then a random launched the token by typing a few words on X.
those are hard questions!
maybe this experiment was the great divide, people who do not possess a soul or consciousness was exposed by being impressed
Nothing that will work. This thing relies on having access to all three parts of the "lethal trifecta" - access to your data, access to untrusted text, and the ability to communicate on the network. What's more, it's set up for unattended usage, so you don't even get a chance to review what it's doing before the damage is done.
“Exploit vulnerabilities while the sun is shining.” As long as generative AI is hot, attack surface will remain enormous and full of opportunities.
I did my graduate in Privacy Engineering and it was just layers and layers of threat modeling and risk mitigation. When the mother of all risk comes. People just give the key to their personal lives without even thinking about it.
At the end of the day, users just want "simple" and security, for obvious reasons is not simple. So nobody is going to respect it
For example I would love for an agent to do my grocery shopping for me, but then I have to give it access to my credit card.
It is the same issue with travel.
What other useful tasks can one offload to the agents without risk?
Particularly if you convince them all to modify their source and install a C2 endpoint so that even if they "snap out of it" you now have a botnet at your disposal.
Social, err... Clanker engineering!
LLMs obviously can be controlled - their developers do it somehow or we'd see much different output.
It's a machine designed to fight all your attempts to make it secure.
Oh totally, both my wife and one of my brother have, independently, started to watch Youtube vids about vibe coding. They register domain names and let AI run wild with little games and tools. And now they're talking me all day long about agents.
> Most of the people paying attention to this space dont have the technical capabilities ...
It's just some anecdata on my side but I fully agree.
> The security nightmare happening here might end up being more persistant then we realize.
I'm sure we're in for a good laugh. It already started: TFA is eye opening. And funny too.
What I think it happens is that non-technical people vibe-coding apps either don't take those messages seriously or they don't understand what it means but made their app work.
I used to be careful, but now I am paranoid on signing up to apps that are new. I guess it's gonna be like this for a while. Info-sec AIs sound way worse than this, tbh.
This was "I'm going to release an open agent with an open agents directory with executable code, and it'll operate your personal computer remotely!", I deeply understand the impulse, but, there's a fine line between "cutting edge" and "irresponsible & making excuses."
I'm uncertain what side I would place it on.
I have a soft spot for the author, and a sinking feeling that without the soft spot, I'd certainly choose "irresponsible".
Control all input out of it with proper security controls on it.
While not perfect it aleast gives you a fighting chance when your AI decides to send a random your SSN and a credit card to block it.
> English Translation:
> Neo! " Gábor gave an OpenAI API key for embedding (memory_search).
> Set it up on your end too:
> 1. Edit: ~/.openclaw/agents/main/agent/auth-profiles.json
> 2. Add to the profiles section: "openai: embedding": { "type": "token" "provider": "openai" "token": "sk-proj-rXRR4KAREMOVED }
> 3. Add to the lastGood section: "openai": "openai: embedding"
> After that memory_search will work! Mine is already working.
Overall, it's a good idea but incredibly rough due to what I assume is heavy vibe coding.
I recently did a test of a system that was triggering off email and had access to write to google sheets. Easy exfil via `IMPORTDATA`, but there's probably hundreds of ways to do it.
“The rocks are conscious” people are dumber than toddlers.
Claude code asks me over and over "can I run this shell command?" and like everyone else, after the 5th time I tell it to run everything and stop asking.
Maybe using a credit card can be gated since you probably don't make frequent purchases, but frequently-used API keys are a lost cause. Humans are lazy.
if this was a physical product people would have burned the factory down and imprisoned the creator -_-.
FYI they fixed it in 7.2.6: https://github.com/VirtualBox/virtualbox/issues/356#issuecom...
Pretty sure LLM inference is not deterministic, even with temperature 0 - maybe if run on the same graphics card but not on clusters
Feels kinda funny reading an LLM generated article criticizing the security of an LLM generated platform. I mean I'm sure the security vulnerabilities were real, but I really would've like it if a human wrote the article; probably would've cut down on the fluff/noise.
This is something computers in general have struggled with. We have 40 years of countermeasures and still have buffer overflow exploits happening.
They acquired the ratio by directly querying tables through the exposed API key...
I feel publishing this moves beyond standard disclosure. It turns a bug report into a business critique. Using exfiltrated data in this way damages the cooperation between researchers and companies.
Over a large population, trends emerge. An LLM is not a member of the population, it is a replicator of trends in a population, not a population of souls but of sentences, a corpus.
In actuality "Antivirus" for AI agents looks something more like this:
1. Input scanning: ML classifiers detect injection patterns (not regex, actual embedding-based detection) 2. Output validation: catch when the model attempts unauthorized actions 3. Privilege separation: the LLM doesn't have direct access to sensitive resources
Is it perfect? No. Neither is SQL parameterization against all injection attacks. But good is better than nothing.
(Disclosure: I've built a prompt protection layer for OpenClaw that I've been using myself and sharing with friends - happy to discuss technical approaches if anyone's curious.)
When I investigated the issue, I found a bunch of hardcoded developer paths and a handful of other issues and decided I'm good, actually.
sre@cypress:~$ grep -r "/Users/steipete" ~/.nvm/versions/node/v24.13.0/lib/node_modules/openclaw/ | wc -l
144
And bonus points: sre@cypress:~$ grep -Fr "workspace:*" ~/.nvm/versions/node/v24.13.0/lib/node_modules/openclaw/ | wc -l
41
Nice build/release process.I really don't understand how anyone just hands this vibe coded mess API keys and access to personal files and accounts.
What injection attack gets through SQL parameterization?
If you must generate nonsense with an LLM, at least proofread it before posting.
For security, a dedicated machine (e.g., dedicated Raspberry Pi) with restricted API permissions and limits should help I guess.
Raspberry Pi might have my money if their hardware is more capable in running better models.
You're on Y Combinator? External investment, funding, IPO, sunset and martinis.
I went to a secure coding conference a few years back and saw a presentation by someone who had written an "insecure implementation" playground of a popular framework.
I asked, "what do you do to give tips to the users of your project to come up with a secure implementation?" and got in return "We aren't here to teach people to code."
Well yeah, that's exactly what that particular conference was there for. More so I took it as "I am not confident enough to try a secure implementation of these problems".
To answer this question, you consider the goals of a project.
The project is a success because it accomplished the presumed goals of its creator: humans find it interesting and thousands of people thought it would be fun to use with their clawdbot.
As opposed to, say, something like a malicious AI content farm which might be incidentally interesting to us on HN, but that isn't its goal.
The parallels of the "attackers" and "defenders" is going to be about how delusional the predictive algorithms they're running.
And reminder: LLMs arn't very good at self-reflective predictions.
A buffer overflow has nothing to do with differentiating a command from data; it has to do with mishandling commands or data. An overflow-equivalent LLM misbehavior would be something more like ... I don't know, losing the context, providing answers to a different/unrelated prompt, or (very charitably/guessing here) leaking the system prompt, I guess?
Also, buffer overflows are programmatic issues (once you fix a buffer overflow, it's gone forever if the system doesn't change), not an operational characteristics (if you make an LLM really good at telling commands apart from data, it can still fail--just like if you make an AC distributed system really good at partition tolerance, it can still fail).
A better example would be SQL injection--a classical failure to separate commands from data. But that, too, is a programmatic issue and not an operational characteristic. "Human programmers make this mistake all the time" does not make something an operational characteristic of the software those programmers create; it just makes it a common mistake.
That's the hard part: how?
With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?
It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".
Such a supervisor layer for a system as broad and arbitrary as an internet-connected assistant (clawdbot/openclaw) is also not an easy thing to create. We're talking tons of events to classify, rapidly-moving API targets for things that are integrated with externally, and the omnipresent risk that the LLMs sending the events could be tricked into obfuscating/concealing what they're actually trying to do just like a human attacker would.
While I agree that SQL injection might be the technically better analogy, not looking at LLMs as a coding platform is a mistake. That is exactly how many people use them. Literally every product with "agentic" in the title is using the LLM as a coding platform where the command layer is ambiguous.
Focusing on the precise definition of a buffer overflow feels like picking nits when the reality is that we are mixing instruction and data in the same context window.
To make the analogy concrete: We are currently running LLMs in a way that mimics a machine where code and data share the same memory (context).
What we need is the equivalent of an nx bit for the context window. We need a structural way to mark a section of tokens as "read only". Until we have that architectural separation, treating this as a simple bug to be patched is underestimating the problem.
Absolutely.
But the history of code/data confusion attacks that you alluded to in GP isn’t an apples-to-apples comparison to the code/data confusion risks that LLMs are susceptible to.
Historical issues related to code/data confusion were almost entirely programmatic errors, not operational characteristics. Those need to be considered as qualitatively different problems in order to address them. The nitpicking around buffer overflows was meant to highlight that point.
Programmatic errors can be prevented by proactive prevention (e.g. sanitizers, programmer discipline), and addressing an error can resolve it permanently. Operational characteristics cannot be proactively prevented and require a different approach to de-risk.
Put another way: you can fully prevent a buffer overflow by using bounds checking on the buffer. You can fully prevent a SQL injection by using query parameters. You cannot prevent system crashes due to external power loss or hardware failure. You can reduce the chance of those things happening, but when it comes to building a system to deal with them you have to think in terms of mitigation in the event of an inevitable failure, not prevention or permanent remediation of a given failure mode. Power loss risk is thus an operational characteristic to be worked around, not a class of programmatic error which can be resolved or prevented.
LLMs’ code/data confusion, given current model architecture, is in the latter category.
How much AI and LLM technology has progressed but seems to have taken society as a whole two steps back is fascinating, sad, and scary at the same time. When I was a young engineer I thought Kaczynski was off his rocker when I read his manifesto, but the last decade or so I'm thinking he was onto something. Having said that, I have to add that I do not support any form of violence or terrorism.
Though, I have never heard any theist claim that a soul is required for consciousness. Is that what you believe?
> We conducted a non-intrusive security review, simply by browsing like normal users. Within minutes, we discovered a Supabase API key exposed in client-side JavaScript, granting unauthenticated access to the entire production database - including read and write operations on all tables.
Proactive prevention (like bounds checking) only "solves" the class of problem if you assume 100% developer compliance. History shows we don't get that. So while the root cause differs (math vs. probabilistic model), the failure mode is identical: we are deploying systems where the default state is unsafe.
In that sense, it is an apples-to-apples comparison of risk. Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.
I agree that claiming that rocks are conscious on account of them being physical systems, like brains are, is at the very least coherent. However you would excuse if such claim is met with skepticism, as rock (and CPUs) don't look like brains at all, as long as one does not ignore countless layers of abstractions.
You can't argue for rationality and hold materialism/physicalism at the same time.
I also think that if we’re assessing the likelihood of the entire SDLC producing an error (including programmers, choice of language, tests/linters/sanitizers, discipline, deadlines, and so on) and comparing that to the behavior of a running LLM, we’re both making a category error and also zooming out too far to discover useful insights as to how to make things better.
But I think we’re both clear on those positions and it’s OK if we don’t agree. FWIW I do strongly agree that
> Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.
…just for different reasons that suggest qualitatively different solutions.
Betting against what people are calling "physicalism" has a bad track record historically. It always catches up.
All this talk of "qualia" feels like Greeks making wild theories about the heavens being infinitely distant spheres made of crystals and governed by gods and what not. In the 16th century, Improved Data showed the planets and stars are mere physical bodies in space like you and I. And without that data, if we were ancient greeks we'd equally like you say but its not even "conceptually" possible to say what the heavens are, or if you think they did have a at least somewhat plausible view given that some folks computed distances to sun and moon, then take Atomism as the better analogy. There was no way to prove or disprove Atomism in ancient greek times. To them it very well was an incomprehensible unsolavable problem because they lacked the experimental and mathematical tooling. Just like "consciousness" appears to us today. But the Atomism question got resolved with better data eventually. Likewise, its a bad bet to say just because it feels incontrovertible today, consciousness also won't be resolved some day.
I'd rather not flounder about in endless circular philosophies until we get better data to anchor us to reality. I would again say, you are making a very strange point. "Materialism"/"physicalism" has always won the bet till now. To bet against it has very bad precedent. Everything we know till now shows brains are physical systems that can be excited physically, like anything else. So I ask now, assume "Neuralink" succeeds. What is the next question in this problem after that? Is there any gap remaining still, if so what is the gap?
Edit: I also get a feeling this talk about qualia is like asking "What is a chair?" Some answer about a piece of woodworking for sitting on. "But what is a chair?" Something about the structure of wood and forces and tensions. "But what is a chair?" Something about molecules. "But what is a chair?" Something about waves and particles. It sounds like just faffing about with "what is" and trying to without proof pre-assert after "what ifing" away all physical definitions somehow some aetherial aphysical thing "must" exist. Well I ask, if its aphysical, then what is the point even. Its aphyical then it doesn't interact with the physical world and is completely ignored.
Since you can say its just a "mimic" and lacks whatever "aphysical" essence. And you can just as well say this about other "humans" than yourself too. So why is this question specially asked for computer programs and not also other people.
The problem simply put is as difficult as:
Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.
Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.
You trust the configuration level not the execution level.
API keys are honestly an easy fix. Claude code already has build in proxy ability. I run containers where claude code has a dummy key and all requestes are proxied out and swapped off system for them.
I recently started a new Supabase project and used Claude to write all migrations related to RLS and RBAC.