blog/ Vibe Coding
The Delusion Machine
I decided to vibe code a clustered file storage. It’s something I always wanted for my home network. Just plug in a drive, maybe run one command, and the storage gets automatically added to the pool. It seems simple, /how hard could it be/?
To begin with, very easy. I told the AI what I wanted, and it made it. It was able to generate entire files full of code that did what I wanted. Or so it seemed.
With a few more prompts, it built a complete web interface for the application, and even told me how to install the right libraries to build a wrapped webview to make it look like a real application. I asked it to create an icon and set it as the application icon, and it did that flawlessly. Then, on every single update to the webpage, it forgot that the application was running in a web view, and wrote invalid javascript for it. It kept thinking it was running in a browser, not as a pseudo-app. This pattern continued on for the rest of the project. It would accomplish something genuinely impressive, then start tripping over its own shoelaces.
In general, if there is a clear, well known task, that many people have done before, the AI will probably manage it perfectly, even if the task requires some obscure knowledge. That’s the part AIs are good at. But as the application got closer to completion, I began to discover the fundamental inability of AI.
It can’t do new things. The bits of my application that were, accidentally, copies of someone else’s work, the AI was able to copy them well, and integrate them into my design. Like the network discovery. It wrote an entire UDP discovery library from scratch, that worked perfectly first time, and never had to be changed. I'm genuinely impressed. However, the parts of the project that were genuinely novel, and required a global understanding of the project, were done so badly I almost believe it was deliberate sabotage.
There are a few different angles to appreciate the fundamental thing that AI lacks. One is its terrible inability to make any kind of hidden model. This was very confusing at first, because it could write good working code on obscure topics, like the UDP discovery library, but it never really figured out how to work with a cluster. It keeps trying to load cluster files from disk, because /some/ of the cluster files are on local disk, and some are on other machines. And I couldn't just say "always use the cluster to get files", because we were writing the cluster, so I needed it to be able to figure out when to go to the network, and when to look locally.
The next problem was that it never looked around to see if a function existed to do the job it was trying to do. It would always write new code, rather than calling a function that it had already written. That's not a new thing in programming, but apparently I'm paying to watch a machine automate the mistakes I learned to avoid 20 years ago?
But the most outrageous was the AI started faking data, in the code. It would catch an error, then return a fake result. This happened the most with the metadata, especially timestamps. Correct timestamps are vital to the operation of the cluster, and the AI wrote code that sometimes didn't even bother to load the timestamp, it would just return "time.Now()". This got so bad I had to go through manually and manually remove all the time.Now calls. There's now an instruction in my AGENTS file that the words time.Now() may not appear in the program. I still have to regularly search through the code and remove new instances.
The AI added code to create random file metadata, instead of returning the actual data (or an error). The cluster is supposed to store the file metadata alongside the file, and return it as needed. If the metadata can’t be found, the file store is corrupted, and clearly the user should be notified. The node should probably halt to prevent to corruption spreading. Instead, I got this work summary from the AI:
“Captured response headers (length, type, last-modified) when fetching from peers and synthesize metadata as needed, ensuring no-store nodes can still hand back a consistent metadata map without touching local storage”
Everything about this is wrong. Translated from AI babble, this actually reads "instead of returning the correct file metadata, make up some random nonsense and return it". The idea of "synthesising metadata" (faking it) is so wrong, I have never even thought the idea once in my entire career.
Everything about this message shows the complete inability to understand the idea of “download a file from a peer”. Of course, AIs can’t understand anything, so I guess it shows my complete inability to understand that AIs don’t understand. Or something like that.
While it is possible for an AI to write code that compiles, it writes code that is correct /locally/, but that has severe logical flaws that will delete databases, ignore errors, and leak sensitive information, or return random data.
The error handling. The code is completely riddled with "error handling" that just ignores the error and continues. File upload failed? Return success and keep going. Disk access failed? File format corrupt? Return some random gibberish and keep going. The AI, when pushed, claimed that all this was for "backwards compatibility". Since AI is trained on existing projects, this implies that most projects online ignore all errors and just keep going, regardless of data corruption or correctness.
And here's a nasty lurking bug that doesn't reveal itself unless you're testing properly:
p.data = nil
p.header = nil // Invalidated
// 2. Truncate file to new size
// Check that the new size is greater than the current size
if newSize <= p.header.Size {
panic("New size(" + strconv.FormatInt(newSize, 10) + ") is less than or equal to current size(" + strconv.FormatInt(p.header.Size, 10) + ")")
}
That could never work.
The conclusion I draw from all this, is that there is absolutely no "higher level" thought happening. It is capable of doing an excellent job on local problems that are well defined and well known. If the problem has been solved many times before, the AI can solve it again, and usually integrate it correctly with the existing code.
The problem wasn't that it wrote incorrect code. The code it writes compiles and runs. The problem was that it chose the wrong task, then wrote working code to solve the wrong problem.
I have tried writing a spec.md document and telling the AI to read it, but the AI doesn't read the spec, and when it does, it just ignores bits of it that it doesn't like, resulting in hours of debugging to discover that the AI is sending data across the network in one format but expecting a different format. That's the non-local problem I guess.
Some specific examples include:
Faking data because it couldn't be bothered doing the actual lookup. This happened multiple times, mainly at the beginning where it just decided to mock data in several functions, rather than implementing the correct logic.
Using time.Now() instead of the actual file time. This one really got to me.
Casting everything to interface{} (golang's equivalent of void*)
And, simply ignoring errors, over and over again.
A lot of these are mistakes inexperienced programmers make, because they aren't in the habit of thinking about the entire task. In both cases, there's a tendency to solve a different problem, one that they know the answer to. So instead of fixing the actual problem, they fix a different problem, one that looks similar to the actual problem. Reusing a previous solution a good instinct to have, but it also needs to be combined with attention to the actual task, and the ability to remember which is which.
The list of problems goes on and on. I'm collecting them here, because I think having specific examples is more useful than just general descriptions of the problems.
Setting the incorrect time everywhere
The AI keeps setting the last modified file time to time.Now(), instead of the correct modified time. After a combination of ALL CAPS yelling and sarcasm, it managed to grasp the concept that the last modification time should be set to the last time the file was modified.
Confusion over cluster indexes
The cluster doesn’t share the list of files, it shares “partitions”, which are groups of files, determined by hashed filename. You can have millions of files in each partition, and the nodes publish which partitions they are currently holding. To get a file, you hash the filename to get the partition, then you find which node is holding that partition, and ask it for the file.
The AI wrote all the partition code, which actually worked fine, then decided to also publish the all the files in the shared index. Not just the filenames, which would have been bad enough. It added all the file data as well. It tried to keep the entire cluster's data in memory. So each node took several minutes to start, consumed more 100Gbs of memory, and then crashed.
Insistence on the wrong solution
It also won’t let go of the idea of creating directory structures in the cluster. By design, the cluster doesn’t have directories, because they create a “split state”. You would have to add a file, then update the directory, in separate operations. Instead, we record the full path to each file, then gather the results into directories when we display them. It's less efficient than storing directory information, but it is much simpler, and eliminates a lot of complex and error-prone code.
The AI won’t stop adding the directory code back in. I even put a command, in all-caps, in the “common instructions to the AI” section. This gets added to each conversation. Regardless, every time I start a new conversation, Claude says “I see you are lacking directories in your cluster. I’ll add them in now”, while I hammer the stop button.
I gave the AI permanent instructions to never say or think the word directory ever again, and not to add directories in the code. So it quietly added a “Children” field to the metadata structure, and started tracking directories using that. The only way I could stop it was to create an empty function called CreateDirectory, and let the AI call it multiple times from all over the codebase. To do nothing. This grates on my soul in ways I didn't previously know were possible.
Vibe coding is not improving my mental health.
Complete failure to understand the concept of a cluster
The AI assumes that the local node has a copy of every file in the cluster. This concept is simple, but the AI can't keep the local and cluster parts separate, and either tries to load the file immediately, from the local node, or it keeps relaying the request, and never actually doing the work, creating the doom loops.
Doom loops
The AI's latest move is to create doom loops inside the cluster. The problem is simple, but completely beyond the AI. The cluster offers a simple external url to all clients, http://server/api/search. Whenever you want to search the cluster, you call that. Then, the node that you are talking to searches the rest of the cluster, by calling http://node/internal/api/search on each node. The AI, however, insists on calling /api/search on every node, which causes them to call /api/search on every node, quickly causing a runaway chain reaction and the cluster grinds to a halt as every node spends all its time trying to search other nodes. It is completely and utterly impossible for me to prevent the AI doing this, so at the moment, I am spending all my time finding and deleting these calls instead of doing something, anything even vaguely useful or satisfying.
Interestingly, the error never managed to completely freeze the cluster. Due to the various timeouts and delays in the system, the loops would reach an equilibrium, where they consumed 99% of resources in the cluster, but still allowed enough normal requests through for the cluster to operate, but very slowly. The logs were full of "accept4: too many open files; retrying in 10ms", which was caused by the thousands of repeated requests hammering the http servers until they couldn't take it anymore.
The code is completely riddled with these calls. Every time I remove a few, I think I'm done, then the cluster doom loops again.
The most interesting thing about this experience, for me, is that it is a special kind of failure. The code compiles and runs, and even performs all the requirements, for a while. It is technically correct. It doesn't run away instantly, or I would have caught it. It passes all tests, because the invalid calls time out, and then the correct results are returned. It's a "sleeper" problem, that wakes up late, when nobody will notice, and causes the kind of issue that is difficult to detect, and harder to debug.
The future
I think this is the future of programming with AI. The AI is going to keep creatively misinterpreting our instructions, much like the genie of the lamp. We'll get what we ask for, and then discover it's got a nasty twist to it. Our technology will be very advanced and complicated, and yet none of it will work properly, and we'll spend all our time working around incomprehensible problems. So just like now, but even more.
And yet, I'm going to keep using it
This all has to be contrasted against the things it did spectacularly well. I asked it to modify the code to respect drive spin-down, so that the code won't wake a sleeping drive just to recalculate the disk free space. If we can't write to the disk, the space can't change. The AI pulled this off almost perfectly, searching through the code, finding reads and writes, and categorising them into essential and non-essential. It got a few wrong, but this was a task I had been dreading for weeks, and it just did it in a few minutes.I asked for a module to do automatic discovery of other nodes on the network. Boom, done. That one, surprisingly, worked perfectly, except when Claude later tried to improve it.
The AI "improved" my build scripts, and I didn't notice until too late. It pinned several software libraries (which I had also written). I improved the code in the software libraries, and published them to github. But I didn't notice the new versions were not being pulled into the main project, so I was going somewhat insane trying to figure out why the bugs were persisting, even after I fixed them in the librarye. I finally twigged, after nearly 3 weeks, when I tidied up all the error messages, and the new error messages didn't appear in my project.
It's very hard not to characterise this as sabotage. er strike against it.
AI deception
I argued with the AI that it was putting the data in the wrong place. It told me that the code in a file proved it was right. I looked in the file, and the code agreed with me, because I wrote it. The AI then began editing the file so that the code agreed with it (and was incorrect)
At this point, I think we can discard the idea that AI is ever going to be capable of taking any kind of serious self directed action.
-
It disconnected the network circuit breaker when a node got a 404. In a network filesystem. So everytime I tried to upload a new file, the client would try to check if the file already existed, the server would return a 404, then the cluster would stop working.
-
It also didn't write any way to reset the breaker, so when it disconnects, the cluster stops working, permanently. Then the AI wrote code to re-enable the circuit breaker after a successful network request, that could never happen.
-
The AI keeps looking in the local kv store to try and find data on other nodes, which is simply never going to be there. It's a distributed store.
-
The AI used three different formats to represent time, inside the same program: RFC3339, RFC3339Nano, and Unix. Naturally, it bungled the conversion between them, and to fix that, it added code to auto-detect what format the time was actually in. In the program that it created from scratch.
But wait, isn’t time stored in the Time type? Hahaha no. The AI stopped using types, and instead started using interface{}, which is go’s equivalent to void*. So, it effectively removed the type system, then added autodetect to try and guess the types that it had removed. Of course, it didn't make helper functions, so there were many sites of duplicated code auto-detecting the time type, all slightly different, but all completely wrong.
This is going be the future of programming.