diff --git a/src/posts/2024-06-25-ai-code.md b/src/posts/2024-06-25-ai-code.md new file mode 100644 index 0000000..b142105 --- /dev/null +++ b/src/posts/2024-06-25-ai-code.md @@ -0,0 +1,120 @@ +--- +author: "Youwen Wu" +authorTwitter: "@youwen" +desc: "tldr: ai is terrible" +keywords: "ai, llm, chatgpt, copilot" +lang: "en" +title: "the case against AI generated code" +updated: "2024-05-25T12:00:00Z" +--- + +Many software developers believe that the coming "AI revolution" will end the +software industry and ChatGPT will replace all software developers. While I +cannot predict the future, I am convinced that every developer who genuinely +believes the current state of generative AI can meaningfully supplant the work +of the average software developer has only worked on toy projects. See: projects +like [Devin](https://www.codemotion.com/magazine/ai-ml/is-devin-fake/) (recently +shown to be faked). + +No, I am not worried about AI taking my job and trying to discredit it in hopes +of saving the industry. I'm not even surprised by generative AI. I was following +GPT-2 since 2020 and tested the first "Instruct" models by OpenAI in their +playground many months before ChatGPT was released. It's true that these tools +have made a considerable jump in progress in the past 2-3 years, but such is the +nature of pretty much everything in tech. They did not just suddenly become +insanely good. This is the culmination of decades of research and rudimentary +models and the result of us finally having the sheer compute power necessary to +train these models. It's just that the technology has finally become usable, and +ChatGPT made it available to "normies" by making it accessible without calling +an API. There is no evidence to suggest that generative models will continue to +make jumps like they did from GPT-2 to GPT-3. + +Of course, generative AI could prove to be a useful tool to developers. Many +developers already say it is. But I recently removed Copilot autocomplete from +my editor because I found it caused more trouble than it was worth and I'd like +reflect on why. + +## the "LLM software engineer" grift + +The issue with people's perception of AI is that LLMs are fairly capable of +doing a simple task acceptably. People will see ChatGPT solve a Leetcode problem +in Python with the optimal solution and claim programmers have become obsolete. +True, ChatGPT is most likely better than the majority of programmers at Leetcode +as of now. The problem is conflating Leetcode with building real software (also +a mistake made by many software recruiters). Being able to complete multiple +simple well-defined tasks does not translate to being able to construct real +production software. Real code requires knowledge of how to design things to not +suck, and put everything together, which can only be done by something which +actually logically understands what they're doing - which currently does not +include LLMs. If this changes in the future, then perhaps developers really are +doomed. As of now, however, anyone who's tried to make ChatGPT build upon its +own code or iteratively develop larger projects with simple instructions knows +how utterly stupid and useless it is. The key issue lies with how their powerful +abilities simply cannot scale. It really is quite impressive the amount of +"logical understanding" an LLM can simulate in a short conversation. But while a +human engineer exhibits _actual_ logical understanding, the LLM pretends. + +You may say that there is no difference between "true" logic and "pretend" logic +if they achieve the same result. Sure. While I think there's still a +philosophical distinction to be made, practically speaking, "pretend logic" is +probably just as useful as "real logic". The problem is LLMs are not good enough +at pretending. A real engineer can build part of their codebase, test it out, +debug, and iterate on the codebase. They (hopefully) retain understanding of the +previous code they wrote and can improve, extend, and build upon it. Meanwhile, +tell an LLM to build a simple API, and they might do it correctly, but then tell +them to build a frontend that interfaces with it properly, and chances are it +will completely fail. You might be able to coax results out with some prompt +engineering, but then tell it to begin extending the API and frontend into a +non-trivial app and it will completely break down, spitting out fake syntax and +subtly yet catastrophically wrong results. + +There's a reason why Devin had to fake its results. + +## but what about copilot? + +Plugins like Copilot or Codeium simply provide advanced autocomplete suggestions +and help complete localized tasks. Didn't I say that AI is good at completing +simple and well-defined tasks? + +True, I definitely think tools like Copilot are much more useful than the +"ChatGPT full stack software engineer" pipe dream. But these tools present a +different set of issues. First of all, their usefulness is inversely +proportional to your actual programming skill and mastery of a language. They're +really good as suggesting obvious solutions and idiomatic syntax which you might +not know when first getting to know a programming language. Type the beginning +of a common code snippet or function call and it'll fill in and infer the rest. +But this isn't really that useful if you're actually familiar with your language +and your tools. At best, it provides a marginal speed increase. + +## the code just sucks + +The biggest downside of these tools is _they simply write terrible code._ You've +seen them write optimal Leetcode solutions, but as I said, Leetcode solutions +are both readily available and nothing like real software. Think about it. The +average software developer is TypeScript Tommy, who dropped out of Udemy to +pursue his dreams of becoming a React boilerplate developer. This is the code +that Copilot and every other LLM trains on. At the end of the day, LLMs are +simply extremely advanced probability machines and the majority of the code +available for them to train with on the internet is just _awful_. When you're +writing code, do you want a mid-tier developer to constantly be placing +distracting suggestions in front of you? Not only does this terminate your flow +of thought as you come up with your own (likely better) code, but you also have +to review what basically amounts to amateur-level code that is sprayed out +across your editor, and often contains subtle errors which leads you to spend +just as much time reviewing documentation or Googling as just implementing +yourself would have taken. This might work for other amateur devs, but shouldn't +the goal be to master your craft? If you're at a skill level where you need to +rely on LLM suggestions to be productive and they're often better than your own +code, you should probably avoid using these tools in the first place and focus +on improving your own skills first. + +## it's not all bad? + +One concession I will make is that LLM code is great for rapid prototyping where +you don't care at all about code quality and just need something which holds up +for one use and doesn't immediately error out. If you're good at prompting, you +should be able to get an LLM to create shitty prototype code much faster than a +human can. I keep Copilot chat around in my editor for this exact reason. This +is pretty much only useful for toy projects, rudimentary demos, or simple bash +scripting though. Again, bring LLMs to anything that wouldn't fit in a 20 minute +YouTube tutorial, and they completely fail.