content: first draft of AI post

2024-11-28 11:23:49 -08:00 · 2024-06-25 02:44:44 -07:00 · 2024-06-25 02:44:44 -07:00 · e8db015769
commit e8db015769
parent 88d69e0002
1 changed files with 120 additions and 0 deletions
--- a/src/posts/2024-06-25-ai-code.md
+++ b/src/posts/2024-06-25-ai-code.md
@ -0,0 +1,120 @@
 ---
 author: "Youwen Wu"
 authorTwitter: "@youwen"
 desc: "tldr: ai is terrible"
 keywords: "ai, llm, chatgpt, copilot"
 lang: "en"
 title: "the case against AI generated code"
 updated: "2024-05-25T12:00:00Z"
 ---
 Many software developers believe that the coming "AI revolution" will end the
 software industry and ChatGPT will replace all software developers. While I
 cannot predict the future, I am convinced that every developer who genuinely
 believes the current state of generative AI can meaningfully supplant the work
 of the average software developer has only worked on toy projects. See: projects
 like [Devin](https://www.codemotion.com/magazine/ai-ml/is-devin-fake/) (recently
 shown to be faked).
 No, I am not worried about AI taking my job and trying to discredit it in hopes
 of saving the industry. I'm not even surprised by generative AI. I was following
 GPT-2 since 2020 and tested the first "Instruct" models by OpenAI in their
 playground many months before ChatGPT was released. It's true that these tools
 have made a considerable jump in progress in the past 2-3 years, but such is the
 nature of pretty much everything in tech. They did not just suddenly become
 insanely good. This is the culmination of decades of research and rudimentary
 models and the result of us finally having the sheer compute power necessary to
 train these models. It's just that the technology has finally become usable, and
 ChatGPT made it available to "normies" by making it accessible without calling
 an API. There is no evidence to suggest that generative models will continue to
 make jumps like they did from GPT-2 to GPT-3.
 Of course, generative AI could prove to be a useful tool to developers. Many
 developers already say it is. But I recently removed Copilot autocomplete from
 my editor because I found it caused more trouble than it was worth and I'd like
 reflect on why.
 ## the "LLM software engineer" grift
 The issue with people's perception of AI is that LLMs are fairly capable of
 doing a simple task acceptably. People will see ChatGPT solve a Leetcode problem
 in Python with the optimal solution and claim programmers have become obsolete.
 True, ChatGPT is most likely better than the majority of programmers at Leetcode
 as of now. The problem is conflating Leetcode with building real software (also
 a mistake made by many software recruiters). Being able to complete multiple
 simple well-defined tasks does not translate to being able to construct real
 production software. Real code requires knowledge of how to design things to not
 suck, and put everything together, which can only be done by something which
 actually logically understands what they're doing - which currently does not
 include LLMs. If this changes in the future, then perhaps developers really are
 doomed. As of now, however, anyone who's tried to make ChatGPT build upon its
 own code or iteratively develop larger projects with simple instructions knows
 how utterly stupid and useless it is. The key issue lies with how their powerful
 abilities simply cannot scale. It really is quite impressive the amount of
 "logical understanding" an LLM can simulate in a short conversation. But while a
 human engineer exhibits _actual_ logical understanding, the LLM pretends.
 You may say that there is no difference between "true" logic and "pretend" logic
 if they achieve the same result. Sure. While I think there's still a
 philosophical distinction to be made, practically speaking, "pretend logic" is
 probably just as useful as "real logic". The problem is LLMs are not good enough
 at pretending. A real engineer can build part of their codebase, test it out,
 debug, and iterate on the codebase. They (hopefully) retain understanding of the
 previous code they wrote and can improve, extend, and build upon it. Meanwhile,
 tell an LLM to build a simple API, and they might do it correctly, but then tell
 them to build a frontend that interfaces with it properly, and chances are it
 will completely fail. You might be able to coax results out with some prompt
 engineering, but then tell it to begin extending the API and frontend into a
 non-trivial app and it will completely break down, spitting out fake syntax and
 subtly yet catastrophically wrong results.
 There's a reason why Devin had to fake its results.
 ## but what about copilot?
 Plugins like Copilot or Codeium simply provide advanced autocomplete suggestions
 and help complete localized tasks. Didn't I say that AI is good at completing
 simple and well-defined tasks?
 True, I definitely think tools like Copilot are much more useful than the
 "ChatGPT full stack software engineer" pipe dream. But these tools present a
 different set of issues. First of all, their usefulness is inversely
 proportional to your actual programming skill and mastery of a language. They're
 really good as suggesting obvious solutions and idiomatic syntax which you might
 not know when first getting to know a programming language. Type the beginning
 of a common code snippet or function call and it'll fill in and infer the rest.
 But this isn't really that useful if you're actually familiar with your language
 and your tools. At best, it provides a marginal speed increase.
 ## the code just sucks
 The biggest downside of these tools is _they simply write terrible code._ You've
 seen them write optimal Leetcode solutions, but as I said, Leetcode solutions
 are both readily available and nothing like real software. Think about it. The
 average software developer is TypeScript Tommy, who dropped out of Udemy to
 pursue his dreams of becoming a React boilerplate developer. This is the code
 that Copilot and every other LLM trains on. At the end of the day, LLMs are
 simply extremely advanced probability machines and the majority of the code
 available for them to train with on the internet is just _awful_. When you're
 writing code, do you want a mid-tier developer to constantly be placing
 distracting suggestions in front of you? Not only does this terminate your flow
 of thought as you come up with your own (likely better) code, but you also have
 to review what basically amounts to amateur-level code that is sprayed out
 across your editor, and often contains subtle errors which leads you to spend
 just as much time reviewing documentation or Googling as just implementing
 yourself would have taken. This might work for other amateur devs, but shouldn't
 the goal be to master your craft? If you're at a skill level where you need to
 rely on LLM suggestions to be productive and they're often better than your own
 code, you should probably avoid using these tools in the first place and focus
 on improving your own skills first.
 ## it's not all bad?
 One concession I will make is that LLM code is great for rapid prototyping where
 you don't care at all about code quality and just need something which holds up
 for one use and doesn't immediately error out. If you're good at prompting, you
 should be able to get an LLM to create shitty prototype code much faster than a
 human can. I keep Copilot chat around in my editor for this exact reason. This
 is pretty much only useful for toy projects, rudimentary demos, or simple bash
 scripting though. Again, bring LLMs to anything that wouldn't fit in a 20 minute
 YouTube tutorial, and they completely fail.