content: first draft of AI post

This commit is contained in:
Youwen Wu 2024-06-25 02:44:44 -07:00
parent 88d69e0002
commit e8db015769
Signed by: youwen5
GPG key ID: 865658ED1FE61EC3

View file

@ -0,0 +1,120 @@
---
author: "Youwen Wu"
authorTwitter: "@youwen"
desc: "tldr: ai is terrible"
keywords: "ai, llm, chatgpt, copilot"
lang: "en"
title: "the case against AI generated code"
updated: "2024-05-25T12:00:00Z"
---
Many software developers believe that the coming "AI revolution" will end the
software industry and ChatGPT will replace all software developers. While I
cannot predict the future, I am convinced that every developer who genuinely
believes the current state of generative AI can meaningfully supplant the work
of the average software developer has only worked on toy projects. See: projects
like [Devin](https://www.codemotion.com/magazine/ai-ml/is-devin-fake/) (recently
shown to be faked).
No, I am not worried about AI taking my job and trying to discredit it in hopes
of saving the industry. I'm not even surprised by generative AI. I was following
GPT-2 since 2020 and tested the first "Instruct" models by OpenAI in their
playground many months before ChatGPT was released. It's true that these tools
have made a considerable jump in progress in the past 2-3 years, but such is the
nature of pretty much everything in tech. They did not just suddenly become
insanely good. This is the culmination of decades of research and rudimentary
models and the result of us finally having the sheer compute power necessary to
train these models. It's just that the technology has finally become usable, and
ChatGPT made it available to "normies" by making it accessible without calling
an API. There is no evidence to suggest that generative models will continue to
make jumps like they did from GPT-2 to GPT-3.
Of course, generative AI could prove to be a useful tool to developers. Many
developers already say it is. But I recently removed Copilot autocomplete from
my editor because I found it caused more trouble than it was worth and I'd like
reflect on why.
## the "LLM software engineer" grift
The issue with people's perception of AI is that LLMs are fairly capable of
doing a simple task acceptably. People will see ChatGPT solve a Leetcode problem
in Python with the optimal solution and claim programmers have become obsolete.
True, ChatGPT is most likely better than the majority of programmers at Leetcode
as of now. The problem is conflating Leetcode with building real software (also
a mistake made by many software recruiters). Being able to complete multiple
simple well-defined tasks does not translate to being able to construct real
production software. Real code requires knowledge of how to design things to not
suck, and put everything together, which can only be done by something which
actually logically understands what they're doing - which currently does not
include LLMs. If this changes in the future, then perhaps developers really are
doomed. As of now, however, anyone who's tried to make ChatGPT build upon its
own code or iteratively develop larger projects with simple instructions knows
how utterly stupid and useless it is. The key issue lies with how their powerful
abilities simply cannot scale. It really is quite impressive the amount of
"logical understanding" an LLM can simulate in a short conversation. But while a
human engineer exhibits _actual_ logical understanding, the LLM pretends.
You may say that there is no difference between "true" logic and "pretend" logic
if they achieve the same result. Sure. While I think there's still a
philosophical distinction to be made, practically speaking, "pretend logic" is
probably just as useful as "real logic". The problem is LLMs are not good enough
at pretending. A real engineer can build part of their codebase, test it out,
debug, and iterate on the codebase. They (hopefully) retain understanding of the
previous code they wrote and can improve, extend, and build upon it. Meanwhile,
tell an LLM to build a simple API, and they might do it correctly, but then tell
them to build a frontend that interfaces with it properly, and chances are it
will completely fail. You might be able to coax results out with some prompt
engineering, but then tell it to begin extending the API and frontend into a
non-trivial app and it will completely break down, spitting out fake syntax and
subtly yet catastrophically wrong results.
There's a reason why Devin had to fake its results.
## but what about copilot?
Plugins like Copilot or Codeium simply provide advanced autocomplete suggestions
and help complete localized tasks. Didn't I say that AI is good at completing
simple and well-defined tasks?
True, I definitely think tools like Copilot are much more useful than the
"ChatGPT full stack software engineer" pipe dream. But these tools present a
different set of issues. First of all, their usefulness is inversely
proportional to your actual programming skill and mastery of a language. They're
really good as suggesting obvious solutions and idiomatic syntax which you might
not know when first getting to know a programming language. Type the beginning
of a common code snippet or function call and it'll fill in and infer the rest.
But this isn't really that useful if you're actually familiar with your language
and your tools. At best, it provides a marginal speed increase.
## the code just sucks
The biggest downside of these tools is _they simply write terrible code._ You've
seen them write optimal Leetcode solutions, but as I said, Leetcode solutions
are both readily available and nothing like real software. Think about it. The
average software developer is TypeScript Tommy, who dropped out of Udemy to
pursue his dreams of becoming a React boilerplate developer. This is the code
that Copilot and every other LLM trains on. At the end of the day, LLMs are
simply extremely advanced probability machines and the majority of the code
available for them to train with on the internet is just _awful_. When you're
writing code, do you want a mid-tier developer to constantly be placing
distracting suggestions in front of you? Not only does this terminate your flow
of thought as you come up with your own (likely better) code, but you also have
to review what basically amounts to amateur-level code that is sprayed out
across your editor, and often contains subtle errors which leads you to spend
just as much time reviewing documentation or Googling as just implementing
yourself would have taken. This might work for other amateur devs, but shouldn't
the goal be to master your craft? If you're at a skill level where you need to
rely on LLM suggestions to be productive and they're often better than your own
code, you should probably avoid using these tools in the first place and focus
on improving your own skills first.
## it's not all bad?
One concession I will make is that LLM code is great for rapid prototyping where
you don't care at all about code quality and just need something which holds up
for one use and doesn't immediately error out. If you're good at prompting, you
should be able to get an LLM to create shitty prototype code much faster than a
human can. I keep Copilot chat around in my editor for this exact reason. This
is pretty much only useful for toy projects, rudimentary demos, or simple bash
scripting though. Again, bring LLMs to anything that wouldn't fit in a 20 minute
YouTube tutorial, and they completely fail.