mirror of
https://github.com/youwen5/blog.git
synced 2024-11-28 11:23:49 -08:00
content: first draft of AI post
This commit is contained in:
parent
88d69e0002
commit
e8db015769
1 changed files with 120 additions and 0 deletions
120
src/posts/2024-06-25-ai-code.md
Normal file
120
src/posts/2024-06-25-ai-code.md
Normal file
|
@ -0,0 +1,120 @@
|
||||||
|
---
|
||||||
|
author: "Youwen Wu"
|
||||||
|
authorTwitter: "@youwen"
|
||||||
|
desc: "tldr: ai is terrible"
|
||||||
|
keywords: "ai, llm, chatgpt, copilot"
|
||||||
|
lang: "en"
|
||||||
|
title: "the case against AI generated code"
|
||||||
|
updated: "2024-05-25T12:00:00Z"
|
||||||
|
---
|
||||||
|
|
||||||
|
Many software developers believe that the coming "AI revolution" will end the
|
||||||
|
software industry and ChatGPT will replace all software developers. While I
|
||||||
|
cannot predict the future, I am convinced that every developer who genuinely
|
||||||
|
believes the current state of generative AI can meaningfully supplant the work
|
||||||
|
of the average software developer has only worked on toy projects. See: projects
|
||||||
|
like [Devin](https://www.codemotion.com/magazine/ai-ml/is-devin-fake/) (recently
|
||||||
|
shown to be faked).
|
||||||
|
|
||||||
|
No, I am not worried about AI taking my job and trying to discredit it in hopes
|
||||||
|
of saving the industry. I'm not even surprised by generative AI. I was following
|
||||||
|
GPT-2 since 2020 and tested the first "Instruct" models by OpenAI in their
|
||||||
|
playground many months before ChatGPT was released. It's true that these tools
|
||||||
|
have made a considerable jump in progress in the past 2-3 years, but such is the
|
||||||
|
nature of pretty much everything in tech. They did not just suddenly become
|
||||||
|
insanely good. This is the culmination of decades of research and rudimentary
|
||||||
|
models and the result of us finally having the sheer compute power necessary to
|
||||||
|
train these models. It's just that the technology has finally become usable, and
|
||||||
|
ChatGPT made it available to "normies" by making it accessible without calling
|
||||||
|
an API. There is no evidence to suggest that generative models will continue to
|
||||||
|
make jumps like they did from GPT-2 to GPT-3.
|
||||||
|
|
||||||
|
Of course, generative AI could prove to be a useful tool to developers. Many
|
||||||
|
developers already say it is. But I recently removed Copilot autocomplete from
|
||||||
|
my editor because I found it caused more trouble than it was worth and I'd like
|
||||||
|
reflect on why.
|
||||||
|
|
||||||
|
## the "LLM software engineer" grift
|
||||||
|
|
||||||
|
The issue with people's perception of AI is that LLMs are fairly capable of
|
||||||
|
doing a simple task acceptably. People will see ChatGPT solve a Leetcode problem
|
||||||
|
in Python with the optimal solution and claim programmers have become obsolete.
|
||||||
|
True, ChatGPT is most likely better than the majority of programmers at Leetcode
|
||||||
|
as of now. The problem is conflating Leetcode with building real software (also
|
||||||
|
a mistake made by many software recruiters). Being able to complete multiple
|
||||||
|
simple well-defined tasks does not translate to being able to construct real
|
||||||
|
production software. Real code requires knowledge of how to design things to not
|
||||||
|
suck, and put everything together, which can only be done by something which
|
||||||
|
actually logically understands what they're doing - which currently does not
|
||||||
|
include LLMs. If this changes in the future, then perhaps developers really are
|
||||||
|
doomed. As of now, however, anyone who's tried to make ChatGPT build upon its
|
||||||
|
own code or iteratively develop larger projects with simple instructions knows
|
||||||
|
how utterly stupid and useless it is. The key issue lies with how their powerful
|
||||||
|
abilities simply cannot scale. It really is quite impressive the amount of
|
||||||
|
"logical understanding" an LLM can simulate in a short conversation. But while a
|
||||||
|
human engineer exhibits _actual_ logical understanding, the LLM pretends.
|
||||||
|
|
||||||
|
You may say that there is no difference between "true" logic and "pretend" logic
|
||||||
|
if they achieve the same result. Sure. While I think there's still a
|
||||||
|
philosophical distinction to be made, practically speaking, "pretend logic" is
|
||||||
|
probably just as useful as "real logic". The problem is LLMs are not good enough
|
||||||
|
at pretending. A real engineer can build part of their codebase, test it out,
|
||||||
|
debug, and iterate on the codebase. They (hopefully) retain understanding of the
|
||||||
|
previous code they wrote and can improve, extend, and build upon it. Meanwhile,
|
||||||
|
tell an LLM to build a simple API, and they might do it correctly, but then tell
|
||||||
|
them to build a frontend that interfaces with it properly, and chances are it
|
||||||
|
will completely fail. You might be able to coax results out with some prompt
|
||||||
|
engineering, but then tell it to begin extending the API and frontend into a
|
||||||
|
non-trivial app and it will completely break down, spitting out fake syntax and
|
||||||
|
subtly yet catastrophically wrong results.
|
||||||
|
|
||||||
|
There's a reason why Devin had to fake its results.
|
||||||
|
|
||||||
|
## but what about copilot?
|
||||||
|
|
||||||
|
Plugins like Copilot or Codeium simply provide advanced autocomplete suggestions
|
||||||
|
and help complete localized tasks. Didn't I say that AI is good at completing
|
||||||
|
simple and well-defined tasks?
|
||||||
|
|
||||||
|
True, I definitely think tools like Copilot are much more useful than the
|
||||||
|
"ChatGPT full stack software engineer" pipe dream. But these tools present a
|
||||||
|
different set of issues. First of all, their usefulness is inversely
|
||||||
|
proportional to your actual programming skill and mastery of a language. They're
|
||||||
|
really good as suggesting obvious solutions and idiomatic syntax which you might
|
||||||
|
not know when first getting to know a programming language. Type the beginning
|
||||||
|
of a common code snippet or function call and it'll fill in and infer the rest.
|
||||||
|
But this isn't really that useful if you're actually familiar with your language
|
||||||
|
and your tools. At best, it provides a marginal speed increase.
|
||||||
|
|
||||||
|
## the code just sucks
|
||||||
|
|
||||||
|
The biggest downside of these tools is _they simply write terrible code._ You've
|
||||||
|
seen them write optimal Leetcode solutions, but as I said, Leetcode solutions
|
||||||
|
are both readily available and nothing like real software. Think about it. The
|
||||||
|
average software developer is TypeScript Tommy, who dropped out of Udemy to
|
||||||
|
pursue his dreams of becoming a React boilerplate developer. This is the code
|
||||||
|
that Copilot and every other LLM trains on. At the end of the day, LLMs are
|
||||||
|
simply extremely advanced probability machines and the majority of the code
|
||||||
|
available for them to train with on the internet is just _awful_. When you're
|
||||||
|
writing code, do you want a mid-tier developer to constantly be placing
|
||||||
|
distracting suggestions in front of you? Not only does this terminate your flow
|
||||||
|
of thought as you come up with your own (likely better) code, but you also have
|
||||||
|
to review what basically amounts to amateur-level code that is sprayed out
|
||||||
|
across your editor, and often contains subtle errors which leads you to spend
|
||||||
|
just as much time reviewing documentation or Googling as just implementing
|
||||||
|
yourself would have taken. This might work for other amateur devs, but shouldn't
|
||||||
|
the goal be to master your craft? If you're at a skill level where you need to
|
||||||
|
rely on LLM suggestions to be productive and they're often better than your own
|
||||||
|
code, you should probably avoid using these tools in the first place and focus
|
||||||
|
on improving your own skills first.
|
||||||
|
|
||||||
|
## it's not all bad?
|
||||||
|
|
||||||
|
One concession I will make is that LLM code is great for rapid prototyping where
|
||||||
|
you don't care at all about code quality and just need something which holds up
|
||||||
|
for one use and doesn't immediately error out. If you're good at prompting, you
|
||||||
|
should be able to get an LLM to create shitty prototype code much faster than a
|
||||||
|
human can. I keep Copilot chat around in my editor for this exact reason. This
|
||||||
|
is pretty much only useful for toy projects, rudimentary demos, or simple bash
|
||||||
|
scripting though. Again, bring LLMs to anything that wouldn't fit in a 20 minute
|
||||||
|
YouTube tutorial, and they completely fail.
|
Loading…
Reference in a new issue