The only AI product I intentionally use is Copilot, which I adopted about a year ago following a friend’s enthusiastic endorsement1. For my non-programmer readers: GitHub Copilot is an LLM-based code autocompletion tool integrated into software used for programming (IDEs), not to be confused with Microsoft’s general Copilot AI (Microsoft owns GitHub). It automatically generates code given existing code, and you can prompt it the same way you’d use ChatGPT and friends (enemies?). Under the hood, Copilot uses OpenAI’s GPT-3 and 4. Disclaimer: I get Copilot for free because I’m a student.
I will contextualize this reflection with: the majority of my programming these days is small models in Python, basically just moving matrices around (many such cases). It’s also, euphemistically, “research code”, so it’s not exactly what code should aspire to be, and I’m not writing robust tests and infrastructure. On the flip side, research means I often need to write things likely uncommon in LLM training data, e.g. a specific complex-valued matrix operation, which provides an opportunity to explore how the model performs on unseen tasks (spoiler: not well). I don’t know how it performs for other purposes and languages, though it’s supposed to be best at Python2.
Some pros:
Very well-integrated into VSCode (my IDE): if you stop typing for a few seconds, it will automatically suggest the rest of the line, or a next line/chunk. You can also use Copilot Chat that lets you interact with your codebase and generate more code via prompting. I haven’t tried Chat yet but it seems like it could be useful
Often good at doing a simple task when you write a comment (prompt) saying what you want to do, like: #
get the min and max values of a
, or# order b using the descending sorted order of a
Generally good at completion if you know exactly what you want to do using some common package (e.g. matplotlib), but don’t know the exact way to call a function. You can Google it easily, but this is often faster. E.g. plotting a joint histogram of two variables
Good at writing documentation/comments for a function you have already written
Good at writing functions similar to what you have written before (that Copilot has indexed)
Cons:
Bad at any mathematical operations beyond basic linear algebra, or just generally trying to do something that hasn’t been done before
Will sometimes make up functions that don’t exist
For more specific/complex tasks, takes much more involved prompting (breaking the task into small pieces, explaining every step), even with a well-known package, e.g. plotting a list of images in a grid formatted in a particular way. In these cases, doing it all myself would probably take a similar amount of time. I suspect I can get better at prompting, but I’d rather just write it myself?
Overall, I like Copilot, and think it’s a net positive for my work. I have a bad memory, and can never remember if I should use dim
or axis
in NumPy. Copilot can just do stuff like that for me! Even if it only saves me a few seconds, it reduces friction by offloading trivial tasks without context-switching. But it’s still just a tool: it’s not magic, does not read my mind, and cannot do anything I don’t already know how to do, e.g. come up with a cleverer/faster way of writing something. Like other tools, it won’t be helpful if you don’t already have a good understanding of the task, and using it well takes practice. There’s nothing to verify correctness; using it naively can waste your time instead of helping you.
Although it’s clear these tools cannot generate novel ideas, they are great pattern recognizers, which is still useful! It seems like there is potential for them to help research in other ways. Terence Tao thinks AI will help mathematicians formalize proofs in the future3. Jordan Ellenberg argues certain types of AI proofs will not actually help our mathematical understanding, but thinks program search via LLMs might offer insight4. Tons of people claim Anthropic’s new Claude 3.5 Sonnet is really good at code generation. I am curious to see where these tools go in the next few years.
The big question remains: in this year of code completion, have I become reliant on Copilot and gotten worse at programming? The answer, I think, is yes? Well, I have gotten lazier for sure. But maybe that’s okay, for these unambiguous menial tasks? I don’t know the answer exactly, but I’ll just say I am glad LLMs weren’t around when I was learning how to code and write.
Lastly, if it weren’t free, I wouldn’t pay for it. Microsoft doesn’t care about me. But if most people also feel this way, then LLM tools are hard to monetize, and it becomes clear why Big Tech is incentivized to hype up AI.
I didn’t intend this blog to be just AI rants, but somehow almost everything has turned into that. Don’t you worry, dear reader: upcoming posts will be about SCIENCE.
Bonus Copilot comments (colored is mine, gray is generated):
Balint: “It’s the best thing ever!”
Me: “But won’t you start relying on it and then you’ll get worse at programming?”
Balint: “Oh absolutely!!”
OpenAI Codex, based on GPT-3, is trained on a lot of Python.
“…in the near term, AI will automate the boring, trivial stuff first.“
Paper: “Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pretrained LLM with a systematic evaluator… In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is.”