LLM Foundation for Engineering Managers
It's overwhelming, I know. Every week there is something new in the news. Another big LLM announcement. Is it relevant to me? Does it even matter?
Every week there is another new Large Language Model (LLM). Another breakthrough. Another barrier broken.
And in the meantime, you are in the trenches. You support the engineers on your team, hold 1:1s, set direction, track progress, align with your partners, write promotion documents, update a Performance Improvement Plan (PIP), make a case for additional headcount, interview for an open position. And that’s just your Monday.
You don’t have time to stay up to date on the latest news.
And why should you?
Like it or not, LLMs are changing how your software engineers write code. It is changing how your Product Manager is experimenting. It is changing how your CEO is thinking about the business.
And it can change how you lead.
You must stay up to date.
In this series, you will learn everything that an engineering manager must know about LLMs. You will learn by doing.
Amazon’s CTO, Dr. Werner Vogels, took the time to learn. He took the time to scratch an itch and build something. Because building is the only way to learn.
If he has the time, you can make the time.
What You Will Build
In this series, you will build your own management personal assistant and coach. It will help you:
Prepare for upcoming 1:1 meetings
Coach you ahead of hard conversations
Keep your organized throughout your day
Wait, Gilad, aren’t you a management coach yourself? Aren’t you afraid of going out of a job?
No, I’m not worried. And I hope that by the end of this series, you’ll understand why.
In this week’s tutorial, you will:
Setup your local environment
Learn what LLMs really are and key concepts to know
Build your first tool
Setup
You need a local setup.
You do not want to send queries with sensitive information over the wire.
It doesn’t really matter what a certain cloud provider’s policy is. Your company’s IT department can access these queries. I can see a case where an internal research team wants to see what type of queries your company uses.
You don’t want them to see that you were thinking about firing Clark or that Lewis has issues at home.
So, let’s get your local environment ready.
Here is an overview of the process:
Install ollama
Download a model
Install a user-interface
Install Ollama
Ollama makes it easy to download, manage, and run LLMs on your machine.
This step is very easy. Just head over to https://ollama.com/download, choose your operating system (Mac, Linux, or Windows) and follow the instructions.
Download a Model
Before we can run an LLM, we need to have a local copy .
Head over to https://ollama.com/search for a list of available models.
Make sure the list is sorted by popularity (it’s the default)
Look at small models. That’s the <number>b tag below the model. Most models have several variants. For example, you can download Google’s gemma3 as either a 1-billion parameter model, a 4-billion, etc.
As of late May 2025, I recommend to start with Google’s gemma3 or Alibaba’s Qwen3. If you have a beefier machine with 24GB of RAM, you can also try out Microsoft’s phi4 model.
The only downside to downloading several models is diskspace. Each model will take up several GBs of storage.
To download the model, open up a terminal and run:
ollama run llama3.2
This will download the model (if it’s not already downloaded) and open a new chat session. Type `/bye` to end the session.
Some useful commands to know:
ollama help
– show the list of available commands
ollama rm model_name
– will delete the model and free up disk-space.
ollama show model_name
– will show more information about the model.
By the end of our series, you will understand what all of these parameters mean.
Install a User-Friendly Interface
As an optional step, you can install a program that will make it easier to interact with the LLM. Ollama by defaults spins up a local server, which makes it easy to integrate with.
I’m on a Mac, so I use Enchanted: https://github.com/gluonfield/enchanted
A cross platform and powerful alternative is Open WebUI: https://github.com/open-webui/open-webui
Everything You Need to Know About LLMs (for now)
Today we will talk about three important properties of LLMs. Each has a surprising and important consequence. We will use all of these properties in the next section when we build the first version of our tool.
They guess the next “word”
They have broad context.
They work one “word” at a time
For now, you can ignore why word is in quotes. To avoid quotes, I will use “token” in the rest of this post. In next week’s post we will cover the difference and why it matters.
They Are Guessing Machines
Large Language Models are guessing machines. They try to guess the next token in the sentence.
And guess here means rolling the dice. It means occasionally going of on weird tangents. That’s what creativity is all about, after all.
You can control this by changing a parameter called “temperature.” A lower value means less creativity. A common default value of 0.7 typically strikes the right balance between creative outputs and complete garbage.
They Have Broad Context
Try to complete this sentence:
“The quick brown fox jumps over the lazy …”
There is a reasonable chance that you would say “dog.” This depends on how much experience you have with typography, where this sentence is a famous pangram. A pangram is a sentence that contains every letter in the English language.
LLMs know about this sentence, because they were trained across most of human knowledge. This is a unique sentence, so the “typography enthusiast” persona is the one answering your question. It’s a fusion of all those blog posts that reference this sentence.
This also means that when you ask a broad question, you don’t know which persona will provide the answer.
When you ask: “I need more storage solutions. What should I do?” you don’t know which persona will answer you. Is it an interior designer? Or a DIY (do-it-yourself) enthusiast? Or an expert carpenter? Maybe a self-storage company owner? Or all of them combined?
You can get a targeted persona by telling the LLM what persona to use. Start your prompt with “You are an X” and you will get a higher quality response.
They Work One “Word” at a Time
An LLM is guessing the next token, one token at a time.
This means that after the first token, the LLM cannot distinguish between your original prompt and its own answer.
And LLMs are trained to be helpful. They want to do what you ask of them.
What happens when they get extra creative midway through their answer? They have no way of knowing if that creativity was part of your prompt or not. In other words, they think that it was you who came up with that creativity. And they will try to be helpful and do what they think you said.
This is one of the reasons that LLMs hallucinate.
You can minimize this risk by asking for shorter responses.
Build Your First Tool
Let’s scratch our own itch.
You have a tough conversation that you need to have.
The more prepared you feel, the more likely you are to have it.
Now that you have a local LLM, you can ask it to help you prepare.
To keep this first lesson short(-ish), we will have a simple prompt:
“You are an engineering management coach. I am an engineering manager. Hep me prepare for an upcoming conversation with an employee that is not meeting expectations. Ask me questions to gather information as needed."
How did it go? Hit reply and let me know!
Until next week,
--Gilad.