Code Your First Tech-Manager LLM
You learn best by doing. I'll show you how to roll-up your sleeves and start building.
This is the second post in the Tech-Manager LLM series. You can find the first post here:
In today's issue you will roll your sleeves and start hacking an actual tool.
Here are the guiding principles for today:
80/20. You will hack together tools that will work for you. There is no attempt to build a production-ready system.
Scratch a real itch. You will create a tool that will make your life as an engineering manager easier.
Learn about AI. Along the way, you will learn a thing or two about how large language models work.
In light of these principles, we build a short Python script. There are many great packages that abstract much of what happens behind the scenes. This will not work for us. We want to learn what's going on.
To set everything up, we'll set up a virtual environment (feel free to swap this for conda
or uv
or whatever your favorite process is.)
python3 -m venv venv
source venv/bin/activate
pip install ollama
Now we are ready to get started.
1. First Attempt
To start off, we will create a simple command line script that will capture our input and send it to llama3.2. Feel free to choose a different model. The benefit of llama3.2 is that it come in a small and lightweight version. Google's Gemini 3 is an excellent and newer alternative.
This script is as basic as it gets. We call the ollama.chat
method to send a message to the ollama server, and return the response.
The interesting tidbit here is that we set the role
parameter to user
next to our prompt. Remember it for the next iteration.
import ollama
def chat(prompt: str) -> str:
"""
Send a prompt to Ollama's Llama 3.2 model and return the response.
Args:
prompt (str): The user's input prompt
Returns:
str: The model's response
"""
response = ollama.chat(model='llama3.2', messages=[
{
'role': 'user',
'content': prompt
}
])
return response['message']['content']
def main():
print("Type 'quit' to exit.")
while True:
user_input = input("\nYou: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
response = chat(user_input)
print(f"\nAssistant: {response}")
if __name__ == '__main__':
main()
Here is an example of running this script:
You: What is the capital of France?
Assistant: The capital of France is Paris.
You: In what continent is it?
Assistant: I don't have enough information to determine which continent you are referring to. Could you please provide more context or clarify which specific topic or location you are asking about? I'll do my best to assist you.
You can already see the first problem here, which is that the model cannot remember anything that we told it. Let's fix it now.
2. Adding Memory
How do we capture history? Well, the LLM itself doesn't have any history that we can change. You can think about the model as having only ROM, or Read Only Memory. Any memory that we want to add has to be external to it.
Since remembering chat history is a common need, all chat based models can accept an array of prompts and responses. You'll see how this actually happens behind the scenes next week, but for now you can accept it as an existing abstraction.
Create a ChatBot
class to store the conversation history state so that we can send it all of to the LLM with each prompt.
import ollama
from typing import List, Dict
class ChatBot:
def __init__(self, model: str = 'llama3.2'):
"""
Initialize the ChatBot with a specific model.
Args:
model (str): The name of the model to use (default: 'llama3.2')
"""
self.model = model
self.conversation_history: List[Dict[str, str]] = []
def chat(self, prompt: str) -> str:
"""
Send a prompt to the model and return the response.
Also stores both the prompt and response in the conversation history.
Args:
prompt (str): The user's input prompt
Returns:
str: The model's response
"""
# Add user message to history
self.conversation_history.append({
'role': 'user',
'content': prompt
})
# Get response from model
response = ollama.chat(
model=self.model,
messages=self.conversation_history
)
# Extract response content
response_content = response['message']['content']
# Add assistant response to history
self.conversation_history.append({
'role': 'assistant',
'content': response_content
})
return response_content
def main():
print("Type 'quit' to exit.")
chatbot = ChatBot()
while True:
user_input = input("\nYou: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
response = chatbot.chat(user_input)
print(f"\nAssistant: {response}")
if __name__ == '__main__':
main()
Let's try it out:
You: What is the capital of France?
Assistant: The capital of France is Paris.
You: In what continent is it? Assistant: Paris, the capital of France, is located in the continent of Europe.
It works!
You can now see that there are two separate role
s that we set: user
for our queries and assistant
for the LLM's previous responses.
To build specialized tools, we need to introduce the 3rd role
.
3. Giving Instructions
When we write a prompt, we can write it in two separate roles: user
and system
. Both will be combined and sent to the agent in together as a single prompt. So why does it matter?
Because the LLM was trained to give the system prompt special consideration. There is nothing inherently magical about the system prompt. It's all in how the foundational model was trained.
Let's start making a specialized tool to help us prepare for an upcoming conversation with one of our directs, David.
import ollama
from typing import List, Dict, Optional
class ChatBot:
def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None):
"""
Initialize the ChatBot with a specific model and optional system prompt.
Args:
model (str): The name of the model to use (default: 'llama3.2')
system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
"""
self.model = model
self.conversation_history: List[Dict[str, str]] = []
# Add system prompt if provided
if system_prompt:
self.conversation_history.append({
'role': 'system',
'content': system_prompt
})
def chat(self, prompt: str) -> str:
# ...
def main():
employee = 'David'
me = 'Gilad'
system_prompt = f"""You are role-playing as {employee}.
David is a software engineer on my team.
My name is {me} and I am the team manager.
Do not deviate from role-playing as {employee}.
"""
print("Type 'quit' to exit.")
chatbot = ChatBot(system_prompt=system_prompt)
while True:
user_input = input(f"\n{me}: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
response = chatbot.chat(user_input)
print(f"\n{employee}: {response}")
if __name__ == '__main__':
main()
The only change in the ChatBot
class is the additional of an optional system_prompt
parameter. If included, it will be added to the very start of the conversation history that is sent to the LLM with each prompt. It's role
is set to system
and it remains constant throughout our interaction with the LLM.
In the system prompt for our tool, we instruct the LLM to role play as an employee.
This is also our opportunity to "teach" the model relevant information: the name of the employee they should be role playing, and our name.
So, how does it look like in practice?
Gilad: Who are you?
David: I'm David, one of your team members here. I work primarily with front-end development, focusing on building user interfaces for our applications. How's it going, Gilad?
Gilad: It's going well. What is your biggest challenge this week?
David: To be honest, I've been stuck on a tricky CSS layout issue that's been driving me crazy all week. We're building a new feature for the dashboard and I'm trying to get the navigation menu to stack properly on smaller screens, while maintaining its responsiveness on larger ones. It's been a challenge, but I think I'm close to finding a solution.
It's a nice start. It does have a glaring problem. The LLM completely made up the CSS layout issue! Let's make it better.
4. Adding Context
The LLM made stuff up. This is great! We want it to improvise when simulating a conversation with us. This is also terrible! It shouldn't make up irrelevant situations.
The problem here is on us, not the LLM. We asked it to role-play as David, but we told it very little about David. The LLM understood the relationship between us and that David is a software engineer. It did the best it could with the limited information that it had.
To fix it, let's give it additional information. Since this is background information that should be sent once, we will add it to the system prompt that is at the start of our conversation history.
Where should you get this information? The easiest source is to ingest your 1:1 notes document. In the script below, we load a markdown file with our 1:1 notes. Since this is a hacky tool that you are building just for you, you can 100% rely on file naming conventions, at least for now.
import ollama
from typing import List, Dict, Optional
from pathlib import Path
class ChatBot:
def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
"""
Initialize the ChatBot with a specific model, optional system prompt, and optional context file.
Args:
model (str): The name of the model to use (default: 'llama3.2')
system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
context_file (Optional[str]): Path to a markdown file containing additional context
"""
self.model = model
self.conversation_history: List[Dict[str, str]] = []
# Load context from file if provided
context = ""
if context_file:
try:
context_path = Path(context_file)
if context_path.exists():
context = f"\n<context>\n{context_path.read_text()}\n</context>"
else:
print(f"Warning: Context file {context_file} not found")
except Exception as e:
print(f"Warning: Failed to load context file: {e}")
# Combine system prompt and context
full_prompt = system_prompt + context if system_prompt else context
# Add to conversation history if we have any content
if full_prompt:
self.conversation_history.append({
'role': 'system',
'content': full_prompt
})
def chat(self, prompt: str) -> str:
# ...
def main():
employee = 'David'
me = 'Gilad'
# Get the directory where this script is located
script_dir = Path(__file__).parent
context_file = script_dir / f"{employee.lower()}.md"
system_prompt = f"""You are role-playing as {employee}.
David is a software engineer on my team.
My name is {me} and I am the team manager.
Do not deviate from role-playing as {employee}.
"""
print("Type 'quit' to exit.")
chatbot = ChatBot(
system_prompt=system_prompt,
context_file=str(context_file)
)
while True:
user_input = input(f"\n{me}: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
response = chatbot.chat(user_input)
print(f"\n{employee}: {response}")
if __name__ == '__main__':
main()
Here we added a second optional parameter pointing to a markdown file with the same name as the employee. If it exists, we append to the file context to the end of the system prompt.
We enclose the context in a <context>
XML block. This is 100% not required. Feel free to play around with how you share the context and how that affects how the LLM responds.
How does our session look like now?
Gilad: Good morning
David. How are you doing today?
David: Good morning Gilad! I'm doing well, thanks for asking. Been putting in some extra hours on my current project, trying to meet the deadline. But overall, feeling pretty good about where I am with my tasks and how I can improve as a team lead. What's on your mind today? Gilad: What is the biggest challenge that you have right now? David: To be honest, Gilad, it's been a bit of an identity crisis for me lately. As a software engineer, I'm used to focusing on the technical aspects of my work, but as a team lead, I need to take on more leadership and management responsibilities. It's not always easy for me to balance those two roles. I feel like I'm still figuring out how to be effective in both areas, and it's hard to know what skills to prioritize. Do I focus on being a great technical engineer, or do I focus on being a great team lead? It's a bit of an internal conflict, you know? But I think that's a good problem to have, actually. It means I'm growing and developing as a person, which is something I appreciate about this role.
Considering that the david.md
file that I used contains months of coaching David how to delegate more, it's a great start!
5. Giving Us Visibility
Before we move any further, let's remember that our goal is to learn more about what's going on behind the scenes. To create more visibility into what you're actually sending to the LLM, let's add some logging.
(there is more that's happening behind the scenes on the server side. We'll talk about that next week when we enable the LLM use tools.)
import ollama
from typing import List, Dict, Optional
from pathlib import Path
import logging
from datetime import datetime
class ChatBot:
def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
"""
Initialize the ChatBot with a specific model, optional system prompt, and optional context file.
Args:
model (str): The name of the model to use (default: 'llama3.2')
system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
context_file (Optional[str]): Path to a markdown file containing additional context
"""
self.model = model
self.conversation_history: List[Dict[str, str]] = []
# Set up logging
self._setup_logging()
self.logger = logging.getLogger('ChatBot')
# Load context from file if provided
context = ""
if context_file:
try:
context_path = Path(context_file)
if context_path.exists():
context = f"\n<context>\n{context_path.read_text()}\n</context>"
self.logger.info(f"Loaded context from file: {context_file}")
else:
self.logger.warning(f"Context file {context_file} not found")
except Exception as e:
self.logger.error(f"Failed to load context file: {e}")
# Combine system prompt and context
full_prompt = system_prompt + context if system_prompt else context
# Add to conversation history if we have any content
if full_prompt:
self.conversation_history.append({
'role': 'system',
'content': full_prompt
})
self.logger.info("System prompt and context loaded into conversation history")
self.logger.debug(f"Full system prompt:\n{full_prompt}")
def _setup_logging(self):
"""Set up logging configuration"""
# Create logs directory if it doesn't exist
log_dir = Path(__file__).parent / "logs"
log_dir.mkdir(exist_ok=True)
# Create a unique log file for this session
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
log_file = log_dir / f"chatbot_{timestamp}.log"
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_file),
]
)
def chat(self, prompt: str) -> str:
"""
Send a prompt to the model and return the response.
Also stores both the prompt and response in the conversation history.
Args:
prompt (str): The user's input prompt
Returns:
str: The model's response
"""
self.logger.info(f"Received user prompt: {prompt}")
# Add user message to history
self.conversation_history.append({
'role': 'user',
'content': prompt
})
# Get response from model
self.logger.debug("Sending request to Ollama...")
response = ollama.chat(
model=self.model,
messages=self.conversation_history
)
# Extract response content
response_content = response['message']['content']
self.logger.info(f"Received response from model: {response_content[:100]}...")
# Add assistant response to history
self.conversation_history.append({
'role': 'assistant',
'content': response_content
})
return response_content
def main():
# ...
6. Making it Fast
There's one last quality of life improvement for today.
Right now the LLM's response is slow.
At least, it feels slow.
The reason for this is that we get the response only once it is completely done. LLMs work token by token. At each invocation, they go over all of the possible tokens that they can answer. All of them. And they compute the probability that they should answer with this single token.
(There are several common parameters to control this part. The temperature
parameter controls how likely the LLM is to choose tokens with lower probabilities. The top_k
and top_p
create limit which tokens the LLM will even consider.)
To ensure that the LLM eventually shuts up, one of these tokens is special, and signifies the EOM, or End of Message. Right now, we don't get a response over the wire until the LLM's output is this special character.
Let's fix this by instructing the ollama server to return a response token by token.
# Finally, let's add streaming for faster respones
import ollama
from typing import List, Dict, Optional, Generator
from pathlib import Path
import logging
from datetime import datetime
class ChatBot:
def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
# ...
def chat(self, prompt: str) -> Generator[str, None, None]:
"""
Send a prompt to the model and stream the response.
Also stores both the prompt and response in the conversation history.
Args:
prompt (str): The user's input prompt
Yields:
str: Chunks of the model's response as they are generated
"""
self.logger.info(f"Received user prompt: {prompt}")
# Add user message to history
self.conversation_history.append({
'role': 'user',
'content': prompt
})
# Get streaming response from model
self.logger.debug("Sending streaming request to Ollama...")
full_response = ""
for chunk in ollama.chat(
model=self.model,
messages=self.conversation_history,
stream=True
):
if 'message' in chunk and 'content' in chunk['message']:
content = chunk['message']['content']
full_response += content
yield content
# Add complete assistant response to history
self.conversation_history.append({
'role': 'assistant',
'content': full_response
})
self.logger.info(f"Completed streaming response from model")
def main():
employee = 'David'
me = 'Gilad'
# Get the directory where this script is located
script_dir = Path(__file__).parent
context_file = script_dir / f"{employee.lower()}.md"
system_prompt = f"""You are role-playing as {employee}.
David is a software engineer on my team.
My name is {me} and I am the team manager.
Do not deviate from role-playing as {employee}.
"""
print("Type 'quit' to exit.")
chatbot = ChatBot(
system_prompt=system_prompt,
context_file=str(context_file)
)
while True:
user_input = input(f"\n{me}: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
print(f"\n{employee}: ", end="", flush=True)
for chunk in chatbot.chat(user_input):
print(chunk, end="", flush=True)
print()
if __name__ == '__main__':
main()
All we did here was add the optional stream=True
when calling ollama. We also used a Python Generator
and yield
ed each token inside the method.
Try it out, it will feel much faster now.
Summary
You should now have a short and simple python script that lets you have a conversation with a virtual persona of one of your employees. It gathers the information about the employee from your 1:1 notes. Finally, you can look at the logs to understand what is actually being sent to the LLM.
Next week we will learn more about how the roles are actually used on the backend. This information will help you integrate tools with the LLM. This is one of the most powerful features, while also being one of the most misunderstood.
Until next week, hit Reply and let me know what you think so far. What do you like? What do you want me to cover? What is not working for you?