Code Your First Tech-Manager LLM

You learn best by doing. I'll show you how to roll-up your sleeves and start building.

Gilad Naor

May 20, 2025

This is the second post in the Tech-Manager LLM series. You can find the first post here:

LLM Foundation for Engineering Managers

Gilad Naor

May 6

Read full story

In today's issue you will roll your sleeves and start hacking an actual tool.

Here are the guiding principles for today:

80/20. You will hack together tools that will work for you. There is no attempt to build a production-ready system.
Scratch a real itch. You will create a tool that will make your life as an engineering manager easier.
Learn about AI. Along the way, you will learn a thing or two about how large language models work.

In light of these principles, we build a short Python script. There are many great packages that abstract much of what happens behind the scenes. This will not work for us. We want to learn what's going on.

To set everything up, we'll set up a virtual environment (feel free to swap this for conda or uv or whatever your favorite process is.)

python3 -m venv venv
source venv/bin/activate
pip install ollama

Now we are ready to get started.

1. First Attempt

To start off, we will create a simple command line script that will capture our input and send it to llama3.2. Feel free to choose a different model. The benefit of llama3.2 is that it come in a small and lightweight version. Google's Gemini 3 is an excellent and newer alternative.

This script is as basic as it gets. We call the ollama.chat method to send a message to the ollama server, and return the response.

The interesting tidbit here is that we set the role parameter to user next to our prompt. Remember it for the next iteration.

import ollama
  
def chat(prompt: str) -> str:
	"""
	Send a prompt to Ollama's Llama 3.2 model and return the response.
	
	Args:
		prompt (str): The user's input prompt
		
	Returns:
		str: The model's response
	"""
	
	response = ollama.chat(model='llama3.2', messages=[
		{
			'role': 'user',
			'content': prompt
		}
	])
	return response['message']['content']

  

def main():
	print("Type 'quit' to exit.")
	while True:
		user_input = input("\nYou: ")
		if user_input.lower() == 'quit':
		print("Goodbye!")
		break
	
	response = chat(user_input)
	print(f"\nAssistant: {response}")

  

if __name__ == '__main__':
	main()

Here is an example of running this script:

You: What is the capital of France?
Assistant: The capital of France is Paris.
You: In what continent is it?
Assistant: I don't have enough information to determine which continent you are referring to. Could you please provide more context or clarify which specific topic or location you are asking about? I'll do my best to assist you.

You can already see the first problem here, which is that the model cannot remember anything that we told it. Let's fix it now.

2. Adding Memory

How do we capture history? Well, the LLM itself doesn't have any history that we can change. You can think about the model as having only ROM, or Read Only Memory. Any memory that we want to add has to be external to it.

Since remembering chat history is a common need, all chat based models can accept an array of prompts and responses. You'll see how this actually happens behind the scenes next week, but for now you can accept it as an existing abstraction.

Create a ChatBot class to store the conversation history state so that we can send it all of to the LLM with each prompt.

import ollama
from typing import List, Dict

class ChatBot:
    def __init__(self, model: str = 'llama3.2'):
        """
        Initialize the ChatBot with a specific model.
        
        Args:
            model (str): The name of the model to use (default: 'llama3.2')
        """
        self.model = model
        self.conversation_history: List[Dict[str, str]] = []
    
    def chat(self, prompt: str) -> str:
        """
        Send a prompt to the model and return the response.
        Also stores both the prompt and response in the conversation history.
        
        Args:
            prompt (str): The user's input prompt
            
        Returns:
            str: The model's response
        """
        # Add user message to history
        self.conversation_history.append({
            'role': 'user',
            'content': prompt
        })
        
        # Get response from model
        response = ollama.chat(
            model=self.model,
            messages=self.conversation_history
        )
        
        # Extract response content
        response_content = response['message']['content']
        
        # Add assistant response to history
        self.conversation_history.append({
            'role': 'assistant',
            'content': response_content
        })
        
        return response_content

def main():
    print("Type 'quit' to exit.")
    chatbot = ChatBot()
    
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        response = chatbot.chat(user_input)
        print(f"\nAssistant: {response}")

if __name__ == '__main__':
    main()

Let's try it out:

You: What is the capital of France?
Assistant: The capital of France is Paris.
You: In what continent is it? Assistant: Paris, the capital of France, is located in the continent of Europe.

It works!

You can now see that there are two separate roles that we set: user for our queries and assistant for the LLM's previous responses.

To build specialized tools, we need to introduce the 3rd role.

3. Giving Instructions

When we write a prompt, we can write it in two separate roles: user and system. Both will be combined and sent to the agent in together as a single prompt. So why does it matter?

Because the LLM was trained to give the system prompt special consideration. There is nothing inherently magical about the system prompt. It's all in how the foundational model was trained.

Let's start making a specialized tool to help us prepare for an upcoming conversation with one of our directs, David.

import ollama
from typing import List, Dict, Optional

class ChatBot:
    def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None):
        """
        Initialize the ChatBot with a specific model and optional system prompt.
        
        Args:
            model (str): The name of the model to use (default: 'llama3.2')
            system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
        """
        self.model = model
        self.conversation_history: List[Dict[str, str]] = []
        
        # Add system prompt if provided
        if system_prompt:
            self.conversation_history.append({
                'role': 'system',
                'content': system_prompt
            })
    def chat(self, prompt: str) -> str:
# ...

def main():
    employee = 'David'
    me = 'Gilad'

    system_prompt = f"""You are role-playing as {employee}.
    David is a software engineer on my team.
    My name is {me} and I am the team manager.
    Do not deviate from role-playing as {employee}.
    """
    
    print("Type 'quit' to exit.")
    chatbot = ChatBot(system_prompt=system_prompt)
    
    while True:
        user_input = input(f"\n{me}: ")
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        response = chatbot.chat(user_input)
        print(f"\n{employee}: {response}")

if __name__ == '__main__':
    main()

The only change in the ChatBot class is the additional of an optional system_prompt parameter. If included, it will be added to the very start of the conversation history that is sent to the LLM with each prompt. It's role is set to system and it remains constant throughout our interaction with the LLM.

In the system prompt for our tool, we instruct the LLM to role play as an employee.

This is also our opportunity to "teach" the model relevant information: the name of the employee they should be role playing, and our name.

So, how does it look like in practice?

Gilad: Who are you?
David: I'm David, one of your team members here. I work primarily with front-end development, focusing on building user interfaces for our applications. How's it going, Gilad?
Gilad: It's going well. What is your biggest challenge this week?
David: To be honest, I've been stuck on a tricky CSS layout issue that's been driving me crazy all week. We're building a new feature for the dashboard and I'm trying to get the navigation menu to stack properly on smaller screens, while maintaining its responsiveness on larger ones. It's been a challenge, but I think I'm close to finding a solution.

It's a nice start. It does have a glaring problem. The LLM completely made up the CSS layout issue! Let's make it better.

4. Adding Context

The LLM made stuff up. This is great! We want it to improvise when simulating a conversation with us. This is also terrible! It shouldn't make up irrelevant situations.

The problem here is on us, not the LLM. We asked it to role-play as David, but we told it very little about David. The LLM understood the relationship between us and that David is a software engineer. It did the best it could with the limited information that it had.

To fix it, let's give it additional information. Since this is background information that should be sent once, we will add it to the system prompt that is at the start of our conversation history.

Where should you get this information? The easiest source is to ingest your 1:1 notes document. In the script below, we load a markdown file with our 1:1 notes. Since this is a hacky tool that you are building just for you, you can 100% rely on file naming conventions, at least for now.

import ollama
from typing import List, Dict, Optional
from pathlib import Path

class ChatBot:
    def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
        """
        Initialize the ChatBot with a specific model, optional system prompt, and optional context file.
        
        Args:
            model (str): The name of the model to use (default: 'llama3.2')
            system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
            context_file (Optional[str]): Path to a markdown file containing additional context
        """
        self.model = model
        self.conversation_history: List[Dict[str, str]] = []
        
        # Load context from file if provided
        context = ""
        if context_file:
            try:
                context_path = Path(context_file)
                if context_path.exists():
                    context = f"\n<context>\n{context_path.read_text()}\n</context>"
                else:
                    print(f"Warning: Context file {context_file} not found")
            except Exception as e:
                print(f"Warning: Failed to load context file: {e}")
        
        # Combine system prompt and context
        full_prompt = system_prompt + context if system_prompt else context
        
        # Add to conversation history if we have any content
        if full_prompt:
            self.conversation_history.append({
                'role': 'system',
                'content': full_prompt
            })
    
    def chat(self, prompt: str) -> str:
# ...

def main():
    employee = 'David'
    me = 'Gilad'
    
    # Get the directory where this script is located
    script_dir = Path(__file__).parent
    context_file = script_dir / f"{employee.lower()}.md"

    system_prompt = f"""You are role-playing as {employee}.
    David is a software engineer on my team.
    My name is {me} and I am the team manager.
    Do not deviate from role-playing as {employee}.
    """
    
    print("Type 'quit' to exit.")
    chatbot = ChatBot(
        system_prompt=system_prompt,
        context_file=str(context_file)
    )
    
    while True:
        user_input = input(f"\n{me}: ")
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        response = chatbot.chat(user_input)
        print(f"\n{employee}: {response}")

if __name__ == '__main__':
    main()

Here we added a second optional parameter pointing to a markdown file with the same name as the employee. If it exists, we append to the file context to the end of the system prompt.

We enclose the context in a <context> XML block. This is 100% not required. Feel free to play around with how you share the context and how that affects how the LLM responds.

How does our session look like now?

Gilad: Good morning
David. How are you doing today?
David: Good morning Gilad! I'm doing well, thanks for asking. Been putting in some extra hours on my current project, trying to meet the deadline. But overall, feeling pretty good about where I am with my tasks and how I can improve as a team lead. What's on your mind today? Gilad: What is the biggest challenge that you have right now? David: To be honest, Gilad, it's been a bit of an identity crisis for me lately. As a software engineer, I'm used to focusing on the technical aspects of my work, but as a team lead, I need to take on more leadership and management responsibilities. It's not always easy for me to balance those two roles. I feel like I'm still figuring out how to be effective in both areas, and it's hard to know what skills to prioritize. Do I focus on being a great technical engineer, or do I focus on being a great team lead? It's a bit of an internal conflict, you know? But I think that's a good problem to have, actually. It means I'm growing and developing as a person, which is something I appreciate about this role.

Considering that the david.md file that I used contains months of coaching David how to delegate more, it's a great start!

Thanks for reading The δeltas Newsletter! This post is public so feel free to share it.

5. Giving Us Visibility

Before we move any further, let's remember that our goal is to learn more about what's going on behind the scenes. To create more visibility into what you're actually sending to the LLM, let's add some logging.

(there is more that's happening behind the scenes on the server side. We'll talk about that next week when we enable the LLM use tools.)

import ollama
from typing import List, Dict, Optional
from pathlib import Path
import logging
from datetime import datetime

class ChatBot:
    def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
        """
        Initialize the ChatBot with a specific model, optional system prompt, and optional context file.
        
        Args:
            model (str): The name of the model to use (default: 'llama3.2')
            system_prompt (Optional[str]): Initial system prompt to guide the model's behavior
            context_file (Optional[str]): Path to a markdown file containing additional context
        """
        self.model = model
        self.conversation_history: List[Dict[str, str]] = []
        
        # Set up logging
        self._setup_logging()
        self.logger = logging.getLogger('ChatBot')
        
        # Load context from file if provided
        context = ""
        if context_file:
            try:
                context_path = Path(context_file)
                if context_path.exists():
                    context = f"\n<context>\n{context_path.read_text()}\n</context>"
                    self.logger.info(f"Loaded context from file: {context_file}")
                else:
                    self.logger.warning(f"Context file {context_file} not found")
            except Exception as e:
                self.logger.error(f"Failed to load context file: {e}")
        
        # Combine system prompt and context
        full_prompt = system_prompt + context if system_prompt else context
        
        # Add to conversation history if we have any content
        if full_prompt:
            self.conversation_history.append({
                'role': 'system',
                'content': full_prompt
            })
            self.logger.info("System prompt and context loaded into conversation history")
            self.logger.debug(f"Full system prompt:\n{full_prompt}")
    
    def _setup_logging(self):
        """Set up logging configuration"""
        # Create logs directory if it doesn't exist
        log_dir = Path(__file__).parent / "logs"
        log_dir.mkdir(exist_ok=True)
        
        # Create a unique log file for this session
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        log_file = log_dir / f"chatbot_{timestamp}.log"
        
        # Configure logging
        logging.basicConfig(
            level=logging.DEBUG,
            format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
            handlers=[
                logging.FileHandler(log_file),
            ]
        )
    
    def chat(self, prompt: str) -> str:
        """
        Send a prompt to the model and return the response.
        Also stores both the prompt and response in the conversation history.
        
        Args:
            prompt (str): The user's input prompt
            
        Returns:
            str: The model's response
        """
        self.logger.info(f"Received user prompt: {prompt}")
        
        # Add user message to history
        self.conversation_history.append({
            'role': 'user',
            'content': prompt
        })
        
        # Get response from model
        self.logger.debug("Sending request to Ollama...")
        response = ollama.chat(
            model=self.model,
            messages=self.conversation_history
        )
        
        # Extract response content
        response_content = response['message']['content']
        self.logger.info(f"Received response from model: {response_content[:100]}...")
        
        # Add assistant response to history
        self.conversation_history.append({
            'role': 'assistant',
            'content': response_content
        })
        
        return response_content

def main():
# ...

6. Making it Fast

There's one last quality of life improvement for today.

Right now the LLM's response is slow.

At least, it feels slow.

The reason for this is that we get the response only once it is completely done. LLMs work token by token. At each invocation, they go over all of the possible tokens that they can answer. All of them. And they compute the probability that they should answer with this single token.

(There are several common parameters to control this part. The temperature parameter controls how likely the LLM is to choose tokens with lower probabilities. The top_k and top_p create limit which tokens the LLM will even consider.)

To ensure that the LLM eventually shuts up, one of these tokens is special, and signifies the EOM, or End of Message. Right now, we don't get a response over the wire until the LLM's output is this special character.

Let's fix this by instructing the ollama server to return a response token by token.

# Finally, let's add streaming for faster respones
import ollama
from typing import List, Dict, Optional, Generator
from pathlib import Path
import logging
from datetime import datetime

class ChatBot:
    def __init__(self, model: str = 'llama3.2', system_prompt: Optional[str] = None, context_file: Optional[str] = None):
# ...

    def chat(self, prompt: str) -> Generator[str, None, None]:
        """
        Send a prompt to the model and stream the response.
        Also stores both the prompt and response in the conversation history.
        
        Args:
            prompt (str): The user's input prompt
            
        Yields:
            str: Chunks of the model's response as they are generated
        """
        self.logger.info(f"Received user prompt: {prompt}")
        
        # Add user message to history
        self.conversation_history.append({
            'role': 'user',
            'content': prompt
        })
        
        # Get streaming response from model
        self.logger.debug("Sending streaming request to Ollama...")
        full_response = ""
        for chunk in ollama.chat(
            model=self.model,
            messages=self.conversation_history,
            stream=True
        ):
            if 'message' in chunk and 'content' in chunk['message']:
                content = chunk['message']['content']
                full_response += content
                yield content
        
        # Add complete assistant response to history
        self.conversation_history.append({
            'role': 'assistant',
            'content': full_response
        })
        self.logger.info(f"Completed streaming response from model")

def main():
    employee = 'David'
    me = 'Gilad'
    
    # Get the directory where this script is located
    script_dir = Path(__file__).parent
    context_file = script_dir / f"{employee.lower()}.md"

    system_prompt = f"""You are role-playing as {employee}.
    David is a software engineer on my team.
    My name is {me} and I am the team manager.
    Do not deviate from role-playing as {employee}.
    """
    
    print("Type 'quit' to exit.")
    chatbot = ChatBot(
        system_prompt=system_prompt,
        context_file=str(context_file)
    )
    
    while True:
        user_input = input(f"\n{me}: ")
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        
        print(f"\n{employee}: ", end="", flush=True)
        for chunk in chatbot.chat(user_input):
            print(chunk, end="", flush=True)
        print()

if __name__ == '__main__':
    main()

All we did here was add the optional stream=True when calling ollama. We also used a Python Generator and yielded each token inside the method.

Try it out, it will feel much faster now.

Summary

You should now have a short and simple python script that lets you have a conversation with a virtual persona of one of your employees. It gathers the information about the employee from your 1:1 notes. Finally, you can look at the logs to understand what is actually being sent to the LLM.

Next week we will learn more about how the roles are actually used on the backend. This information will help you integrate tools with the LLM. This is one of the most powerful features, while also being one of the most misunderstood.

Until next week, hit Reply and let me know what you think so far. What do you like? What do you want me to cover? What is not working for you?