Python

What Happens When You Press Run? The Inner Life of a Python Program

Explore the hidden pipeline from keystroke to execution: lexing, parsing, bytecode compilation, and the Python Virtual Machine. Understand why Python behaves the way it does and how to write faster code.

June 2026 · 8 min read · 1 views · 0 hearts

Try in editor Tutorial catalog

What Happens When You Press Run? The Inner Life of a Python Program

You write print("hello"), press Enter, and the text appears. Magic? Not quite. Behind that instant feedback lies a surprisingly elegant chain of transformations — and most Python developers never look under the hood.

Understanding how Python actually interprets and executes your code isn't just academic trivia. It explains why Python "feels" slow sometimes, why global variables are faster than locals (wait, actually the opposite), and how your .py file becomes machine instructions without ever being compiled in the traditional sense.

Let's walk through every step, from keystroke to execution.

Step 1: The Source Goes Through a Lexer (Character by Character)

Before Python can understand your code, it needs to split the stream of characters into meaningful tokens. This is the lexer (or tokenizer).

Your x = 10 + 5 becomes:

NAME 'x'
OP '='
NUMBER '10'
OP '+'
NUMBER '5'

The lexer isn't thinking about what these tokens mean yet. It's pattern-matching: "oh, that's a number," "that's a valid identifier name." Comments and whitespace are discarded, except where indentation matters (that's why Python cares about spaces — it's built into the lexer rules).

Why this matters: If you've ever gotten a SyntaxError on a seemingly correct line, the lexer likely tripped on a non-printable character or an invisible Unicode space that looked like a regular space but wasn't.

Step 2: The Parser Builds an Abstract Syntax Tree (AST)

Now Python has a flat list of tokens. But a program isn't a list — it's a structure. The parser takes those tokens and turns them into a tree.

x = 10 + 5 becomes an AST node like:

Assign
├── target: Name('x')
└── value:
    BinOp
    ├── left: Constant(10)
    ├── op: Add
    └── right: Constant(5)

The parser catches structural mistakes here: unmatched parentheses, missing colons, dangling operators. If you get a SyntaxError, it's usually the parser saying "I can't build a valid tree from these tokens."

You can actually inspect the AST yourself:

import ast
code = "x = 10 + 5"
tree = ast.parse(code)
print(ast.dump(tree, indent=2))

Go ahead, try it. You'll see the exact tree structure Python builds internally.

Step 3: The AST Gets Compiled into Bytecode

This is the step that surprises a lot of people. Python doesn't skip compilation — it compiles your code into bytecode. Not machine code, but something close: a set of instructions for the Python Virtual Machine (PVM).

The AST gets walked recursively, and for each node, the compiler emits bytecode instructions. x = 10 + 5 becomes roughly:

LOAD_CONST  10
LOAD_CONST  5
BINARY_ADD
STORE_NAME  'x'

These are stack-based operations. LOAD_CONST pushes values onto a stack, BINARY_ADD pops two values, adds them, and pushes the result, then STORE_NAME pops the result and stores it in the variable x.

This compiled bytecode is what gets cached in your __pycache__ directory as .pyc files. Next time you run the same script, Python skips the lexing and parsing steps entirely — it loads the pre-compiled bytecode.

Real-world impact: Want to see what your function actually does? Use dis.dis(): python import dis def foo(): return 10 + 5 dis.dis(foo)

Step 4: The Bytecode Runs on the Python Virtual Machine (PVM)

Now the PVM (also called the interpreter loop) takes over. It's a big while loop that:

Fetches the next bytecode instruction
Decodes it (determines what operation and what arguments)
Executes it
Repeats

For LOAD_CONST, the PVM looks up a constant in the function's constant table and pushes it onto the evaluation stack. For BINARY_ADD, it pops two values, calls Python's addition logic (which handles integers, floats, strings, lists, etc. via the __add__ protocol), and pushes the result.

This is why Python is "slower" than C. Every single operation in C becomes maybe 1-2 CPU instructions. In Python, even adding two numbers requires: - A bytecode loop iteration - Stack manipulation - Dynamic type checking - Potentially calling a C function for the actual arithmetic

Step 5: Names Are Resolved at Runtime (Not Compile Time)

Here's a critical distinction from compiled languages: Python resolves variable names when the code runs, not when it's compiled.

When the PVM encounters STORE_NAME 'x', it doesn't know what x is at compile time. It looks up the scope at runtime:

Local scope (function body)
Enclosing function scope (closures)
Global scope (module level)
Built-in scope

This is called the LEGB rule (Local, Enclosing, Global, Built-in). Python searches these scopes in order every single time you reference a variable.

x = 10  # global
def foo():
    y = 20  # local
    # PVM will search local first, then global
    print(y + x)

This runtime name resolution is why local variables are actually faster than global ones — Python doesn't have to search multiple scopes. The compiler optimizes local variable access to use array indices instead of dictionary lookups.

Step 6: Frames Are Created and Stacked

Every time you call a function, Python creates a frame object. The frame holds:

The local namespace (your variables)
The evaluation stack (intermediate values)
The instruction pointer (what bytecode to execute next)
A reference to the code object (compiled bytecode)

These frames live on the call stack. When you call foo(), a new frame is pushed. When foo() calls bar(), another frame is pushed. When bar() returns, its frame is popped, and execution resumes in foo()'s frame.

If you've ever seen a Python traceback, you're looking at a dump of this frame stack — each frame's bytecode position and local variables.

def a():
    return b()
def b():
    raise ValueError("deep")
a()  # Traceback shows both frames

What This Means for Your Code

Understanding this pipeline changes how you think about Python performance:

Avoid global variable access in hot loops — force local scope (or use local_var = global_var at function start).
Function calls are expensive because each requires frame setup. Not "avoid them completely" expensive, but don't put trivial operations into tiny wrapper functions in performance-critical sections.
The import system compiles bytecode once and caches it. Changing a .py file invalidates the cache (based on timestamp), but if you import a module 100 times, the bytecode is loaded from cache 99 times.
List comprehensions are faster than manual for loops partly because they execute entirely in C internals, avoiding the bytecode loop overhead for every iteration.

The Full Chain, Summarized

Step	What Happens	Where Errors Occur
Source Code	Your `.py` text	—
Lexer	Tokenizes characters	Invalid characters, malformed strings
Parser	Builds AST	Syntax errors, indentation errors
Compiler	Generates bytecode	—
PVM	Executes bytecode	Runtime errors (`TypeError`, `NameError`, etc.)

The next time you run a Python script, picture this pipeline: your innocent print("hello") was tokenized, parsed into a tree, compiled into bytecode instructions, and then interpreted by a virtual machine that spent microseconds fetching, decoding, and executing each instruction — all before the text appeared on your screen.

That's not magic. That's engineering.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.