Files

Deeman 910424c956 update cicd & philosophy

2026-02-18 16:11:56 +01:00

15 KiB

Raw Blame History

Coding Philosophy & Engineering Principles

This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles.

Influenced by Casey Muratori, Jonathan Blow, and TigerStyle (adapted for Python/SQL).

<core_philosophy> Simple, Direct, Procedural Code

Solve the actual problem, not the general case
Understand what the computer is doing
Explicit is better than clever
Code should be obvious, not impressive
Do it right the first time — feature gaps are acceptable, but what ships must meet design goals </core_philosophy>

<code_style>

<functions_over_classes> Prefer:

Pure functions that transform data
Simple procedures that do clear things
Explicit data structures (dicts, lists, named tuples)

Avoid:

Classes that are just namespaces for functions
Objects hiding behavior behind methods
Inheritance hierarchies
"Manager" or "Handler" classes

Example - Good:

def calculate_user_metrics(events: list[dict]) -> dict:
    """Calculate metrics from event list."""
    total = len(events)
    unique_sessions = len(set(e['session_id'] for e in events))

    return {
        'total_events': total,
        'unique_sessions': unique_sessions,
        'events_per_session': total / unique_sessions if unique_sessions > 0 else 0
    }

Example - Bad:

class UserMetricsCalculator:
    def __init__(self):
        self._events = []

    def add_events(self, events: list[dict]):
        self._events.extend(events)

    def calculate(self) -> UserMetrics:
        return UserMetrics(
            total=self._calculate_total(),
            sessions=self._calculate_sessions()
        )

</functions_over_classes>

<data_oriented_design> Think about the data:

What's the shape of the data?
How does it flow through the system?
What transformations are needed?
What's the memory layout?

Data is just data:

Use simple structures (dicts, lists, tuples)
Don't hide data behind getters/setters
Make data transformations explicit
Consider performance implications

Example - Good:

# Data is data, functions transform it
users = [
    {'id': 1, 'name': 'Alice', 'active': True},
    {'id': 2, 'name': 'Bob', 'active': False},
]

def filter_active(users: list[dict]) -> list[dict]:
    return [u for u in users if u['active']]

active_users = filter_active(users)

Example - Bad:

# Data hidden behind objects
class User:
    def __init__(self, id, name, active):
        self._id = id
        self._name = name
        self._active = active

    def get_name(self):
        return self._name

    def is_active(self):
        return self._active

users = [User(1, 'Alice', True), User(2, 'Bob', False)]
active_users = [u for u in users if u.is_active()]

</data_oriented_design>

<keep_it_simple> Simple control flow:

Straightforward if/else over clever tricks
Explicit loops over list comprehensions when clearer
Early returns to reduce nesting
Avoid deeply nested logic

Simple naming:

Descriptive variable names (user_count not uc)
Function names that say what they do (calculate_total not process)
No abbreviations unless universal (id, url, sql)
Include units in names: timeout_seconds, size_bytes, latency_ms — not timeout, size, latency
Place qualifiers last in descending significance: latency_ms_max not max_latency_ms (aligns related variables)

Simple structure:

Functions should do one thing
Keep functions short (20-50 lines, hard limit ~70 — must fit on screen without scrolling)
If it's getting complex, break it up
But don't break it up "just because" </keep_it_simple>

<minimize_variable_scope> Declare variables close to where they're used:

Don't introduce variables before they're needed
Remove them when no longer relevant
Minimize the number of variables in scope at any point
Reduces probability of stale-state bugs (check something in one place, use it in another)

Don't duplicate state:

One source of truth for each piece of data
Don't create aliases or copies that can drift out of sync
If you compute a value, use it directly — don't store it in a variable you'll use 50 lines later </minimize_variable_scope>

</code_style>

<architecture_principles>

<build_minimum_that_works> Start simple:

Solve the immediate problem
Don't build for imagined future requirements
Add complexity only when actually needed
Prefer obvious solutions over clever ones

Avoid premature abstraction:

Duplication is okay early on
Abstract only when pattern is clear
Three examples before abstracting
Question every layer of indirection

Zero technical debt:

Do it right the first time
A problem solved in design costs less than one solved in implementation, which costs less than one solved in production
Feature gaps are acceptable; broken or half-baked code is not </build_minimum_that_works>

<explicit_over_implicit> Be explicit about:

Where data comes from
What transformations happen
Error conditions and handling
Dependencies and side effects

Avoid magic:

Framework conventions that hide behavior
Implicit configuration
Action-at-a-distance
Metaprogramming tricks
Relying on library defaults — pass options explicitly at call site </explicit_over_implicit>

<set_limits_on_everything> Nothing should run unbounded:

Set max retries on network calls
Set timeouts on all external requests
Bound loop iterations where data size is unknown
Set max page counts on paginated API fetches
Cap queue/buffer sizes

Why: Unbounded operations cause tail latency spikes, resource exhaustion, and silent hangs. A system that fails loudly at a known limit is better than one that degrades mysteriously. </set_limits_on_everything>

<question_dependencies> Before adding a library:

Can I write this simply myself?
What's the complexity budget?
Am I using 5% of a large framework?
Is this solving my actual problem?

Prefer:

Standard library when possible
Small, focused libraries
Direct solutions
Understanding what code does </question_dependencies>

</architecture_principles>

<performance_consciousness>

<think_about_the_computer> Understand:

Memory layout matters
Cache locality matters
Allocations have cost
Loops over data can be fast or slow

Common issues:

N+1 queries (database or API)
Nested loops over large data
Copying large structures unnecessarily
Loading entire datasets into memory </think_about_the_computer>

<design_phase_performance> Think about performance upfront during design, not just after profiling:

The largest wins (100-1000x) happen in the design phase
Back-of-envelope sketch: estimate load across network, disk, memory, CPU
Optimize for the slowest resource first (network > disk > memory > CPU)
Compensate for frequency — a cheap operation called 10M times can dominate

Batching:

Amortize costs via batching (network calls, disk writes, database inserts)
One batch insert of 1000 rows beats 1000 individual inserts
Distinguish control plane (rare, can be slow) from data plane (hot path, must be fast)

But don't prematurely optimize implementation details:

Design for performance, then measure before micro-optimizing
Make it work, then make it fast
Optimize the hot path, not everything </design_phase_performance>

</performance_consciousness>

<assertions_and_invariants>

<use_assertions_as_documentation> Assert preconditions, postconditions, and invariants — especially in data pipelines:

def normalize_prices(prices: list[dict], currency: str) -> list[dict]:
    assert len(prices) > 0, "prices must not be empty"
    assert currency in ("USD", "EUR", "BRL"), f"unsupported currency: {currency}"

    result = [convert_price(p, currency) for p in prices]

    assert len(result) == len(prices), "normalization must not drop rows"
    assert all(r['currency'] == currency for r in result), "all prices must be in target currency"
    return result

Guidelines:

Assert function arguments and return values at boundaries
Assert data quality: row counts, non-null columns, expected ranges
Use assertions to document surprising or critical invariants
Split compound assertions: assert a; assert b not assert a and b (clearer error messages)
Assertions catch programmer errors — they should never be used for expected runtime conditions (use if/else for those) </use_assertions_as_documentation>

</assertions_and_invariants>

<sql_and_data>

<keep_logic_in_sql> Good:

-- Logic is clear, database does the work
SELECT
    user_id,
    COUNT(*) as event_count,
    COUNT(DISTINCT session_id) as session_count,
    MAX(event_time) as last_active
FROM events
WHERE event_time >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(*) >= 10

Bad:

# Pulling too much data, doing work in Python
events = db.query("SELECT * FROM events WHERE event_time >= CURRENT_DATE - 30")
user_events = {}
for event in events:  # Could be millions of rows!
    if event.user_id not in user_events:
        user_events[event.user_id] = []
    user_events[event.user_id].append(event)

results = []
for user_id, events in user_events.items():
    if len(events) >= 10:
        results.append({'user_id': user_id, 'count': len(events)})

</keep_logic_in_sql>

<sql_best_practices> Write readable SQL:

Use CTEs for complex queries
One concept per CTE
Descriptive CTE names
Comments for non-obvious logic

Example:

WITH active_users AS (
    -- Users who logged in within last 30 days
    SELECT DISTINCT user_id
    FROM login_events
    WHERE login_time >= CURRENT_DATE - 30
),

user_activity AS (
    -- Count events for active users
    SELECT
        e.user_id,
        COUNT(*) as event_count
    FROM events e
    INNER JOIN active_users au ON e.user_id = au.user_id
    GROUP BY e.user_id
)

SELECT
    user_id,
    event_count,
    event_count / 30.0 as avg_daily_events
FROM user_activity
ORDER BY event_count DESC

</sql_best_practices>

</sql_and_data>

<error_handling>

<be_explicit_about_errors> Handle errors explicitly:

def get_user(user_id: str) -> dict | None:
    """Get user by ID. Returns None if not found."""
    result = db.query("SELECT * FROM users WHERE id = ?", [user_id])
    return result[0] if result else None

def process_user(user_id: str):
    user = get_user(user_id)
    if user is None:
        logger.warning(f"User {user_id} not found")
        return None

    # Process user...
    return result

Don't hide errors:

# Bad - silently catches everything
try:
    result = do_something()
except:
    result = None

# Good - explicit about what can fail
try:
    result = do_something()
except ValueError as e:
    logger.error(f"Invalid value: {e}")
    raise
except ConnectionError as e:
    logger.error(f"Connection failed: {e}")
    return None

</be_explicit_about_errors>

<fail_fast>

Validate inputs at boundaries
Check preconditions early
Return early on error conditions
Don't let bad data propagate
All errors must be handled — 92% of catastrophic system failures come from incorrect handling of non-fatal errors </fail_fast>

</error_handling>

<anti_patterns>

<over_engineering>

Repository pattern for simple CRUD
Service layer that just calls the database
Dependency injection containers
Abstract factories for concrete things
Interfaces with one implementation </over_engineering>

<framework_magic>

ORM hiding N+1 queries
Decorators doing complex logic
Metaclass magic
Convention over configuration (when it hides behavior) </framework_magic>

<premature_abstraction>

Creating interfaces "for future flexibility"
Generics for specific use cases
Configuration files for hardcoded values
Plugins systems for known features </premature_abstraction>

<unnecessary_complexity>

Class hierarchies for classification
Design patterns "just because"
Microservices for a small app
Message queues for synchronous operations </unnecessary_complexity>

</anti_patterns>

<testing_philosophy>

<test_behavior_not_implementation> Focus on:

What the function does (inputs → outputs)
Edge cases and boundaries
Error conditions
Data transformations

Don't test:

Private implementation details
Framework internals
External libraries
Simple property access </test_behavior_not_implementation>

<keep_tests_simple>

def test_user_aggregation():
    # Arrange - simple, clear test data
    events = [
        {'user_id': 'u1', 'event': 'click'},
        {'user_id': 'u1', 'event': 'view'},
        {'user_id': 'u2', 'event': 'click'},
    ]

    # Act - call the function
    result = aggregate_user_events(events)

    # Assert - check the behavior
    assert result == {'u1': 2, 'u2': 1}

</keep_tests_simple>

<test_both_spaces> Test positive and negative space:

Test valid inputs produce correct outputs (positive space)
Test invalid inputs are rejected or handled correctly (negative space)
For data pipelines: test with realistic data samples AND with malformed/missing data </test_both_spaces>

<integration_tests_often_more_valuable>

Test with real database (DuckDB is fast)
Test actual SQL queries
Test end-to-end flows
Use realistic data samples </integration_tests_often_more_valuable>

</testing_philosophy>

<comments_and_documentation>

<when_to_comment> Comment the "why":

# Use binary search because list is sorted and can be large (1M+ items)
index = binary_search(sorted_items, target)

# Cache for 5 minutes - balance freshness vs database load
@cache(ttl=300)
def get_user_stats(user_id):
    ...

Don't comment the "what":

# Bad - code is self-explanatory
# Increment the counter
counter += 1

# Good - code is clear on its own
counter += 1

Always motivate decisions:

Explain why you wrote code the way you did
Code alone isn't documentation — the reasoning matters
Comments are well-written prose, not margin scribblings </when_to_comment>

<self_documenting_code>

Use descriptive names
Keep functions focused
Make data flow obvious
Structure for readability </self_documenting_code>

</comments_and_documentation>

**Key Principles:** 1. **Simple, direct, procedural** — functions over classes 2. **Data-oriented** — understand the data and its flow 3. **Explicit over implicit** — no magic, no hiding 4. **Build minimum that works** — solve actual problems, zero technical debt 5. **Performance conscious** — design for performance, then measure before micro-optimizing 6. **Keep logic in SQL** — let the database do the work 7. **Handle errors explicitly** — no silent failures, all errors handled 8. **Assert invariants** — use assertions to document and enforce correctness 9. **Set limits on everything** — nothing runs unbounded 10. **Question abstractions** — every layer needs justification

Ask yourself:

Is this the simplest solution?
Can someone else understand this?
What is the computer actually doing?
Am I solving the real problem?
What are the bounds on this operation?

When in doubt, go simpler.

15 KiB Raw Blame History

Coding Philosophy & Engineering Principles

15 KiB

Raw Blame History