Files
beanflows/coding_philosophy.md
2025-11-02 00:26:01 +01:00

11 KiB

Coding Philosophy & Engineering Principles

This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles.

<core_philosophy> Simple, Direct, Procedural Code

We follow the Casey Muratori / Jonathan Blow school of thought:

  • Solve the actual problem, not the general case
  • Understand what the computer is doing
  • Explicit is better than clever
  • Code should be obvious, not impressive </core_philosophy>

<code_style>

<functions_over_classes> Prefer:

  • Pure functions that transform data
  • Simple procedures that do clear things
  • Explicit data structures (dicts, lists, named tuples)

Avoid:

  • Classes that are just namespaces for functions
  • Objects hiding behavior behind methods
  • Inheritance hierarchies
  • "Manager" or "Handler" classes

Example - Good:

def calculate_user_metrics(events: list[dict]) -> dict:
    """Calculate metrics from event list."""
    total = len(events)
    unique_sessions = len(set(e['session_id'] for e in events))
    
    return {
        'total_events': total,
        'unique_sessions': unique_sessions,
        'events_per_session': total / unique_sessions if unique_sessions > 0 else 0
    }

Example - Bad:

class UserMetricsCalculator:
    def __init__(self):
        self._events = []
    
    def add_events(self, events: list[dict]):
        self._events.extend(events)
    
    def calculate(self) -> UserMetrics:
        return UserMetrics(
            total=self._calculate_total(),
            sessions=self._calculate_sessions()
        )

</functions_over_classes>

<data_oriented_design> Think about the data:

  • What's the shape of the data?
  • How does it flow through the system?
  • What transformations are needed?
  • What's the memory layout?

Data is just data:

  • Use simple structures (dicts, lists, tuples)
  • Don't hide data behind getters/setters
  • Make data transformations explicit
  • Consider performance implications

Example - Good:

# Data is data, functions transform it
users = [
    {'id': 1, 'name': 'Alice', 'active': True},
    {'id': 2, 'name': 'Bob', 'active': False},
]

def filter_active(users: list[dict]) -> list[dict]:
    return [u for u in users if u['active']]

active_users = filter_active(users)

Example - Bad:

# Data hidden behind objects
class User:
    def __init__(self, id, name, active):
        self._id = id
        self._name = name
        self._active = active
    
    def get_name(self):
        return self._name
    
    def is_active(self):
        return self._active

users = [User(1, 'Alice', True), User(2, 'Bob', False)]
active_users = [u for u in users if u.is_active()]

</data_oriented_design>

<keep_it_simple> Simple control flow:

  • Straightforward if/else over clever tricks
  • Explicit loops over list comprehensions when clearer
  • Early returns to reduce nesting
  • Avoid deeply nested logic

Simple naming:

  • Descriptive variable names (user_count not uc)
  • Function names that say what they do (calculate_total not process)
  • No abbreviations unless universal (id, url, sql)

Simple structure:

  • Functions should do one thing
  • Keep functions short (20-50 lines usually)
  • If it's getting complex, break it up
  • But don't break it up "just because" </keep_it_simple>

</code_style>

<architecture_principles>

<build_minimum_that_works> Start simple:

  • Solve the immediate problem
  • Don't build for imagined future requirements
  • Add complexity only when actually needed
  • Prefer obvious solutions over clever ones

Avoid premature abstraction:

  • Duplication is okay early on
  • Abstract only when pattern is clear
  • Three examples before abstracting
  • Question every layer of indirection </build_minimum_that_works>

<explicit_over_implicit> Be explicit about:

  • Where data comes from
  • What transformations happen
  • Error conditions and handling
  • Dependencies and side effects

Avoid magic:

  • Framework conventions that hide behavior
  • Implicit configuration
  • Action-at-a-distance
  • Metaprogramming tricks </explicit_over_implicit>

<question_dependencies> Before adding a library:

  • Can I write this simply myself?
  • What's the complexity budget?
  • Am I using 5% of a large framework?
  • Is this solving my actual problem?

Prefer:

  • Standard library when possible
  • Small, focused libraries
  • Direct solutions
  • Understanding what code does </question_dependencies>

</architecture_principles>

<performance_consciousness>

<think_about_the_computer> Understand:

  • Memory layout matters
  • Cache locality matters
  • Allocations have cost
  • Loops over data can be fast or slow

Common issues:

  • N+1 queries (database or API)
  • Nested loops over large data
  • Copying large structures unnecessarily
  • Loading entire datasets into memory

But don't prematurely optimize:

  • Profile first, optimize second
  • Make it work, then make it fast
  • Measure actual performance
  • Optimize the hot path, not everything </think_about_the_computer>

</performance_consciousness>

<sql_and_data>

<keep_logic_in_sql> Good:

-- Logic is clear, database does the work
SELECT
    user_id,
    COUNT(*) as event_count,
    COUNT(DISTINCT session_id) as session_count,
    MAX(event_time) as last_active
FROM events
WHERE event_time >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(*) >= 10

Bad:

# Pulling too much data, doing work in Python
events = db.query("SELECT * FROM events WHERE event_time >= CURRENT_DATE - 30")
user_events = {}
for event in events:  # Could be millions of rows!
    if event.user_id not in user_events:
        user_events[event.user_id] = []
    user_events[event.user_id].append(event)

results = []
for user_id, events in user_events.items():
    if len(events) >= 10:
        results.append({'user_id': user_id, 'count': len(events)})

</keep_logic_in_sql>

<sql_best_practices> Write readable SQL:

  • Use CTEs for complex queries
  • One concept per CTE
  • Descriptive CTE names
  • Comments for non-obvious logic

Example:

WITH active_users AS (
    -- Users who logged in within last 30 days
    SELECT DISTINCT user_id
    FROM login_events
    WHERE login_time >= CURRENT_DATE - 30
),

user_activity AS (
    -- Count events for active users
    SELECT
        e.user_id,
        COUNT(*) as event_count
    FROM events e
    INNER JOIN active_users au ON e.user_id = au.user_id
    GROUP BY e.user_id
)

SELECT
    user_id,
    event_count,
    event_count / 30.0 as avg_daily_events
FROM user_activity
ORDER BY event_count DESC

</sql_best_practices>

</sql_and_data>

<error_handling>

<be_explicit_about_errors> Handle errors explicitly:

def get_user(user_id: str) -> dict | None:
    """Get user by ID. Returns None if not found."""
    result = db.query("SELECT * FROM users WHERE id = ?", [user_id])
    return result[0] if result else None

def process_user(user_id: str):
    user = get_user(user_id)
    if user is None:
        logger.warning(f"User {user_id} not found")
        return None
    
    # Process user...
    return result

Don't hide errors:

# Bad - silently catches everything
try:
    result = do_something()
except:
    result = None

# Good - explicit about what can fail
try:
    result = do_something()
except ValueError as e:
    logger.error(f"Invalid value: {e}")
    raise
except ConnectionError as e:
    logger.error(f"Connection failed: {e}")
    return None

</be_explicit_about_errors>

<fail_fast>

  • Validate inputs at boundaries
  • Check preconditions early
  • Return early on error conditions
  • Don't let bad data propagate </fail_fast>

</error_handling>

<anti_patterns>

<over_engineering> Repository pattern for simple CRUD Service layer that just calls the database Dependency injection containers Abstract factories for concrete things Interfaces with one implementation </over_engineering>

<framework_magic> ORM hiding N+1 queries Decorators doing complex logic Metaclass magic Convention over configuration (when it hides behavior) </framework_magic>

<premature_abstraction> Creating interfaces "for future flexibility" Generics for specific use cases Configuration files for hardcoded values Plugins systems for known features </premature_abstraction>

<unnecessary_complexity> Class hierarchies for classification Design patterns "just because" Microservices for a small app Message queues for synchronous operations </unnecessary_complexity>

</anti_patterns>

<testing_philosophy>

<test_behavior_not_implementation> Focus on:

  • What the function does (inputs → outputs)
  • Edge cases and boundaries
  • Error conditions
  • Data transformations

Don't test:

  • Private implementation details
  • Framework internals
  • External libraries
  • Simple property access </test_behavior_not_implementation>

<keep_tests_simple>

def test_user_aggregation():
    # Arrange - simple, clear test data
    events = [
        {'user_id': 'u1', 'event': 'click'},
        {'user_id': 'u1', 'event': 'view'},
        {'user_id': 'u2', 'event': 'click'},
    ]
    
    # Act - call the function
    result = aggregate_user_events(events)
    
    # Assert - check the behavior
    assert result == {'u1': 2, 'u2': 1}

</keep_tests_simple>

<integration_tests_often_more_valuable>

  • Test with real database (DuckDB is fast)
  • Test actual SQL queries
  • Test end-to-end flows
  • Use realistic data samples </integration_tests_often_more_valuable>

</testing_philosophy>

<comments_and_documentation>

<when_to_comment> Comment the "why":

# Use binary search because list is sorted and can be large (1M+ items)
index = binary_search(sorted_items, target)

# Cache for 5 minutes - balance freshness vs database load
@cache(ttl=300)
def get_user_stats(user_id):
    ...

Don't comment the "what":

# Bad - code is self-explanatory
# Increment the counter
counter += 1

# Good - code is clear on its own
counter += 1

</when_to_comment>

<self_documenting_code>

  • Use descriptive names
  • Keep functions focused
  • Make data flow obvious
  • Structure for readability </self_documenting_code>

</comments_and_documentation>

**Key Principles:** 1. **Simple, direct, procedural** - functions over classes 2. **Data-oriented** - understand the data and its flow 3. **Explicit over implicit** - no magic, no hiding 4. **Build minimum that works** - solve actual problems 5. **Performance conscious** - but measure, don't guess 6. **Keep logic in SQL** - let the database do the work 7. **Handle errors explicitly** - no silent failures 8. **Question abstractions** - every layer needs justification

Ask yourself:

  • Is this the simplest solution?
  • Can someone else understand this?
  • What is the computer actually doing?
  • Am I solving the real problem?

When in doubt, go simpler.