# Coding Philosophy & Engineering Principles This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles. **Simple, Direct, Procedural Code** We follow the Casey Muratori / Jonathan Blow school of thought: - Solve the actual problem, not the general case - Understand what the computer is doing - Explicit is better than clever - Code should be obvious, not impressive **Prefer:** - Pure functions that transform data - Simple procedures that do clear things - Explicit data structures (dicts, lists, named tuples) **Avoid:** - Classes that are just namespaces for functions - Objects hiding behavior behind methods - Inheritance hierarchies - "Manager" or "Handler" classes **Example - Good:** ```python def calculate_user_metrics(events: list[dict]) -> dict: """Calculate metrics from event list.""" total = len(events) unique_sessions = len(set(e['session_id'] for e in events)) return { 'total_events': total, 'unique_sessions': unique_sessions, 'events_per_session': total / unique_sessions if unique_sessions > 0 else 0 } ``` **Example - Bad:** ```python class UserMetricsCalculator: def __init__(self): self._events = [] def add_events(self, events: list[dict]): self._events.extend(events) def calculate(self) -> UserMetrics: return UserMetrics( total=self._calculate_total(), sessions=self._calculate_sessions() ) ``` **Think about the data:** - What's the shape of the data? - How does it flow through the system? - What transformations are needed? - What's the memory layout? **Data is just data:** - Use simple structures (dicts, lists, tuples) - Don't hide data behind getters/setters - Make data transformations explicit - Consider performance implications **Example - Good:** ```python # Data is data, functions transform it users = [ {'id': 1, 'name': 'Alice', 'active': True}, {'id': 2, 'name': 'Bob', 'active': False}, ] def filter_active(users: list[dict]) -> list[dict]: return [u for u in users if u['active']] active_users = filter_active(users) ``` **Example - Bad:** ```python # Data hidden behind objects class User: def __init__(self, id, name, active): self._id = id self._name = name self._active = active def get_name(self): return self._name def is_active(self): return self._active users = [User(1, 'Alice', True), User(2, 'Bob', False)] active_users = [u for u in users if u.is_active()] ``` **Simple control flow:** - Straightforward if/else over clever tricks - Explicit loops over list comprehensions when clearer - Early returns to reduce nesting - Avoid deeply nested logic **Simple naming:** - Descriptive variable names (`user_count` not `uc`) - Function names that say what they do (`calculate_total` not `process`) - No abbreviations unless universal (`id`, `url`, `sql`) **Simple structure:** - Functions should do one thing - Keep functions short (20-50 lines usually) - If it's getting complex, break it up - But don't break it up "just because" **Start simple:** - Solve the immediate problem - Don't build for imagined future requirements - Add complexity only when actually needed - Prefer obvious solutions over clever ones **Avoid premature abstraction:** - Duplication is okay early on - Abstract only when pattern is clear - Three examples before abstracting - Question every layer of indirection **Be explicit about:** - Where data comes from - What transformations happen - Error conditions and handling - Dependencies and side effects **Avoid magic:** - Framework conventions that hide behavior - Implicit configuration - Action-at-a-distance - Metaprogramming tricks **Before adding a library:** - Can I write this simply myself? - What's the complexity budget? - Am I using 5% of a large framework? - Is this solving my actual problem? **Prefer:** - Standard library when possible - Small, focused libraries - Direct solutions - Understanding what code does **Understand:** - Memory layout matters - Cache locality matters - Allocations have cost - Loops over data can be fast or slow **Common issues:** - N+1 queries (database or API) - Nested loops over large data - Copying large structures unnecessarily - Loading entire datasets into memory **But don't prematurely optimize:** - Profile first, optimize second - Make it work, then make it fast - Measure actual performance - Optimize the hot path, not everything **Good:** ```sql -- Logic is clear, database does the work SELECT user_id, COUNT(*) as event_count, COUNT(DISTINCT session_id) as session_count, MAX(event_time) as last_active FROM events WHERE event_time >= CURRENT_DATE - 30 GROUP BY user_id HAVING COUNT(*) >= 10 ``` **Bad:** ```python # Pulling too much data, doing work in Python events = db.query("SELECT * FROM events WHERE event_time >= CURRENT_DATE - 30") user_events = {} for event in events: # Could be millions of rows! if event.user_id not in user_events: user_events[event.user_id] = [] user_events[event.user_id].append(event) results = [] for user_id, events in user_events.items(): if len(events) >= 10: results.append({'user_id': user_id, 'count': len(events)}) ``` **Write readable SQL:** - Use CTEs for complex queries - One concept per CTE - Descriptive CTE names - Comments for non-obvious logic **Example:** ```sql WITH active_users AS ( -- Users who logged in within last 30 days SELECT DISTINCT user_id FROM login_events WHERE login_time >= CURRENT_DATE - 30 ), user_activity AS ( -- Count events for active users SELECT e.user_id, COUNT(*) as event_count FROM events e INNER JOIN active_users au ON e.user_id = au.user_id GROUP BY e.user_id ) SELECT user_id, event_count, event_count / 30.0 as avg_daily_events FROM user_activity ORDER BY event_count DESC ``` **Handle errors explicitly:** ```python def get_user(user_id: str) -> dict | None: """Get user by ID. Returns None if not found.""" result = db.query("SELECT * FROM users WHERE id = ?", [user_id]) return result[0] if result else None def process_user(user_id: str): user = get_user(user_id) if user is None: logger.warning(f"User {user_id} not found") return None # Process user... return result ``` **Don't hide errors:** ```python # Bad - silently catches everything try: result = do_something() except: result = None # Good - explicit about what can fail try: result = do_something() except ValueError as e: logger.error(f"Invalid value: {e}") raise except ConnectionError as e: logger.error(f"Connection failed: {e}") return None ``` - Validate inputs at boundaries - Check preconditions early - Return early on error conditions - Don't let bad data propagate ❌ Repository pattern for simple CRUD ❌ Service layer that just calls the database ❌ Dependency injection containers ❌ Abstract factories for concrete things ❌ Interfaces with one implementation ❌ ORM hiding N+1 queries ❌ Decorators doing complex logic ❌ Metaclass magic ❌ Convention over configuration (when it hides behavior) ❌ Creating interfaces "for future flexibility" ❌ Generics for specific use cases ❌ Configuration files for hardcoded values ❌ Plugins systems for known features ❌ Class hierarchies for classification ❌ Design patterns "just because" ❌ Microservices for a small app ❌ Message queues for synchronous operations **Focus on:** - What the function does (inputs → outputs) - Edge cases and boundaries - Error conditions - Data transformations **Don't test:** - Private implementation details - Framework internals - External libraries - Simple property access ```python def test_user_aggregation(): # Arrange - simple, clear test data events = [ {'user_id': 'u1', 'event': 'click'}, {'user_id': 'u1', 'event': 'view'}, {'user_id': 'u2', 'event': 'click'}, ] # Act - call the function result = aggregate_user_events(events) # Assert - check the behavior assert result == {'u1': 2, 'u2': 1} ``` - Test with real database (DuckDB is fast) - Test actual SQL queries - Test end-to-end flows - Use realistic data samples **Comment the "why":** ```python # Use binary search because list is sorted and can be large (1M+ items) index = binary_search(sorted_items, target) # Cache for 5 minutes - balance freshness vs database load @cache(ttl=300) def get_user_stats(user_id): ... ``` **Don't comment the "what":** ```python # Bad - code is self-explanatory # Increment the counter counter += 1 # Good - code is clear on its own counter += 1 ``` - Use descriptive names - Keep functions focused - Make data flow obvious - Structure for readability **Key Principles:** 1. **Simple, direct, procedural** - functions over classes 2. **Data-oriented** - understand the data and its flow 3. **Explicit over implicit** - no magic, no hiding 4. **Build minimum that works** - solve actual problems 5. **Performance conscious** - but measure, don't guess 6. **Keep logic in SQL** - let the database do the work 7. **Handle errors explicitly** - no silent failures 8. **Question abstractions** - every layer needs justification **Ask yourself:** - Is this the simplest solution? - Can someone else understand this? - What is the computer actually doing? - Am I solving the real problem? When in doubt, go simpler.