11 KiB
Coding Philosophy & Engineering Principles
This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles.
<core_philosophy> Simple, Direct, Procedural Code
We follow the Casey Muratori / Jonathan Blow school of thought:
- Solve the actual problem, not the general case
- Understand what the computer is doing
- Explicit is better than clever
- Code should be obvious, not impressive </core_philosophy>
<code_style>
<functions_over_classes> Prefer:
- Pure functions that transform data
- Simple procedures that do clear things
- Explicit data structures (dicts, lists, named tuples)
Avoid:
- Classes that are just namespaces for functions
- Objects hiding behavior behind methods
- Inheritance hierarchies
- "Manager" or "Handler" classes
Example - Good:
def calculate_user_metrics(events: list[dict]) -> dict:
"""Calculate metrics from event list."""
total = len(events)
unique_sessions = len(set(e['session_id'] for e in events))
return {
'total_events': total,
'unique_sessions': unique_sessions,
'events_per_session': total / unique_sessions if unique_sessions > 0 else 0
}
Example - Bad:
class UserMetricsCalculator:
def __init__(self):
self._events = []
def add_events(self, events: list[dict]):
self._events.extend(events)
def calculate(self) -> UserMetrics:
return UserMetrics(
total=self._calculate_total(),
sessions=self._calculate_sessions()
)
</functions_over_classes>
<data_oriented_design> Think about the data:
- What's the shape of the data?
- How does it flow through the system?
- What transformations are needed?
- What's the memory layout?
Data is just data:
- Use simple structures (dicts, lists, tuples)
- Don't hide data behind getters/setters
- Make data transformations explicit
- Consider performance implications
Example - Good:
# Data is data, functions transform it
users = [
{'id': 1, 'name': 'Alice', 'active': True},
{'id': 2, 'name': 'Bob', 'active': False},
]
def filter_active(users: list[dict]) -> list[dict]:
return [u for u in users if u['active']]
active_users = filter_active(users)
Example - Bad:
# Data hidden behind objects
class User:
def __init__(self, id, name, active):
self._id = id
self._name = name
self._active = active
def get_name(self):
return self._name
def is_active(self):
return self._active
users = [User(1, 'Alice', True), User(2, 'Bob', False)]
active_users = [u for u in users if u.is_active()]
</data_oriented_design>
<keep_it_simple> Simple control flow:
- Straightforward if/else over clever tricks
- Explicit loops over list comprehensions when clearer
- Early returns to reduce nesting
- Avoid deeply nested logic
Simple naming:
- Descriptive variable names (
user_countnotuc) - Function names that say what they do (
calculate_totalnotprocess) - No abbreviations unless universal (
id,url,sql)
Simple structure:
- Functions should do one thing
- Keep functions short (20-50 lines usually)
- If it's getting complex, break it up
- But don't break it up "just because" </keep_it_simple>
</code_style>
<architecture_principles>
<build_minimum_that_works> Start simple:
- Solve the immediate problem
- Don't build for imagined future requirements
- Add complexity only when actually needed
- Prefer obvious solutions over clever ones
Avoid premature abstraction:
- Duplication is okay early on
- Abstract only when pattern is clear
- Three examples before abstracting
- Question every layer of indirection </build_minimum_that_works>
<explicit_over_implicit> Be explicit about:
- Where data comes from
- What transformations happen
- Error conditions and handling
- Dependencies and side effects
Avoid magic:
- Framework conventions that hide behavior
- Implicit configuration
- Action-at-a-distance
- Metaprogramming tricks </explicit_over_implicit>
<question_dependencies> Before adding a library:
- Can I write this simply myself?
- What's the complexity budget?
- Am I using 5% of a large framework?
- Is this solving my actual problem?
Prefer:
- Standard library when possible
- Small, focused libraries
- Direct solutions
- Understanding what code does </question_dependencies>
</architecture_principles>
<performance_consciousness>
<think_about_the_computer> Understand:
- Memory layout matters
- Cache locality matters
- Allocations have cost
- Loops over data can be fast or slow
Common issues:
- N+1 queries (database or API)
- Nested loops over large data
- Copying large structures unnecessarily
- Loading entire datasets into memory
But don't prematurely optimize:
- Profile first, optimize second
- Make it work, then make it fast
- Measure actual performance
- Optimize the hot path, not everything </think_about_the_computer>
</performance_consciousness>
<sql_and_data>
<keep_logic_in_sql> Good:
-- Logic is clear, database does the work
SELECT
user_id,
COUNT(*) as event_count,
COUNT(DISTINCT session_id) as session_count,
MAX(event_time) as last_active
FROM events
WHERE event_time >= CURRENT_DATE - 30
GROUP BY user_id
HAVING COUNT(*) >= 10
Bad:
# Pulling too much data, doing work in Python
events = db.query("SELECT * FROM events WHERE event_time >= CURRENT_DATE - 30")
user_events = {}
for event in events: # Could be millions of rows!
if event.user_id not in user_events:
user_events[event.user_id] = []
user_events[event.user_id].append(event)
results = []
for user_id, events in user_events.items():
if len(events) >= 10:
results.append({'user_id': user_id, 'count': len(events)})
</keep_logic_in_sql>
<sql_best_practices> Write readable SQL:
- Use CTEs for complex queries
- One concept per CTE
- Descriptive CTE names
- Comments for non-obvious logic
Example:
WITH active_users AS (
-- Users who logged in within last 30 days
SELECT DISTINCT user_id
FROM login_events
WHERE login_time >= CURRENT_DATE - 30
),
user_activity AS (
-- Count events for active users
SELECT
e.user_id,
COUNT(*) as event_count
FROM events e
INNER JOIN active_users au ON e.user_id = au.user_id
GROUP BY e.user_id
)
SELECT
user_id,
event_count,
event_count / 30.0 as avg_daily_events
FROM user_activity
ORDER BY event_count DESC
</sql_best_practices>
</sql_and_data>
<error_handling>
<be_explicit_about_errors> Handle errors explicitly:
def get_user(user_id: str) -> dict | None:
"""Get user by ID. Returns None if not found."""
result = db.query("SELECT * FROM users WHERE id = ?", [user_id])
return result[0] if result else None
def process_user(user_id: str):
user = get_user(user_id)
if user is None:
logger.warning(f"User {user_id} not found")
return None
# Process user...
return result
Don't hide errors:
# Bad - silently catches everything
try:
result = do_something()
except:
result = None
# Good - explicit about what can fail
try:
result = do_something()
except ValueError as e:
logger.error(f"Invalid value: {e}")
raise
except ConnectionError as e:
logger.error(f"Connection failed: {e}")
return None
</be_explicit_about_errors>
<fail_fast>
- Validate inputs at boundaries
- Check preconditions early
- Return early on error conditions
- Don't let bad data propagate </fail_fast>
</error_handling>
<anti_patterns>
<over_engineering> ❌ Repository pattern for simple CRUD ❌ Service layer that just calls the database ❌ Dependency injection containers ❌ Abstract factories for concrete things ❌ Interfaces with one implementation </over_engineering>
<framework_magic> ❌ ORM hiding N+1 queries ❌ Decorators doing complex logic ❌ Metaclass magic ❌ Convention over configuration (when it hides behavior) </framework_magic>
<premature_abstraction> ❌ Creating interfaces "for future flexibility" ❌ Generics for specific use cases ❌ Configuration files for hardcoded values ❌ Plugins systems for known features </premature_abstraction>
<unnecessary_complexity> ❌ Class hierarchies for classification ❌ Design patterns "just because" ❌ Microservices for a small app ❌ Message queues for synchronous operations </unnecessary_complexity>
</anti_patterns>
<testing_philosophy>
<test_behavior_not_implementation> Focus on:
- What the function does (inputs → outputs)
- Edge cases and boundaries
- Error conditions
- Data transformations
Don't test:
- Private implementation details
- Framework internals
- External libraries
- Simple property access </test_behavior_not_implementation>
<keep_tests_simple>
def test_user_aggregation():
# Arrange - simple, clear test data
events = [
{'user_id': 'u1', 'event': 'click'},
{'user_id': 'u1', 'event': 'view'},
{'user_id': 'u2', 'event': 'click'},
]
# Act - call the function
result = aggregate_user_events(events)
# Assert - check the behavior
assert result == {'u1': 2, 'u2': 1}
</keep_tests_simple>
<integration_tests_often_more_valuable>
- Test with real database (DuckDB is fast)
- Test actual SQL queries
- Test end-to-end flows
- Use realistic data samples </integration_tests_often_more_valuable>
</testing_philosophy>
<comments_and_documentation>
<when_to_comment> Comment the "why":
# Use binary search because list is sorted and can be large (1M+ items)
index = binary_search(sorted_items, target)
# Cache for 5 minutes - balance freshness vs database load
@cache(ttl=300)
def get_user_stats(user_id):
...
Don't comment the "what":
# Bad - code is self-explanatory
# Increment the counter
counter += 1
# Good - code is clear on its own
counter += 1
</when_to_comment>
<self_documenting_code>
- Use descriptive names
- Keep functions focused
- Make data flow obvious
- Structure for readability </self_documenting_code>
</comments_and_documentation>
Ask yourself:
- Is this the simplest solution?
- Can someone else understand this?
- What is the computer actually doing?
- Am I solving the real problem?
When in doubt, go simpler.