cleanup and prefect service setup

2026-02-04 22:24:55 +01:00
parent fc27d5f887
commit 6d4377ccf9
41 changed files with 15888 additions and 2591 deletions
--- a/coding_philosophy.md
+++ b/coding_philosophy.md
@@ -0,0 +1,446 @@
+# Coding Philosophy & Engineering Principles
+
+This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles.
+
+<core_philosophy>
+**Simple, Direct, Procedural Code**
+
+We follow the Casey Muratori / Jonathan Blow school of thought:
+- Solve the actual problem, not the general case
+- Understand what the computer is doing
+- Explicit is better than clever
+- Code should be obvious, not impressive
+</core_philosophy>
+
+<code_style>
+
+<functions_over_classes>
+**Prefer:**
+- Pure functions that transform data
+- Simple procedures that do clear things
+- Explicit data structures (dicts, lists, named tuples)
+
+**Avoid:**
+- Classes that are just namespaces for functions
+- Objects hiding behavior behind methods
+- Inheritance hierarchies
+- "Manager" or "Handler" classes
+
+**Example - Good:**
+```python
+def calculate_user_metrics(events: list[dict]) -> dict:
+    """Calculate metrics from event list."""
+    total = len(events)
+    unique_sessions = len(set(e['session_id'] for e in events))
+    
+    return {
+        'total_events': total,
+        'unique_sessions': unique_sessions,
+        'events_per_session': total / unique_sessions if unique_sessions > 0 else 0
+    }
+```
+
+**Example - Bad:**
+```python
+class UserMetricsCalculator:
+    def __init__(self):
+        self._events = []
+    
+    def add_events(self, events: list[dict]):
+        self._events.extend(events)
+    
+    def calculate(self) -> UserMetrics:
+        return UserMetrics(
+            total=self._calculate_total(),
+            sessions=self._calculate_sessions()
+        )
+```
+</functions_over_classes>
+
+<data_oriented_design>
+**Think about the data:**
+- What's the shape of the data?
+- How does it flow through the system?
+- What transformations are needed?
+- What's the memory layout?
+
+**Data is just data:**
+- Use simple structures (dicts, lists, tuples)
+- Don't hide data behind getters/setters
+- Make data transformations explicit
+- Consider performance implications
+
+**Example - Good:**
+```python
+# Data is data, functions transform it
+users = [
+    {'id': 1, 'name': 'Alice', 'active': True},
+    {'id': 2, 'name': 'Bob', 'active': False},
+]
+
+def filter_active(users: list[dict]) -> list[dict]:
+    return [u for u in users if u['active']]
+
+active_users = filter_active(users)
+```
+
+**Example - Bad:**
+```python
+# Data hidden behind objects
+class User:
+    def __init__(self, id, name, active):
+        self._id = id
+        self._name = name
+        self._active = active
+    
+    def get_name(self):
+        return self._name
+    
+    def is_active(self):
+        return self._active
+
+users = [User(1, 'Alice', True), User(2, 'Bob', False)]
+active_users = [u for u in users if u.is_active()]
+```
+</data_oriented_design>
+
+<keep_it_simple>
+**Simple control flow:**
+- Straightforward if/else over clever tricks
+- Explicit loops over list comprehensions when clearer
+- Early returns to reduce nesting
+- Avoid deeply nested logic
+
+**Simple naming:**
+- Descriptive variable names (`user_count` not `uc`)
+- Function names that say what they do (`calculate_total` not `process`)
+- No abbreviations unless universal (`id`, `url`, `sql`)
+
+**Simple structure:**
+- Functions should do one thing
+- Keep functions short (20-50 lines usually)
+- If it's getting complex, break it up
+- But don't break it up "just because"
+</keep_it_simple>
+
+</code_style>
+
+<architecture_principles>
+
+<build_minimum_that_works>
+**Start simple:**
+- Solve the immediate problem
+- Don't build for imagined future requirements
+- Add complexity only when actually needed
+- Prefer obvious solutions over clever ones
+
+**Avoid premature abstraction:**
+- Duplication is okay early on
+- Abstract only when pattern is clear
+- Three examples before abstracting
+- Question every layer of indirection
+</build_minimum_that_works>
+
+<explicit_over_implicit>
+**Be explicit about:**
+- Where data comes from
+- What transformations happen
+- Error conditions and handling
+- Dependencies and side effects
+
+**Avoid magic:**
+- Framework conventions that hide behavior
+- Implicit configuration
+- Action-at-a-distance
+- Metaprogramming tricks
+</explicit_over_implicit>
+
+<question_dependencies>
+**Before adding a library:**
+- Can I write this simply myself?
+- What's the complexity budget?
+- Am I using 5% of a large framework?
+- Is this solving my actual problem?
+
+**Prefer:**
+- Standard library when possible
+- Small, focused libraries
+- Direct solutions
+- Understanding what code does
+</question_dependencies>
+
+</architecture_principles>
+
+<performance_consciousness>
+
+<think_about_the_computer>
+**Understand:**
+- Memory layout matters
+- Cache locality matters
+- Allocations have cost
+- Loops over data can be fast or slow
+
+**Common issues:**
+- N+1 queries (database or API)
+- Nested loops over large data
+- Copying large structures unnecessarily
+- Loading entire datasets into memory
+
+**But don't prematurely optimize:**
+- Profile first, optimize second
+- Make it work, then make it fast
+- Measure actual performance
+- Optimize the hot path, not everything
+</think_about_the_computer>
+
+</performance_consciousness>
+
+<sql_and_data>
+
+<keep_logic_in_sql>
+**Good:**
+```sql
+-- Logic is clear, database does the work
+SELECT
+    user_id,
+    COUNT(*) as event_count,
+    COUNT(DISTINCT session_id) as session_count,
+    MAX(event_time) as last_active
+FROM events
+WHERE event_time >= CURRENT_DATE - 30
+GROUP BY user_id
+HAVING COUNT(*) >= 10
+```
+
+**Bad:**
+```python
+# Pulling too much data, doing work in Python
+events = db.query("SELECT * FROM events WHERE event_time >= CURRENT_DATE - 30")
+user_events = {}
+for event in events:  # Could be millions of rows!
+    if event.user_id not in user_events:
+        user_events[event.user_id] = []
+    user_events[event.user_id].append(event)
+
+results = []
+for user_id, events in user_events.items():
+    if len(events) >= 10:
+        results.append({'user_id': user_id, 'count': len(events)})
+```
+</keep_logic_in_sql>
+
+<sql_best_practices>
+**Write readable SQL:**
+- Use CTEs for complex queries
+- One concept per CTE
+- Descriptive CTE names
+- Comments for non-obvious logic
+
+**Example:**
+```sql
+WITH active_users AS (
+    -- Users who logged in within last 30 days
+    SELECT DISTINCT user_id
+    FROM login_events
+    WHERE login_time >= CURRENT_DATE - 30
+),
+
+user_activity AS (
+    -- Count events for active users
+    SELECT
+        e.user_id,
+        COUNT(*) as event_count
+    FROM events e
+    INNER JOIN active_users au ON e.user_id = au.user_id
+    GROUP BY e.user_id
+)
+
+SELECT
+    user_id,
+    event_count,
+    event_count / 30.0 as avg_daily_events
+FROM user_activity
+ORDER BY event_count DESC
+```
+</sql_best_practices>
+
+</sql_and_data>
+
+<error_handling>
+
+<be_explicit_about_errors>
+**Handle errors explicitly:**
+```python
+def get_user(user_id: str) -> dict | None:
+    """Get user by ID. Returns None if not found."""
+    result = db.query("SELECT * FROM users WHERE id = ?", [user_id])
+    return result[0] if result else None
+
+def process_user(user_id: str):
+    user = get_user(user_id)
+    if user is None:
+        logger.warning(f"User {user_id} not found")
+        return None
+    
+    # Process user...
+    return result
+```
+
+**Don't hide errors:**
+```python
+# Bad - silently catches everything
+try:
+    result = do_something()
+except:
+    result = None
+
+# Good - explicit about what can fail
+try:
+    result = do_something()
+except ValueError as e:
+    logger.error(f"Invalid value: {e}")
+    raise
+except ConnectionError as e:
+    logger.error(f"Connection failed: {e}")
+    return None
+```
+</be_explicit_about_errors>
+
+<fail_fast>
+- Validate inputs at boundaries
+- Check preconditions early
+- Return early on error conditions
+- Don't let bad data propagate
+</fail_fast>
+
+</error_handling>
+
+<anti_patterns>
+
+<over_engineering>
+❌ Repository pattern for simple CRUD
+❌ Service layer that just calls the database
+❌ Dependency injection containers
+❌ Abstract factories for concrete things
+❌ Interfaces with one implementation
+</over_engineering>
+
+<framework_magic>
+❌ ORM hiding N+1 queries
+❌ Decorators doing complex logic
+❌ Metaclass magic
+❌ Convention over configuration (when it hides behavior)
+</framework_magic>
+
+<premature_abstraction>
+❌ Creating interfaces "for future flexibility"
+❌ Generics for specific use cases
+❌ Configuration files for hardcoded values
+❌ Plugins systems for known features
+</premature_abstraction>
+
+<unnecessary_complexity>
+❌ Class hierarchies for classification
+❌ Design patterns "just because"
+❌ Microservices for a small app
+❌ Message queues for synchronous operations
+</unnecessary_complexity>
+
+</anti_patterns>
+
+<testing_philosophy>
+
+<test_behavior_not_implementation>
+**Focus on:**
+- What the function does (inputs → outputs)
+- Edge cases and boundaries
+- Error conditions
+- Data transformations
+
+**Don't test:**
+- Private implementation details
+- Framework internals
+- External libraries
+- Simple property access
+</test_behavior_not_implementation>
+
+<keep_tests_simple>
+```python
+def test_user_aggregation():
+    # Arrange - simple, clear test data
+    events = [
+        {'user_id': 'u1', 'event': 'click'},
+        {'user_id': 'u1', 'event': 'view'},
+        {'user_id': 'u2', 'event': 'click'},
+    ]
+    
+    # Act - call the function
+    result = aggregate_user_events(events)
+    
+    # Assert - check the behavior
+    assert result == {'u1': 2, 'u2': 1}
+```
+</keep_tests_simple>
+
+<integration_tests_often_more_valuable>
+- Test with real database (DuckDB is fast)
+- Test actual SQL queries
+- Test end-to-end flows
+- Use realistic data samples
+</integration_tests_often_more_valuable>
+
+</testing_philosophy>
+
+<comments_and_documentation>
+
+<when_to_comment>
+**Comment the "why":**
+```python
+# Use binary search because list is sorted and can be large (1M+ items)
+index = binary_search(sorted_items, target)
+
+# Cache for 5 minutes - balance freshness vs database load
+@cache(ttl=300)
+def get_user_stats(user_id):
+    ...
+```
+
+**Don't comment the "what":**
+```python
+# Bad - code is self-explanatory
+# Increment the counter
+counter += 1
+
+# Good - code is clear on its own
+counter += 1
+```
+</when_to_comment>
+
+<self_documenting_code>
+- Use descriptive names
+- Keep functions focused
+- Make data flow obvious
+- Structure for readability
+</self_documenting_code>
+
+</comments_and_documentation>
+
+<summary>
+**Key Principles:**
+1. **Simple, direct, procedural** - functions over classes
+2. **Data-oriented** - understand the data and its flow
+3. **Explicit over implicit** - no magic, no hiding
+4. **Build minimum that works** - solve actual problems
+5. **Performance conscious** - but measure, don't guess
+6. **Keep logic in SQL** - let the database do the work
+7. **Handle errors explicitly** - no silent failures
+8. **Question abstractions** - every layer needs justification
+
+**Ask yourself:**
+- Is this the simplest solution?
+- Can someone else understand this?
+- What is the computer actually doing?
+- Am I solving the real problem?
+
+When in doubt, go simpler.
+</summary>