update cicd & philosophy

This commit is contained in:
Deeman
2026-02-18 16:11:56 +01:00
parent 2748c606e9
commit 910424c956
2 changed files with 193 additions and 41 deletions

View File

@@ -2,14 +2,16 @@
This document defines the coding philosophy and engineering principles that guide all agent work. All agents should internalize and follow these principles.
Influenced by Casey Muratori, Jonathan Blow, and [TigerStyle](https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TIGER_STYLE.md) (adapted for Python/SQL).
<core_philosophy>
**Simple, Direct, Procedural Code**
We follow the Casey Muratori / Jonathan Blow school of thought:
- Solve the actual problem, not the general case
- Understand what the computer is doing
- Explicit is better than clever
- Code should be obvious, not impressive
- Do it right the first time — feature gaps are acceptable, but what ships must meet design goals
</core_philosophy>
<code_style>
@@ -32,7 +34,7 @@ def calculate_user_metrics(events: list[dict]) -> dict:
"""Calculate metrics from event list."""
total = len(events)
unique_sessions = len(set(e['session_id'] for e in events))
return {
'total_events': total,
'unique_sessions': unique_sessions,
@@ -45,10 +47,10 @@ def calculate_user_metrics(events: list[dict]) -> dict:
class UserMetricsCalculator:
def __init__(self):
self._events = []
def add_events(self, events: list[dict]):
self._events.extend(events)
def calculate(self) -> UserMetrics:
return UserMetrics(
total=self._calculate_total(),
@@ -92,10 +94,10 @@ class User:
self._id = id
self._name = name
self._active = active
def get_name(self):
return self._name
def is_active(self):
return self._active
@@ -115,14 +117,29 @@ active_users = [u for u in users if u.is_active()]
- Descriptive variable names (`user_count` not `uc`)
- Function names that say what they do (`calculate_total` not `process`)
- No abbreviations unless universal (`id`, `url`, `sql`)
- Include units in names: `timeout_seconds`, `size_bytes`, `latency_ms` — not `timeout`, `size`, `latency`
- Place qualifiers last in descending significance: `latency_ms_max` not `max_latency_ms` (aligns related variables)
**Simple structure:**
- Functions should do one thing
- Keep functions short (20-50 lines usually)
- Keep functions short (20-50 lines, hard limit ~70 — must fit on screen without scrolling)
- If it's getting complex, break it up
- But don't break it up "just because"
</keep_it_simple>
<minimize_variable_scope>
**Declare variables close to where they're used:**
- Don't introduce variables before they're needed
- Remove them when no longer relevant
- Minimize the number of variables in scope at any point
- Reduces probability of stale-state bugs (check something in one place, use it in another)
**Don't duplicate state:**
- One source of truth for each piece of data
- Don't create aliases or copies that can drift out of sync
- If you compute a value, use it directly — don't store it in a variable you'll use 50 lines later
</minimize_variable_scope>
</code_style>
<architecture_principles>
@@ -139,6 +156,11 @@ active_users = [u for u in users if u.is_active()]
- Abstract only when pattern is clear
- Three examples before abstracting
- Question every layer of indirection
**Zero technical debt:**
- Do it right the first time
- A problem solved in design costs less than one solved in implementation, which costs less than one solved in production
- Feature gaps are acceptable; broken or half-baked code is not
</build_minimum_that_works>
<explicit_over_implicit>
@@ -153,8 +175,20 @@ active_users = [u for u in users if u.is_active()]
- Implicit configuration
- Action-at-a-distance
- Metaprogramming tricks
- Relying on library defaults — pass options explicitly at call site
</explicit_over_implicit>
<set_limits_on_everything>
**Nothing should run unbounded:**
- Set max retries on network calls
- Set timeouts on all external requests
- Bound loop iterations where data size is unknown
- Set max page counts on paginated API fetches
- Cap queue/buffer sizes
**Why:** Unbounded operations cause tail latency spikes, resource exhaustion, and silent hangs. A system that fails loudly at a known limit is better than one that degrades mysteriously.
</set_limits_on_everything>
<question_dependencies>
**Before adding a library:**
- Can I write this simply myself?
@@ -185,16 +219,55 @@ active_users = [u for u in users if u.is_active()]
- Nested loops over large data
- Copying large structures unnecessarily
- Loading entire datasets into memory
**But don't prematurely optimize:**
- Profile first, optimize second
- Make it work, then make it fast
- Measure actual performance
- Optimize the hot path, not everything
</think_about_the_computer>
<design_phase_performance>
**Think about performance upfront during design, not just after profiling:**
- The largest wins (100-1000x) happen in the design phase
- Back-of-envelope sketch: estimate load across network, disk, memory, CPU
- Optimize for the slowest resource first (network > disk > memory > CPU)
- Compensate for frequency — a cheap operation called 10M times can dominate
**Batching:**
- Amortize costs via batching (network calls, disk writes, database inserts)
- One batch insert of 1000 rows beats 1000 individual inserts
- Distinguish control plane (rare, can be slow) from data plane (hot path, must be fast)
**But don't prematurely optimize implementation details:**
- Design for performance, then measure before micro-optimizing
- Make it work, then make it fast
- Optimize the hot path, not everything
</design_phase_performance>
</performance_consciousness>
<assertions_and_invariants>
<use_assertions_as_documentation>
**Assert preconditions, postconditions, and invariants — especially in data pipelines:**
```python
def normalize_prices(prices: list[dict], currency: str) -> list[dict]:
assert len(prices) > 0, "prices must not be empty"
assert currency in ("USD", "EUR", "BRL"), f"unsupported currency: {currency}"
result = [convert_price(p, currency) for p in prices]
assert len(result) == len(prices), "normalization must not drop rows"
assert all(r['currency'] == currency for r in result), "all prices must be in target currency"
return result
```
**Guidelines:**
- Assert function arguments and return values at boundaries
- Assert data quality: row counts, non-null columns, expected ranges
- Use assertions to document surprising or critical invariants
- Split compound assertions: `assert a; assert b` not `assert a and b` (clearer error messages)
- Assertions catch programmer errors — they should never be used for expected runtime conditions (use if/else for those)
</use_assertions_as_documentation>
</assertions_and_invariants>
<sql_and_data>
<keep_logic_in_sql>
@@ -281,7 +354,7 @@ def process_user(user_id: str):
if user is None:
logger.warning(f"User {user_id} not found")
return None
# Process user...
return result
```
@@ -311,6 +384,7 @@ except ConnectionError as e:
- Check preconditions early
- Return early on error conditions
- Don't let bad data propagate
- All errors must be handled — 92% of catastrophic system failures come from incorrect handling of non-fatal errors
</fail_fast>
</error_handling>
@@ -318,32 +392,32 @@ except ConnectionError as e:
<anti_patterns>
<over_engineering>
Repository pattern for simple CRUD
Service layer that just calls the database
Dependency injection containers
Abstract factories for concrete things
Interfaces with one implementation
- Repository pattern for simple CRUD
- Service layer that just calls the database
- Dependency injection containers
- Abstract factories for concrete things
- Interfaces with one implementation
</over_engineering>
<framework_magic>
ORM hiding N+1 queries
Decorators doing complex logic
Metaclass magic
Convention over configuration (when it hides behavior)
- ORM hiding N+1 queries
- Decorators doing complex logic
- Metaclass magic
- Convention over configuration (when it hides behavior)
</framework_magic>
<premature_abstraction>
Creating interfaces "for future flexibility"
Generics for specific use cases
Configuration files for hardcoded values
Plugins systems for known features
- Creating interfaces "for future flexibility"
- Generics for specific use cases
- Configuration files for hardcoded values
- Plugins systems for known features
</premature_abstraction>
<unnecessary_complexity>
Class hierarchies for classification
Design patterns "just because"
Microservices for a small app
Message queues for synchronous operations
- Class hierarchies for classification
- Design patterns "just because"
- Microservices for a small app
- Message queues for synchronous operations
</unnecessary_complexity>
</anti_patterns>
@@ -373,15 +447,22 @@ def test_user_aggregation():
{'user_id': 'u1', 'event': 'view'},
{'user_id': 'u2', 'event': 'click'},
]
# Act - call the function
result = aggregate_user_events(events)
# Assert - check the behavior
assert result == {'u1': 2, 'u2': 1}
```
</keep_tests_simple>
<test_both_spaces>
**Test positive and negative space:**
- Test valid inputs produce correct outputs (positive space)
- Test invalid inputs are rejected or handled correctly (negative space)
- For data pipelines: test with realistic data samples AND with malformed/missing data
</test_both_spaces>
<integration_tests_often_more_valuable>
- Test with real database (DuckDB is fast)
- Test actual SQL queries
@@ -414,6 +495,11 @@ counter += 1
# Good - code is clear on its own
counter += 1
```
**Always motivate decisions:**
- Explain why you wrote code the way you did
- Code alone isn't documentation — the reasoning matters
- Comments are well-written prose, not margin scribblings
</when_to_comment>
<self_documenting_code>
@@ -427,20 +513,23 @@ counter += 1
<summary>
**Key Principles:**
1. **Simple, direct, procedural** - functions over classes
2. **Data-oriented** - understand the data and its flow
3. **Explicit over implicit** - no magic, no hiding
4. **Build minimum that works** - solve actual problems
5. **Performance conscious** - but measure, don't guess
6. **Keep logic in SQL** - let the database do the work
7. **Handle errors explicitly** - no silent failures
8. **Question abstractions** - every layer needs justification
1. **Simple, direct, procedural** functions over classes
2. **Data-oriented** understand the data and its flow
3. **Explicit over implicit** no magic, no hiding
4. **Build minimum that works** solve actual problems, zero technical debt
5. **Performance conscious** — design for performance, then measure before micro-optimizing
6. **Keep logic in SQL** let the database do the work
7. **Handle errors explicitly** no silent failures, all errors handled
8. **Assert invariants** — use assertions to document and enforce correctness
9. **Set limits on everything** — nothing runs unbounded
10. **Question abstractions** — every layer needs justification
**Ask yourself:**
- Is this the simplest solution?
- Can someone else understand this?
- What is the computer actually doing?
- Am I solving the real problem?
- What are the bounds on this operation?
When in doubt, go simpler.
</summary>