mirror of
https://github.com/trailofbits/algo.git
synced 2025-09-05 19:43:22 +02:00
Document debugging lessons learned in CLAUDE.md
Added comprehensive debugging guidance based on our troubleshooting session: - VPN connectivity troubleshooting order (DNS first!) - systemd socket activation best practices - Common deployment failures and solutions - Time wasters to avoid (lessons learned the hard way) - Multi-homed system considerations - Testing notes for DigitalOcean These additions will help future debugging sessions avoid the same rabbit holes and focus on the most likely issues first. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
767d615788
commit
9fb0cd1031
1 changed files with 58 additions and 4 deletions
62
CLAUDE.md
62
CLAUDE.md
|
@ -176,19 +176,38 @@ This practice ensures:
|
||||||
- Too many tasks to fix immediately (113+)
|
- Too many tasks to fix immediately (113+)
|
||||||
- Focus on new code having proper names
|
- Focus on new code having proper names
|
||||||
|
|
||||||
|
### 2. dnscrypt-proxy Service Failures
|
||||||
|
**Problem:** "Unit dnscrypt-proxy.socket is masked" or service won't start
|
||||||
|
- The service has `Requires=dnscrypt-proxy.socket` dependency
|
||||||
|
- Masking the socket prevents the service from starting
|
||||||
|
- **Solution:** Configure socket properly instead of fighting it (see systemd section above)
|
||||||
|
|
||||||
### 3. Jinja2 Template Complexity
|
### 3. DNS Not Accessible to VPN Clients
|
||||||
|
**Symptoms:** VPN connects but no internet access
|
||||||
|
- First check: `sudo ss -ulnp | grep :53` on the server
|
||||||
|
- If only showing 127.0.0.53 or 127.0.2.1, socket activation is misconfigured
|
||||||
|
- Check firewall allows VPN subnets: `-A INPUT -s {{ subnets }} -d {{ local_service_ip }}`
|
||||||
|
- **Never** allow DNS from all sources (0.0.0.0/0) - security risk!
|
||||||
|
|
||||||
|
### 4. Multi-homed Systems and NAT
|
||||||
|
**DigitalOcean and other providers with multiple IPs:**
|
||||||
|
- Servers may have both public and private IPs on same interface
|
||||||
|
- MASQUERADE needs output interface: `-o {{ ansible_default_ipv4['interface'] }}`
|
||||||
|
- Don't overengineer with SNAT - MASQUERADE with interface works fine
|
||||||
|
- Use `alternative_ingress_ip` option only when truly needed
|
||||||
|
|
||||||
|
### 5. Jinja2 Template Complexity
|
||||||
- Many templates use Ansible-specific filters
|
- Many templates use Ansible-specific filters
|
||||||
- Test templates with `tests/unit/test_template_rendering.py`
|
- Test templates with `tests/unit/test_template_rendering.py`
|
||||||
- Mock Ansible filters when testing
|
- Mock Ansible filters when testing
|
||||||
|
|
||||||
### 4. OpenSSL Version Compatibility
|
### 6. OpenSSL Version Compatibility
|
||||||
```yaml
|
```yaml
|
||||||
# Check version and use appropriate flags
|
# Check version and use appropriate flags
|
||||||
{{ (openssl_version is version('3', '>=')) | ternary('-legacy', '') }}
|
{{ (openssl_version is version('3', '>=')) | ternary('-legacy', '') }}
|
||||||
```
|
```
|
||||||
|
|
||||||
### 5. IPv6 Endpoint Formatting
|
### 7. IPv6 Endpoint Formatting
|
||||||
- WireGuard configs must bracket IPv6 addresses
|
- WireGuard configs must bracket IPv6 addresses
|
||||||
- Template logic: `{% if ':' in IP %}[{{ IP }}]:{{ port }}{% else %}{{ IP }}:{{ port }}{% endif %}`
|
- Template logic: `{% if ':' in IP %}[{{ IP }}]:{{ port }}{% else %}{{ IP }}:{{ port }}{% endif %}`
|
||||||
|
|
||||||
|
@ -223,9 +242,11 @@ This practice ensures:
|
||||||
Each has specific requirements:
|
Each has specific requirements:
|
||||||
- **AWS**: Requires boto3, specific AMI IDs
|
- **AWS**: Requires boto3, specific AMI IDs
|
||||||
- **Azure**: Complex networking setup
|
- **Azure**: Complex networking setup
|
||||||
- **DigitalOcean**: Simple API, good for testing
|
- **DigitalOcean**: Simple API, good for testing (watch for multiple IPs on eth0)
|
||||||
- **Local**: KVM/Docker for development
|
- **Local**: KVM/Docker for development
|
||||||
|
|
||||||
|
**Testing Note:** DigitalOcean droplets often have both public and private IPs on the same interface, making them excellent test cases for multi-IP scenarios and NAT issues.
|
||||||
|
|
||||||
### Architecture Considerations
|
### Architecture Considerations
|
||||||
- Support both x86_64 and ARM64
|
- Support both x86_64 and ARM64
|
||||||
- Some providers have limited ARM support
|
- Some providers have limited ARM support
|
||||||
|
@ -265,6 +286,15 @@ Each has specific requirements:
|
||||||
- Linter compliance
|
- Linter compliance
|
||||||
- Conservative approach
|
- Conservative approach
|
||||||
|
|
||||||
|
### Time Wasters to Avoid (Lessons Learned)
|
||||||
|
**Don't spend time on these unless absolutely necessary:**
|
||||||
|
1. **Converting MASQUERADE to SNAT** - MASQUERADE works fine for Algo's use case
|
||||||
|
2. **Fighting systemd socket activation** - Configure it properly instead
|
||||||
|
3. **Debugging NAT before checking DNS** - Most "routing" issues are DNS issues
|
||||||
|
4. **Complex IPsec policy matching** - Keep NAT rules simple
|
||||||
|
5. **Testing on existing servers** - Always test on fresh deployments
|
||||||
|
6. **Adding `-m policy --pol none`** - This breaks NAT, don't use it
|
||||||
|
|
||||||
## Working with Algo
|
## Working with Algo
|
||||||
|
|
||||||
### Local Development Setup
|
### Local Development Setup
|
||||||
|
@ -297,6 +327,30 @@ ansible-playbook users.yml -e "server=SERVER_NAME"
|
||||||
3. Check firewall rules
|
3. Check firewall rules
|
||||||
4. Review generated configs in `configs/`
|
4. Review generated configs in `configs/`
|
||||||
|
|
||||||
|
### Troubleshooting VPN Connectivity
|
||||||
|
|
||||||
|
#### "VPN connects but can't route traffic" - Check in this order:
|
||||||
|
1. **DNS first** - `sudo ss -ulnp | grep :53` - Is dnscrypt-proxy listening on VPN IPs?
|
||||||
|
2. **Packet counters** - `sudo iptables -L FORWARD -v -n | grep -E '10.49|10.48'` - Are packets reaching the firewall?
|
||||||
|
3. **NAT counters** - `sudo iptables -t nat -L POSTROUTING -v -n` - Is NAT happening?
|
||||||
|
4. **Service status** - `sudo systemctl status dnscrypt-proxy` - Is the DNS service running?
|
||||||
|
|
||||||
|
**Important:** Most "routing" issues are actually DNS issues. Always check DNS first.
|
||||||
|
|
||||||
|
#### systemd and dnscrypt-proxy
|
||||||
|
- Ubuntu's dnscrypt-proxy package uses socket activation by default
|
||||||
|
- The default socket listens on 127.0.2.1:53, NOT the VPN service IPs
|
||||||
|
- Work WITH systemd, not against it:
|
||||||
|
```yaml
|
||||||
|
# Create socket override at /etc/systemd/system/dnscrypt-proxy.socket.d/override.conf
|
||||||
|
[Socket]
|
||||||
|
ListenStream= # Clear defaults
|
||||||
|
ListenStream=172.x.x.x:53 # Add VPN IP
|
||||||
|
```
|
||||||
|
- Use empty `listen_addresses = []` in dnscrypt-proxy.toml when using socket activation
|
||||||
|
- **Never** use `TriggeredBy=` in systemd units (it's not a valid directive)
|
||||||
|
- Don't mask sockets that services depend on - just disable them
|
||||||
|
|
||||||
## Important Context for LLMs
|
## Important Context for LLMs
|
||||||
|
|
||||||
### What Makes Algo Special
|
### What Makes Algo Special
|
||||||
|
|
Loading…
Add table
Reference in a new issue