algo/CLAUDE.md
Dan Guido f668af22d0
Fix VPN routing on multi-homed systems by specifying output interface (#14826)
* Fix VPN routing by adding output interface to NAT rules

The NAT rules were missing the output interface specification (-o eth0),
which caused routing failures on multi-homed systems (servers with multiple
network interfaces). Without specifying the output interface, packets might
not be NAT'd correctly.

Changes:
- Added -o {{ ansible_default_ipv4['interface'] }} to all NAT rules
- Updated both IPv4 and IPv6 templates
- Updated tests to verify output interface is present
- Added ansible_default_ipv4/ipv6 to test fixtures

This fixes the issue where VPN clients could connect but not route traffic
to the internet on servers with multiple network interfaces (like DigitalOcean
droplets with private networking enabled).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix VPN routing by adding output interface to NAT rules

On multi-homed systems (servers with multiple network interfaces or multiple IPs
on one interface), MASQUERADE rules need to specify which interface to use for
NAT. Without the output interface specification, packets may not be routed correctly.

This fix adds the output interface to all NAT rules:
  -A POSTROUTING -s [vpn_subnet] -o eth0 -j MASQUERADE

Changes:
- Modified roles/common/templates/rules.v4.j2 to include output interface
- Modified roles/common/templates/rules.v6.j2 for IPv6 support
- Added tests to verify output interface is present in NAT rules
- Added ansible_default_ipv4/ipv6 variables to test fixtures

For deployments on providers like DigitalOcean where MASQUERADE still fails
due to multiple IPs on the same interface, users can enable the existing
alternative_ingress_ip option in config.cfg to use explicit SNAT.

Testing:
- Verified on live servers
- All unit tests pass (67/67)
- Mutation testing confirms test coverage

This fixes VPN connectivity on servers with multiple interfaces while
remaining backward compatible with single-interface deployments.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix dnscrypt-proxy not listening on VPN service IPs

Problem: dnscrypt-proxy on Ubuntu uses systemd socket activation by default,
which overrides the configured listen_addresses in dnscrypt-proxy.toml.
The socket only listens on 127.0.2.1:53, preventing VPN clients from
resolving DNS queries through the configured service IPs.

Solution: Disable and mask the dnscrypt-proxy.socket unit to allow
dnscrypt-proxy to bind directly to the VPN service IPs specified in
its configuration file.

This fixes DNS resolution for VPN clients on Ubuntu 20.04+ systems.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Apply Python linting and formatting

- Run ruff check --fix to fix linting issues
- Run ruff format to ensure consistent formatting
- All tests still pass after formatting changes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Restrict DNS access to VPN clients only

Security fix: The firewall rule for DNS was accepting traffic from any
source (0.0.0.0/0) to the local DNS resolver. While the service IP is
on the loopback interface (which normally isn't routable externally),
this could be a security risk if misconfigured.

Changed firewall rules to only accept DNS traffic from VPN subnets:
- INPUT rule now includes -s {{ subnets }} to restrict source IPs
- Applied to both IPv4 and IPv6 rules
- Added test to verify DNS is properly restricted

This ensures the DNS resolver is only accessible to connected VPN
clients, not the entire internet.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix dnscrypt-proxy service startup with masked socket

Problem: dnscrypt-proxy.service has a dependency on dnscrypt-proxy.socket
through the TriggeredBy directive. When we mask the socket before starting
the service, systemd fails with "Unit dnscrypt-proxy.socket is masked."

Solution:
1. Override the service to remove socket dependency (TriggeredBy=)
2. Reload systemd daemon immediately after override changes
3. Start the service (which now doesn't require the socket)
4. Only then disable and mask the socket

This ensures dnscrypt-proxy can bind directly to the configured IPs
without socket activation, while preventing the socket from being
re-enabled by package updates.

Changes:
- Added TriggeredBy= override to remove socket dependency
- Added explicit daemon reload after service overrides
- Moved socket masking to after service start in main.yml
- Fixed YAML formatting issues

Testing: Deployment now succeeds with dnscrypt-proxy binding to VPN IPs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix dnscrypt-proxy by not masking the socket

Problem: Masking dnscrypt-proxy.socket prevents the service from starting
because the service has Requires=dnscrypt-proxy.socket dependency.

Solution: Simply stop and disable the socket without masking it. This
prevents socket activation while allowing the service to start and bind
directly to the configured IPs.

Changes:
- Removed socket masking (just disable it)
- Moved socket disabling before service start
- Removed invalid systemd directives from override

Testing: Confirmed dnscrypt-proxy now listens on VPN service IPs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Use systemd socket activation properly for dnscrypt-proxy

Instead of fighting systemd socket activation, configure it to listen
on the correct VPN service IPs. This is more systemd-native and reliable.

Changes:
- Create socket override to listen on VPN IPs instead of localhost
- Clear default listeners and add VPN service IPs
- Use empty listen_addresses in dnscrypt-proxy.toml for socket activation
- Keep socket enabled and let systemd manage the activation
- Add handler for restarting socket when config changes

Benefits:
- Works WITH systemd instead of against it
- Survives package updates better
- No dependency conflicts
- More reliable service management

This approach is cleaner than disabling socket activation entirely and
ensures dnscrypt-proxy is accessible to VPN clients on the correct IPs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Document debugging lessons learned in CLAUDE.md

Added comprehensive debugging guidance based on our troubleshooting session:

- VPN connectivity troubleshooting order (DNS first!)
- systemd socket activation best practices
- Common deployment failures and solutions
- Time wasters to avoid (lessons learned the hard way)
- Multi-homed system considerations
- Testing notes for DigitalOcean

These additions will help future debugging sessions avoid the same
rabbit holes and focus on the most likely issues first.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix DNS resolution for VPN clients by enabling route_localnet

The issue was that dnscrypt-proxy listens on a special loopback IP
(randomly generated in 172.16.0.0/12 range) which wasn't accessible
from VPN clients. This fix:

1. Enables net.ipv4.conf.all.route_localnet sysctl to allow routing
   to loopback IPs from other interfaces
2. Ensures dnscrypt-proxy socket is properly restarted when its
   configuration changes
3. Adds proper handler flushing after socket configuration updates

This allows VPN clients to reach the DNS resolver at the local_service_ip
address configured on the loopback interface.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Improve security by using interface-specific route_localnet

Instead of enabling route_localnet globally (net.ipv4.conf.all.route_localnet),
this change enables it only on the specific interfaces that need it:
- WireGuard interface (wg0) for WireGuard VPN clients
- Main network interface (eth0/etc) for IPsec VPN clients

This minimizes the security impact by restricting loopback routing to only
the VPN interfaces, preventing other interfaces from being able to route
to loopback addresses.

The interface-specific approach provides the same functionality (allowing
VPN clients to reach the DNS resolver on the local_service_ip) while
reducing the potential attack surface.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert to global route_localnet to fix deployment failure

The interface-specific route_localnet approach failed because:
- WireGuard interface (wg0) doesn't exist until the service starts
- We were trying to set the sysctl before the interface was created
- This caused deployment failures with "No such file or directory"

Reverting to the global setting (net.ipv4.conf.all.route_localnet=1) because:
- It always works regardless of interface creation timing
- VPN users are trusted (they have our credentials)
- Firewall rules still restrict access to only port 53
- The security benefit of interface-specific settings is minimal
- The added complexity isn't worth the marginal security improvement

This ensures reliable deployments while maintaining the DNS resolution fix.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix dnscrypt-proxy socket restart and remove problematic BPF hardening

Two important fixes:

1. Fix dnscrypt-proxy socket not restarting with new configuration
   - The socket wasn't properly restarting when its override config changed
   - This caused DNS to listen on wrong IP (127.0.2.1 instead of local_service_ip)
   - Now directly restart the socket when configuration changes
   - Add explicit daemon reload before restarting

2. Remove BPF JIT hardening that causes deployment errors
   - The net.core.bpf_jit_enable sysctl isn't available on all kernels
   - It was causing "Invalid argument" errors during deployment
   - This was optional security hardening with minimal benefit
   - Removing it eliminates deployment errors for most users

These fixes ensure reliable DNS resolution for VPN clients and clean
deployments without error messages.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update CLAUDE.md with comprehensive debugging lessons learned

Based on our extensive debugging session, this update adds critical documentation:

## DNS Architecture and Troubleshooting
- Explained the local_service_ip design and why it requires route_localnet
- Added detailed DNS debugging methodology with exact steps in order
- Documented systemd socket activation complexities and common mistakes
- Added specific commands to verify DNS is working correctly

## Architectural Decisions
- Added new section explaining trade-offs in Algo's design choices
- Documented why local_service_ip uses loopback instead of alternatives
- Explained iptables-legacy vs iptables-nft backend choice

## Enhanced Debugging Guidance
- Expanded troubleshooting with exact commands and expected outputs
- Added warnings about configuration changes that need restarts
- Documented socket activation override requirements in detail
- Added common pitfalls like interface-specific sysctls

## Time Wasters Section
- Added new lessons learned from this debugging session
- Interface-specific route_localnet (fails before interface exists)
- DNAT for loopback addresses (doesn't work)
- BPF JIT hardening (causes errors on many kernels)

This documentation will help future maintainers avoid the same debugging
rabbit holes and understand why things are designed the way they are.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-17 22:12:23 -04:00

18 KiB

CLAUDE.md - LLM Guidance for Algo VPN

This document provides essential context and guidance for LLMs working on the Algo VPN codebase. It captures important learnings, patterns, and best practices discovered through extensive work with this project.

Project Overview

Algo is an Ansible-based tool that sets up a personal VPN in the cloud. It's designed to be:

  • Security-focused: Creates hardened VPN servers with minimal attack surface
  • Easy to use: Automated deployment with sensible defaults
  • Multi-platform: Supports various cloud providers and operating systems
  • Privacy-preserving: No logging, minimal data retention

Core Technologies

  • VPN Protocols: WireGuard (preferred) and IPsec/IKEv2
  • Configuration Management: Ansible (currently v9.x)
  • Languages: Python, YAML, Shell, Jinja2 templates
  • Supported Providers: AWS, Azure, DigitalOcean, GCP, Vultr, Hetzner, local deployment

Architecture and Structure

Directory Layout

algo/
├── main.yml                 # Primary playbook
├── users.yml               # User management playbook
├── server.yml              # Server-specific tasks
├── config.cfg              # Main configuration file
├── pyproject.toml          # Python project configuration and dependencies
├── uv.lock                 # Exact dependency versions lockfile
├── requirements.yml        # Ansible collections
├── roles/                  # Ansible roles
│   ├── common/            # Base system configuration
│   ├── wireguard/         # WireGuard VPN setup
│   ├── strongswan/        # IPsec/IKEv2 setup
│   ├── dns/               # DNS configuration (dnsmasq, dnscrypt)
│   ├── ssh_tunneling/     # SSH tunnel setup
│   └── cloud-*/           # Cloud provider specific roles
├── library/               # Custom Ansible modules
├── playbooks/             # Supporting playbooks
└── tests/                 # Test suite
    └── unit/             # Python unit tests

Key Roles

  • common: Firewall rules, system hardening, package management
  • wireguard: WireGuard server/client configuration
  • strongswan: IPsec server setup with certificate generation
  • dns: DNS encryption and ad blocking
  • cloud-*: Provider-specific instance creation

Critical Dependencies and Version Management

Current Versions (MUST maintain compatibility)

ansible==11.8.0     # Stay current to get latest security, performance and bugfixes
jinja2~=3.1.6      # Security fix for CVE-2025-27516
netaddr==1.3.0     # Network address manipulation

Version Update Guidelines

  1. Be Conservative: Prefer minor version bumps over major ones
  2. Security First: Always prioritize security updates (CVEs)
  3. Test Thoroughly: Run all tests before updating
  4. Document Changes: Explain why each update is necessary

Ansible Collections

Currently unpinned in requirements.yml, but key ones include:

  • community.general
  • ansible.posix
  • openstack.cloud

Development Practices

Code Style and Linting

Python (ruff)

# pyproject.toml configuration
[tool.ruff]
target-version = "py311"
line-length = 120

[tool.ruff.lint]
select = ["E", "W", "F", "I", "B", "C4", "UP"]

YAML (yamllint)

  • Document start markers (---) required
  • No trailing spaces
  • Newline at end of file
  • Quote 'on': in GitHub workflows (truthy value)

Shell Scripts (shellcheck)

  • Quote all variables: "${var}"
  • Use set -euo pipefail for safety

PowerShell Scripts (PSScriptAnalyzer)

  • Use approved verbs (Get-, Set-, New-, etc.)
  • Avoid positional parameters in functions
  • Use proper error handling with try/catch
  • Note: Algo's PowerShell script is a WSL wrapper since Ansible doesn't run natively on Windows

Ansible (ansible-lint)

  • Many warnings are suppressed in .ansible-lint
  • Focus on errors, not warnings
  • Common suppressions: name[missing], risky-file-permissions

Documentation Style

  • Avoid excessive header nesting (prefer 2-3 levels maximum)
  • Don't overuse bold formatting in lists - use sparingly for emphasis only
  • Write flowing paragraphs instead of choppy bullet-heavy sections
  • Keep formatting clean and readable - prefer natural text over visual noise
  • Use numbered lists for procedures, simple bullets for feature lists
  • Example: "Navigate to Network → Interfaces" not "Navigate to NetworkInterfaces"

Git Workflow

  1. Create feature branches from master
  2. Make atomic commits with clear messages
  3. Run all linters before pushing
  4. Update PR description with test results
  5. Squash commits if requested

Testing Requirements

Before pushing any changes:

# Python tests
pytest tests/unit/ -v

# Ansible syntax
ansible-playbook main.yml --syntax-check
ansible-playbook users.yml --syntax-check

# Linters
ansible-lint
yamllint .
ruff check .
shellcheck *.sh

# PowerShell (if available)
pwsh -Command "Invoke-ScriptAnalyzer -Path ./algo.ps1"

Writing Effective Tests - Mutation Testing Approach

When writing tests, always verify that your test actually detects the failure case. This is a form of lightweight mutation testing that ensures tests add real value:

  1. Write the test for the bug/issue you're preventing
  2. Temporarily introduce the bug to verify the test fails
  3. Fix the bug and verify the test passes
  4. Document what specific issue the test prevents

Example from our codebase:

def test_regression_openssl_inline_comments():
    """Tests that we detect inline comments in Jinja2 expressions."""
    # This pattern SHOULD fail (has inline comments)
    problematic = "{{ ['DNS:' + id,  # comment ] }}"
    assert not validate(problematic), "Should detect inline comments"
    
    # This pattern SHOULD pass (no inline comments)  
    fixed = "{{ ['DNS:' + id] }}"
    assert validate(fixed), "Should pass without comments"

This practice ensures:

  • Tests aren't just checking happy paths
  • Tests will actually catch regressions
  • The test's purpose is clear to future maintainers
  • We avoid false confidence from tests that always pass

Common Issues and Solutions

1. Ansible-lint "name[missing]" Warnings

  • Added to skip_list in .ansible-lint
  • Too many tasks to fix immediately (113+)
  • Focus on new code having proper names

2. DNS Architecture and Common Issues

Understanding local_service_ip

  • Algo uses a randomly generated IP in the 172.16.0.0/12 range on the loopback interface
  • This IP (local_service_ip) is where dnscrypt-proxy should listen
  • Requires net.ipv4.conf.all.route_localnet=1 sysctl for VPN clients to reach loopback IPs
  • This is by design for consistency across VPN types (WireGuard + IPsec)

dnscrypt-proxy Service Failures

Problem: "Unit dnscrypt-proxy.socket is masked" or service won't start

  • The service has Requires=dnscrypt-proxy.socket dependency
  • Masking the socket prevents the service from starting
  • Solution: Configure socket properly instead of fighting it

DNS Not Accessible to VPN Clients

Symptoms: VPN connects but no internet/DNS access

  1. First check what's listening: sudo ss -ulnp | grep :53
    • Should show local_service_ip:53 (e.g., 172.24.117.23:53)
    • If showing only 127.0.2.1:53, socket override didn't apply
  2. Check socket status: systemctl status dnscrypt-proxy.socket
    • Look for "configuration has changed while running" - needs restart
  3. Verify route_localnet: sysctl net.ipv4.conf.all.route_localnet
    • Must be 1 for VPN clients to reach loopback IPs
  4. Check firewall: Ensure allows VPN subnets: -A INPUT -s {{ subnets }} -d {{ local_service_ip }}
    • Never allow DNS from all sources (0.0.0.0/0) - security risk!

3. Multi-homed Systems and NAT

DigitalOcean and other providers with multiple IPs:

  • Servers may have both public and private IPs on same interface
  • MASQUERADE needs output interface: -o {{ ansible_default_ipv4['interface'] }}
  • Don't overengineer with SNAT - MASQUERADE with interface works fine
  • Use alternative_ingress_ip option only when truly needed

4. iptables Backend Changes (nft vs legacy)

Critical: Switching between iptables-nft and iptables-legacy can break subtle behaviors

  • Ubuntu 22.04+ defaults to iptables-nft which may have implicit NAT behaviors
  • Algo forces iptables-legacy for consistent rule ordering
  • This switch can break DNS routing that "just worked" before
  • Always test thoroughly after backend changes

5. systemd Socket Activation Gotchas

  • Interface-specific sysctls (e.g., net.ipv4.conf.wg0.route_localnet) fail if interface doesn't exist yet
  • WireGuard interface only created when service starts
  • Use global sysctls or apply settings after service start
  • Socket configuration changes require explicit restart (not just reload)

6. Jinja2 Template Complexity

  • Many templates use Ansible-specific filters
  • Test templates with tests/unit/test_template_rendering.py
  • Mock Ansible filters when testing

7. OpenSSL Version Compatibility

# Check version and use appropriate flags
{{ (openssl_version is version('3', '>=')) | ternary('-legacy', '') }}

8. IPv6 Endpoint Formatting

  • WireGuard configs must bracket IPv6 addresses
  • Template logic: {% if ':' in IP %}[{{ IP }}]:{{ port }}{% else %}{{ IP }}:{{ port }}{% endif %}

Security Considerations

Always Priority One

  • Never expose secrets: No passwords/keys in commits
  • CVE Response: Update immediately when security issues found
  • Least Privilege: Minimal permissions, dropped capabilities
  • Secure Defaults: Strong crypto, no logging, firewall rules

Certificate Management

  • Elliptic curve cryptography (secp384r1)
  • Proper CA password handling
  • Certificate revocation support
  • Secure storage in /etc/ipsec.d/

Network Security

  • Strict firewall rules (iptables/ip6tables)
  • No IP forwarding except for VPN
  • DNS leak protection
  • Kill switch implementation

Platform Support

Operating Systems

  • Primary: Ubuntu 20.04/22.04 LTS
  • Secondary: Debian 11/12
  • Clients: Windows, macOS, iOS, Android, Linux

Cloud Providers

Each has specific requirements:

  • AWS: Requires boto3, specific AMI IDs
  • Azure: Complex networking setup
  • DigitalOcean: Simple API, good for testing (watch for multiple IPs on eth0)
  • Local: KVM/Docker for development

Testing Note: DigitalOcean droplets often have both public and private IPs on the same interface, making them excellent test cases for multi-IP scenarios and NAT issues.

Architecture Considerations

  • Support both x86_64 and ARM64
  • Some providers have limited ARM support
  • Performance varies by instance type

CI/CD Pipeline

GitHub Actions Workflows

  1. lint.yml: Runs ansible-lint on all pushes
  2. main.yml: Tests cloud provider configurations
  3. smart-tests.yml: Selective test running based on changes
  4. integration-tests.yml: Full deployment tests (currently disabled)

Test Categories

  • Unit Tests: Python-based, test logic and templates
  • Syntax Checks: Ansible playbook validation
  • Linting: Code quality enforcement
  • Integration: Full deployment testing (needs work)

Maintenance Guidelines

Dependency Updates

  1. Check for security vulnerabilities monthly
  2. Update conservatively (minor versions)
  3. Test on multiple platforms
  4. Document in PR why updates are needed

Issue Triage

  • Security issues: Priority 1
  • Broken functionality: Priority 2
  • Feature requests: Priority 3
  • Check issues for duplicates

Pull Request Standards

  • Clear description of changes
  • Test results included
  • Linter compliance
  • Conservative approach

Time Wasters to Avoid (Lessons Learned)

Don't spend time on these unless absolutely necessary:

  1. Converting MASQUERADE to SNAT - MASQUERADE works fine for Algo's use case
  2. Fighting systemd socket activation - Configure it properly instead of trying to disable it
  3. Debugging NAT before checking DNS - Most "routing" issues are DNS issues
  4. Complex IPsec policy matching - Keep NAT rules simple, avoid -m policy --pol none
  5. Testing on existing servers - Always test on fresh deployments
  6. Interface-specific route_localnet - WireGuard interface doesn't exist until service starts
  7. DNAT for loopback addresses - Packets to local IPs don't traverse PREROUTING
  8. Removing BPF JIT hardening - It's optional and causes errors on many kernels

Working with Algo

Local Development Setup

# Install dependencies
uv sync
uv run ansible-galaxy install -r requirements.yml

# Run local deployment
ansible-playbook main.yml -e "provider=local"

Common Tasks

Adding a New User

ansible-playbook users.yml -e "server=SERVER_NAME"

Updating Dependencies

  1. Create a new branch
  2. Update pyproject.toml conservatively
  3. Run uv lock to update lockfile
  4. Run all tests
  5. Document security fixes

Debugging Deployment Issues

  1. Check ansible-playbook -vvv output
  2. Verify cloud provider credentials
  3. Check firewall rules
  4. Review generated configs in configs/

Troubleshooting VPN Connectivity

Debugging Methodology

When VPN connects but traffic doesn't work, follow this exact order (learned from painful experience):

  1. Check DNS listening addresses first

    ss -lnup | grep :53
    # Should show local_service_ip:53 (e.g., 172.24.117.23:53)
    # If showing 127.0.2.1:53, socket override didn't apply
    
  2. Check both socket AND service status

    systemctl status dnscrypt-proxy.socket dnscrypt-proxy.service
    # Look for "configuration has changed while running" warnings
    
  3. Verify route_localnet is enabled

    sysctl net.ipv4.conf.all.route_localnet
    # Must be 1 for VPN clients to reach loopback IPs
    
  4. Test DNS resolution from server

    dig @172.24.117.23 google.com  # Use actual local_service_ip
    # Should return results if DNS is working
    
  5. Check firewall counters

    iptables -L INPUT -v -n | grep -E '172.24|10.49|10.48'
    # Look for increasing packet counts
    
  6. Verify NAT is happening

    iptables -t nat -L POSTROUTING -v -n
    # Check for MASQUERADE rules with packet counts
    

Key insight: 90% of "routing" issues are actually DNS issues. Always check DNS first!

systemd and dnscrypt-proxy (Critical for Ubuntu/Debian)

Background: Ubuntu's dnscrypt-proxy package uses systemd socket activation which completely overrides the listen_addresses setting in the config file.

How it works:

  1. Default socket listens on 127.0.2.1:53 (hardcoded in package)
  2. Socket activation means systemd opens the port, not dnscrypt-proxy
  3. Config file listen_addresses is ignored when socket activation is used
  4. Must configure the socket, not just the service

Correct approach:

# Create socket override at /etc/systemd/system/dnscrypt-proxy.socket.d/10-algo-override.conf
[Socket]
ListenStream=              # Clear ALL defaults first
ListenDatagram=            # Clear UDP defaults too
ListenStream=172.x.x.x:53  # Add TCP on VPN IP
ListenDatagram=172.x.x.x:53 # Add UDP on VPN IP

Config requirements:

  • Use empty listen_addresses = [] in dnscrypt-proxy.toml for socket activation
  • Socket must be restarted (not just reloaded) after config changes
  • Check with: systemctl status dnscrypt-proxy.socket for warnings
  • Verify with: ss -lnup | grep :53 to see actual listening addresses

Common mistakes:

  • Trying to disable/mask the socket (breaks service with Requires= dependency)
  • Only setting ListenStream (need ListenDatagram for UDP)
  • Forgetting to clear defaults first (results in listening on both IPs)
  • Not restarting socket after configuration changes

Architectural Decisions and Trade-offs

DNS Service IP Design

Algo uses a randomly generated IP in the 172.16.0.0/12 range on the loopback interface for DNS (local_service_ip). This design has trade-offs:

Why it's done this way:

  • Provides a consistent DNS IP across both WireGuard and IPsec
  • Avoids binding to VPN gateway IPs which differ between protocols
  • Survives interface changes and restarts
  • Works the same way across all cloud providers

The cost:

  • Requires route_localnet=1 sysctl (minor security consideration)
  • Adds complexity with systemd socket activation
  • Can be confusing to debug

Alternatives considered but rejected:

  • Binding to VPN gateway IPs directly (breaks unified configuration)
  • Using dummy interface instead of loopback (non-standard, more complex)
  • DNAT redirects (doesn't work with loopback destinations)

iptables Backend Choice

Algo forces iptables-legacy instead of iptables-nft on Ubuntu 22.04+ because:

  • nft reorders rules unpredictably, breaking VPN traffic
  • Legacy backend provides consistent, predictable behavior
  • Trade-off: Lost some implicit NAT behaviors that nft provided

Important Context for LLMs

What Makes Algo Special

  • Simplicity: One command to deploy
  • Security: Hardened by default
  • No Bloat: Minimal dependencies
  • Privacy: No telemetry or logging

User Expectations

  • It should "just work"
  • Security is non-negotiable
  • Backwards compatibility matters
  • Clear error messages

Common User Profiles

  1. Privacy Advocates: Want secure communications
  2. Travelers: Need reliable VPN access
  3. Small Teams: Shared VPN for remote work
  4. Developers: Testing and development

Maintenance Philosophy

  • Stability over features
  • Security over convenience
  • Clarity over cleverness
  • Test everything

Final Notes

When working on Algo:

  1. Think Security First: Every change should maintain or improve security
  2. Test Thoroughly: Multiple platforms, both VPN types
  3. Document Clearly: Users may not be technical
  4. Be Conservative: This is critical infrastructure
  5. Respect Privacy: No tracking, minimal logging

Remember: People trust Algo with their privacy and security. Every line of code matters.