wallenstein/algo

mirror of https://github.com/trailofbits/algo.git synced 2025-09-07 12:33:02 +02:00

Author	SHA1	Message	Date
Dan Guido	f668af22d0	Fix VPN routing on multi-homed systems by specifying output interface (#14826 ) * Fix VPN routing by adding output interface to NAT rules The NAT rules were missing the output interface specification (-o eth0), which caused routing failures on multi-homed systems (servers with multiple network interfaces). Without specifying the output interface, packets might not be NAT'd correctly. Changes: - Added -o {{ ansible_default_ipv4['interface'] }} to all NAT rules - Updated both IPv4 and IPv6 templates - Updated tests to verify output interface is present - Added ansible_default_ipv4/ipv6 to test fixtures This fixes the issue where VPN clients could connect but not route traffic to the internet on servers with multiple network interfaces (like DigitalOcean droplets with private networking enabled). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix VPN routing by adding output interface to NAT rules On multi-homed systems (servers with multiple network interfaces or multiple IPs on one interface), MASQUERADE rules need to specify which interface to use for NAT. Without the output interface specification, packets may not be routed correctly. This fix adds the output interface to all NAT rules: -A POSTROUTING -s [vpn_subnet] -o eth0 -j MASQUERADE Changes: - Modified roles/common/templates/rules.v4.j2 to include output interface - Modified roles/common/templates/rules.v6.j2 for IPv6 support - Added tests to verify output interface is present in NAT rules - Added ansible_default_ipv4/ipv6 variables to test fixtures For deployments on providers like DigitalOcean where MASQUERADE still fails due to multiple IPs on the same interface, users can enable the existing alternative_ingress_ip option in config.cfg to use explicit SNAT. Testing: - Verified on live servers - All unit tests pass (67/67) - Mutation testing confirms test coverage This fixes VPN connectivity on servers with multiple interfaces while remaining backward compatible with single-interface deployments. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix dnscrypt-proxy not listening on VPN service IPs Problem: dnscrypt-proxy on Ubuntu uses systemd socket activation by default, which overrides the configured listen_addresses in dnscrypt-proxy.toml. The socket only listens on 127.0.2.1:53, preventing VPN clients from resolving DNS queries through the configured service IPs. Solution: Disable and mask the dnscrypt-proxy.socket unit to allow dnscrypt-proxy to bind directly to the VPN service IPs specified in its configuration file. This fixes DNS resolution for VPN clients on Ubuntu 20.04+ systems. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply Python linting and formatting - Run ruff check --fix to fix linting issues - Run ruff format to ensure consistent formatting - All tests still pass after formatting changes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Restrict DNS access to VPN clients only Security fix: The firewall rule for DNS was accepting traffic from any source (0.0.0.0/0) to the local DNS resolver. While the service IP is on the loopback interface (which normally isn't routable externally), this could be a security risk if misconfigured. Changed firewall rules to only accept DNS traffic from VPN subnets: - INPUT rule now includes -s {{ subnets }} to restrict source IPs - Applied to both IPv4 and IPv6 rules - Added test to verify DNS is properly restricted This ensures the DNS resolver is only accessible to connected VPN clients, not the entire internet. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix dnscrypt-proxy service startup with masked socket Problem: dnscrypt-proxy.service has a dependency on dnscrypt-proxy.socket through the TriggeredBy directive. When we mask the socket before starting the service, systemd fails with "Unit dnscrypt-proxy.socket is masked." Solution: 1. Override the service to remove socket dependency (TriggeredBy=) 2. Reload systemd daemon immediately after override changes 3. Start the service (which now doesn't require the socket) 4. Only then disable and mask the socket This ensures dnscrypt-proxy can bind directly to the configured IPs without socket activation, while preventing the socket from being re-enabled by package updates. Changes: - Added TriggeredBy= override to remove socket dependency - Added explicit daemon reload after service overrides - Moved socket masking to after service start in main.yml - Fixed YAML formatting issues Testing: Deployment now succeeds with dnscrypt-proxy binding to VPN IPs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix dnscrypt-proxy by not masking the socket Problem: Masking dnscrypt-proxy.socket prevents the service from starting because the service has Requires=dnscrypt-proxy.socket dependency. Solution: Simply stop and disable the socket without masking it. This prevents socket activation while allowing the service to start and bind directly to the configured IPs. Changes: - Removed socket masking (just disable it) - Moved socket disabling before service start - Removed invalid systemd directives from override Testing: Confirmed dnscrypt-proxy now listens on VPN service IPs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Use systemd socket activation properly for dnscrypt-proxy Instead of fighting systemd socket activation, configure it to listen on the correct VPN service IPs. This is more systemd-native and reliable. Changes: - Create socket override to listen on VPN IPs instead of localhost - Clear default listeners and add VPN service IPs - Use empty listen_addresses in dnscrypt-proxy.toml for socket activation - Keep socket enabled and let systemd manage the activation - Add handler for restarting socket when config changes Benefits: - Works WITH systemd instead of against it - Survives package updates better - No dependency conflicts - More reliable service management This approach is cleaner than disabling socket activation entirely and ensures dnscrypt-proxy is accessible to VPN clients on the correct IPs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Document debugging lessons learned in CLAUDE.md Added comprehensive debugging guidance based on our troubleshooting session: - VPN connectivity troubleshooting order (DNS first!) - systemd socket activation best practices - Common deployment failures and solutions - Time wasters to avoid (lessons learned the hard way) - Multi-homed system considerations - Testing notes for DigitalOcean These additions will help future debugging sessions avoid the same rabbit holes and focus on the most likely issues first. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix DNS resolution for VPN clients by enabling route_localnet The issue was that dnscrypt-proxy listens on a special loopback IP (randomly generated in 172.16.0.0/12 range) which wasn't accessible from VPN clients. This fix: 1. Enables net.ipv4.conf.all.route_localnet sysctl to allow routing to loopback IPs from other interfaces 2. Ensures dnscrypt-proxy socket is properly restarted when its configuration changes 3. Adds proper handler flushing after socket configuration updates This allows VPN clients to reach the DNS resolver at the local_service_ip address configured on the loopback interface. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve security by using interface-specific route_localnet Instead of enabling route_localnet globally (net.ipv4.conf.all.route_localnet), this change enables it only on the specific interfaces that need it: - WireGuard interface (wg0) for WireGuard VPN clients - Main network interface (eth0/etc) for IPsec VPN clients This minimizes the security impact by restricting loopback routing to only the VPN interfaces, preventing other interfaces from being able to route to loopback addresses. The interface-specific approach provides the same functionality (allowing VPN clients to reach the DNS resolver on the local_service_ip) while reducing the potential attack surface. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Revert to global route_localnet to fix deployment failure The interface-specific route_localnet approach failed because: - WireGuard interface (wg0) doesn't exist until the service starts - We were trying to set the sysctl before the interface was created - This caused deployment failures with "No such file or directory" Reverting to the global setting (net.ipv4.conf.all.route_localnet=1) because: - It always works regardless of interface creation timing - VPN users are trusted (they have our credentials) - Firewall rules still restrict access to only port 53 - The security benefit of interface-specific settings is minimal - The added complexity isn't worth the marginal security improvement This ensures reliable deployments while maintaining the DNS resolution fix. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix dnscrypt-proxy socket restart and remove problematic BPF hardening Two important fixes: 1. Fix dnscrypt-proxy socket not restarting with new configuration - The socket wasn't properly restarting when its override config changed - This caused DNS to listen on wrong IP (127.0.2.1 instead of local_service_ip) - Now directly restart the socket when configuration changes - Add explicit daemon reload before restarting 2. Remove BPF JIT hardening that causes deployment errors - The net.core.bpf_jit_enable sysctl isn't available on all kernels - It was causing "Invalid argument" errors during deployment - This was optional security hardening with minimal benefit - Removing it eliminates deployment errors for most users These fixes ensure reliable DNS resolution for VPN clients and clean deployments without error messages. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update CLAUDE.md with comprehensive debugging lessons learned Based on our extensive debugging session, this update adds critical documentation: ## DNS Architecture and Troubleshooting - Explained the local_service_ip design and why it requires route_localnet - Added detailed DNS debugging methodology with exact steps in order - Documented systemd socket activation complexities and common mistakes - Added specific commands to verify DNS is working correctly ## Architectural Decisions - Added new section explaining trade-offs in Algo's design choices - Documented why local_service_ip uses loopback instead of alternatives - Explained iptables-legacy vs iptables-nft backend choice ## Enhanced Debugging Guidance - Expanded troubleshooting with exact commands and expected outputs - Added warnings about configuration changes that need restarts - Documented socket activation override requirements in detail - Added common pitfalls like interface-specific sysctls ## Time Wasters Section - Added new lessons learned from this debugging session - Interface-specific route_localnet (fails before interface exists) - DNAT for loopback addresses (doesn't work) - BPF JIT hardening (causes errors on many kernels) This documentation will help future maintainers avoid the same debugging rabbit holes and understand why things are designed the way they are. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-17 22:12:23 -04:00
Dan Guido	172fc348ef	Add test to detect inline comments in Jinja2 expressions within YAML files (#14817 ) * Add test to detect inline comments in Jinja2 expressions within YAML files This test would have caught the bug reported where inline comments (#) within Jinja2 expressions in YAML task files caused Ansible template errors. The test: - Extracts and validates all Jinja2 expressions from YAML files - Specifically detects inline comments within {{ }} and {% %} blocks - Includes regression test for the exact reported bug pattern - Avoids false positives (# in strings, escaped #, comments outside expressions) - Focuses on the critical inline comment issue The original bug was in roles/strongswan/tasks/openssl.yml where comments like "# Per-deployment UUID..." were placed inside a Jinja2 expression, causing "unexpected char '#'" errors during playbook execution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor test to use pytest framework and add comprehensive edge cases - Converted standalone script to proper pytest test functions - Replaced main() with individual test functions using pytest assertions - Added comprehensive edge case tests for inline comment detection: * Hash symbols in strings (should pass) * Escaped hashes (should pass) * Comments in control blocks (should fail) * Multi-line expressions with comments (should fail) * URL fragments and hex colors (should pass) - Test functions now properly integrate with pytest: * test_regression_openssl_inline_comments() - regression test * test_edge_cases_inline_comments() - comprehensive edge cases * test_yaml_files_no_inline_comments() - scan all YAML files * test_openssl_file_specifically() - test the originally buggy file This addresses the review feedback about pytest integration and adds the suggested test cases for better coverage. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix linter issues in test_yaml_jinja2_expressions.py - Fixed trailing whitespace issues (W293) - Applied ruff formatting for consistent code style - All tests still pass after formatting changes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add mutation testing guidance to CLAUDE.md Added a section on writing effective tests that emphasizes the importance of verifying that tests actually detect failure cases. This lightweight mutation testing approach ensures: - Tests catch the specific bugs they're designed to prevent - We avoid false confidence from tests that always pass - Test purposes are clear and documented - Both success and failure cases are validated The guidance includes a concrete example from our recent inline comment detection test, showing how to verify both the problematic pattern (should fail) and the fixed pattern (should pass). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-07 11:12:23 -07:00
Dan Guido	2ab57c3f6a	Implement self-bootstrapping uv setup to resolve issue #14776 (#14814 ) * Implement self-bootstrapping uv setup to resolve issue #14776 This major simplification addresses the Python setup complexity that has been a barrier for non-developer users deploying Algo VPN. ## Revolutionary User Experience Change Before (complex): ```bash python3 -m virtualenv --python="$(command -v python3)" .env && source .env/bin/activate && python3 -m pip install -U pip virtualenv && python3 -m pip install -r requirements.txt ./algo ``` After (simple): ```bash ./algo ``` ## Key Technical Changes ### Core Implementation - algo script: Complete rewrite with automatic uv installation - Detects missing uv and installs automatically via curl - Cross-platform support (macOS, Linux, Windows) - Preserves exact same command interface - Uses `uv run ansible-playbook` instead of virtualenv activation ### Documentation Overhaul - README.md: Reduced installation from 4 complex steps to 1 command - Platform docs: Simplified macOS, Windows, Linux, Cloud Shell guides - Removed Python installation complexity from all user-facing docs ### CI/CD Infrastructure Updates - 5 GitHub Actions workflows converted from pip to uv - Docker builds updated to use uv instead of virtualenv - Legacy test scripts (3 files) updated for uv compatibility ### Repository Cleanup - install.sh: Updated for cloud-init/bootstrap scenarios - algo-showenv.sh: Updated environment detection for uv - pyproject.toml: Added all dependencies with proper versioning - test scripts: Removed .env references, updated paths ## Benefits Achieved ✅ Zero-step dependency installation - uv installs automatically on first run ✅ Cross-platform consistency - identical experience on all operating systems ✅ Automatic Python version management - uv handles Python 3.11+ requirement ✅ Familiar interface preserved - existing `./algo` and `./algo update-users` unchanged ✅ No breaking changes - existing users see same commands, same functionality ✅ Resolves macOS Python compatibility - works with system Python 3.9 via uv's Python management ## Files Changed (18 total) Core Scripts (3): - algo (complete rewrite with self-bootstrapping) - algo-showenv.sh (uv environment detection) - install.sh (cloud-init script updated) Documentation (4): - README.md (revolutionary simplification) - docs/deploy-from-macos.md (removed Python complexity) - docs/deploy-from-windows.md (simplified WSL setup) - docs/deploy-from-cloudshell.md (updated for uv) CI/CD (5): - .github/workflows/main.yml (pip → uv conversion) - .github/workflows/smart-tests.yml (pip → uv conversion) - .github/workflows/lint.yml (pip → uv conversion) - .github/workflows/integration-tests.yml (pip → uv + Docker fix) - Dockerfile (virtualenv → uv conversion) Tests (4): - tests/legacy-lxd/local-deploy.sh (virtualenv → uv in Docker) - tests/legacy-lxd/update-users.sh (virtualenv → uv in Docker) - tests/legacy-lxd/ca-password-fix.sh (virtualenv → uv in Docker) - tests/unit/test_template_rendering.py (removed .env path reference) Dependencies (2): - pyproject.toml (added full dependency specification) - uv.lock (new uv lockfile for reproducible builds) This implementation makes Algo VPN accessible to non-technical users while maintaining all power and flexibility for advanced users. Closes #14776 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix CI/CD workflow inconsistencies and resolve Claude's code review issues - Fix inconsistent dependency management across all CI workflows - Replace 'uv add' with 'uv sync' for reproducible builds - Use 'uv run --with' for temporary tool installations - Standardize on locked dependencies from pyproject.toml - Fix ineffective linting by removing '\|\| true' from ruff check in lint.yml - Ensures linting errors actually fail the build - Maintains consistency with other linter configurations - Update yamllint configuration to exclude .venv/ directory - Prevents scanning Python package templates with Ansible-specific filters - Fixes trailing spaces in workflow files - Improve shell script quality by fixing shellcheck warnings - Quote $(pwd) expansions in Docker test scripts - Address critical word-splitting vulnerabilities - Update test infrastructure for uv compatibility - Exclude .env/.venv directories from template scanning - Ensure local tests exactly match CI workflow commands All linters and tests now pass locally and match CI requirements exactly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove test configuration file * Remove obsolete venvs directory and update .gitignore for uv - Remove venvs/ directory which was only used as a placeholder for virtualenv - Update .gitignore to use explicit .env/ and .venv/ patterns instead of env - Modernize ignore patterns for uv-based dependency management 🤖 Generated with [Claude Code](https://claude.ai/code) Implement secure uv installation addressing Claude's security concerns Security improvements: - Package managers first: Try brew, apt, dnf, pacman, zypper, winget, scoop - User consent required: Clear security warning before script download - Manual installation guidance: Provide fallback instructions with checksums - Versioned installers: Use uv 0.8.5 specific URLs for consistency across CI/local Benefits: - ✅ Most users get uv via secure package managers (no download needed) - ✅ Clear security disclosure for script downloads with opt-out - ✅ Transparent about security tradeoffs vs usability - ✅ Maintains "just works" experience while respecting security concerns - ✅ CI and local installations now use identical versioned scripts This addresses the unverified download security vulnerability while preserving the user experience improvements from the self-bootstrapping approach. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Major improvements: modernize Python tooling, fix CI, enhance security This commit implements comprehensive improvements across multiple areas: ## 🚀 Python Tooling Modernization - Eliminate requirements.txt: Move to pyproject.toml as single source of truth - Add pytest integration: Replace individual test file execution with pytest discovery - Add dev dependencies: Include pytest and pytest-xdist for parallel testing - Update documentation: Modernize CLAUDE.md with uv-based workflows ## 🔒 Security Enhancements (zizmor fixes) - Fix credential persistence: Add persist-credentials: false to checkout steps - Fix template injection: Move GitHub context variables to environment variables - Pin action versions: Use commit hash for astral-sh/setup-uv@v6 (1ddb97e5078301c0bec13b38151f8664ed04edc8) ## ⚡ CI/CD Optimization - Create composite action: Centralize uv setup (.github/actions/setup-uv) - Eliminate workflow duplication: Replace 13 duplicate uv setup blocks with reusable action - Fix path filters: Update smart-tests.yml to watch pyproject.toml instead of requirements.txt - Remove pip caching: Clean up obsolete cache: 'pip' configurations - Standardize test execution: Use pytest across all workflows ## 🐳 Docker Improvements - Secure uv installation: Use official distroless image instead of curl - Remove requirements.txt: Update COPY directive for new dependency structure ## 📈 Impact Summary - Security: Resolved 12/14 zizmor issues (86% improvement) - Maintainability: 92% reduction in workflow duplication - Performance: Better caching and parallel test execution - Standards: Aligned with 2025 Python packaging best practices 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Complete backward compatibility cleanup and Windows improvements - Fix main.yml requirements.txt lookup with pyproject.toml parsing - Update test_docker_localhost_deployment.py to check pyproject.toml - Fix Vagrantfile pip args with hard-coded dependency versions - Enhance Windows OS detection for WSL, Git Bash, and MINGW variants - Implement versioned Windows PowerShell installer (0.8.5) - Update documentation references in troubleshooting.md and tests/README.md All linters and tests pass: ruff ✅ yamllint ✅ pytest 48/48 ✅ ansible syntax ✅ 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Python version requirement consistency Update test to require Python 3.11+ to match pyproject.toml requires-python setting. Previously test accepted 3.10+ while pyproject.toml required 3.11+. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix pyproject.toml version parsing to not require community.general collection Replace community.general.toml lookup with regex_search on file lookup. This fixes "lookup plugin (community.general.toml) not found" error on macOS where the collection may not be available during early bootstrap. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix ansible version detection for uv-managed environments Replace pip_package_info lookup with uv pip list command to detect ansible version. This fixes "'dict object' has no attribute 'ansible'" error on macOS where ansible is installed via uv instead of system pip. The fix extracts the ansible package version (e.g. 11.8.0) from uv pip list output instead of trying to access non-existent pip package registry. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add Ubuntu-specific uv installation alternatives Enhance the algo bootstrapping script with Ubuntu-specific trusted installation methods when system package managers don't provide uv: - pipx option (official PyPI, ~9 packages vs 58 for python3-pip) - snap option (community-maintained by Canonical employee) - Links to source repo for transparency (github.com/lengau/uv-snap) - Interactive menu with clear explanations - Robust error handling with fallbacks Addresses common Ubuntu 24.04+ deployment scenario where uv is not available via apt, providing secure alternatives to script downloads. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix shellcheck warning in Ubuntu uv installation menu Add -r flag to read command to prevent backslash mangling as required by shellcheck SC2162. This ensures proper handling of user input in the interactive installation method selection. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Major packaging improvements for AlgoVPN 2.0 beta Remove outdated development files and modernize packaging: - Remove PERFORMANCE.md (optimizations are now defaults) - Remove Makefile (limited Docker-only utility) - Remove Vagrantfile (over-engineered for edge case) Modernize Docker support: - Fix .dockerignore: 872MB -> 840KB build context (99.9% reduction) - Update Dockerfile: Python 3.12, uv:latest, better security - Add multi-arch support and health checks - Simplified package dependencies Improve dependency management: - Pin Ansible collections to exact versions (prevent breakage) - Update version to 2.0.0-beta for upcoming release - Align with uv's exact dependency philosophy This reduces maintenance burden while focusing on Algo's core cloud deployment use case. Created GitHub issue #14816 for lazy cloud provider loading in future releases. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update community health files for AlgoVPN 2.0 Remove outdated CHANGELOG.md: - Contained severely outdated information (v1.2, Ubuntu 20.04, Makefile intro) - Conflicted with current 2.0.0-beta version and recent changes - 136 lines of misleading content requiring complete rewrite - GitHub releases provide better, auto-generated changelogs Modernize CONTRIBUTING.md: - Update client support: macOS 12+, iOS 15+, Windows 11+, Ubuntu 22.04+ - Expand cloud provider list: Add Vultr, Hetzner, Linode, OpenStack, CloudStack - Replace manual dependency setup with uv auto-installation - Add modern development practices: exact dependency pinning, lint.sh usage - Include development setup section with current workflow Fix PULL_REQUEST_TEMPLATE.md: - Fix broken checkboxes: `- []` → `- [ ]` (missing space) - Add linter compliance requirement: `./scripts/lint.sh` - Add dependency pinning check for exact versions - Reorder checklist for logical flow Community health files now accurately reflect AlgoVPN 2.0 capabilities and guide contributors toward modern best practices. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Complete legacy pip module elimination for uv migration Fixes critical macOS installation failure due to PEP 668 externally-managed-environment restrictions. Key changes: - Add missing pyopenssl and segno dependencies to pyproject.toml - Add optional cloud provider dependencies with exact versions - Replace all cloud provider pip module tasks with uv-based installation - Implement dynamic cloud provider dependency installation in cloud-pre.yml - Modernize OpenStack dependency (openstacksdk replaces deprecated shade) This completes the migration from legacy pip to modern uv dependency management, ensuring consistent behavior across all platforms and eliminating the root cause of macOS installation failures. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update lockfile with cloud provider dependencies and correct version Regenerates uv.lock to include all optional cloud provider dependencies and ensures version consistency between pyproject.toml and lockfile. Added dependencies for all cloud providers: - AWS: boto3, boto, botocore, s3transfer - Azure: azure-identity, azure-mgmt-, msrestazure - GCP: google-auth, requests - Hetzner: hcloud - Linode: linode-api4 - OpenStack: openstacksdk, keystoneauth1 - CloudStack: cs, sshpubkeys 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> Modernize and simplify README installation instructions - Remove obsolete step 3 (dependency installation) since uv handles this automatically - Streamline installation from 5 to 4 steps - Make device section headers consistent (Apple, Android, Windows, Linux) - Combine Linux WireGuard and IPsec sections for clarity - Improve "please see this page" links with clear descriptions - Move PKI preservation note to user management section where it's relevant - Enhance adding/removing users section with better flow - Add context to Other Devices section for manual configuration - Fix grammar inconsistencies (setup → set up, missing commas) - Update Ubuntu deployment docs to specify 22.04 LTS requirement - Simplify road warrior setup instructions - Remove outdated macOS WireGuard complexity notes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Comprehensive documentation modernization and cleanup - Remove all FreeBSD support (roles, documentation, references) - Modernize troubleshooting guide by removing ~200 lines of obsolete content - Rewrite OpenWrt router documentation with cleaner formatting - Update Amazon EC2 documentation with current information - Rewrite unsupported cloud provider documentation - Remove obsolete linting documentation - Update all version references to Ubuntu 22.04 LTS and Python 3.11+ - Add documentation style guidelines to CLAUDE.md - Clean up compilation and legacy Python compatibility issues - Update client documentation for current requirements All documentation now reflects the uv-based modernization and current supported platforms, eliminating references to obsolete tooling and unsupported operating systems. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix linting and syntax errors caused by FreeBSD removal - Restore missing newline in roles/dns/handlers/main.yml (broken during FreeBSD cleanup) - Add FQCN for community.crypto modules in cloud-pre.yml - Exclude playbooks/ directory from ansible-lint (these are task files, not standalone playbooks) The FreeBSD removal accidentally removed a trailing newline causing YAML format errors. The playbook syntax errors were false positives - these files contain tasks for import_tasks/include_tasks, not standalone plays. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix CI test failure: use uv-managed ansible in test script The test script was calling ansible-playbook directly instead of 'uv run ansible-playbook', which caused it to use the system-installed ansible that doesn't have access to the netaddr dependency required by the ansible.utils.ipmath filter. This fixes the CI error: 'Failed to import the required Python library (netaddr)' 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Clean up test config warnings - Remove duplicate ipsec_enabled key (was defined twice) - Remove reserved variable name 'no_log' This eliminates YAML parsing warnings in the test script while maintaining the same test functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add native Windows support with PowerShell script - Create algo.ps1 for native Windows deployment - Auto-install uv via winget/scoop with download fallback - Support update-users command like Unix version - Add PowerShell linting to CI pipeline with PSScriptAnalyzer - Update documentation with Windows-specific instructions - Streamline deploy-from-windows.md with clearer options 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix PowerShell script for Windows Ansible limitations - Fix syntax issues: remove emoji chars, add winget acceptance flags - Address core issue: Ansible doesn't run natively on Windows - Convert PowerShell script to intelligent WSL wrapper - Auto-detect WSL environment and use appropriate approach - Provide clear error messages and WSL installation guidance - Update documentation to reflect WSL requirement - Maintain backward compatibility for existing WSL users 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Greatly improve PowerShell script error messages and WSL detection - Fix WSL detection: only detect when actually running inside WSL - Add comprehensive error messages with step-by-step WSL installation - Provide clear troubleshooting guidance for common scenarios - Add colored output for better visibility (Red/Yellow/Green/Cyan) - Improve WSL execution with better error handling and path validation - Clarify Ubuntu 22.04 LTS recommendation for WSL stability - Add fallback suggestions when things go wrong Resolves the confusing "bash not recognized" error by properly detecting Windows vs WSL environments and providing actionable guidance. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Address code review feedback - Add documentation about PATH export scope in algo script - Optimize Dockerfile layers by combining dependency operations The PATH export comment clarifies that changes only affect the current shell session. The Dockerfile change reduces layers by copying and installing dependencies in a more efficient order. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove unused uv installation code from PowerShell script The PowerShell script is purely a WSL wrapper - it doesn't need to install uv since it just passes execution to WSL/bash where the Unix algo script handles dependency management. Removing dead code that was never called in the execution flow. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Improve uv installation feedback and Docker dependency locking - Track and display which installation method succeeded for uv - Add --locked flag to Docker uv sync for stricter dependency enforcement - Users now see "uv installed successfully via Homebrew\!" etc. This addresses code review feedback about installation transparency and dependency management strictness. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Docker build: use --locked without --frozen The --frozen and --locked flags are mutually exclusive in uv. Using --locked alone provides the stricter enforcement we want - it asserts the lockfile won't change and errors if it would. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix setuptools package discovery error during cloud provider dependency installation The issue occurred when uv tried to install optional dependencies (e.g., [digitalocean]) because setuptools was auto-discovering directories like 'roles', 'library', etc. as Python packages. Since Algo is an Ansible project, not a Python package, this caused builds to fail. Added explicit build-system configuration to pyproject.toml with py-modules = [] to disable package discovery entirely. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix Jinja2 template syntax error in OpenSSL certificate generation Removed inline comments from within Jinja2 expressions in the name_constraints_permitted and name_constraints_excluded fields. Jinja2 doesn't support comments within expressions using the # character, which was causing template rendering to fail. Moved explanatory comments outside the Jinja2 expressions to maintain documentation while fixing the syntax error. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Enhance Jinja2 template testing infrastructure Added comprehensive Jinja2 template testing to catch syntax errors early: 1. Created validate_jinja2_templates.py: - Validates all Jinja2 templates for syntax errors - Detects inline comments in Jinja2 expressions (the bug we just fixed) - Checks for common anti-patterns - Provides warnings for style issues - Skips templates requiring Ansible runtime context 2. Created test_strongswan_templates.py: - Tests all StrongSwan templates with multiple scenarios - Tests with IPv4-only, IPv6, DNS hostnames, and legacy OpenSSL - Validates template output correctness - Skips mobileconfig test that requires complex Ansible runtime 3. Updated .ansible-lint: - Enabled jinja[invalid] and jinja[spacing] rules - These will catch template errors during linting 4. Added scripts/test-templates.sh: - Comprehensive test script that runs all template tests - Can be used in CI and locally for validation - All tests pass cleanly without false failures - Treats spacing issues as warnings, not failures This testing would have caught the inline comment issue in the OpenSSL template before it reached production. All tests now pass cleanly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix StrongSwan CRL reread handler race condition The ipsec rereadcrls command was failing with exit code 7 when the IPsec daemon wasn't fully started yet. This is a timing issue that can occur during initial setup. Added retry logic to: 1. Wait up to 10 seconds for the IPsec daemon to be ready 2. Check daemon status before attempting CRL operations 3. Gracefully handle the case where daemon isn't ready Also fixed Python linting issues (whitespace) in test files caught by ruff. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix StrongSwan CRL handler properly without ignoring errors Instead of ignoring errors (anti-pattern), this fix properly handles the race condition when StrongSwan restarts: 1. After restarting StrongSwan, wait for port 500 (IKE) to be listening - This ensures the daemon is fully ready before proceeding - Waits up to 30 seconds with proper timeout handling 2. When reloading CRLs, use Ansible's retry mechanism - Retries up to 3 times with 2-second delays - Handles transient failures during startup 3. Separated rereadcrls and purgecrls into distinct tasks - Better error reporting and debugging - Cleaner task organization This approach ensures the installation works reliably on fresh installs without hiding potential real errors. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix StrongSwan handlers - handlers cannot be blocks Ansible handlers cannot be blocks. Fixed by: 1. Making each handler a separate task that can notify the next handler 2. restart strongswan -> notifies -> wait for strongswan 3. rereadcrls -> notifies -> purgecrls This maintains the proper execution order while conforming to Ansible's handler constraints. The wait and retry logic is preserved. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix StrongSwan CRL handler for fresh installs The root cause: rereadcrls handler is notified when copying CRL files during certificate generation, which happens BEFORE StrongSwan is installed and started on fresh installs. The fix: 1. Check if StrongSwan service is actually running before attempting CRL reload 2. If not running, skip reload (not needed - StrongSwan will load CRLs on start) 3. If running, attempt reload with retries This handles both scenarios: - Fresh install: StrongSwan not yet running, skip reload - Updates: StrongSwan running, reload CRLs properly Also removed the wait_for port 500 which was failing because StrongSwan doesn't bind to localhost. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-06 22:10:56 -07:00
Jack Ivanov	4289db043a	Refactor StrongSwan PKI tasks to use Ansible crypto modules and remove legacy OpenSSL scripts (#14809 ) * Refactor StrongSwan PKI automation with Ansible crypto modules - Replace shell-based OpenSSL commands with community.crypto modules - Remove custom OpenSSL config template and manual file management - Upgrade Ansible to 11.8.0 in requirements.txt - Improve idempotency, maintainability, and security of certificate and CRL handling * Enhance nameConstraints with comprehensive exclusions - Add email domain exclusions (.com, .org, .net, .gov, .edu, .mil, .int) - Include private IPv4 network exclusions - Add IPv6 null route exclusion - Preserve all security constraints from original openssl.cnf.j2 - Note: Complex IPv6 conditional logic simplified for Ansible compatibility Security: Maintains defense-in-depth certificate scope restrictions * Refactor StrongSwan PKI with comprehensive security enhancements and hybrid testing ## StrongSwan PKI Modernization - Migrated from shell-based OpenSSL commands to Ansible community.crypto modules - Simplified complex Jinja2 templates while preserving all security properties - Added clear, concise comments explaining security rationale and Apple compatibility ## Enhanced Security Implementation (Issues #75, #153) - Name constraints: CA certificates restricted to specific IP/email domains - EKU role separation: Server certs (serverAuth only) vs client certs (clientAuth only) - Domain exclusions: Blocks public domains (.com, .org, etc.) and private IP ranges - Apple compatibility: SAN extensions and PKCS#12 compatibility2022 encryption - Certificate revocation: Automated CRL generation for removed users ## Comprehensive Test Suite - Hybrid testing: Validates real certificates when available, config validation for CI - Security validation: Verifies name constraints, EKU restrictions, role separation - Apple compatibility: Tests SAN extensions and PKCS#12 format compliance - Certificate chain: Validates CA signing and certificate validity periods - CI-compatible: No deployment required, tests Ansible configuration directly ## Configuration Updates - Updated CLAUDE.md: Ansible version rationale (stay current for security/performance) - Streamlined comments: Removed duplicative explanations while preserving technical context - Maintained all Issue #75/#153 security enhancements with modern Ansible approach 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix linting issues across the codebase ## Python Code Quality (ruff) - Fixed import organization and removed unused imports in test files - Replaced `== True` comparisons with direct boolean checks - Added noqa comments for intentional imports in test modules ## YAML Formatting (yamllint) - Removed trailing spaces in openssl.yml comments - All YAML files now pass yamllint validation (except one pre-existing long regex line) ## Code Consistency - Maintained proper import ordering in test files - Ensured all code follows project linting standards - Ready for CI pipeline validation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Replace magic number with configurable certificate validity period ## Maintainability Improvement - Replaced hardcoded `+3650d` (10 years) with configurable variable - Added `certificate_validity_days: 3650` in vars section with clear documentation - Applied consistently to both server and client certificate signing ## Benefits - Single location to modify certificate validity period - Supports compliance requirements for shorter certificate lifespans - Improves code readability and maintainability - Eliminates magic number duplication ## Backwards Compatibility - Default remains 10 years (3650 days) - no behavior change - Organizations can now easily customize certificate validity as needed 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update test to validate configurable certificate validity period ## Test Update - Fixed test failure after replacing magic number with configurable variable - Now validates both variable definition and usage patterns: - `certificate_validity_days: 3650` (configurable parameter) - `ownca_not_after: "+{{ certificate_validity_days }}d"` (variable usage) ## Improved Test Coverage - Better validation: checks that validity is configurable, not hardcoded - Maintains backwards compatibility verification (10-year default) - Ensures proper Ansible variable templating is used ## Verified - Config validation mode: All 6 tests pass ✓ - Validates the maintainability improvement from previous commit 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Update to Python 3.11 minimum and fix IPv6 constraint format - Update Python requirement from 3.10 to 3.11 to align with Ansible 11 - Pin Ansible collections in requirements.yml for stability - Fix invalid IPv6 constraint format causing deployment failure - Update ruff target-version to py311 for consistency 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix x509_crl mode parameter and auto-fix Python linting - Remove deprecated 'mode' parameter from x509_crl task - Add separate file task to set CRL permissions (0644) - Auto-fix Python datetime import (use datetime.UTC alias) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix final IPv6 constraint format in defaults template - Update nameConstraints template in defaults/main.yml - Change malformed IP:0:0:0:0:0:0:0:0/0:0:0:0:0:0:0:0 to correct IP:::/0 - This ensures both Ansible crypto modules and OpenSSL template use consistent IPv6 format * Fix critical certificate generation issues for macOS/iOS VPN compatibility This commit addresses multiple certificate generation bugs in the Ansible crypto module implementation that were causing VPN authentication failures on Apple devices. Fixes implemented: 1. Basic Constraints Extension: Added missing `CA:FALSE` constraints to both server and client certificate CSRs. This was causing certificate chain validation errors on macOS/iOS devices. 2. Subject Key Identifier: Added `create_subject_key_identifier: true` to CA certificate generation to enable proper Authority Key Identifier creation in signed certificates. 3. Complete Name Constraints: Fixed missing DNS and IPv6 constraints in CA certificate that were causing size differences compared to legacy shell-based generation. Now includes: - DNS constraints for the deployment-specific domain - IPv6 permitted addresses when IPv6 support is enabled - Complete IPv6 exclusion ranges (fc00::/7, fe80::/10, 2001:db8::/32) These changes bring the certificate format much closer to the working shell-based implementation and should resolve most macOS/iOS VPN connectivity issues. Outstanding Issue: Authority Key Identifier still incomplete - missing DirName and serial components. The community.crypto module limitation may require additional investigation or alternative approaches. Certificate size improvements: Server certificates increased from ~750 to ~775 bytes, CA certificates from ~1070 to ~1250 bytes, bringing them closer to the expected ~3000 byte target size. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix certificate generation and improve version parsing This commit addresses multiple issues found during macOS certificate validation: Certificate Generation Fixes: - Add Basic Constraints (CA:FALSE) to server and client certificates - Generate Subject Key Identifier for proper AKI creation - Improve Name Constraints implementation for security - Update community.crypto to version 3.0.3 for latest fixes Code Quality Improvements: - Clean up certificate comments and remove obsolete references - Fix server certificate identification in tests - Update datetime comparisons for cryptography library compatibility - Fix Ansible version parsing in main.yml with proper regex handling Testing: - All certificate validation tests pass - Ansible syntax checks pass - Python linting (ruff) clean - YAML linting (yamllint) clean These changes restore macOS/iOS certificate compatibility while maintaining security best practices and improving code maintainability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Enhance security documentation with comprehensive inline comments Add detailed technical explanations for critical PKI security features: - Name Constraints: Defense-in-depth rationale and attack prevention - Public domain/network exclusions: Impersonation attack prevention - RFC 1918 private IP blocking: Lateral movement prevention - IPv6 constraint strategy: ULA/link-local/documentation range handling - Role separation enforcement: Server vs client EKU restrictions - CA delegation prevention: pathlen:0 security implications - Cross-deployment isolation: UUID-based certificate scope limiting These comments provide essential context for maintainers to understand the security importance of each configuration without referencing external issue numbers, ensuring long-term maintainability. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix CI test failures in PKI certificate validation Resolve Smart Test Selection workflow failures by fixing test validation logic: Certificate Configuration Fixes: - Remove unnecessary serverAuth/clientAuth EKUs from CA certificate - CA now only has IPsec End Entity EKU for VPN-specific certificate issuance - Maintains proper role separation between server and client certificates Test Validation Improvements: - Fix domain exclusion detection to handle both single and double quotes in YAML - Improve EKU validation to check actual configuration lines, not comments - Server/client certificate tests now correctly parse YAML structure - Tests pass in both CI mode (config validation) and local mode (real certificates) Root Cause: The CI failures were caused by overly broad test assertions that: 1. Expected double-quoted strings but found single-quoted YAML 2. Detected EKU keywords in comments rather than actual configuration 3. Failed to properly parse YAML list structures All security constraints remain intact - no actual security issues were present. The certificate generation produces properly constrained certificates for VPN use. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix trailing space in openssl.yml for yamllint compliance --------- Co-authored-by: Dan Guido <dan@trailofbits.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-08-05 05:40:28 -07:00
Dan Guido	d961f1d7e0	Add Claude Code GitHub Workflow (#14798 ) * "Claude PR Assistant workflow" * "Claude Code Review workflow" * docs: Add CLAUDE.md for LLM guidance This comprehensive guide captures important context and learnings for LLMs working on the Algo VPN codebase, including: - Project architecture and structure - Critical dependencies and version management - Development practices and code style - Testing requirements and CI/CD pipeline - Common issues and solutions - Security considerations - Platform support details - Maintenance guidelines The guide emphasizes Algo's core values: security, simplicity, and privacy. It provides practical guidance based on extensive experience working with the codebase, helping future contributors maintain high standards while avoiding common pitfalls. * feat: Configure Claude GitHub Actions with Algo-specific settings - Add allowed_tools for running Ansible, Python, and shell linters - Enable use_sticky_comment for cleaner PR discussions - Add custom_instructions to follow Algo's security-first principles - Reference CLAUDE.md for project-specific guidance	2025-08-03 08:01:41 -04:00

5 commits