Commit graph

65 commits

Author SHA1 Message Date
Dan Guido
c495307027
Fix DigitalOcean cloud-init compatibility and deprecation warnings (#14801)
* Fix DigitalOcean cloud-init compatibility issue causing SSH timeout on port 4160

This commit addresses the issue described in GitHub issue #14800 where DigitalOcean
deployments fail during the "Wait until SSH becomes ready..." step due to cloud-init
not processing the write_files directive correctly.

## Problem
- DigitalOcean's cloud-init shows "Unhandled non-multipart (text/x-not-multipart) userdata" warning
- write_files module gets skipped, leaving SSH on default port 22 instead of port 4160
- Algo deployment times out when trying to connect to port 4160

## Solution
Added proactive detection and remediation to the DigitalOcean role:
1. Check if SSH is listening on the expected port (4160) after droplet creation
2. If not, automatically apply the SSH configuration manually via SSH on port 22
3. Verify SSH is now listening on the correct port before proceeding

## Changes
- Added SSH port check with 30-second timeout
- Added fallback remediation block that:
  - Connects via SSH on port 22 to apply Algo's SSH configuration
  - Backs up the original sshd_config
  - Applies the correct SSH settings (port 4160, security hardening)
  - Restarts the SSH service
  - Verifies the fix worked

This ensures DigitalOcean deployments succeed even when cloud-init fails to process
the user_data correctly, maintaining backward compatibility and reliability.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Implement cleaner fix for DigitalOcean cloud-init encoding issue

This replaces the previous workaround with two targeted fixes that address
the root cause of the "Unhandled non-multipart (text/x-not-multipart) userdata"
issue that prevents write_files from being processed.

## Root Cause
Cloud-init receives user_data as binary/bytes instead of UTF-8 string,
causing it to fail parsing and skip the write_files directive that
configures SSH on port 4160.

## Cleaner Solutions Implemented

### Fix 1: String Encoding (user_data | string)
- Added explicit string conversion to user_data template lookup
- Ensures DigitalOcean API receives proper UTF-8 string, not bytes
- Minimal change with maximum compatibility

### Fix 2: Use runcmd Instead of write_files
- Replaced write_files approach with runcmd shell commands
- Bypasses the cloud-init parsing issue entirely
- More reliable as it executes direct shell commands
- Includes automatic SSH config backup for safety

## Changes Made
- `roles/cloud-digitalocean/tasks/main.yml`: Added | string filter to user_data
- `files/cloud-init/base.yml`: Replaced write_files with runcmd approach
- Removed complex SSH detection/remediation workaround (no longer needed)

## Benefits
-  Fixes root cause instead of working around symptoms
-  Much simpler and more maintainable code
-  Backward compatible - no API changes required
-  Handles both potential failure modes (encoding + parsing)
-  All tests pass, linters clean

This should resolve DigitalOcean SSH timeout issues while being much
cleaner than the previous workaround approach.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix cloud-init header format for DigitalOcean compatibility

The space in '# cloud-config' (introduced in PR #14775) breaks cloud-init
YAML parsing on DigitalOcean, causing SSH configuration to be skipped.

Cloud-init documentation requires '#cloud-config' without a space.

Fixes #14800

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert to write_files approach for SSH configuration

Using write_files is more maintainable and Ansible-native than runcmd.
The root cause was the cloud-config header format, not write_files itself.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix Ansible deprecation and variable warnings

- Replace deprecated network filters with ansible.utils equivalents:
  - ipaddr → ansible.utils.ipaddr
  - ipmath → ansible.utils.ipmath
  - ipv4 → ansible.utils.ipv4
  - ipv6 → ansible.utils.ipv6
  - next_nth_usable → ansible.utils.next_nth_usable

- Fix reserved variable name: no_log → algo_no_log

- Fix SSH user groups warning by explicitly specifying groups parameter

Addresses deprecation warnings that would become errors after 2024-01-01.
All linter checks pass with only cosmetic warnings remaining.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add comprehensive protection for cloud-config header format

- Add inline documentation explaining critical #cloud-config format requirement
- Exclude files/cloud-init/ from yamllint and ansible-lint to prevent automatic 'fixes'
- Create detailed README.md documenting the issue and protection measures
- Reference GitHub issue #14800 for future maintainers

This prevents regression of the critical cloud-init header format that
causes deployment failures when changed from '#cloud-config' to '# cloud-config'.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add test for cloud-init header format to prevent regression

This test ensures the cloud-init header remains exactly ''#cloud-config''
without a space. The regression in PR #14775 that added a space broke
DigitalOcean deployments by causing cloud-init YAML parsing to fail,
resulting in SSH timeouts on port 4160.

Co-authored-by: Dan Guido <dguido@users.noreply.github.com>

* Refactor SSH config template and fix MOTD task permissions

- Use dedicated sshd_config template instead of inline content
- Add explicit become: true to MOTD task to fix permissions warning

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix no_log variable references after renaming to algo_no_log

Update all remaining references from old 'no_log' variable to 'algo_no_log'
in WireGuard, SSH tunneling, and StrongSwan roles. This fixes deployment
failures caused by undefined variable references.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Correct YAML indentation in cloud-init template for DigitalOcean

The indent filter was not indenting the first line of the sshd_config content,
causing invalid YAML structure that cloud-init couldn't parse. This resulted
in SSH timeouts during deployment as the port was never changed from 22 to 4160.

- Add first=True parameter to indent filter to ensure all lines are indented
- Remove extra indentation in base template to prevent double-indentation
- Add comprehensive test suite to validate template rendering and prevent regressions

Fixes deployment failures where cloud-init would show:
"Invalid format at line X: expected <block end>, but found '<scalar>'"

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Dan Guido <dguido@users.noreply.github.com>
2025-08-03 14:25:47 -04:00
Dan Guido
640249ae59
fix: Fix shellcheck POSIX sh issue and make ansible-lint stricter (#14789)
* fix: Remove POSIX-incompatible 'local' keyword from install.sh

The install.sh script uses #\!/usr/bin/env sh (POSIX shell) but was using
the 'local' keyword in the tryGetMetadata function, which is a bash-specific
feature. This caused shellcheck to fail with SC3043 warnings in CI.

Fixed by removing 'local' keywords from variable declarations in the
tryGetMetadata function. The variables are still function-scoped in practice
since they're assigned at the beginning of the function.

This resolves the CI failure introduced in PR #14788 (run #919).

* ci: Make ansible-lint stricter and fix basic issues

- Remove || true from ansible-lint CI job to enforce linting
- Enable name[play] rule - all plays should be named
- Enable yaml[new-line-at-end-of-file] rule
- Move name[missing] from skip_list to warn_list (first step)
- Add names to plays in main.yml and users.yml
- Document future linting improvements in comments

This makes the CI stricter while fixing the easy issues first.
More comprehensive fixes for the 113 name[missing] warnings can
be addressed in future PRs.

* fix: Add name[missing] to skip_list temporarily

The ansible-lint CI is failing because name[missing] was not properly
added to skip_list. This causes 113 name[missing] errors to fail the CI.

Adding it to skip_list for now to fix the CI. The rule can be moved to
warn_list and eventually enabled once all tasks are properly named in
future PRs.

* fix: Fix ansible-lint critical errors

- Fix schema[tasks] error in roles/local/tasks/prompts.yml by removing with_items loop
- Add missing newline at end of requirements.yml
- Replace ignore_errors with failed_when in reboot task
- Add pipefail to shell command with pipes in strongswan openssl task

These fixes address all critical ansible-lint errors that were causing CI failures.
2025-08-03 07:04:04 -04:00
dasmart
17881b2d2a
make sure cron is installed on ubuntu. #14568 (#14640) 2023-09-27 17:56:28 +03:00
Jack Ivanov
347f864abb
Ansible upgrade 6.1 (#14500)
* linting

* update ansible

* linters
2022-07-30 15:01:24 +03:00
Christian Clauss
571daf4464
Fix typos discovered by codespell (#14325) 2021-12-14 00:30:09 +03:00
dependabot[bot]
4e739b518f
Bump ansible from 2.9.20 to 4.4.0 (#14272)
* Bump ansible from 2.9.20 to 4.4.0

Bumps [ansible](https://github.com/ansible/ansible) from 2.9.20 to 4.4.0.
- [Release notes](https://github.com/ansible/ansible/releases)
- [Commits](https://github.com/ansible/ansible/commits)

---
updated-dependencies:
- dependency-name: ansible
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* ansible core

* aadd vagrant and fix jinja

* bool variable fix

* ec2 task deprecation

* bool fix

* azure requirements fix

* cloudscale fix

* scaleway fix

* openstack fixes

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jack Ivanov <e601809@gmail.com>
Co-authored-by: Jack Ivanov <17044561+jackivanov@users.noreply.github.com>
2021-10-31 12:58:35 +03:00
Jack Ivanov
ebec20ed36
Multiple Azure fixes (#1908)
* Multiple Azure fixes

* back to azure daily
2020-10-31 22:40:09 +03:00
Saravanan Palanisamy
02fe2f7dd5
use ca_password from variable(--extra-vars) - non-interactive installation using ansible playbook (#1774)
* use ca_password from variable

* add tests to cover the changes

* update tests - PR #1774
2020-04-25 19:32:16 +03:00
Jack Ivanov
28d95eace2
Update main.yml (#1727) 2020-02-18 16:20:27 +01:00
Jack Ivanov
dcfed41ae8 Apply netplan for digitalocean only (#1723) 2020-02-10 11:01:20 +01:00
Jack Ivanov
2abbf22196
Alternative Ingress IP (#1605)
* Separate ingress IP draft

* task name fix

* placeholder
2020-01-31 11:24:29 +01:00
Jack Ivanov
8bdd99c05d Refactor to support Ansible 2.8 (#1549)
* bump ansible to 2.8.3

* DigitalOcean: move to the latest modules

* Add Hetzner Cloud

* Scaleway and Lightsail fixes

* lint missing roles

* Update roles/cloud-hetzner/tasks/main.yml

Add api_token

Co-Authored-By: phaer <phaer@phaer.org>

* Update roles/cloud-hetzner/tasks/main.yml

Add api_token

Co-Authored-By: phaer <phaer@phaer.org>

* Try to run apt until succeeded

* Scaleway modules upgrade

* GCP: Refactoring, remove deprecated modules

* Doc updates (#1552)

* Update README.md

Adding links and mentions of Exoscale aka CloudStack and Hetzner Cloud.

* Update index.md

Add the Hetzner Cloud to the docs index

* Remove link to Win 10 IPsec instructions

* Delete client-windows.md

Unnecessary since the deprecation of IPsec for Win10.

* Update deploy-from-ansible.md

Added sections and required variables for CloudStack and Hetzner Cloud.

* Update deploy-from-ansible.md

Added sections for CloudStack and Hetzner, added req variables and examples, mentioned environment variables, and added links to the provider role section.

* Update deploy-from-ansible.md

Cosmetic changes to links, fix typo.

* Update GCE variables

* Update deploy-from-script-or-cloud-init-to-localhost.md

Fix a finer point, and make variables list more readable.

* update azure requirements

* Python3 draft

* set LANG=c to the p12 password generation task

* Update README

* Install cloud requirements to the existing venv

* FreeBSD fix

* env->.env fixes

* lightsail_region_facts fix

* yaml syntax fix

* Update README for Python 3 (#1564)

* Update README for Python 3

* Remove tabs and tweak instructions

* Remove cosmetic command indentation

* Update README.md

* Update README for Python 3 (#1565)

* DO fix for "found unpermitted parameters: id"

* Verify Python version

* Remove ubuntu 16.04 from readme

* Revert back DigitalOcean module

* Update deploy-from-script-or-cloud-init-to-localhost.md

* env to .env
2019-09-28 08:10:20 +08:00
Squirrel
1ca8ee5554 Generates a password by native module (#1576)
* use password module to generate password

* fix variable reference

* reduce character set to meet origin design

*  CA and p12 password chanes

- Move the CA_password generation task to the native lookup plugin
- Get rid of unneeded tasks
2019-09-06 10:55:57 +02:00
TC1977
8462f0fb6c Unattended upgrade fixes (#1485)
* Keep custom dnscrypt-proxy conffile when upgrading

* Unattended upgrade tuning
- Upgrade the 50unattended-upgrades file with latest options
- Keep the common unattended upgrade options in one file
- Enable removing of unused kernels and dependencies to save some space
2019-06-24 10:23:34 +02:00
Anton Strogonoff
368ebc8625 fix: Use wait_for_connection to avoid failure (#1381)
With preexisting wait_for implementation, deployment to Ubuntu on Lightsail failed with a connection reset error on this task. It appears that Ansible’s wait_for_connection is the recommended way. I have successfully gotten past this task after this change, however I’d appreciate more eyes on this.
2019-05-17 16:04:13 +02:00
Jack Ivanov
5904546a48
Randomly generated IP address for the local dns resolver (#1429)
* generate service IPs dynamically

* update cloud-init tests

* exclude ipsec and wireguard ranges from the random service ip

* Update docs

* @davidemyers: update wireguard docs for linux

* Move to netaddr filter

* AllowedIPs fix

* WireGuard IPs fix
2019-05-17 14:49:29 +02:00
Jack Ivanov
25513cf925 Refactoring, Linting and additional tests (#1397)
* Refactoring, Linting and additional tests

* Vultr: Undefined variable and deprecation notes fix

* Travis-CI enable linters

* Azure: Update python requirements

* Update main.yml

* Update install.sh

* Add missing roles to ansible-lint

* Linting for skipped roles

* add .ansible-lint config
2019-04-26 11:48:28 -04:00
Jack Ivanov
c4ea88000b Refactoring to support roles inclusion (#1365) 2019-04-08 16:20:34 -04:00
Jack Ivanov
84bbc0e22c
Update ubuntu.yml (#1383) 2019-04-02 13:21:45 +03:00
Jack Ivanov
273c7665d3 Refactoring (#1334)
<!--- Provide a general summary of your changes in the Title above -->

## Description
Renames the vpn role to strongswan, and split up the variables to support 2 separate VPNs. Closes #1330 and closes #1162
Configures Ansible to use python3 on the server side. Closes #1024 
Removes unneeded playbooks, reorganises a lot of variables
Reorganises the `config` folder. Closes #1330
<details><summary>Here is how the config directory looks like now</summary>
<p>

```
configs/X.X.X.X/
|-- ipsec
|   |-- apple
|   |   |-- desktop.mobileconfig
|   |   |-- laptop.mobileconfig
|   |   `-- phone.mobileconfig
|   |-- manual
|   |   |-- cacert.pem
|   |   |-- desktop.p12
|   |   |-- desktop.ssh.pem
|   |   |-- ipsec_desktop.conf
|   |   |-- ipsec_desktop.secrets
|   |   |-- ipsec_laptop.conf
|   |   |-- ipsec_laptop.secrets
|   |   |-- ipsec_phone.conf
|   |   |-- ipsec_phone.secrets
|   |   |-- laptop.p12
|   |   |-- laptop.ssh.pem
|   |   |-- phone.p12
|   |   `-- phone.ssh.pem
|   `-- windows
|       |-- desktop.ps1
|       |-- laptop.ps1
|       `-- phone.ps1
|-- ssh-tunnel
|   |-- desktop.pem
|   |-- desktop.pub
|   |-- laptop.pem
|   |-- laptop.pub
|   |-- phone.pem
|   |-- phone.pub
|   `-- ssh_config
`-- wireguard
    |-- desktop.conf
    |-- desktop.png
    |-- laptop.conf
    |-- laptop.png
    |-- phone.conf
    `-- phone.png
```

![finder](https://i.imgur.com/FtOmKO0.png)

</p>
</details>

## Motivation and Context
This refactoring is focused to aim to the 1.0 release

## How Has This Been Tested?
Deployed to several cloud providers with various options enabled and disabled

## Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [x] Refactoring

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [x] I have read the **CONTRIBUTING** document.
- [x] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
- [x] All new and existing tests passed.
2019-03-10 13:16:34 -04:00
Demian
5e5424df69 fix OS is undefined error (#1335) 2019-02-26 12:19:34 +01:00
Luvpreet Singh
6233642c66 fix(update-users): changed generate p12 password task (#1289)
Changed task's module to generic python format for python2 and python3.
2019-01-25 16:36:44 -05:00
Jack Ivanov
7a6daff1ff IPv6 fix (#1302) 2019-01-18 23:39:08 -05:00
David Myers
5981bb9cad Replace 'max_mss' with 'reduce_mtu' (#1253) 2018-12-20 09:21:04 -05:00
Jack Ivanov
955a986c21
IPv6 forwarding fixes (#1256) 2018-12-18 13:59:25 +01:00
Federico G. Schwindt
a4f2c97fd2 Fix ipv4 address missing on reboot (#1245) 2018-12-10 06:57:15 +01:00
Jack Ivanov
45b00ee994
BSD StrongSwan fixes (#1207) 2018-11-20 19:20:24 +01:00
Jack Ivanov
dbd68aa97d WireGuard BSD (#1083)
* WireGuard BSD

* Remove unneeded config option

* Enable PersistentKeepalive for NAT and Firewall Traversal Persistence

* Install dnscrypt-proxy from repositories
2018-09-27 04:18:12 -04:00
Jack Ivanov
eb2224cde1
install generic linux headers (#1124) 2018-09-21 20:05:11 +03:00
David Myers
d95df710a5 Add an unattended reboot option (#1082) 2018-09-02 15:26:06 -04:00
Jack Ivanov
e8947f318b Large refactor to support Ansible 2.5 (#976)
* Refactoring, booleans declaration and update users fix

* Make server_name more FQDN compatible

* Rename variables

* Define the default value for store_cakey

* Skip a prompt about the SSH user if deploying to localhost

* Disable reboot for non-cloud deployments

* Enable EC2 volume encryption by default

* Add default server value (localhost) for the local installation

Delete empty files

* Add default region to aws_region_facts

* Update docs

* EC2 credentials fix

* Warnings fix

* Update deploy-from-ansible.md

* Fix a typo

* Remove lightsail from the docs

* Disable EC2 encryption by default

* rename droplet to server

* Disable dependencies

* Disable tls_cipher_suite

* Convert wifi-exclude to a string. Update-users fix

* SSH access congrats fix

* 16.04 > 18.04

* Dont ask for the credentials if specified in the environment vars

* GCE server name fix
2018-08-27 10:05:45 -04:00
Jack Ivanov
b061df6631
Move DNSCrypt proxy fallback_resolver to systemd resolved (#1011) 2018-06-26 13:11:09 +03:00
Jack Ivanov
aee043977f explicit installation of linux headers (#975) 2018-05-29 21:43:06 -07:00
Jack Ivanov
d56f50180b Extra line and better DNS configuration for WireGuard (#968)
- Adds an extra line after the if statement. Jinja2 trims such blocks by default in Ansible. Fixes #965
- More appropriate way to configure DNS servers
- Removes `DNS` option from the wireguard server config
- Fixes dnscrypt-proxy restart
2018-05-25 10:37:13 -07:00
Jack Ivanov
3488e660ad Add WireGuard support for Android (#910)
* WireGuard Implementation

* Update client-android.md

* Update README.md

* WireGuard unattended upgrades

* Update README.md

* reload-module-on-update and syntax fix

* SaveConfig to true

* Azure firewall. Fixes #962

* Update README.md

* Update client-android.md
2018-05-24 08:15:27 -07:00
Jack Ivanov
d27b849f24 Ubuntu1804 (#925)
- Fixes #897 #944 #956

Work in progress. Lightsail is not ready for Ubuntu 18.04 yet

- [x] DigitalOcean
~~- [ ] Amazon Lightsail~~
- [x] Amazon EC2
- [x] Microsoft Azure
- [x] Google Compute Engine
- [x] Scaleway
- [x] OpenStack (DreamCompute optimised)
2018-05-24 07:08:14 -07:00
Jack Ivanov
c82bd8c5ff DNS-over-HTTPS (#875) 2018-04-25 12:27:58 -07:00
Jack Ivanov
02427910de Ansible 2.4, Lightsail, Scaleway, DreamCompute (OpenStack) integration (#804)
* Move to ansible-2.4.3

* Add Lightsail support #623

* Fixing the EC2 deployment

* Scaleway integration #623

* OpenStack cloud provider (DreamCompute optimised) #623

* Remove the security role

* Enable unattended-upgrades for clouds

* New requirements to make Azure and GCE work
2018-03-02 07:55:54 -05:00
Jack Ivanov
4da752b603 Ubuntu 17.10 support (#811) 2018-02-24 14:17:34 +01:00
Jack Ivanov
a844870b7a Sendmail should not be installed (#738) 2017-11-22 09:15:43 -05:00
Jack Ivanov
bd348af9c2 Implementing blocks and additional fail hints #487 (#497)
change the troubleshooting url
2017-04-29 10:48:25 -04:00
Jack Ivanov
6e61a51aca rewrite the sysctl task 2017-04-04 17:02:11 +02:00
Jack Ivanov
c0f4b5fa41 Enable default values if the role is skipped #313 2017-04-04 16:57:39 +02:00
Jack Ivanov
6facb6cb4f FreeBSD / HardenedBSD (#262)
* FreeBSD draft

ifconfig fix

Pre-tasks fixes

fix hardcoded IP

some refactoring

disable system-based tags

disable freebsd tags

FreeBSD vpn role

add defaults

ssh role freebsd

default fix

dns_adblocking freebsd

ubuntu dict fix

* HardenedBSD

update-users BSD

* Rebuild the kernel

docs changing
2017-03-18 12:22:07 +03:00
Jack Ivanov
2798f84d3f ensure that apparmor is supported by the kernel #215 2017-01-16 00:19:57 +03:00
Jack Ivanov
a50a396b94 addtiional fixes 2017-01-11 20:55:44 +03:00
Jack Ivanov
03c805cb87 reorganize the wait_for functions #159 2016-12-13 21:58:45 +03:00
Kevin Cernekee
433389c0ab Use /var/run/reboot-required to determine if a restart is needed
The current check only looks to see if a new kernel was installed.
2016-11-06 09:45:39 -08:00
Kevin Cernekee
09bbc4058c Add missing tags in common playbook
If the common playbook is invoked with the "cloud" tag, non-cloud
tasks will be skipped.  On GCE this causes "Install tools" to be skipped,
apparmor-utils is not installed, and then the "Enforcing ipsec with
apparmor" step fails.
2016-11-06 09:45:34 -08:00
Jack Ivanov
d052cb8e77 skip-tags added. Fixed #121 2016-10-28 21:00:11 +03:00