15 Commits

Author SHA1 Message Date
sam
fd71d5b82c Remove stuck portainer_agent container before redeploying
If the container exists but is in a Restarting state (e.g. due to a
stale AGENT_HOST env var), remove it so the deploy task creates a
fresh container with the correct config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:30:11 -07:00
sam
9fa819a10d Remove AGENT_HOST: bind to 0.0.0.0, not the host IP
Setting AGENT_HOST to the host's real IP (e.g. 10.40.40.3) causes the agent
to try binding to that specific address inside the container, which fails with
'cannot assign requested address' because the container only has a Docker
bridge interface.

Without AGENT_HOST the agent binds to 0.0.0.0:9001 and Docker's port mapping
(-p 9001:9001) forwards traffic correctly. The TLSSkipVerify on the Portainer
registration already handles the bridge-IP cert mismatch.

Fixes: portainer_agent restart loop on snap-based Docker hosts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:26:33 -07:00
sam
30c28fd200 Fix diagnose_agent: avoid Jinja2 template collision in shell arg 2026-03-01 01:24:18 -07:00
sam
5c43952de4 Add agent diagnostic script 2026-03-01 01:23:41 -07:00
sam
dc1efc5ae0 Skip agent deployment on Portainer host (ubuntu-server-01)
ubuntu-server-01 (10.40.40.2) runs Portainer itself and is already
managed via local Docker socket (Portainer endpoint ID=3). Deploying
a Portainer Agent there is redundant and port 9001 binding fails.

Add portainer_skip_agent: true flag to the inventory and check it in
both Play 2 (deploy agent) and Play 3 (register endpoint) to exclude
the host from agent-based enrollment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:16:52 -07:00
sam
d2cf626bee Fix TLS cert mismatch and constrained-Docker volume failures
- Remove /var/lib/docker/volumes mount (fails on nested Docker hosts)
- Add AGENT_HOST env var so agent cert is valid for host's real IP
- Add TLSSkipVerify/TLSSkipClientVerify to Portainer endpoint registration
  to handle existing agents with bridge-IP certs
- Remove final delegate_to: localhost (wait_for now runs on remote host)
- Add ignore_errors: true to agent deploy and enrollment tasks
- Guard existing_endpoints.json with | default([]) for failed API calls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:12:27 -07:00
sam
6db20117fd Eliminate localhost tasks to fix sudo issue on Semaphore runner
- Play 3: Run Portainer API calls from remote hosts directly (no
  delegate_to: localhost). Add validate_certs: false for self-signed cert.
- Play 4: Replace localhost file report with debug output using run_once.
  No filesystem writes = no privilege escalation needed on the runner.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:06:17 -07:00
sam
b8dde7f2ca Remove ansible_become from inventory to fix delegate_to: localhost inheritance
ansible_become: true in host inventory vars leaks into delegate_to: localhost
tasks in Ansible 2.18, causing those tasks to try sudo on the Semaphore
runner (which has no sudo). Instead, become: true is set at the play level
in the playbook where needed, which does NOT propagate to delegated tasks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:02:27 -07:00
sam
9c56789951 Add diagnostic script to check Semaphore runner ansible config 2026-03-01 00:59:52 -07:00
sam
1029cccc11 Add host_vars/localhost to fix sudo on Semaphore runner
delegate_to: localhost tasks inherit the Semaphore host's system ansible.cfg
which has become=True. An explicit localhost inventory entry with
ansible_become: false overrides this at inventory precedence level.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 00:53:23 -07:00
sam
af320f2341 Fix become passwords, stale children groups, and localhost sudo
- Add ansible_become_pass to all hosts (sudo uses same password as SSH)
- Remove truenas-scale and vyos from children groups (no connection info)
- Add ansible.cfg: host_key_checking=False, become=False as default
- Add become: false to wait_for_connection to avoid sudo during SSH test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 00:47:27 -07:00
sam
00c3288452 Fix become on localhost tasks and update inventory path comment
- Add become: false to Play 4 (report) to prevent sudo on Semaphore host
- Add become: false to all delegate_to: localhost tasks in Plays 2 & 3
- Update usage comment to reflect correct inventory path (inventory/hosts.yml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 00:44:21 -07:00
sam
2a507cec7d Add targeted 6-host inventory for first Docker scan 2026-03-01 00:36:03 -07:00
sam
24f220c6ad Add per-host credential support and refresh inventory
- host_credentials.yml.example: template for per-device SSH creds,
  matched by IP, subnet CIDR, or global default (actual file is gitignored)
- inventory/hosts.yml: refreshed with 162 hosts (31 NetBox + 135 UniFi)
- .gitignore: exclude host_credentials.yml and run reports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 22:34:31 -07:00
sam
017a3a00ee Initial commit: playbooks and inventory for Semaphore automation
- find_docker_enroll_portainer.yml: discover Docker hosts across all VLANs,
  deploy Portainer Agent, register in Portainer, write discovery report
- inventory/hosts.yml: auto-generated from NetBox (31 hosts) + UniFi clients
  (135 unmanaged hosts not in NetBox) across vlan1/vlan40/vlan20

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 22:27:58 -07:00