Error Handling and Debugging

Real-world automation must handle failures gracefully. Ansible provides several mechanisms for error handling, custom failure conditions, and debugging tools to help you troubleshoot issues.

Default Error Behaviour

By default, when a task fails on a host:

Ansible stops executing further tasks on that host
Other hosts continue (unless any_errors_fatal: true is set)
Handlers that were notified are not run for the failed host

ignore_errors

Continue execution even when a task fails:

tasks:
  - name: Check if legacy app exists
    command: which legacy-app
    register: legacy_check
    ignore_errors: true

  - name: Report legacy app status
    debug:
      msg: "Legacy app {{ 'found' if legacy_check.rc == 0 else 'not found' }}"

Warning: Use ignore_errors sparingly --- it suppresses all errors, which can mask real problems. Prefer failed_when for fine-grained control.

ignore_unreachable

Continue when a host is unreachable (network issues, SSH failures):

tasks:
  - name: Attempt to gather info
    command: hostname
    ignore_unreachable: true

  - name: This still runs even if host was unreachable
    debug:
      msg: "Continuing..."

failed_when

Define custom failure conditions:

tasks:
  - name: Run database migration
    command: /opt/app/migrate.sh
    register: migration_result
    failed_when:
      - migration_result.rc != 0
      - "'already up to date' not in migration_result.stdout"

  - name: Check disk space
    command: df -h /data
    register: disk_check
    failed_when: "'100%' in disk_check.stdout"

This lets you treat a non-zero return code as a success when the output contains an expected message.

changed_when

Control when Ansible reports a task as "changed":

tasks:
  - name: Check application version
    command: /opt/app/version.sh
    register: version_check
    changed_when: false     # This task never changes anything

  - name: Run database migration
    command: /opt/app/migrate.sh
    register: migration
    changed_when: "'migrated' in migration.stdout"

This is important for accurate reporting and for triggering (or not triggering) handlers.

block / rescue / always

Ansible's equivalent of try/catch/finally:

tasks:
  - name: Handle deployment with error recovery
    block:
      - name: Deploy new version
        command: /opt/app/deploy.sh

      - name: Run smoke tests
        command: /opt/app/smoke-test.sh

    rescue:
      - name: Rollback to previous version
        command: /opt/app/rollback.sh

      - name: Send failure notification
        mail:
          to: ops@example.com
          subject: "Deployment failed on {{ inventory_hostname }}"
          body: "Automatic rollback was executed."

    always:
      - name: Clean up temporary files
        file:
          path: /tmp/deploy-artifacts
          state: absent

      - name: Log deployment attempt
        lineinfile:
          path: /var/log/deployments.log
          line: "{{ ansible_date_time.iso8601 }} - Deployment attempted on {{ inventory_hostname }}"
          create: true

Section	When It Runs
block	Always runs first (the main tasks)
rescue	Only runs if a task in `block` fails
always	Always runs, regardless of success or failure

any_errors_fatal

Stop the entire play immediately when any host fails:

---
- name: Critical deployment
  hosts: webservers
  any_errors_fatal: true

Error Handling and Debugging

Error Handling and Debugging

Default Error Behaviour

ignore_errors

ignore_unreachable

failed_when

changed_when

block / rescue / always

any_errors_fatal

More in DevOps