Part 6: Preventing Future Occurrences

The asm health checker found 1 new failures message is often a symptom of deeper operational drift. Implement these best practices:

2. Permission or Ownership Mismatch

ASM disks must be owned by grid:asmadmin (or oracle:asmadmin in older setups) with 660 permissions. If udev rules or ASMLIB configuration changes, the checker flags a failure.

4. Mismatched Disk Group Compatibility

If compatible.asm, compatible.rdbms, or compatible.advm values are set incorrectly relative to the GI version, the health checker will report advisories as failures.

Short checklist to include in runbooks

Check name & ID

Exact error message and timestamp

Service/process status

Dependency connectivity

Resource usage snapshot

Recent deploys/config changes

Reproduce and re-run check

Log excerpts attached to ticket

Mitigation steps and verification

If you want, I can:

Draft a runnable runbook/playbook specific to a named component (give the component name and environment).

Help write an incident report template pre-filled with example fields.

Suggest automated remediation scripts tailored to a specific failure type.

(Invoking related search terms tool for People/Places/Shopping/etc. is not needed here.)

[ASM Health Checker] 🚨 1 new failure detected • Failure: TLS certificate expiry < 7 days • Component: asm-gateway • First seen: 2026-04-12 10:15 UTC • Severity: HIGH • Previous status: pass → current: fail

Suggested action: Run asm health fix --check tls_expiry

Asm Health Checker Found 1 New Failures =link=

ASM Health Checker Found 1 New Failure: What It Means and How to Resolve It

The Automatic Storage Management (ASM) health checker is a crucial tool in Oracle databases that monitors the health and integrity of the storage infrastructure. When the ASM health checker reports a new failure, it's essential to understand the implications and take corrective actions to prevent data loss or system downtime. In this blog post, we'll discuss what an ASM health checker failure means, how to investigate the issue, and steps to resolve it.

What does an ASM health checker failure mean?

When the ASM health checker detects a problem, it logs an error message indicating that a failure has been detected. The message may look like this:

"ASM health checker found 1 new failure"

This message indicates that the ASM health checker has detected a single failure in the storage system. The failure could be related to various issues, such as:

Disk errors or corruption
Connectivity problems between the database server and storage
Insufficient disk space or quota issues
ASM configuration errors

Investigating the ASM health checker failure

To investigate the failure, follow these steps:

Check the ASM alert log: The ASM alert log provides detailed information about the failure, including the error message, timestamp, and affected disk group. You can find the alert log in the $ORACLE_BASE/diag/asm/+ASM/<instance_name>/trace directory.
Run the asmcmd command: The asmcmd command-line tool provides a comprehensive view of the ASM configuration and status. Run asmcmd with the lsdg option to list the disk groups and their status: asmcmd ls dg
Check the disk group status: Use the asmcmd command with the dg option to check the status of the affected disk group: asmcmd dg <disk_group_name>

Resolving the ASM health checker failure

Once you've identified the root cause of the failure, take corrective actions to resolve the issue:

Replace a failed disk: If the failure is due to a disk error, replace the disk and re-add it to the ASM disk group.
Check and correct connectivity: Verify that the storage connections are stable and functioning correctly.
Free up disk space: If the failure is due to insufficient disk space, free up space by deleting unnecessary files or expanding the disk group.
Reconfigure ASM: If the failure is due to an ASM configuration error, reconfigure ASM with the correct settings.

Best practices to prevent ASM health checker failures

To minimize the likelihood of ASM health checker failures:

Regularly monitor ASM alerts: Regularly check the ASM alert log and respond promptly to any errors or warnings.
Perform routine maintenance: Regularly perform routine maintenance tasks, such as checking disk space and replacing failed disks.
Test and validate ASM configurations: Test and validate ASM configurations to ensure they are correct and optimal.

By understanding the causes of ASM health checker failures and taking proactive steps to prevent them, you can ensure the reliability and performance of your Oracle database storage infrastructure.

The message " asm health checker found 1 new failures typically appears in environments using Oracle Automatic Storage Management (ASM) when an automated health check tool (like Oracle ORAchk Oracle EXAchk

) identifies a configuration issue or a hardware fault that doesn't match the established "best practices" or previous healthy state What This Usually Means

When this alert is triggered, it indicates that a recent scan has detected a deviation in your ASM environment. Common causes for a single new failure include: Disk Path Issues

: A single disk path has become unavailable, even if the disk is still accessible via a redundant path. Disk Group Redundancy

: One of the disks in a "Normal Redundancy" disk group has failed, putting the group in a "degraded" state. Parameter Mismatches : An ASM instance parameter (like ASM_POWER_LIMIT

) has been changed and no longer aligns with recommended settings. Offline Disks

: A disk has been taken offline due to I/O errors but has not yet been dropped from the disk group. Oracle Forums Recommended Steps to Investigate Check the Health Check Report : The tool that generated this message (likely

) will have created an HTML report. Locate this report to see the specific and description of the failure. Verify ASM Disk Status utility to check the status of your disks and disk groups: asmcmd lsdsk -t asmcmd lsdg Use code with caution. Copied to clipboard Look for disks with a status of Inspect the ASM Alert Log

: Review the ASM alert log file (usually found in the ADR home) for specific ORA- errors or messages about disk evictions. Validate Path Visibility

: Ensure the OS can still see all physical devices associated with the ASM disks. Oracle Help Center For more detailed troubleshooting, you can refer to the Oracle Automatic Storage Management documentation or check for tool-specific errors on the Oracle Support portal ASMCMD commands to check for disk redundancy or rebalance status?

Subject: [ALERT] ASM Health Checker Detected 1 New Failure - Immediate Investigation Required

Next Steps

Please acknowledge this alert in the monitoring dashboard. If the issue is resolved, update the ticket with the root cause analysis (RCA).

Note: If "ASM" in your context refers to Oracle Automatic Storage Management, the focus of this write-up should shift immediately to Disk Group redundancy, ASM instance connectivity, and I/O latency checks.

Troubleshooting Guide: ASM Health Checker Found 1 New Failure

If you are managing an Oracle database environment and receive the alert "ASM Health Checker found 1 new failure," it’s time to pay attention. While Oracle Automatic Storage Management (ASM) is robust, this specific notification indicates that the internal diagnostic framework has detected an issue that could potentially impact disk group availability or performance.

Here is a comprehensive breakdown of what this error means, how to diagnose it, and the steps to resolve it. 1. Understanding the ASM Health Checker (CHMA) asm health checker found 1 new failures

The ASM Health Checker is part of the Oracle Check Framework. It runs periodic checks on the ASM instance, disk groups, and metadata to ensure everything is operating within healthy parameters.

When it reports a "new failure," it means a specific "check" (such as disk connectivity, metadata consistency, or space usage) has moved from a PASS to a FAIL state. 2. Immediate Step: Identify the Failure

The alert itself is generic. To find out what actually failed, you need to query the ASM instance. Run this SQL command in your ASM instance:

SELECT check_name, failure_pri, status, repair_script FROM v$asm_healthcheck_status WHERE status = 'FAILED'; Use code with caution. Common culprits include:

Disk Offline: One or more disks in a disk group are no longer accessible.

Metadata Corruption: Inconsistencies in the ASM metadata (e.g., File Directory or Disk Directory).

Space Issues: A disk group is nearing 100% capacity, risking an instance crash.

Stale Quorum: Issues with voting files in a CRS/Grid Infrastructure environment. 3. Deep Dive into the Logs

To get the granular details, look at the ASM Alert Log. You can usually find this in your Oracle Base directory:$ORACLE_BASE/diag/asm/+asm/+asm1/trace/alert_+asm1.log

Search for the timestamp of the alert. You will often see a corresponding ORA- error code (like ORA-15078 or ORA-15032) that provides the exact technical reason for the health check failure. 4. How to Resolve the Failure Scenario A: Disk Connectivity Issues

If the health checker found a disk failure, check the OS-level connectivity. Command: lsdsk (within ASMCMD) or fdisk -l (Linux).

Fix: If a disk is "OFFLINE," try to online it using:ALTER DISKGROUP ONLINE DISK ; Scenario B: Metadata Inconsistency

If the health check indicates metadata issues, you may need to run a manual check on the disk group.

Action: Execute the CHECK command:ALTER DISKGROUP CHECK ALL;Note: This checks for consistency but does not fix errors. If errors are found, you may need to involve Oracle Support. Scenario C: Space Pressure

If the failure is related to "Insufficient Space," rebalance the disk group or add new disks immediately.

Action: Check free space:SELECT name, free_mb, total_mb, usable_file_mb FROM v$asm_diskgroup; 5. Clearing the Alert

Once you have fixed the underlying physical or logical issue, the Health Checker should automatically update during its next run. However, if the status remains "Failed" in the views, you can manually trigger a re-run of the health check or use ADRCI to purge the alert. Summary Checklist

Query v$asm_healthcheck_status to identify the specific check. Review the ASM Alert Log for specific ORA-error codes.

Verify Physical Disks at the OS level to ensure no hardware failure.

Check Disk Group Capacity to ensure you haven't hit a "disk full" state.

By catching these "1 new failures" early, you prevent minor disk hiccups from turning into major database outages.

The alert " ASM Health Checker found 1 new failures " is a critical notification typically found in Oracle Automatic Storage Management (ASM) alert logs. It indicates that the GMON (Group Monitor)

process has detected an issue—often a disk failure or a forced dismount—that requires immediate attention What This Alert Means

This message usually appears alongside other ORA- errors and signals that ASM has identified a problem with the storage layer. Common triggers include: Disk Failures

: A physical disk or a storage path (LUN) has become inaccessible. Forced Dismounts

: The diskgroup has been forced offline because it can no longer maintain its required redundancy (e.g., a disk failure in an EXTERNAL REDUNDANCY Metadata Corruption

: Corruption in the ASM metadata blocks, which can happen during intensive operations like rebalancing. Configuration Issues

: Problems during the addition of new disks or voting file refreshes. Immediate Troubleshooting Steps Check the ASM Alert Log : Locate the alert log for your ASM instance (often in /u01/app/oracle/diag/asm/.../trace/alert_+ASM.log ASM Health Checker Found 1 New Failure: What

). Look for the ORA- errors immediately preceding the "1 new failures" message to identify the specific disk or group affected. Verify Disk Status

: Run the following query in your ASM instance to check for offline or missing disks: name, group_number, path, state, header_status v$asm_disk; Use code with caution. Copied to clipboard Investigate the Incident : Oracle’s Fault Diagnosability Infrastructure

often generates an incident report when this occurs. Use the tool to view the incident details: show incident show tracefile (for the specific process like +ASM_rbal_xxxx.trc Monitor Rebalance/Repair : If a disk is just offline and you have redundancy, check the REPAIR_TIME

to see how long you have to fix the issue before ASM automatically drops the disk. Oracle Forums When to Take Urgent Action External Redundancy

: If your diskgroup uses external redundancy and a disk fails, the group will likely dismount immediately, potentially crashing your database. Intermediate States

: If your Clusterware (Grid Infrastructure) resources show an INTERMEDIATE

state after this alert, the diskgroup may be partially available but failing to fully mount. trace file associated with this failure?

When the ASM Health Checker reports "found 1 new failures," it usually indicates a critical disruption to the storage layer, often leading to a forced dismount of a disk group to prevent data corruption. This message is a summary alert that appears in the ASM Alert Log after a specific storage-related error has already occurred. Common Causes

Missing or Inaccessible Disks: The most frequent cause is that one or more disks in a group are no longer reachable due to hardware failure, storage connectivity issues, or OS-level changes.

Metadata Corruption: If ASM detects invalid block headers or internal inconsistencies in the metadata, it may trigger a failure and dismount the group.

Insufficient Quorum: In diskgroups with redundancy (Normal or High), if too many disks or a required "voting" disk (PST) become unavailable, the group cannot maintain a read quorum and will fail.

I/O Errors: Significant write failures or heartbeat timeouts to the PST (Physical Status Table) will prompt the health checker to record a new failure. Immediate Troubleshooting Steps 2 Automatic Storage Management - Oracle Help Center

ASM Health Checker Found 1 New Failure: What It Means and How to Resolve It

If you're a database administrator or a system administrator working with Oracle databases, you're likely familiar with the Automatic Storage Management (ASM) system. ASM is a storage management system that provides a simple and efficient way to manage storage for Oracle databases. One of the tools used to monitor and maintain ASM is the ASM Health Checker, which periodically checks the health of the ASM infrastructure and reports any issues or failures.

Recently, you may have encountered an alert or message indicating that the "ASM health checker found 1 new failure." This message can be concerning, especially if you're not familiar with what it means or how to resolve it. In this article, we'll explore what this message means, the possible causes, and step-by-step instructions on how to resolve the issue.

What Does the ASM Health Checker Do?

The ASM Health Checker is a background process that periodically checks the health of the ASM infrastructure. It monitors various aspects of ASM, including:

Disk availability and performance
Disk group configuration and status
ASM instance status and performance
I/O operations and errors

The ASM Health Checker runs automatically and reports any issues or failures it detects. The checker runs at regular intervals, which can be configured using the ASM_CHECK_INTERVAL parameter.

What Does "ASM Health Checker Found 1 New Failure" Mean?

When the ASM Health Checker detects a new failure, it reports the issue and provides information about the failure. The message "ASM health checker found 1 new failure" indicates that the checker has detected a problem with the ASM infrastructure that requires attention.

The failure can be related to various aspects of ASM, such as:

A disk failure or error
A disk group configuration issue
An ASM instance failure or performance issue
An I/O error or performance problem

Possible Causes of the Failure

There are several possible causes for the ASM Health Checker to report a new failure. Some common causes include:

Disk failure or error: A disk failure or error can occur due to hardware issues, such as a disk crash or a cable problem.
Disk group configuration issue: A disk group configuration issue can occur if there are problems with the disk group configuration, such as a missing disk or an incorrect disk group name.
ASM instance failure or performance issue: An ASM instance failure or performance issue can occur due to problems with the ASM instance, such as a lack of resources or a configuration issue.
I/O error or performance problem: An I/O error or performance problem can occur due to issues with the storage subsystem, such as a slow disk or a network problem.

How to Resolve the Issue

To resolve the issue, follow these step-by-step instructions:

Check the ASM alert log: The ASM alert log provides detailed information about the failure, including the error message and the time it occurred. You can find the ASM alert log in the $ORACLE_BASE/diag/asm/+ASM/trace directory.
Investigate the failure: Use the information from the ASM alert log to investigate the failure. Check the ASM disk groups, disks, and instances to identify any issues.
Run the ASM Health Checker manually: Run the ASM Health Checker manually to get more information about the failure. You can do this using the following command:

ALTER SESSION SET CONTAINER = '+ASM';
BEGIN
  DBMS ASMADM .check_health;
END;
/

This command will provide more detailed information about the failure.

Check the disk groups and disks: Check the disk groups and disks to ensure they are configured correctly and are online.

SELECT * FROM V$ASM_DISKGROUP;
SELECT * FROM V$ASM_DISK;

Check the ASM instance: Check the ASM instance to ensure it is running and configured correctly.

SELECT * FROM V$ASM_INSTANCE;

Perform corrective actions: Based on the investigation, perform corrective actions to resolve the issue. This may include:
- Replacing a failed disk
- Reconfiguring a disk group
- Restarting the ASM instance
- Correcting an I/O error or performance problem

Best Practices to Avoid Future Failures

To avoid future failures and ensure the health of your ASM infrastructure, follow these best practices: Disk errors or corruption Connectivity problems between the

Regularly monitor the ASM alert log: Regularly monitoring the ASM alert log can help you detect issues before they become major problems.
Run the ASM Health Checker regularly: Run the ASM Health Checker regularly to identify potential issues before they occur.
Configure disk groups and disks correctly: Ensure disk groups and disks are configured correctly and are online.
Monitor ASM instance performance: Monitor ASM instance performance to ensure it is running optimally.

By following these best practices and resolving the issue reported by the ASM Health Checker, you can ensure the health and performance of your ASM infrastructure and prevent future failures.

Troubleshooting Oracle ASM Health Checker Failures The message "ASM Health Checker found 1 new failures"

is a critical alert in Oracle Automatic Storage Management (ASM). It typically appears in the ASM alert log when the background health monitoring process detects a problem that could threaten disk group availability. Immediate Impact

When this error is triggered, it often coincides with other critical events: Disk Group Dismounting

: ASM may force a dismount of a disk group (e.g., ORA-15130) to prevent data corruption. Instance Reconfiguration

: A "Dirty detach reconfiguration" may start as the cluster tries to handle the failure. Database Downtime

: If the affected disk group contains critical files like the OCR, Voting files, or database data files, the associated Oracle instance or Clusterware may crash. Common Root Causes Lost Storage Connectivity

: One or more LUNs/disks became inaccessible due to hardware, cable, or storage controller issues. Write I/O Errors

: ASM takes disks offline if it cannot complete a write operation, which can lead to a disk group failure if redundancy is lost. Insufficient Redundancy

: In "External Redundancy" disk groups, the failure of even a single disk causes the entire group to fail. Disk Header Corruption

: Physical corruption of the disk header can prevent ASM from identifying the disk as a "MEMBER" of a group. Investigative Steps

To identify and resolve the specific failure, follow these steps: ASM Generic Archives | Helmut's RAC / JEE Blog

The error message "ASM Health Checker found 1 new failures" typically appears in the Oracle ASM alert logs when the system detects an issue with a disk or disk group

. This message indicates that a failure has been logged in the Automatic Storage Management (ASM) health check framework, often related to disk group dismounts, header corruption, or voting file issues. Oracle ASM Health Check Failure Report Report Field Description / Details Alert Message ASM Health Checker found 1 new failures System Component Oracle Automatic Storage Management (ASM) Detection Source ASM Alert Log (typically located at diag/asm/+asm//trace/alert_+asm.log Incident Status

(Requires immediate investigation to prevent data loss or service disruption) Potential Causes & Findings Disk Group Dismount

: A disk group may have been forced to dismount due to lost connectivity or multiple disk failures in a failure group. Disk Header Corruption

: The metadata (headers) on one or more ASM disks may be corrupted or in a "FORMER" or "PROVISIONED" status instead of "MEMBER". Voting File Issues

: If the ASM disk group hosts the Cluster Registry (OCR) or Voting Disks, a failure can cause node evictions or cluster instability. Storage Latency/I/O Timeouts

: The health checker may trigger a failure if it waits too long (e.g., >15 seconds) for I/O operations to complete on a specific disk. Oracle Forums Recommended Troubleshooting Steps

Subject: ASM Health Check Report – New Failures Detected

To: Database Administration Team / System Health Monitoring Group
Date: [Insert Date]
Priority: Medium

Immediate actions (first 10 minutes)

Don’t panic — gather context.
Check the Health Checker details/console immediately for:
- The specific check name and ID.
- Timestamp of the failure.
- Failure severity (critical/warning/info).
- Any short description or error code provided.
Look at recent system events (last 30–60 minutes):
- Service restarts
- Deployments/patches
- Scheduled jobs or cron tasks
- OS reboots or kernel messages
Check logs related to the failed check:
- Health checker logs
- Application server logs
- System logs (syslog/journalctl)
- Network or load balancer logs if relevant
Confirm whether the failure is still present by re-running the single failed check (if your health checker supports manual re-run).