Fault Diagnosability Infrastructure Overview
The fault diagnosability infrastructure aids in preventing, detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, metadata corruption, and customer data corruption. When a critical error occurs, it is assigned an incident number, and diagnostic data for the error (such as trace files) are immediately captured and tagged with this number. The data is then stored in the Automatic Diagnostic Repository (ADR)--a file-based repository outside the database--where it can later be retrieved by incident number and analyzed.
About Incidents and Problems
A problem is a critical error in a database instance, Oracle Automatic Storage Management (Oracle ASM) instance, or other Oracle product or component. Critical errors manifest as internal errors, such as ORA- 00600, or other severe errors, such as ORA-07445 (operating system exception) or ORA-04031 (out of memory in the shared pool). Problems are tracked in the ADR. Each problem has a problem key, which is a text string that describes the problem. It includes an error code (such as ORA 600) and in some cases, one or more error parameters. An incident is a single occurrence of a problem. When a problem (critical error) occurs multiple times, an incident is created for each occurrence. Incidents are timestamped and tracked in the Automatic Diagnostic Repository (ADR). Each incident is identified by a numeric incident ID, which is unique within the ADR.
When an incident occurs, the database Makes an entry in the alert log.
Sends an incident alert to Oracle Enterprise Manager (Enterprise Manager).
Gathers first-failure diagnostic data about the incident in the form of dump files (incident dumps).
Tags the incident dumps with the incident ID.
Stores the incident dumps in an ADR subdirectory created for that incident.