Troubleshoot API request issues

API metrics provide valuable insights into API consumption. They enable you to monitor, manage, and optimize API performance and usage. Another benefit of API metrics is error diagnosis to troubleshoot your APIs.

Leverage API metrics to effectively troubleshoot your APIs. Start by narrowing down the point where the error occurs. Determine if usage or execution abnormalities cause the issue. Then, you can correlate API metrics and API logs to assist in troubleshooting. To pinpoint the exact error, determine if it's related to a pipeline error, policy error, or Snaplex issue. For deeper troubleshooting analysis, investigate pipeline runtime issues, Snap configuration issues, and Snaplex metrics.

Troubleshooting flow

API metrics

API metrics dashboard displays charts that help you visualize various metrics for the APIs consumed. As an API developer, you can track and troubleshoot errors using these metrics. Correlate the metrics listed below to identify the exact error location and troubleshoot effectively.

Correlate the request, queue size and error percentage charts: To identify the error, start by narrowing down the timeframe using the Request chart. Then, use the Error Percentage chart to understand the issue, which could be due to ineffective API design, poor documentation, or malicious actors. If Ultra Tasks are involved and there's a spike in request wait time, analyze the Queue Size chart. Co-relating these metrics provides a high-level view of the error flow, which can be further refined with other metrics.
Correlate the request errors, target errors and latency spike: Request errors in the 4xx to 5xx range typically result from API policy violations or authentication failures, causing requests to be canceled. Target errors in the 4xx to 5xx range are often related to pipeline state or runtime conditions. By correlating latency spikes with target errors, you can determine whether the pipeline error occurs before or after execution. Similarly, correlating latency spikes with request errors helps clarify whether the issue is related to policy configuration, post-policy execution, or authentication.

Note: A one-hour time range is recommended for API metric analysis to help you quickly locate and troubleshoot errors. This interval allows you to analyze the aggregated number of requests processed per minute, providing a clearer view of spikes in API consumption.

API logs

Using the Retrieve API Management Logs, you can further narrow down errors related to the pipeline or policy by examining response body attributes such as request_status_code, target_status_code, auth_type, which help identify error types or authentication issues. The cc_id and invoker_snode attributes provide information about the Snaplex node, allowing you to identify the Triggered Task or Ultra Task that generated the log entry.

To locate the time of the error, check theprocessing_time, request_time, request_processing_end_timeattributes. For Ultra Tasks, you can gather additional time-frame details using therecv_time, start_timereply_time attributes from the response body.

Pipeline issues

Pipeline errors can occur for various reasons and at different stages of pipeline execution. Commonly, pipeline issues can fall into the following categories: runtime errors and Snap configuration errors.

Runtime error: Runtime errors occur during pipeline execution. These errors may result from incorrect pipeline configuration, invalid input data, resource or permission restrictions, timeouts, memory limits, or version conflicts. You can gain more insights into pipeline execution by referring to Retrieve info about a task.
Snap configuration error: Invalid Snap configuration results in errors. Snap configuration issues may arise due to invalid input data, permission restrictions, missing required fields, or inconsistent data schemas. External JAR files or library version mismatches can also cause execution failures. To troubleshoot these errors, refer to the Pipeline issues in Troubleshooting page.

Policy issues

Policy errors can occur due to incorrect policy configuration, authentication failures, request transformation issues, or request validation failures. These errors may arise at the API, API version, Project, or Org level. To troubleshoot policy issues effectively, consider the following categories:

Note: To avoid authentication errors with OAuth 2.0 policies, follow the best practices for implementing OpenID Connect (OIDC).

Policy configuration error: These errors occur when policy settings are incorrect or improperly defined. To troubleshoot these errors, follow these steps:
- Check Permissions or Access Control:
  - Ensure that the user or application has been granted the necessary roles or permissions.
  - Review the policy settings, to verify that they allow the required actions.
  - Avoid overly restrictive access that may block necessary operations.
- Review policy changes:
  - Ensure that any recent updates or changes to policies are correctly propagated across your system.
  - Ensure policy hierarchy or priority is enforced correctly. One policy may allow an action while another denies it, so confirm that policies are aligned and do not conflict.
Policy authentication error: These errors occur when there’s an issue with verifying the identity of a user or service, often related to authentication mechanisms.
- Validate API Keys, Tokens, or Credentials:
  - If using OAuth, check whether the token is expired or invalid and refresh the token if necessary.
  - Ensure the correct authentication method is used for the specific API or API ecosystem
- Debug the Authentication flow:
  - Verify whether the user or application has the appropriate roles and permissions assigned for the operation.
  - Review any status code in the response body. For example, a 401 Unauthorized status indicates an authorization error, and a 403 Forbidden status indicates access issues.

Snaplex issues

For Snaplex issues, you can check the Monitor > Infrastructure > System Overview dashboard. This dashboard provides details such as CPU usage, disk usage, pipeline executions, alerts, and activity logs.To investigate further, you can request Snaplex logs from your Customer Success Manager (CSM) to identify the root cause of the error.

Learn more about the Infrastructure dashboard