Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry for Swarms Python FastAPI Server for AWS Integration #695

Open
jmikedupont2 opened this issue Dec 20, 2024 · 1 comment
Open
Assignees

Comments

@jmikedupont2
Copy link

jmikedupont2 commented Dec 20, 2024

Feature Request: Add OpenTelemetry for Swarms Python FastAPI Server for AWS Integration

Task

  1. Read the instuctions here :
    https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-Enable-EC2.html
  2. follow the Ceviche AI protocol here https://github.com/meta-introspector/meta-meme/wiki/Ceviche-cevaidi-CEVADI-framework
  3. look for existing github and python libraries and methods to implement open telemetry
  4. deliver code changes to swarms, tested locally
  5. deliver terraform code to generate any aws resources needed based off the latest branch (feature/call_swarms) you can find of https://github.com/jmikedupont2/swarms-terraform/
  6. deliver shell scripts to setup any environmental changes needed in the user data like in this patch here

Bounty

100 $MCS tokens from my wallet here https://solscan.io/account/HMEKzpgzJEfyYyqoob5uGHR9P3LF6248zbm8tWgaApim

Is your feature request related to a problem? Please describe.

Currently, our Swarm's Python FastAPI server lacks visibility into its performance and behavior in AWS environments. This makes it difficult for our development team to diagnose issues, optimize the application, and ensure seamless integration with AWS services. We are constantly struggling to:

  • Monitor and troubleshoot serverless functions, APIs, and distributed systems
  • Get insights into latency, throughput, and error rates
  • Identify bottlenecks and areas for optimization
  • Integrate with AWS services like AWS X-Ray, CloudWatch, and Lambda

Describe the solution you'd like

We would like to integrate OpenTelemetry, an open-source observability framework, into our Swarm's Python FastAPI server to enable seamless monitoring, tracing, and logging in AWS environments. This would allow us to:

  • Automatically instrument our code with OpenTelemetry's Python agent
  • Collect and export telemetry data to AWS services like X-Ray, CloudWatch, and Lambda
  • Visualize and analyze performance metrics, latency, and errors in a centralized dashboard
  • Leverage OpenTelemetry's extensibility to integrate with other monitoring tools and services

Describe alternatives you've considered

Before opting for OpenTelemetry, we considered the following alternatives:

  • AWS X-Ray SDK for Python: While this provides some basic tracing capabilities, it lacks the extensibility and customization offered by OpenTelemetry.
  • New Relic, Datadog, or other commercial monitoring tools: These solutions are expensive and may require significant instrumentation changes to our codebase.
  • Custom-built monitoring solutions: Developing an in-house monitoring system would require significant resources and expertise.

Additional context

Our Swarm's Python FastAPI server is a critical component of our AWS-based infrastructure, handling thousands of requests per minute. By integrating OpenTelemetry, we aim to improve the reliability, scalability, and performance of our application while reducing the mean time to detect (MTTD) and mean time to resolve (MTTR) issues.

We have attached a high-level architecture diagram of our proposed OpenTelemetry integration, which includes the following components:

  • OpenTelemetry Python agent
  • FastAPI instrumentation
  • AWS X-Ray, CloudWatch, and Lambda exporters
  • Centralized monitoring dashboard (e.g., Grafana, Prometheus)

We believe that adding OpenTelemetry to our Swarm's Python FastAPI server will significantly enhance our ability to monitor, troubleshoot, and optimize our application in AWS environments.

Task instructions

Review these untested ai instructions here.

Plan: Integrating OpenTelemetry into Swarm's Python FastAPI Server for AWS Integration

Objective

Integrate OpenTelemetry into the Swarm's Python FastAPI server to enable enhanced observability and seamless integration with AWS services, specifically AWS X-Ray, CloudWatch, and Lambda.


Steps to Implement

  1. Project Setup and Preliminary Configuration

Objective: Prepare the development environment.

Tasks:

  1. Ensure the FastAPI server is running and accessible.

  2. Verify AWS account access and permissions for required services (X-Ray, CloudWatch, Lambda).

  3. Install OpenTelemetry SDK for Python:

pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-fastapi opentelemetry-exporter-aws

  1. Verify compatibility of the current server and AWS services with OpenTelemetry.

  1. Instrumentation of FastAPI with OpenTelemetry

Objective: Integrate OpenTelemetry with FastAPI for automatic instrumentation.

Tasks:

  1. Enable OpenTelemetry for the application using FastAPI instrumentation:

from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.aws.xray import AwsXRayIdGenerator
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.aws.xray import AwsXRaySpanExporter

app = FastAPI()

Resource configuration

resource = Resource.create({"service.name": "swarm-fastapi"})

Tracer configuration

tracer_provider = TracerProvider(resource=resource, id_generator=AwsXRayIdGenerator())
trace_exporter = AwsXRaySpanExporter()

span_processor = BatchSpanProcessor(trace_exporter)
tracer_provider.add_span_processor(span_processor)

Instrument FastAPI

FastAPIInstrumentor.instrument_app(app, tracer_provider=tracer_provider)

  1. Verify that spans are generated for incoming HTTP requests.

  1. Exporting Telemetry Data to AWS Services

Objective: Route telemetry data to AWS X-Ray and CloudWatch for visualization and analysis.

Tasks:

  1. Configure AWS X-Ray as the span exporter (as shown above).

  2. Use the OpenTelemetry metrics API to capture metrics (e.g., latency, throughput):

from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.cloudwatch.metrics import CloudWatchMetricsExporter

meter_provider = MeterProvider(resource=resource)
metrics_exporter = CloudWatchMetricsExporter(namespace="SwarmFastAPI")
meter_provider.start_pipeline(meter_provider.get_meter("swarm-metrics"), metrics_exporter)

  1. Configure log export using AWS Lambda or CloudWatch directly.

  1. Centralized Monitoring Dashboard

Objective: Enable visualization and analysis of telemetry data.

Tasks:

  1. Configure AWS CloudWatch for log and metrics visualization.

  2. Optionally integrate with Grafana or Prometheus for advanced dashboards.

  3. Verify dashboard configurations and test queries for latency and error rates.


  1. Testing and Validation

Objective: Ensure the integration works as expected.

Tasks:

  1. Simulate requests to the FastAPI server and verify traces, metrics, and logs appear in AWS services.

  2. Test different scenarios, including errors, high traffic, and API latency.

  3. Validate integration with AWS X-Ray for distributed tracing.


  1. Documentation and Handover

Objective: Provide clear documentation for the implemented solution.

Tasks:

  1. Create a README file detailing setup, configuration, and usage instructions.

  2. Document common troubleshooting steps and FAQs.

  3. Conduct a knowledge transfer session with the team.


Deliverables

  1. Integrated OpenTelemetry in the Python FastAPI server with AWS X-Ray, CloudWatch, and Lambda.

  2. Centralized monitoring dashboard (AWS CloudWatch or Grafana).

  3. Documentation and testing reports.


Timeline


Budget and Bounty

Bounty: 100 $MCS tokens

@jmikedupont2
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants