Skip to main content

Observability

Vali-Blob instruments all storage operations with OpenTelemetry traces and metrics using .NET's System.Diagnostics APIs — compatible with any OpenTelemetry backend: Jaeger, Tempo, Zipkin, Prometheus, Grafana, Datadog, Azure Monitor, and others.


Telemetry Sources

Source TypeNamePurpose
ActivitySource"Vali-Blob.Storage"Distributed tracing spans per operation
Meter"Vali-Blob.Storage"Counters and histograms for metrics

OpenTelemetry Setup

using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;

builder.Services.AddOpenTelemetry()
.WithTracing(t => t
.AddSource("Vali-Blob.Storage")
.AddOtlpExporter() // sends to Jaeger, Tempo, etc. via OTLP
)
.WithMetrics(m => m
.AddMeter("Vali-Blob.Storage")
.AddPrometheusExporter() // exposes /metrics for Prometheus scraping
);

Local Development with Jaeger

docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
jaegertracing/all-in-one:latest
builder.Services.AddOpenTelemetry()
.WithTracing(t => t
.AddSource("Vali-Blob.Storage")
.AddOtlpExporter(o =>
{
o.Endpoint = new Uri("http://localhost:4317");
o.Protocol = OtlpExportProtocol.Grpc;
}));

Open Jaeger UI at http://localhost:16686 and filter by service "Vali-Blob.Storage".


Traced Operations (Spans)

Every storage operation produces an Activity (OpenTelemetry span):

Activity NameTriggered By
storage.uploadIStorageProvider.UploadAsync
storage.downloadIStorageProvider.DownloadAsync
storage.deleteIStorageProvider.DeleteAsync
storage.existsIStorageProvider.ExistsAsync
storage.copyIStorageProvider.CopyAsync
storage.listIStorageProvider.ListFilesAsync / ListFoldersAsync
storage.get_metadataIStorageProvider.GetMetadataAsync
storage.set_metadataIStorageProvider.SetMetadataAsync
storage.presign_uploadIPresignedUrlProvider.GetPresignedUploadUrlAsync
storage.presign_downloadIPresignedUrlProvider.GetPresignedDownloadUrlAsync
storage.resumable.startIResumableUploadProvider.StartResumableUploadAsync
storage.resumable.chunkIResumableUploadProvider.UploadChunkAsync
storage.resumable.completeIResumableUploadProvider.CompleteResumableUploadAsync
storage.resumable.abortIResumableUploadProvider.AbortResumableUploadAsync

Span Tags

Each span is enriched with contextual tags:

TagExample ValueDescription
provider.name"s3", "azure"Registered provider name
provider.type"AWSS3Provider"Provider implementation class
file.path"uploads/avatar.jpg"Storage object path
file.size_bytes102400File size in bytes
file.content_type"image/jpeg"MIME type
resumable.upload_id"abc123"Session ID for resumable operations
resumable.chunk_offset5242880Byte offset of the chunk being uploaded
error.type"HttpRequestException"Exception type on failure

Example Trace

A single upload with thumbnail generation produces:

storage.upload  [85 ms]
├── pipeline.validation [2 ms]
├── pipeline.content_detection [1 ms]
├── pipeline.image_processing [20 ms]
│ ├── image.resize [12 ms]
│ └── image.encode [8 ms]
├── s3.put_object [55 ms] ← AWS SDK span
└── storage.upload.thumbnail [30 ms]
├── image.resize [6 ms]
└── s3.put_object [20 ms]

Metrics

Vali-Blob emits the following metrics via System.Diagnostics.Metrics.Meter:

Counters

Metric NameUnitDescription
storage.upload.count{operation}Total upload operations
storage.download.count{operation}Total download operations
storage.delete.count{operation}Total delete operations
storage.error.count{error}Total failed operations
storage.resumable.session.count{session}Resumable upload sessions started
storage.resumable.chunk.count{chunk}Total chunks uploaded

Histograms

Metric NameUnitDescription
storage.upload.bytesByBytes uploaded per operation
storage.download.bytesByBytes downloaded per operation
storage.upload.durationmsUpload operation duration
storage.download.durationmsDownload operation duration
storage.resumable.chunk.bytesByBytes per chunk upload

All metrics include a provider_name dimension for per-provider segmentation.


Prometheus Scraping

With AddPrometheusExporter(), metrics are exposed at the /metrics endpoint:

# HELP storage_upload_count Total number of upload operations
# TYPE storage_upload_count counter
storage_upload_count{provider_name="s3"} 1423

# HELP storage_upload_bytes_total Bytes uploaded total
# TYPE storage_upload_bytes_total counter
storage_upload_bytes_total{provider_name="s3"} 1073741824

# HELP storage_upload_duration_ms Upload duration
# TYPE storage_upload_duration_ms histogram
storage_upload_duration_ms_bucket{provider_name="s3",le="100"} 1200
storage_upload_duration_ms_bucket{provider_name="s3",le="500"} 1380
storage_upload_duration_ms_bucket{provider_name="s3",le="1000"} 1420
storage_upload_duration_ms_bucket{provider_name="s3",le="+Inf"} 1423

# HELP storage_error_count Total failed operations
# TYPE storage_error_count counter
storage_error_count{provider_name="s3",error_type="TimeoutException"} 3

Configure Prometheus to scrape:

# prometheus.yml
scrape_configs:
- job_name: my-app
static_configs:
- targets: ['myapp:8080']
metrics_path: /metrics
scrape_interval: 15s

Grafana Dashboard Queries

Example PromQL for a Vali-Blob Grafana dashboard:

# Upload throughput (MB/s)
rate(storage_upload_bytes_total[5m]) / 1024 / 1024

# Download throughput (MB/s)
rate(storage_download_bytes_total[5m]) / 1024 / 1024

# Error rate per minute
rate(storage_error_count[1m]) * 60

# P50 / P95 / P99 upload latency
histogram_quantile(0.50, rate(storage_upload_duration_ms_bucket[5m]))
histogram_quantile(0.95, rate(storage_upload_duration_ms_bucket[5m]))
histogram_quantile(0.99, rate(storage_upload_duration_ms_bucket[5m]))

# Error rate by provider
sum by (provider_name) (rate(storage_error_count[5m]))

Structured Logging

Vali-Blob emits structured log messages through ILogger<T>. Configure the log level:

{
"Logging": {
"LogLevel": {
"Vali-Blob": "Information"
}
}
}
Log LevelWhat Is Logged
InformationUpload/download start and completion, provider initialization
WarningRetry attempts, circuit breaker state changes
ErrorOperation failures after all retries exhausted
DebugPer-chunk details for resumable uploads, pipeline step execution

Custom Activity Enrichment

Attach additional tags to all Vali-Blob spans by implementing IStorageTelemetryEnricher:

public class TenantTelemetryEnricher : IStorageTelemetryEnricher
{
private readonly IHttpContextAccessor _http;

public TenantTelemetryEnricher(IHttpContextAccessor http) => _http = http;

public void Enrich(Activity activity, StorageOperationContext context)
{
var tenantId = _http.HttpContext?.User.FindFirstValue("tenant_id");
if (tenantId is not null)
{
activity.SetTag("tenant.id", tenantId);
activity.SetTag("tenant.provider", context.ProviderName);
}
}
}

// Registration
builder.Services.AddSingleton<IStorageTelemetryEnricher, TenantTelemetryEnricher>();

Azure Monitor / Application Insights

builder.Services.AddOpenTelemetry()
.WithTracing(t => t
.AddSource("Vali-Blob.Storage")
.AddAzureMonitorTraceExporter(o =>
o.ConnectionString = config["ApplicationInsights:ConnectionString"]!))
.WithMetrics(m => m
.AddMeter("Vali-Blob.Storage")
.AddAzureMonitorMetricExporter(o =>
o.ConnectionString = config["ApplicationInsights:ConnectionString"]!));

Vali-Blob traces appear in Application Insights under InvestigatePerformance, and metrics appear in Metrics Explorer under the namespace Vali-Blob.Storage.


Datadog

builder.Services.AddOpenTelemetry()
.WithTracing(t => t
.AddSource("Vali-Blob.Storage")
.AddOtlpExporter(o =>
{
o.Endpoint = new Uri("http://datadog-agent:4317");
o.Protocol = OtlpExportProtocol.Grpc;
}))
.WithMetrics(m => m
.AddMeter("Vali-Blob.Storage")
.AddOtlpExporter(o =>
{
o.Endpoint = new Uri("http://datadog-agent:4317");
o.Protocol = OtlpExportProtocol.Grpc;
}));

Set DD_ENV, DD_SERVICE, and DD_VERSION environment variables so Datadog APM correlates traces across services correctly.


  • Resilience — Retry, circuit breaker, and timeout configuration
  • Health Checks — ASP.NET Core health check integration
  • Events — Application-level storage event handling