Designing High-Availability Systems with .NET and Redis

Modern production systems cannot afford downtime. Whether you are running an e-commerce platform, a financial service, or a SaaS application, your users expect 99.9% or better uptime. Designing high-availability systems with .NET and Redis is one of the most practical approaches to achieving this goal. Redis provides lightning-fast, in-memory data access, distributed locking, and pub/sub capabilities — all of which are essential building blocks for resilient architectures. In this article, we will walk through the core patterns and code required to design a genuinely fault-tolerant .NET backend powered by Redis.
Table of Contents
- Why High Availability Requires More Than Just a Good Server
- Setting Up Redis with StackExchange.Redis in .NET
- Implementing Distributed Caching with Fallback
- Distributed Locking to Prevent Race Conditions
- Integrating Health Checks for Redis
- Session Affinity vs. Centralized Session with Redis
- Event-Driven Resilience with Redis Pub/Sub
- Applying the Circuit Breaker Pattern
- Designing Scalable Multi-Tenant Caching Strategies
- Monitoring and Cost Awareness in HA Redis Deployments
- Conclusion
Why High Availability Requires More Than Just a Good Server
High availability (HA) is not about buying the biggest server — it is about designing your system so that no single component failure causes the entire application to go down. This means eliminating single points of failure at every layer: application servers, databases, caches, and message brokers. Teams building high-availability systems with .NET and Redis typically focus on three pillars: redundancy, fast failover, and graceful degradation.
If you are working on scaling up a large backend, you may also find value in understanding lessons from handling millions of users with .NET, which covers real-world scaling decisions that complement the caching strategies we will discuss here.
The Role of Redis in High-Availability Architecture
Redis is not just a cache — in HA systems, it serves as a distributed coordination layer. It handles distributed locks (preventing race conditions across multiple app instances), session storage (so any app node can serve any user), rate limiting, pub/sub messaging, and leader election. When configured with Redis Sentinel or Redis Cluster, it also eliminates the cache tier as a single point of failure. According to the official Redis Sentinel documentation, Sentinel provides automatic failover, monitoring, and notification capabilities that make Redis suitable for production HA deployments.
Setting Up Redis with StackExchange.Redis in .NET
The most widely used Redis client for .NET is StackExchange.Redis. It supports connection multiplexing, reconnection logic, and Redis Cluster natively. Here is how to register a properly configured Redis connection in an ASP.NET Core application:
// Program.cs
builder.Services.AddSingleton<IConnectionMultiplexer>(sp =>
{
var config = ConfigurationOptions.Parse(
builder.Configuration.GetConnectionString("Redis")!);
config.AbortOnConnectFail = false; // Don't throw on startup if Redis is down
config.ConnectRetry = 5;
config.ReconnectRetryPolicy = new ExponentialRetry(5000);
config.SyncTimeout = 5000;
config.AsyncTimeout = 5000;
return ConnectionMultiplexer.Connect(config);
});
builder.Services.AddScoped<IDatabase>(sp =>
sp.GetRequiredService<IConnectionMultiplexer>().GetDatabase());
Setting AbortOnConnectFail = false is critical in HA deployments — your application should start even if Redis is momentarily unavailable, and attempt reconnection in the background. Using ExponentialRetry prevents thundering herd issues when Redis becomes available again after an outage.
Redis Connection String for Sentinel
When connecting to a Redis Sentinel cluster, your connection string needs to reference the Sentinel endpoints, not the Redis master directly. This way, StackExchange.Redis will automatically discover the current master and fail over to a replica if needed:
{
"ConnectionStrings": {
"Redis": "sentinel1:26379,sentinel2:26379,sentinel3:26379,serviceName=mymaster,password=yourpassword"
}
}
Implementing Distributed Caching with Fallback
One of the most important patterns in high-availability systems with .NET and Redis is the cache-aside pattern combined with a fallback to the primary data source. If Redis becomes unavailable, your system should degrade gracefully rather than throwing errors to users.
public class ResilientCacheService
{
private readonly IDatabase _redis;
private readonly ILogger<ResilientCacheService> _logger;
private static readonly TimeSpan DefaultTtl = TimeSpan.FromMinutes(10);
public ResilientCacheService(IDatabase redis, ILogger<ResilientCacheService> logger)
{
_redis = redis;
_logger = logger;
}
public async Task<T?> GetOrSetAsync<T>(
string key,
Func<Task<T>> factory,
TimeSpan? ttl = null) where T : class
{
try
{
var cached = await _redis.StringGetAsync(key);
if (cached.HasValue)
return JsonSerializer.Deserialize<T>(cached!);
}
catch (RedisException ex)
{
_logger.LogWarning(ex, "Redis unavailable for key {Key}. Falling back to data source.", key);
}
// Cache miss or Redis unavailable — hit the source
var result = await factory();
try
{
if (result != null)
{
await _redis.StringSetAsync(
key,
JsonSerializer.Serialize(result),
ttl ?? DefaultTtl);
}
}
catch (RedisException ex)
{
_logger.LogWarning(ex, "Failed to write key {Key} to Redis.", key);
}
return result;
}
}
Wrapping each Redis operation in a try/catch for RedisException ensures that a cache failure never surfaces as an unhandled exception to your users. This pattern is at the heart of resilient, high-availability systems with .NET and Redis.
Distributed Locking to Prevent Race Conditions
When multiple instances of your .NET application run in parallel (as they should in any HA deployment), race conditions become a real risk. Distributed locking with Redis ensures that only one instance performs a critical operation at a time — such as processing a payment, regenerating a large report, or seeding a cache.
public class RedisDistributedLock
{
private readonly IDatabase _redis;
public RedisDistributedLock(IDatabase redis)
{
_redis = redis;
}
public async Task<bool> AcquireLockAsync(
string resource,
string lockToken,
TimeSpan expiry)
{
// SET NX EX atomically — only set if not exists
return await _redis.StringSetAsync(
$"lock:{resource}",
lockToken,
expiry,
When.NotExists);
}
public async Task ReleaseLockAsync(string resource, string lockToken)
{
// Only release if we own the lock (Lua script for atomicity)
const string script = @"
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end";
await _redis.ScriptEvaluateAsync(
script,
new RedisKey[] { $"lock:{resource}" },
new RedisValue[] { lockToken });
}
}
The Lua script for releasing the lock is critical — it ensures that a lock is only released by the process that acquired it, even if the lock nearly expired. For production use, consider the RedLock.net library which implements the Redlock algorithm across multiple independent Redis nodes for even stronger guarantees.
Integrating Health Checks for Redis
No high-availability system is complete without proper health checks. ASP.NET Core’s health check middleware can monitor your Redis connection and report its status to load balancers and orchestration platforms like Kubernetes. This integrates seamlessly with health checks in ASP.NET Core and load balancer integration, ensuring that unhealthy instances are automatically removed from the rotation.
// Program.cs
builder.Services.AddHealthChecks()
.AddRedis(
builder.Configuration.GetConnectionString("Redis")!,
name: "redis",
failureStatus: HealthStatus.Degraded,
tags: new[] { "cache", "infrastructure" });
// In app pipeline:
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("infrastructure")
});
Using HealthStatus.Degraded instead of Unhealthy for Redis means the app continues to serve traffic even when Redis is down — reflecting our graceful degradation strategy. The load balancer only removes an instance from rotation when it reports Unhealthy.
Session Affinity vs. Centralized Session with Redis
In single-server deployments, ASP.NET Core stores sessions in-process. In HA deployments with multiple nodes behind a load balancer, you have two choices: sticky sessions (session affinity) or centralized session storage. Sticky sessions couple a user to a specific server — if that server goes down, the session is lost. Centralized session storage using Redis decouples sessions from any individual node, making it the correct choice for genuine high availability.
// Program.cs
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("Redis");
options.InstanceName = "MyApp_Session_";
});
builder.Services.AddSession(options =>
{
options.IdleTimeout = TimeSpan.FromMinutes(30);
options.Cookie.HttpOnly = true;
options.Cookie.IsEssential = true;
options.Cookie.SecurePolicy = CookieSecurePolicy.Always;
});
// Add middleware
app.UseSession();
With this configuration, any node in your cluster can serve any user’s request without needing to know which node that user connected to previously. This is a foundational requirement for high-availability systems with .NET and Redis.
Event-Driven Resilience with Redis Pub/Sub
For real-time cache invalidation and event broadcasting across nodes, Redis pub/sub is a lightweight solution that avoids the operational overhead of a full message broker for internal communication. When a record is updated in your database, you can publish an invalidation event so all nodes drop that cached entry simultaneously.
// Publisher (on data update)
public async Task InvalidateCacheAsync(string entityKey)
{
var subscriber = _redis.Multiplexer.GetSubscriber();
await subscriber.PublishAsync("cache:invalidate", entityKey);
}
// Subscriber setup (on startup)
var subscriber = connectionMultiplexer.GetSubscriber();
await subscriber.SubscribeAsync("cache:invalidate", async (channel, key) =>
{
await _redis.KeyDeleteAsync((string)key!);
_logger.LogInformation("Cache key {Key} invalidated via pub/sub.", key);
});
This pattern complements more complex event-driven architectures. If you are building a system that requires durable messaging beyond what Redis pub/sub provides, consider reading about event-driven architecture with .NET and Azure Service Bus for guaranteed delivery semantics.
Applying the Circuit Breaker Pattern
Even with graceful fallbacks at the code level, you need a circuit breaker to avoid cascading failures when Redis is consistently unavailable. The Polly library is the standard way to implement this in .NET, and it works natively with your Redis service calls.
// Register a circuit breaker for Redis operations
builder.Services.AddSingleton<IAsyncPolicy>(sp =>
Policy
.Handle<RedisException>()
.Or<RedisTimeoutException>()
.CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 3,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, breakDuration) =>
{
var logger = sp.GetRequiredService<ILogger<Program>>();
logger.LogError(ex,
"Redis circuit breaker OPEN for {Duration}s",
breakDuration.TotalSeconds);
},
onReset: () =>
{
var logger = sp.GetRequiredService<ILogger<Program>>();
logger.LogInformation("Redis circuit breaker CLOSED — connection restored.");
}));
When the circuit breaker is open, calls to Redis are short-circuited immediately without waiting for a timeout — this prevents thread exhaustion and keeps your application responsive. After 30 seconds, the circuit moves to half-open and tests if Redis is available again.
Designing Scalable Multi-Tenant Caching Strategies
If you are building a multi-tenant SaaS product, your caching strategy must account for tenant isolation. Each tenant’s cached data should be namespaced to prevent data leakage and allow per-tenant cache invalidation. Building on patterns for scalable multi-tenant SaaS in .NET will help you design the broader application structure that this caching layer sits within.
public class TenantCacheService
{
private readonly IDatabase _redis;
private readonly IHttpContextAccessor _httpContextAccessor;
public TenantCacheService(IDatabase redis, IHttpContextAccessor httpContextAccessor)
{
_redis = redis;
_httpContextAccessor = httpContextAccessor;
}
private string TenantKey(string key)
{
var tenantId = _httpContextAccessor.HttpContext?
.User.FindFirst("tenant_id")?.Value ?? "global";
return $"tenant:{tenantId}:{key}";
}
public async Task SetAsync<T>(string key, T value, TimeSpan ttl)
where T : class
{
var fullKey = TenantKey(key);
await _redis.StringSetAsync(fullKey, JsonSerializer.Serialize(value), ttl);
}
public async Task InvalidateTenantCacheAsync(string tenantId)
{
// Use SCAN to find all keys for the tenant and delete them
var server = _redis.Multiplexer.GetServer(
_redis.Multiplexer.GetEndPoints().First());
var keys = server.Keys(pattern: $"tenant:{tenantId}:*").ToArray();
if (keys.Length > 0)
await _redis.KeyDeleteAsync(keys);
}
}
Monitoring and Cost Awareness in HA Redis Deployments
High-availability Redis deployments — especially on managed services like Azure Cache for Redis or AWS ElastiCache — carry real cost implications. Memory usage, the number of connections, and replication bandwidth all contribute to your monthly bill. Setting appropriate TTLs, using Redis key expiry policies (volatile-lru or allkeys-lru), and evicting stale data aggressively can significantly reduce memory consumption without sacrificing the benefits of caching. For more context on balancing performance and cost in .NET cloud deployments, see reducing cloud costs for .NET apps without sacrificing performance.
Always configure maxmemory and an eviction policy in your Redis instance to prevent out-of-memory crashes in production:
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
save ""
appendonly no
Disabling persistence (save "" and appendonly no) is recommended when using Redis purely as a cache in an HA setup — persistence adds latency and is unnecessary if your application can rebuild the cache from the primary data source.
Conclusion
Designing high-availability systems with .NET and Redis requires deliberate decisions at every layer. Here are the key takeaways from this article:
- Configure StackExchange.Redis with
AbortOnConnectFail = falseandExponentialRetryto survive transient Redis outages. - Always wrap Redis calls in try/catch for
RedisExceptionand fall back to your primary data source. - Use Redis Sentinel or Redis Cluster to eliminate the cache tier as a single point of failure.
- Implement distributed locking using atomic
SET NX EXand Lua scripts for safe lock release. - Centralize sessions in Redis to enable true stateless, horizontally-scalable app nodes.
- Add Redis health checks so load balancers can route traffic away from unhealthy instances.
- Apply circuit breakers via Polly to prevent cascading failures when Redis is consistently unavailable.
The WireFuture team specialises in building resilient, enterprise-grade .NET backends. If you are evaluating your architecture or need expert hands-on implementation, explore our .NET and ASP.NET development services or reach out via cloud and DevOps consulting to discuss your high-availability requirements.
WireFuture's team spans the globe, bringing diverse perspectives and skills to the table. This global expertise means your software is designed to compete—and win—on the world stage.
No commitment required. Whether you’re a charity, business, start-up or you just have an idea – we’re happy to talk through your project.
Embrace a worry-free experience as we proactively update, secure, and optimize your software, enabling you to focus on what matters most – driving innovation and achieving your business goals.

