In a cloud environment, sooner or later, you may come across scenarios where some services might become temporarily unavailable, since you are depending, to a large extent, on network connections and external services.
Usually, these are little glitches that may even be self-healing, however, in order not to ruin the user experience, you have to be prepared to handle them skillfully.
Usually, in an on-premise environment, your web server, database server or other parts of your application may have a direct physical connection among them, but in a cloud environment, in order for them to communicate, they need to go through load balancers which may result in latency or interruptions during their communication.
In addition to the above, a service that you may want to consume may deliberately throttle you, or deny connection, so as to prevent you from adversely affecting other tenants of the service.
What to do then?
Usually, if something goes wrong, you throw an exception and display an error message to your customer. A quite easy solution which does not add complexity to your application architecture. However, is this solution good enough?
Instead of the aforementioned approach, you could recognize those errors, that are typically transient, and automatically retry the operation for a couple more times, leaving a small time gap between those operations. Usually, the operation will be successful the second or third time, and you will end up recovering from the error without the customer noticing it.
What is Transient Fault Handling Application Block?
The Transient Fault Handling Application Block for Windows Azure (“Topaz”) provides a set of reusable and testable components for adding a retry logic to your Windows Azure applications, leveraging the following services:
- Windows Azure SQL Database
- Windows Azure storage
- Service Bus and
- Caching Service
Its purpose is to make your application more robust by providing the logic for handling transient faults via:
- Including the logic to identify transient faults in a number of common cloud-based services in the form of detection strategies. These detection strategies contain built-in knowledge that is capable of identifying whether a particular exception is likely to be caused by a transient fault condition.
- Enabling you to define your retry strategies, so that you can follow a consistent approach to handling transient faults in your applications.
That way, it can apply retry policies which may be performed by your application against services that may exhibit transient faults.
Built-in Strategies
“Topaz” provides a set of built-in strategies to handle transient faults:
- Fixed interval
Retry four times at one-second intervals. - Incremental interval
Retry four times, waiting one second before the first retry, then two seconds before the second retry, then three seconds before the third retry, and four seconds before the fourth retry. - Exponential back off
Retry four times, waiting two seconds before the first retry, then four seconds before the second retry, then eight seconds before the third retry, and sixteen seconds before the fourth retry.
Except from the built-in strategies, you can extend and modify the Transient Fault Handling Application Block by implementing:
Installation
In order to install this library, you need to search for keyword `topaz` in nuget.org and install the item with label “Enterprise Library – Transient Fault Handling Application Block”
or simply install it from Nuget Package Manager console:
Install-Package EnterpriseLibrary.TransientFaultHandling.Data
Transient Fault Handling for SQL Azure database
Using ADO.NET
If you are using ADO.NET, the Transient Fault Handling Application Block provides the retry logic for you.
- Add using statement for TransientFaultHandling
`using Microsoft.Practices.EnterpriseLibrary.TransientFaultHandling;` - Create the default `Retry Manager`
const string defaultRetryStrategyName = "standard"; const int retryCount = 10; var retryInterval = TimeSpan.FromSeconds(3); var strategy = new FixedInterval(defaultRetryStrategyName, retryCount, retryInterval); var strategies = new List<RetryStrategy> { strategy }; var manager = new RetryManager(strategies, defaultRetryStrategyName); RetryManager.SetDefault(manager);
- Create a `ReliableSqlConnection` that respects the retry settings you have specified in `RetryPolicy`
//Perform your queries with retries var connStr = "your connection string"; var policy = new RetryPolicy<SqlDatabaseTransientErrorDetectionStrategy>(3, TimeSpan.FromSeconds(5)); using (var conn = new ReliableSqlConnection(connStr, policy)) { conn.Open(); var cmd = conn.CreateCommand(); cmd.CommandText = "SELECT COUNT(*) FROM Table"; var result = cmd.ExecuteScalar(); }
Using EF 6+
If you are using EF 6+, the retry logic for transient faults is built into the framework, by providing four execution strategies:
- DefaultExecutionStrategy: this execution strategy does not retry any operations, it is the default strategy for databases other than sql server.
- DefaultSqlExecutionStrategy: this is an internal execution strategy, used by default. This strategy does not retry any operations at all, however, it will wrap any exceptions that could be transient, to inform users that they might want to enable connection resiliency.
- DbExecutionStrategy: this class is suitable as a base class for other execution strategies, including your own custom ones. It implements an exponential retry policy, where the initial retry happens with zero delay and the delay increases exponentially until the maximum retry count is hit. This class has an abstract ShouldRetryOn method that can be implemented in derived execution strategies, so as to control which exceptions should be retried.
- SqlAzureExecutionStrategy: this execution strategy is inherited from DbExecutionStrategy and will retry on exceptions that are known to be possibly transient when working with SqlAzure.
Enabling an Execution Strategy
When your EF 6 model is in your project, you will need to create a new class that derives from DbConfiguration and customizes the execution strategy in the constructor.
EF 6 will look for classes that derive from DbConfiguration in your project, and use them to provide resiliency.
public class EFConfiguration : DbConfiguration { public EFConfiguration() { var retryCount = 10; var retryInterval = TimeSpan.FromSeconds(3); this.SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy(retryCount, retryInterval)); } }
The `SqlAzureExecutionStrategy` will retry the operation instantly, the first time a transient failure occurs, but will delay longer between each retry (in this case, 10) until either the max retry limit is exceeded or the total time hits the max delay.
Known limitations of Retrying Execution Strategies can be found here.
Transient Fault Handling for Azure Storage
As of Azure Storage Client Library 2.0, retries are built-in and, in fact, this library uses sensible defaults without any special steps.
While you use the client library to access blobs, tables or queues for example, you can control the back-off strategy, delay and number of retries that may occur in each operation.
// Authenticate Storage Account string accountName = "<account-name>"; string accountKey = "<account-key>"; var storageAccount = new CloudStorageAccount(new StorageCredentials("<account-name>", "<account-key>"), true); // Set RetryPolicy for you blobClient CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient(); blobClient.DefaultRequestOptions.RetryPolicy = new ExponentialRetry( TimeSpan.FromSeconds(3), 10); // Access container and work on something... CloudBlobContainer blobContainer = blobClient.GetContainerReference("<container-name>"); if (blobContainer.Exists()) { var blobs = blobContainer.ListBlobs().ToList(); // ..... }
For further details on configuring and using retry policies for azure storage you can read this excellent article here.