Anisbert Suárez
4 min readApr 14, 2021

Are our performance tests realistic?

The performance tester is challenged to mitigate as many performance risks as possible and thus avoid very expensive incidents. For this razon, he works on performance tests with the simulation of the most realistic operational and load profiles, which the budget for infrastructure costs, effort and time allows. This time my cost assessments will be basically in terms of technological infrastructure.

In the following image, I summarize that, based on my experience, I consider relevant for the implementation and execution of realistic test. There may be more influencing factors, depending on the context of the application to be tested.

When I speak of operational profile I mean the model of how users run a system, specifically the probabilities of occurrence of function calls and the distributions of parameter values. It can range from the simplest, which would be a single service, understanding other services as users or clients, to a complete flow of transactions where users are specific roles. It is important to select the profiles to simulate based on the analysis of their execution frequency and their impact or representativeness on the performance of the application. Representativeness in performance is important, because sometimes we confuse the scope of these tests with functional tests and they are not. Although they collaborate in some way, the implementation of all the profiles, all the scenarios, all the basic or alternate flows, all the valid and invalid classes are not objective.

Although the test data sets are part of the operational profile, I prefer to highlight them and do an individual analysis for their relevance and particularities.We must ensure that we have a representative data set in terms of quantity, distribution and consumption. For this, it is essential to carry out an exhaustive analysis of the production behavior, if the application is already available to users, or to be clear about the business flows, if it is a new application. For any of the cases, it is necessary to identify their distribution in the period taken as reference; understand which operational profiles could be influencing the 95th and 99th percentiles and understand the particularities of the data layer, in terms of architecture, design and behavior in the reference period.

Think time represents the user’s reflection time when naturally executing the flow of actions in the application. It is important to include them in performance tests, because it brings us closer to the actual behavior in production, regarding concurrency of execution of functions and active sessions. Many times we ignore this aspect and we are moving away from reality, by generating more sessions of concurrent users than those processed in the reference time period. There are tools, such as JMeter, that allow you to record the sequence of actions including the think time, which allows you to determine a range of values between which the times have to oscillate.

Ramp up must be consistent with the throughput curve to be simulated. To implement a correct ramp up, it is important to gradually incorporate virtual users, which ends when the desired concurrency is achieved. Then, a stabilization time should be defined, to keep the load at the desired level and to be able to enter the measurement period. At the end of the test, the closing period is executed, where the users are gradually decreased, to keep the system stable.

In order to achieve the load generation process with adequate throughput, the time period to simulate must be selected taking into account the average and the maximum throughput experienced. The duration of the test depends on the length of this throughput. Without being absolute, I would think that the duration of the test is directly proportional to the magnitude of the target throughput. We must increase the load at levels that do not destabilize the system, so generating more load requires more ramp-up time.

Applications that have an international scope usually evaluate the cost vs. benefit of implementing performance tests with geographically distributed load.Undoubtedly, it is a factor that takes a step closer to realism, but generates relevant costs that require the evaluation of the probability and impact of the associated risks. Many companies consider more profitable the use, on demand, of load generation tools in the cloud, which generally allow generating load from several locations. Otherwise, the costs of infrastructure dedicated to performance testing can grow considerably.

Up to this point, we have focused on aspects of the test strategy, but the system under test has as much or more relevance. Everything would be in vain, if the application environment, on which we are going to test, is not in the same conditions as the live environment or the differences are not controlled and the results are incorrectly interpreted. Among the conditions to consider are the architecture, infrastructure and population of the database, just to mention a few that may influence. There is variety in terms of the trends in which environment it is tested: some companies decide to have an environment only for performance tests, others test against staging and several execute, in a controlled manner, tests against the production environment itself. The decision is based on the affordability of the test.

Anisbert Suárez

Senior Performance Engineer. Master’s Degree in Software Quality. ISTQB® Certified Tester Performance Testing