Skip to main content
Testing

Create or copy test data?

By 29 August 2022August 6th, 2024No Comments

Software testing is a core component of any software development process. Software testing is a complex and varied field where different testing types depend on the requirement, environment, audience, etc. However, the need to use test data is one constant factor across any testing type.

Users have two options for obtaining test data: creating new data or using sanitised production data. Both data sets offer their advantages and disadvantages in different test scenarios. In this post, let’s dig a bit deeper to identify the better approach for obtaining test data.

What is creating test data?

Creating test data is the process of generating a data set tailored for the specific test case or scenario. The testers themselves make this data according to the needs of the test. Its complexity and variety will change depending on the exact requirements.

Advantages of creating test data:

  • Can facilitate data sets that exactly match the testing requirements.
  • The relatively more straightforward process to create test data regardless of the complexity of the requirements.
  • It can be easily modified and extended when needed.
  • There are no security or compliance risks as this data does not contain any sensitive information.
  • You do not need to implement a data sanitisation process before using this data for testing.
  • Test data created in-house reduces test implementation time.

What is copying test data?

This approach uses existing data as the core data set for tests. Typically the data comes from a production environment and can risk breaching privacy legislation because it doesn’t hide or mask Personally Identifiable (PI) information. PI data, such as personal details, addresses, dates of birth, etc., must be anonymised. This process is referred to as sanitisation.

Advantages of creating test data:

  • Real-world production data that can simulate exact production use cases.
  • This data can identify production bugs and be invaluable in optimisation and stress testing.
  • The production data is reusable across multiple test cases.
  • Production data can provide valuable insights into application behavior beyond the test scope.

Creating data vs. copying data?

Now we have a basic idea about each method for obtaining test data, let’s compare them to figure out the better approach. This comparison comes down to two primary factors: use cases of data and the ease of obtaining data.

1. Utilizing test data

Creating test data is the more straightforward approach and is the only option if you need to test a new feature. Creating new data caters to the specific test case. It provides more flexibility to testers as they can produce data that is not only aligned with the requirement but also invalid to test the resilience of the software. This way, a tester can implement numerous test cases covering various use cases.

It is improbable that any production data will match the requirements for testing a new feature. Production data is invaluable if the test cases involve improvements or extensions to existing functionality.

Production data is the way to go when the user wants to understand production behaviour better or needs to retest a production bug. On the other hand, copying existing production data may seem more restrictive as the data is not tailored to the test cases. In this situation, where the test data is not a good match, it must be modified before it can be used in these test cases.

2. Obtaining test data

Again, creating test data from scratch is relatively easy and is the more straightforward method, especially if there are no constraints on the data needed. Meanwhile, copying data is a more involved process. It requires users to implement a proper method to copy data safely and efficiently from a production environment to a test environment. Furthermore, they need to sanitise the data so that no user or system-identifying information is exposed, even to internal users.

Creating data will be faster for some tests in a test implementation than copying data, as copying can be a complex and relatively lengthy process. Mitigating the complexity and length of time to copy the data can be achieved through automation by implementing automated copying and data cleaning/sanitisation process. For example, an organisation can implement a replica of a production database in a test environment that will automatically copy and sanitise data at a predetermined schedule. It allows users to have an up-to-date test data set aligned with the production data.

Automation significantly reduces the time to obtain (copy) data. As the data is readily available, testers can directly modify the test data to suit their specific needs or use it as is. That process will significantly reduce the workload of the testers in the long term, even if it requires a higher initial investment to implement. It can even reduce or eliminate the need to create test data in most use cases as the data is directly available in the test environment.

Conclusion: Which is better?

Both approaches bring tangible benefits to the testing process and are invaluable in different use cases. So, the better strategy is not selecting one over the other but using a combination of both approaches – it will lead to more significant benefits in the delivery pipeline.

You can create new test data for a new feature or test vector while using copied test data for bug fixes, improvements to existing features, etc. It enables users to combine the best aspects of both approaches, such as the flexibility of creating data and the ability to simulate real-world conditions using copied production data.

The choice of test data selected must be reevaluated in every phase of a multi-phase product development cycle. So, keep an eye on it, and reach out to the luvo Testing team via [email protected] for any support incorporating data into your software testing.

/* For Sub Menu itmes*/