ETL Performance

Performance of any application is affected by the environment in which it runs, variables such as host server CPU performance, Operating System performance, amount of available main memory, other applications running on the same host etc. The following is an indication of the ETL performance and should not be construed as a guarantee.

The architecture of the ETL is divided into three components, The Source System Adaptor ( Extractor ), The Mapping and Transformer Engine and The Target System Adaptor ( Loader ), each of these three components run in separate processing threads. In addition to this, each component can be configured via the ETL Configuration Client to operate as multiple threads to improve throughput, in the test cases outlined below, all components are running in single thread mode. The reasoning behind this is that multiple threads can be a blessing or a curse, specifying too many can result in a degradation in performance, specifying too few can result in a loss of throughput. The ETL Configuration Client provides for this to be tailored on an individual basis to fit the client's setup as required.

The available memory can also affect the performance of an application, the ETL Configuration Client provides the mechanism for this to be specified on an individual basis allowing the memory allocation to be tailored to provide the best configuration for each setup.

Host Server Configuration:

The host server is a dual core 3GHz Intel I3 processor running the Linux Operating System with 4 Gbytes of main memory. The host also operates as a database server running Oracle 10g Enterprise database.

Test Cases:

  • Database to Fixed Field Width File: This is a simple SQL query extracting data from the database and transforming this data into Fixed Field Width data file. This case takes 1 minute and 20 seconds to process 1,000,000 records.
  • Fixed Field Width data file to Database table: This reads the data file and loads the data into a database table. This case takes 40 seconds to load 500,000 records.
  • Delimited data file ( CSV ) to Database table: This reads the data file and loads the data into a database table. This case takes 30 seconds to load 500,000 records.
  • Spreadsheet data file ( XLS ) to Database table: This reads the data file and loads the data into a database table. This case takes 30 seconds to load 60,000 records.
  • Database to Database: This is a simple SQL query extracting data from the database, each record then has a Transformer class applied which generates 24 records for each data record read before the resulting data is loaded into a database table. An example of the use of this transformer could be where the source data relates to a customer loan and the recipient data requires a data record for each repayment. This case takes 100,000 loan records from the source database and loads 2,400,000 repayment records into the recipient database table in 2 minutes and 30 seconds.
DEX Software Limited 2006