DTS vs SSIS A basic overview



In today’s blog post, we will give a basic introduction of SSIS

(SQL Server Integration Services) and provide a comparison of some of

those features with DTS (Data Transformation Services), it’s

predecessor.

SSIS is an ETL tool provided by Microsoft for extracting the data

from various sources, transforming the data according to the business

requirements and loading the data into the destination. The source and

the destination could be anything that you can talk to using a

connection. One important thing to understand is that SQL Server

Integration Services (SSIS) is a component of Microsoft SQL Server 2005

and is not a separate install. Integration Services includes graphical

tools and wizards for building and debugging packages, tasks for

performing workflow functions such as FTP operations, for executing SQL

statements, or for sending e-mail messages, data sources and

destinations for extracting and loading data, transformations for

cleaning, aggregating, merging, and copying data.

In SQL Server 2000, DTS packages are created in Enterprise Manager.

The DTS packages are created in the DTS designer and you have the option

of saving them in SQL Server or saving them to the filesystem (by

specifying the location as “Structured Storage File”). In Integration

Services, the packages are created using Business Intelligence

Development Studio (BIDS). BIDS is nothing but another version of

Microsoft Visual Studio. It is a modified version of Visual

Studio.  When we create Integration Services packages using BIDS,

they are saved on the client machine (local machine). We have to

explicitly deploy the Integration Services Packages onto the destination

server.

The following steps show how you can launch BIDS:

Go to: Start –> All Programs –> Microsoft Sql Server 2005 –> Business Intelligence Development Studio

ssis_1.jpg

The basic organization concept in DTS as well as SSIS is the concept

of a package. A package is essentially a collection of SSIS objects

which includes:

a) Connections to the data sources,

b) Data Flows,

c) Control Flows, and

d) Event Handlers

We will discuss these shortly. Let’s first look at the DTS Package Designer and BIDS to get an idea of the IDE layout.

Package Design Pane:

The following images show the DTS and SSIS design panes respectively.

ssis_2.jpg

ssis_3.jpg

The striking difference when one sees both the designers is the way the designer is designed.

In DTS, the designer consists of a single pane. And to the right we

see two options: the Connections and the Workflow tasks. The Connections

are used to make the connections to a wide variety of data sources and

destinations like Excel, Flat Files, Sql Server and Access essentially

anything that we have an ODBC, OLEDB, Jet driver etc. connectivity to.

The Work Flow tasks add the functionality to the package. The thick line

or what we call the data transformation task between the source and the

destination in the above example transforms the data and loads into the

destination. One can write custom code for the transformations or

simply use the inbuilt capabilities of the data transformation task.

In SSIS, the designer is split into 4 design panes:

a) Control Flow,

b) Data Flow,

c) Event Handlers, and

d) Package Explorer.

The control flow pane is used to design the workflow inside

the package. There is also a great improvement in the number of tasks

and additional functionality has been added to some of the existing

tasks. The control flow is like a program flow. All the workflow tasks

in the DTS come under the Control Flow tasks in SSIS except the Data

Transformation Task.

The Data Transformation Task in DTS has been converted into Data Flow

task. As the name suggests, the Data Flow task handles the flow of

data. Microsoft has introduced a wide variety of data flow tasks in SSIS

for building an entire ETL. All the data transformation from the

source to the destination takes place in this Data Flow task. Examples

of Data Flow Task are Multicast, OLEDB Command, Sort transformation,

Pivot transformation, Unpivot transformation etc.

Event Handlers is something new to SSIS. Event handlers

respond to various events raised in the package like on error, on

warning, on pre execute and on post execute. When an event is raised

then the control goes to the Event Handler Pane, and the package does

whatever is instructed in the Event Handler Pane. Again Event Handlers

Contain the Control Flow Tasks because Control Flow determines the

workflow of the package.

Package Explorer is also a new pane in SSIS. It gives the overall view of the package.

ssis_4.jpg

The image shown above shows the package explorer. As said earlier,

the package explorer gives the over all gist of the entire package like

Executables, Precedence Constraints, Connection Managers, Event Handlers

and the Variables. Precedence Constraints and Connection Managers are

described later in the post.

The Solution Explorer:

The Solution Explorer is also a new concept in SSIS. The solution

explorer consists of the SSIS packages, connections and Data Source

Views (DSV). The solution contains a series of projects. Each project

contains a series of objects used in the projects. In DTS we were

allowed to create only independent packages. There was no possibility to

store the related packages of the module in a single place. The

Solution Explorer allows us to do that. We can have multiple packages

relation to one module in a single Solution.

Data Source View (DSV)

This is also new to SSIS. A data source view is a document that

describes the schema of an underlying data source. A data source view

provides a named, browseable, persisted selection of database objects

that can be used to define sources, destinations, and lookup tables for

SQL Server 2005 Integration Services (SSIS) tasks, transformations, data

sources, and destinations. Example: In some ERPs like SAP or even in

some databases, the columns might be named like A0012, A0013, AABE etc.

Naming the columns this way is typically done by some designers for

obfuscation (especially in packaged applications) but does not help the

SSIS developers. As far as the SSIS developer is concerned it is just

another column on which data transformations have to be applied. In this

case we can create a data source view over this data source and can

rename the columns in some meaningful way by not renaming the columns in

the data source. In other words we are overlapping the table with

another transparent table with change in the column names. So we see the

column names of the transparent table and the data of the original

table.

Connection Managers:

Connection Manager is also something new in SSIS. To understand this let us consider the following screen shot of DTS.

ssis_5.jpg

In this package we see two connections. The first one is to the

database on the left and the other one is to the text file on the right.

What happens when we move this package from one environment to another

for example from development to QA? The connections to the databases and

the file locations change. Therefore we have to open up each package

and then each connection to configure it to point to the intended

environment. Typically each DTS package might contain up to 5-10

connections. And when we are moving a large number of packages from one

environment to another it becomes cumbersome. Please do note that this

can easily be overcome by using dynamic properties and disconnected

edits in DTS and we can either pick up the connection information from a

file at run-time or even from a configuration table from a database but

one needed to code for it or the configuration wasn’t that straight

forward.

Now, let’s see how the connection manager helps in making this

process easy in SSIS. Let us understand this with the help of an SSIS

example shown in the image below:

ssis_6.jpg

In the above package we see two connections and one database

connection. In SSIS, we have a centralized place where we can configure

the connection to a particular data source or data destination called

connection managers. All the connections inside the package make use of

these connection managers to make the connection to the data sources and

data destinations. As in DTS, SSIS also provides a wide variety of

connection managers to make connections to a variety of sources and

destinations. So when moving from one environment to another, we should

open up the SSIS package and just make the change only in one place,

which are the connection managers as opposed to multiple places as in

DTS. This saves us time and also is less cumbersome. In a little while

we will see how to dynamically alter the connections even without

opening up the packages. This is done with Configurations.

Variables:

Variables are a powerful piece of the SSIS architecture; they allow

you to dynamically control the package at runtime. There are two types

of variables: system and user. System variables are ones that are built

into SSIS, whereas user variables are created by the SSIS developer.

Variables can also have varying scope, with the default scope being the

entire package. They can also be set to be in scope of a container,

task, or event handler inside the package. The addition of scope to

variables is the main differentiating factor between SSIS variables and

DTS global variables.

Configurations:

SSIS provides us with a way of altering the connection information

and also the variable values at run time. This is done by using the

Configuration Files. The following screen shot shows us how to open up

the configuration editor in SSIS.

ssis_7.jpg

ssis_8.jpg

Right click on the control flow pane and click on the Configurations

menu. This will open up the configuration wizard. This is also

something which did not exist in the DTS. We will explore more of

package configurations and how to create configuration files in the

upcoming posts.

Transactions:

All Microsoft SQL Server 2005 Integration Services (SSIS) container

types—packages, the For Loop, For each Loop, and Sequence containers,

and the task hosts that encapsulate each task—can be configured to use

transactions. Integration Services provides three options for

configuring transactions: NotSupported, Supported, and Required.

Required indicates that the container starts a

transaction, unless one is already started by its parent container. If a

transaction already exists, the container joins the transaction

Supported indicates that the container does not start a

transaction, but joins any transaction started by its parent container.

For example, if a package with four Execute SQL tasks starts a

transaction and all four tasks use the Supported option, the database

updates performed by the Execute SQL tasks are rolled back if any task

fails. If the package does not start a transaction, the four Execute SQL

tasks are not bound by a transaction, and no database updates except

the ones performed by the failed task are rolled back.

NotSupported indicates that the container does not start a

transaction or join an existing transaction. A transaction started by a

parent container does not affect child containers that have been

configured to not support transactions. For example, if a package is

configured to start a transaction and a For Loop container in the

package uses the NotSupported option, none of the tasks in the For Loop

can roll back if they fail.

In this post, we covered a brief introduction to SSIS, its important

features and how SSIS differs from its predecessor: DTS. In future

posts, we will discuss each task in detail and will also cover the best

practices as well as go into the usage of SSIS in building up business

intelligence applications like explaining how to handle slowly changing

dimensions etc..