Docker Containers for SQL Server shared database development


Whilst most application development environments utilise local databases running on individual developer machines, some projects such as ETL and data warehousing may require to work with shared databases where the databases run on a shared server and used on by many developers working on multiple projects.

In these circumstances the developers have to work very carefully with close co-ordination to ensure that they don't inadvertently undo each others work. If one developer has to work in isolation often they may make a temporary copy of the database on the shared server and work on that one separately. This can lead to a lot of unwanted branch databases lying around the server which are highly out of sync with the trunk version.

Now that Microsoft has made the docker images available for SQL Server 2016 and 2017 developer edition, there is a much better solution for managing shared database development.

I am documenting here the process I went though to set up such an environment with a set of sample databases. For simplicity sake, I have only created very basic database structure and not used any complex models.

The Docker engine for Windows can be run on Windows Server 2016 native containers and with Hyper-V enabled on Windows 10 Pro or Enterprise editions.

For the purpose of this documentation work, I am running Docker on my Windows 10 Pro machine with Hyper V enabled and my development environment is based on SQL Server 2016 Developer Edition.

So first I downloaded the Docker Image for SQL Server 2016 Developer from Docker Hub as follows:



The database development environment consist of a database that represent the Trunk branch (the production version) of the sample database application which I call APP1DB_Trunk.

In order to create a Docker image for this database I took copies of the .MDF and .LDF files of the existing APP1DB database. These files I stored locally in a folder named D:\PalProjects\Docker\sqldb1. Then in the same folder I created a text file named Dockerfile where the docker image instructions are placed as follows:



The FROM command instructs docker to use the SQL Server 2016 image which includes its dependant Windows Server Core. Then it runs a PowerShell command to create a new folder named SQLServer on the container's C: drive to which it copies the .MDF and .LDF database files from the local folder. It then sets two environment variables for the SA password and end user licence acceptance indicator. At present I have only intend to use SQL Authentication with SA user to connect to the SQL Server container.

Next, I change directory (CD) to the folder where I have placed the Dockerfile and the database files and ran the following to create the docker image of the database Trunk branch.



-t parameter indicates the tag name given to the image. I have used a naming convention as follows:

<client id>/<database name>_<branch>:<database version>-<sql server version>

As such my client ID is 1, database name is APP1DB, branch is Trunk, database version is 1.1 and SQL Server version is 2016. This is the naming convention I have used in this sample, but you may follow your own convention. Also note that docker image names must be all in lower case.

When this is completes, I check the docker image list with the following command:



This lists all the images present in my repository as depicted below:




Now when this image is launched into a container, the instructions in the Dockerfile gets processed and the database files gets copied over from the current host directory (C:\PalProjects\Docker\sqldb1) into the C:\SQLServer directory which gets created inside the container via the PowerShell mkdir command. When the container starts, an instance of the SQL Server 2016 should be running inside the container. However, the Dockerfile does not have any instructions to actually attach the .mdf and .ldf files to a database. This will be given as a parameter in the following command which launches the container from the image:



This command launches the container from the clt1/app1db_trunk:1.1-2016 image given in the last parameter above. The --name parameter assigns the name APP1DB_dev to the container. -d launches the container in detached mode. -p maps a port number from the image host (left side of the colon) to the image container (right side of the colon). As mentioned earlier, the Dockerfile did not have any instructions for attaching the .mdf and .ldf database files to create a database within the container. This is achieved by an -e parameter which sets the environment variable attach_dbs. The parameter value is a JSON string containing key/value pairs for the database name (dbName), and the database file and log file name (dbFiles) in a two element array.

Any number of containers can now be launched from the single image. In addition to the container named APP1DB_dev which I launched for development, I had also launched a second container for testing with the same command only with the --name parameter changed to APP1DB_uat.

To check what containers are currently runinng, issue the following command:



The currently running containers are listed as depicted below:





The containers configuration can be examined by using the docker inspect commandL



This will show the entire configuration as a JSON object. However for the purpose of being able to connect to the SQL Server instance running inside the container, I only want to know the IP Address that docker has assigned to the container. The following parameter can be used to filter the specific JSON item within the configuration:



The above parameter will show the IP Address docker has assigned to the container.




Note:- An important point to note is that the IP Address displayed above is only visible within the host environment. This means it cannot be accessed from outside the host from the LAN. However it is possible to configure the containers network to allow docker to use the LAN's IP address range when assigning an address to the container.

For this documentation, I have not gone into this network configuration and had limited the activity to the host environment only. Visit the Docker reference documentation website at https://docs.docker.com/engine/reference/run/ for detailed information on this topic. I shall update this document at a later date with how to configure container network to use the LAN's IP address range.

So with that said, to connect to the SQL Server instance within the above containers, launch SQL Server Management Studio and connect as follows :

















After successfully connecting, it will be possible to see the database that was attached within the Object Explorer :

























Having connected to the database, I then added some data to TABLE2 like so:


















Now I kept the best part of docker containers to the last. As I said at the beginning, when developing and testing in a shared database environment, there is often a need to have the database in a specific state to conduct a specific test. Perhaps to run in lengthy ETL flow that could take several hours or days. This can be very tricky to co-ordinate with other developers and testers. You can always launch a new container from the trunk image, but that defeats the point of being able to have the database in a state (mostly due to loosing the test data) appropriate for the test.

So before being able to commit a running container, it must be stopped. Now it appears that this is only a limitation with Windows and does not apply to Unix platforms. Hopefully this is something that would be addressed in the future. To stop the running container issue the following command:



With the container stopped, its current state can be committed to a new image with the following command:



This creates a new image by the name clt1/app1db_dev_test2.1.2-2016. As per my naming convention this referrers to a development branch of the APP1DB database used specifically for the test2. Since I could have also added any new objects such as tables, views stored procedures, I have increased the version to 1.2. The -2016 still referrers to the SQL Server 2016 version.

The new image can be seen in the repository with the 'docker image ls' command as used previously. This image now contains all the changes that was made so far in the development branch under the APP1DB_dev container. Now a new container can be launched from the newly committed image with the following command:



This launched the new container with the name APP1DB_dev_test2. I then examine its IP address in order to connect to the SQL Server instance as follows:




I can now connect to this IP Address with SQL Server Management Studio and run the required tests in isolation. The following shows the new rows that I added to the development branch container before committing its image.




































Those rows can now be seen in the new container and I can now make any changes to this container APP1DB_dev_test2, without effecting the other development branch container APP1DB_dev.



SSIS unit testing with SSISUnit framework in Visual Studio

In this article, I will present a simple ETL application scenario that will be used to demonstrate the SSISUnit unit testing framework in action.

SSISUnit is an open source unit testing framework for SQL Server Integration Services, developed and maintained by Pragmatic Works. You can find more details about it at CodePlex.

There are several ways to use SSISUnit:
a) via the GUI Windows Application
b) via the Command Line Application
c) via the .Net API

Options a) and b) uses XML files to store the test definitions, whilst option c) offers the most flexibility of using the API directly or in combination with XML definition files. In addition option c) is the method that allows us to seamlessly integrate SSISUnit with the Visual Studio unit testing framework.

The Visual Studio unit testing framework offers many features for developer focused unit testing all within one single Integrated Development Environment. For more information about the Visual Studio unit testing framework see this MSDN site.

I am going show you how I have integrated SSISUnit with the Visual Studio unit testing framework via option c) along with XML test definitions.

Development environment

First of all let me give you the details of the software I use and my development environment.

a) SQL Server 2012 Developer with Service Pack 3 running on a Windows 10 desktop (on a LAN)
b) Visual Studio 2012 Ultimate with update 5 running on a Windows 7 laptop (on a LAN)
c) SQL Server Data Tools 2012 (version 11.1) for Visual Studio
d) SQL Server Data Tools with Business Intelligence (version 11..0) for Visual Studio 2012

I have only developed this test application on the above specification, so I cant guarantee it working with other specifications. However SSISUnit is also available from CodePlex for the following environments:

* Visual Studio 2010 targeting SQL Server 2008
* Visual Studio 2013 targeting SQL Server 2014

Setup SSISUnit

Downloaded the source code from CodePlex. The download is a zip file containing several solutions for the following versions of Visual Studio.

SSISUnit_SQL2008.sln :- Visual Studio 2010 targeting SQL Server 2008
SSISUnit_SQL2012.sln :- Visual Studio 2012 targeting SQL Server 2012
SSISUnit_SQL2014.snl :- Visual Studio 2013 targeting SQL Server 2014

The solution I have used for this work is SSISUnit_SQL2012.snl. The solution contains several projects including one for the GUI application, console application and API libraries. The project file containing the API is SSISUnit_SQL2012.csproj. Note that the target framework for this project is .Net Framework 4. So ensure that this framework has been installed on the development environment first. Building this solution successfully will produce the .dll files required to use the SSISUnit API. Please note the following .dll files and their locations which will be referenced by the test application later.

<extract folder>\MAIN\SSISUnit\bin\Release\SSISUnit2012.dll
<extract folder>\MAIN\SSISUnit.Base\bin\Release\SSISUnitBase.dll

<extract folder> is the folder where you extracted the downloaded zip file.

Mixed mode assemblies

The SSISUnit framework uses several COM components which were developed with older versions (.Net 2.0 and earlier) of the .Net Framework. Prior to .Net Framework 4, these components would not work as mixed mode assemblies and needed to be recompiled with same version of .Net as the hosting application. However, starting with .Net Framework 4, it is possible to leave them as they are to work in mixed mode provided a setting is added in the Visual Studio testing framework configuration file as follows.

Edit the file named:
vtest.executionengine.x86.exe.config

This file is usually located in the following folder:
<local drive>\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\CommonExtensions\Microsoft\TestWindow\

Add the following setting:
<startup useLegacyV2RuntimeActivationPolicy="true">  </startup>

SSISUnit XML schema definition

With SSISUnit, we can create test definitions purely in XML. Although not strictly required, having a schema definition file (.xsd) does help a great deal in improving productivity and omitting errors through its intelli-sense and validation features. To enable this in Visual Studios, you can copy the SSISUnit.xsd file located in the <extract folder>\MAIN\SSISUnit\ folder to the Visual Studios XML schema definitions folder located in <local drive>\Program Files (x86)\Microsoft Visual Studio 11.0\xml\Schemas\. Once this file is copied to this location, Visual Studios should automatically detect the schema.

Thats all the setup that is required. Now will walk-through building a simple SSIS project and then add a unit test project using the Visual Studio testing framework.

Application scenario

Before proceeding further, a brief explanation of the sample application scenario maybe worth while. So this is a simple ETL process which maybe called by a master package, however for the sake of simplicity we will work with a single package. The scenario here is the loading of contacts data from a regular text file which is tab delimited and has the following column format.

Number :- A unique number for each contact of numeric type
Name :- The full name of the contact
Telephone :- A comma separated list of telephone numbers
Address :- The address of the contact
AddressDate :- The date when the contact was registered at the above address in yyyy-mm-dd format

The file will be named contacts.txt and delivered (perhaps by a master package) to a specific folder prior to executing the load package. Once successfully loaded this file could be removed, moved or renamed. For all intents and purposes, we are only interested in testing the SSIS package for loading the contacts from the source file to a destination database and not the file delivery and moving mechanism.

The file may contain duplicate records for the same contact in which case the load process should de-duplicate them based on the latest date value in the AddressDate column. Where their are duplicates with identical values in all columns, it should take a single record with distinct values. The processed records should then be checked against the database using the contact number business key that is held in the Number column. The contact numbers that do not exist in the database are inserted as new records. Where they already exist, those records need to be updated as type 1 changes (the existing values are overwritten). The Telephone column may contain blank values and needs to be accommodated in the database.

We will limit our ETL process to these few simple business rules, even through we know in reality they can become much more complex. These simple rules will allow us to more room to focus on understanding the SSISUnit Framework and how to integrate it with the Visual Studio unit testing framework.

SQL database

Now its time to create the SQL Server database and the tables within it. For this purpose, you can either create a new database or use an existing database.

Run the following set of scripts using SQL Server Management Studio or Visual Studio SQL Server Object Explorer.


Once the database and the set of tables are created, we can proceed to develop the SSIS package.

Visual Studio solution

In Visual Studio create a new blank solution. I always prefer to start with a blank solution and add the required projects. This is only my preference but is not a strict requirement. I have named my solution SampleSSISUnit.

To the solution add a new SSIS project and name the project SSISPackages or any other appropriate name. Rename the default package to LoadContacts.dtsx.

To the same solution add a new project, expand Visual C# folder and select Test. Then from the listed templates select Unit Test Project. Name the project SSISUnitTests. This will add a new Visual Studio unit test project.

To the SSISUnitTests project add a new C# class file named UnitTest_LoadContacts.cs. This is the file where we will write the code to drive the unit tests. We will also need to add a new .xml file to this project where we define the unit tests. Name this file as UnitTest_LoadContacts.ssisUnit.

At this point, in order to use the SSISUnit API, we need to reference the .dll files we built earlier. Inside the SSISUnitTest project right click on the References and select Add References option. Then browse to the location where the .dll files were built and add the 2 following files:

<extract folder>\MAIN\SSISUnit\bin\Release\SSISUnit2012.dll
<extract folder>\MAIN\SSISUnit.Base\bin\Release\SSISUnitBase.dll

In addition to the above .dll files, we also need to add a reference to the Microsoft.SQLServer.ManagedDTS library version 11.0.0.0 (for SQL Server 2012). This library can be found in the Assemblies, Extensions. This is the library that allows the SSISunit Framework to execute SSIS packages from within the SSISUnitTest project. 

The following is is a snap shot of the solution explorer with the required projects and files added.



SSIS package

So now we can build this minimal SSIS package, to first stage and then load the contacts.txt file into the SQL database.

Open the LoadContacts.dtsx package in Visual Studio and add the 2 following package level variables:

pkDBConnectionString :- This string type variable will hold the OLEDB connection string to the detination SQL database.
pkSourceFilePath :- This string type variable will hold the file system path to the location where the contacts.txt file will be delivered.

Although not strictly required, holding this information in user variable, allow a grater degree of flexibility, enabling us when required to inject different sets of source data from different locations and to change the destination database to different servers via the test definition .xml file without having to change any code as you will see later on.

Now we need to add the following 2 connection managers:

cnFlatFile_Contacts :- This is a Flat File connection to the source data file. It uses an expression that concatenates the pkSourceFilePath variable with the string "contacts.txt" to form the fully qualified file name as follows. @[User::pkSourceFilePath] + "contacts.txt"

cnOLEDB_ContactsDB :- This is a OLEDB connection to the destination SQL database. It uses the pkDBConnectionString variable through an expression to construct the connection string.

We first clear the staging.Contacts table prior to staging the contacts source data. Add a Execute SQL task into the control flow. Set the connection of this task to the cnOLEDB_ContactsDB connection manager and code the following SQL directly into it.



Then add a Data Flow task consisting of a Flat File source component and an OLEDB destination component. Set the Flat File source component to cnFlatFile_Contacts connection manager and the OLEDB destination component to cnOLEDB_ContactsDB connection manager and select staging.Contact table from the drop down. Make sure the source to destination field level mappings are correct.

Now add another Data Flow task to load the data from staging to the final table. Construct this data flow task as follows:


We start with a OLEDB Source component that is set to the cnOLEDB_ContactsDB connection manager. Code the following SQL command into this component:



The above code fetches the staged data from the staging.Contacts table with an additional field named rno that indicates a row number based on the most recent AddressDate column for each contact number represented by the Number column. This field is used later on down the pipeline for the purposes of de-duplication.

Next we add a Conditional Split component which uses a expression rno == 1 to check for de-duplicated records. All records with rno > 1 are considered duplicates and are ignored from the pipeline.

Add a Lookup component which will be used to check for the existence of contacts in the destination database. Set the component to use the cnOLEDB_ContactsDB connection manager and code the following SQL command to fetch the existing contact records from the database into the lookup.



Map the stg_Number from the source data to the id_Number field in the database table and select all other fields to be returned to the pipeline when a record is found.

Add another Conditional Split component and connect the 'Match' branch from the Lookup component to this split component. Here we will do a further check using an expression to see if any of the fields have changed between the staged source and destination records. Insert the following expression into the split condition:



Bellow the second split component add a OLEDB Command component and connect the split component to it. Set the cnOLEDB_ContactsDB connection manager to it and add the following SQL command.



The above command will update the matched record where a change has taken place. As can be seen the above command takes in the values for each of the fields as parameters. These parameters gets default names as param_0, param_1, param_2, param_3, param_4 in the order they appear in the command. Map each of these parameters to the staged source data fields as follows:


Please note that the above technique is not the most efficient way of updating records in a production environment, especially if the data set is large. However for the sake of simplicity we will use this method for this sample application.

For the 'No Match' branch of the lookup component, we will add a OLEDB Destination component and connect that branch to it. All new contact records with no matching numbers will travel via this branch to get inserted into the destination table. Set the cnOLEDB_ContactsDB connection manager to the destination component and select the destination table dim_Contacts from the drop down. Map all the fields from the sourced staging data set to the destination table as follows. Ignoring the key_Contact field since this is an incremental auto-identity field.



Unit test project

Now that the SSIS package has been built, we will see how to develop some basic unit tests that will cover the most important business rules of this ETL process. 

As mentioned earlier we will be using the SSISUnit .Net API with C# via the Visual Studio's own built-in unit testing framework. The main benefit you get with this method over the others, is that, you can now perform the unit tests as part of the solution build process. This can also be easily incorporated into a continuous integration process that you may be carrying out on regular basis. All the code relating to the application and tests are all located together within the same solution and saves you the trouble of maintaining any additional testing scripts in other locations.

SSISUnit offers the ability perform unit tests at individual task level or entire package level. At which level you conduct your tests depend entirely on the complexity of the application. For our sample project, we will be conducting one test at the task level and others at the package level, so you get experience both approaches.

All tests will follow the traditional xUnit structure of Setup, Test and Tear-down.

Test 1 :- Staging contacts data

In this test we will test the individual data flow task within the package that loads the data from the source file to the staging.Contacts table. For this we will use the following data set in a tab delimited text file. We will save this file as contacts_batch1.txt inside a sub folder named SourceFiles within the unit test project folder. As we shall see later, the test setup will take a copy of this file into the contacts.txt file which is used by the package.


NumberNameTelephoneAddressAddressDate
1James787,387,423,568,938,000,000,000,000,000,00024 Harper Ave, Middlesex, MH2 4BU22/04/2003
1James787,387,423,568,938,000,000,000,000,000,000154 Arvin Drive, Middlesex, MH5 7GD14/12/2005
2Charls7,656,928,671,776,520,000,0001A Park Street, Bedford, BD7 9OP16/01/1987
3Carol30 Church Road, Brighton, Essex, BR8 7UT02/09/2000
4Sarah9,897,876,521,276,760,000,00038 Kings Lane, Wimbledon, London, SW7 5TR07/12/2006
4Sarah9,897,876,521,276,760,000,00038 Kings Lane, Wimbledon, London, SW7 5TR07/12/2006

Now we need to create a test definition for this in xml. To do this open the UnitTest_LoadContacts.ssisUnit file that we added to the unit test project earlier and insert the following xml code.



If you take a closer look at the above xml, you will see some basic elements that are organized as follows.

<TestSuite>
<!-- A Test Suite is the main container for the entire test definition. A Test Suite contain one or more
       individual unit tests -->

    <ConnectionList>
    <!-- A Connection List contain one or more connection string to specific databases. These
           connections are then referenced from elsewhere in the definition to issue SQL commands to
           those databases --></ConnectionList>

    <PackageList>
    <!-- A Package List contain one or more fully qualified package paths pointing to individual SSIS
           packages. These are then referenced from elsewhere in the definition --> </PackageList>

    <TestSuiteSetup>
    <!-- Any setup that is required and common to all tests within the test suite are defined within this
           element. Setup can contain multiple commands, which can perform different types of
           operations. The supported command types are SQLCommand, ProcessCommand and
           VariableCommand. Note that this setup only runs once for the entire test suite-->
           </TestSuiteSetup>

    <Setup>
    <!-- Similar to the previous element, this element also may contain multiple commands. However
            unlike the previous setup this setup runs once for each individual test defined within the test
            suite. Hence these commands are intended to perform any common operations that are
            needed prior to running individual tests. --></Setup>

    <Tests>
    <!-- This element is acts as the container that holds one or more individual tests to be defines for
           the test suite. -->

        <Test>
        <!-- This is the container element for an individual test. Each test will follow the Setup, Test,
               Tear-down structure. -->

            <TestSetup>
            <!-- Within this element, any setup needed for this specific setup will be defined. Similar to
                    the other setup's, the command types used here can be SQLCommand,
                    ProcessCommand or VariableCommand. --></TestSetup>

            <Assert>
            <!-- This is the container element for the assertion definition of this individual test. Here the
                    expected results and the type of this assertion is established. Inside this element there can
                    be one more more commands defined that checks for the actual results of the test. These
                    command are of type SQLCommand, ProcessCommand and VariableCommand. -->
                   </Assert>

            <TestTearDown>
            <!-- Within this element there can be multiple commands that perform any operations needed
                    to clean up or reinstate data after the test has completed --></TestTearDown>

        </Test>
    </Tests>

    <TearDown>
    <!-- This element may contain one or more commands that perform any common operations
            needed to clean up or reinstate data after each individual test within the test suite has
            completed. --></TearDown>

    <TestSuiteTearDown>
    <!-- This element may contain one or more command that perform any operations needed to clean
            up or reinstate data after the entire test suite has completed. Unlike the previous tear down
            element, the command contained within this element will only run once for the entire test
            suite. --></TestSuiteTearDown>

</TestSuite>

Given the above information, lets take some time to examine the xml code for the test definition. We start by looking at the <Test> element which declares the unit test. We give its name attribute to be "TestBatch1Staging". The package attribute carries a reference to the SSIS package listed in the <PackageList> element at the top of the definition file. The task attribute specifies which specific task within the to execute to conduct the test. In this case we are testing the staging Data Flow named as cfDFT_StageContacts (this is a naming convention that I follow). In case we were testing the entire package, then the task attribute will carry the name of the package itself. The taskResult attribute indicates the expected result of the task or package and not the expected result of the unit test itself.

Next we look at the <TestSetup> element. This is where the commands that perform the test setup is specified. For this test, we have used three types of commands.

First we have a VariableCommand, which can perform Set and Get operations on SSIS package variables. This is a very useful feature which allows you to inject different values into the package for different test conditions or environments. Here we use it to set the file path to the contacts.txt source data file in the pkSourceFilePath variable. We also set the connection string for the destination database in the pkDBConnectionString variable.

Then we use a FileCommand, which can perform basic system file operations. We use this here to copy our Batch1 data set stored in the file contacts_batch1.txt to the contacts.txt file. This method gives us the flexibility to inject different data sets for different tests

Another SQLCommand is used to clear the staging table before running the test. This is performed by issuing the TRUNCATE TABLE staging.Contacts command on the database referenced by the connectionRef attribute.

Please note that all the folder paths, database names and connection strings specified in the code are referring to my own development environment. So you will need to replace them with the appropriate values for your environment.

Now look at the <Assert> element. Here we define its name as "AssertStagingContacts". The expectedResult attribute is "6" which is the number of records in the contacts_batch1.txt file and what we also expect to find in the staging table after the staging data flow has run. testBefore attribute is to define whether the assertion should run before the task has run. In this case we want it to run after the task has run, so we set it to false. The expression attribute is a boolean to indicate whether the assertion is an expression which in this case is not, so we set it to false.

Now we set the assertion command. You can use a number of different types of commands. For this test we use a SQLCommand named as "CmdCountStagingContacts" its connectionRef is set to "ContactsDB" which refers to the connection listed under the <ConnectionList> element at the very top of the test suite definition. The returnsValue attribute is set to "true" to indicates that the SQLCommand issued here is a simple SELECT COUNT(*) FROM staging.Contacts which returns a scalar value that will be compared to the value set in the expectedResult attribute to determine the passing or failure of the test. So if the SQL command returns a record count of 6, the test will pass, otherwise it will fail.

That concludes the definition of the test suite and the individual test within it.

Next, we need write the unit test code in C# using the SSISUnit .Net API to execute the test. This is done in the UnitTests_LoadContacts.cs file we added earlier to the project. So lets examine the code to execute the above test.



As can be seen above, the SSISUnit .Net API has been referenced by using the SsisUnit and SsisUnitBase namespaces. In addition a reference to System.xml namespace is also added to allow SSISUnit to serialize and de-serialize xml.

When you create a Unit Test Project in Visual Studios, it adds a class with the attribute [TestClass] with a public Initialize method with the attribute [TestInitialize]. The Initialize method is intended to be used for the purposes of setting configurations and allocating resources needed by all the tests in the class. In SSISUnit this initialization work will be carried out by the API based on the definitions specified within the Setup elements and so we would not need to directly code anything in the Initialize method.

Within the test class, we will create the following private member variables which will be used by all the tests.

private SsisTestSuite testSuite; /* This will hold a reference to the test suite object used by all tests  */
private TestResult testResult; /* This will hold a reference to the test result object used by all tests */
private Test test; /* This will hold a reference to an individual test object */
private Context context; /* This holds a reference to a context object that is used by the test */
private bool isTestPassed;  /* This is a boolean that indicates if the test has passed */

SSISUnit delegates several events that gets fired at various stages in the test execution cycle. The one that we are interested in for this project is the AssertCompleted event. This gets fired when SSISUnit has completed an Assert operation of a test. We will subscribe the following public method as the event handler to this event to handle the assertions for all the tests.

public void TestSuiteAssertCompleted(object sender, AssertCompletedEventArgs e)
        {
            if (e.AssertName != null)
            {
                testResult = e.TestExecResult;
                isTestPassed = e.TestExecResult.TestPassed;
            }
        }

The e parameter is an object that contains all the information pertaining to the assertion. In the most simplistic use of this method we simply check the e.TestExecResult.TestPassed property indicates whether the test passed or not and assign its boolean value to the isTestPassed member variable.

We now need to add a method to conduct the test. This should be a public method with the attribute [TestMethod] added at the top. Its best to give this method the same name that was used for its definition in the xml file which in this case is TestBatch1Staging as shown bellow.

[TestMethod]
public void TestBatch1Staging()
{
    testSuite = new SsisTestSuite(@"<path to the xml file>\UnitTest_LoadContacts.ssisUnit");
    test = testSuite.Tests["TestBatch1Staging"];
    testSuite.AssertCompleted += TestSuiteAssertCompleted;
    isTestPassed = false;
    context = testSuite.CreateContext();
    bool rs = test.Execute(context);
    Assert.AreEqual<bool>(true, isTestPassed);
    testSuite.AssertCompleted -= TestSuiteAssertCompleted;
}

The first line creates a new SsisTestSuite object by passing in the test definition xml file as a parameter. The resulting object will contain all the information about the test suite.

Next we access the specific test from the Tests collection property by specifying the test name as an index.

Then we subscribe our TestSuiteAssertCompleted method to the AssertCompleted event. The isTestPassed member variable is set to false before running the test. This is so that the default result of the test will always be false until the test result is assigned to it.

A new SSISUnit test context object is created which contains any references to system file paths and database connections.

Then we execute the test by calling test.Execute method passing to it the context as a parameter. This is the method that actually runs the SSIS package or task that was specified in the test definition. The method returns a boolean that indicates whether the package or task ran successfully. Note that this is not an indication about the pass or failure of the unit test itself. That happens next with the assertion.

Now we do the assertion with the Assert.AreEqual method which is a method of the Visual Studio unit test framework. This method takes in 2 boolean values as parameters, the first being the expected value and the second being the actual value which in this case is the value held in the isTestPassed member variable. If the two values match then the test is considered passed otherwise its a fail.

Finally we unsubscribe our assertion event handler so that another test can subscribe to it.

Test 2 :- Updating contacts with de-duplication

In this unit test we want to test the entire package execution to see if it correctly updates existing data by first de-duplicating any duplicate contact records that may exist in the source data file. In order to achieve this we will use a second batch of data in the file contacts_batch2.txt as given bellow:

NumberNameTelephoneAddressAddressDate
5Paul676,256,354,316,735,000,000,000,000,000,000201A Jasper Street, Croydon, London, CR4 9KT13/06/2003
2Charls765,692,867,177,652,000,000,000,000,000,00057 Tinpan Alley, Bedford, BD4 3GH10/10/2000
2Charls765,692,867,177,652,000,000,000,000,000,00078 Station Road, Bedford, BD3 6TU15/04/2001

As you can see in the above data set contact number 2 has duplicate records. As stated earlier in the Application Scenario, when inserting new records and updating old records, the ETL process should only pick the record with the most recent date which in this case should be the third record with the AddressDate of 15/04/2001. We will define this unit test to assert both the updating of an existing record and the correct de-duplication by issuing a SQL statement that checks for this specific record with these specific values.

Now lets define this test by coding the following xml into the UnitTest_LoacContacts.ssisUnit file.



Note that the above code should be inserted between the <Tests> and </Tests> elements. The test has been given a name as "TestBatch2OldDedupContact". This is the naming convention I follow, however you can choose any meaning name and also include spaces to make it more readable. You will notice that unlike the previous test Test 1, here the package and task attributes carries the same value which is the name of the package. This is because for this test the unit we are testing is the whole package. The <TestSetup> element is similar to Test 1 except that now the sourcePath attribute of the FileCommand elements refers to the contacts_batch2.txt file which will be copied overwriting the existing contacts.txt file if it already exist. Also notice that the final SQLCommand which truncates the staging table has been removed. The reason behind this would become more apparent when we examine the .Net C# code that will drives this test.

Look at the the <Assert> element. We have set its name attribute to be "AssertOldDedupContact". Again you may chose a more meaningful name if you wish, as this naming convention is just the one I have been using. The expectedResult is set to "1". The reason for this expected value will become apparent when we see the assertion SQL command. The other attributes for this element remain the same as the previous test.

For the assertion command we again use a SQLCommand with a the name attribute set as "CmdGetContactValues", and the rest of the attributes set exactly the same as the previous test. The SQL command that is issued to assert the test is as follows:

SELECT
            CASE WHEN
                  Name = 'Charls' AND
             Telephone = '76569286717,76520989012,87293752635' AND
             Address = '78 Station Road, Bedford, BD3 6TU' AND
             AddressDate = '2001-04-15'
           THEN 1 ELSE 0 END
          FROM dim_Contacts
          WHERE id_Number=2

This command simply filters contact number 2 and checks for specific field values that we know are the correct values for the test to pass. If they all match then the CASE statement will return a 1, otherwise a 0. When the return value is compared to the value set in the expectedResult attribute and if they are both 1, then the test will pass, otherwise it will fail.

Now lets look at the .Net code in C# in the unit test project and see how this test is conducted via the SSISUnit .Net API. To do this we add a new method to the UnitTest_LoadContacts.cs class file in Visual Studio as follows:


As in the previous test we give the method same as given to the unit test definition. As the code for this test is very similar to the previous test, you can refer to the explanation given earlier for an understanding of the API. Where the code for this test differs from the previous one is that here we are executing two tests. The first one "TestBatch1NewContacts" is executed first, but no assertion is done. This is a separate test definition created to test for new records from the contacts_batch1.txt file being correctly inserted. Here we are hijacking that test in order to initialize our destination dim_Contacts table with data that we can then update via the second test and perform the required test assertion.

So as you can see in the above code only one Assert.AreEqual method is called to assert the second test.

Other tests

Apart from the two tests mentioned above, I have created a few more tests that tests several application scenarios all of which are too much to document in this article. However you can download the entire sample with all the tests from my GitHub repository.

Running the test in Visual Studio Test Explorer

Before being able to run the tests, the unit test project needs to be built. When built all the methods with the attribute [TestMethod] attached to them will appear in the Test Explorer. You will then have a number of options on how to run them. You can also set all the tests to be run automatically when ever the solution is built which maybe useful for Continuous Integration builds. For more information about how to use the Visual Studio Test Explorer please refer to this MSDN link.

To view the Test Explorer, in Visual Studio simple select the Test menu, then Windows, the Test Explorer. You should see a window similar to the one bellow:


To run an individual test simply right click on a test and select 'Run Selected Test'. The test will execute and if passed will be indicated with a greed tick. If failed it will be indicated with a red exclamation mark.




A naming conversion for MS datawarehousing projects

SQL Server

Database Objects

Object identifiers within relational database systems are typically not case sensitive and therefore the capitalization styles adopted here are purely for the sake of maintaining consistency. An alternative capitalization style may be adopted as long as it is consistent. The convention adopted here is the use of the underscore ( _ ) as a means of separating object identification prefixes, suffixes and key words where necessary.

Databases
Generally use pascal case and use meaning full name. Use underscore to separate prefixes, suffixes and other any key words where necessary. 

Schema's
Generally use simple case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Schemas should ideally be named with nouns.

Tables
Generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and much as possible.
e.g. Customers, dim_Customers, Product_Categories, dim_Customer_Categories

Fields within tables
For field names, generally use pascal case without any prefix or if a prefix is a must then use camel with underscores to separate the prefix from the field name. Use pascal case for the field name itself. To keep field names as short as possible, you may omit underscores. This is an exception made for the only field names for the sake of aesthetics.  
e,g, CustomerName,
old_CustomerName or old_Customer_Name

Note: Contrary to the above example, try to avoid using the entity name as part of the field name, such as CustomerName, CustomerAddress, CustomerTelephone where Customer is in fact the name of the table entity. Instead simply use Name, Address, Telephone, ext... The above example is only to illustrate the capitalization rules used for this naming convention.

In multi-dimensional data models, prefix any surrogate key fields with key_ both for the primary key field  and any foreign key fields that reference the primary key. These are typically auto-generated integer based identity fields. Prefix natural keys (business or logical keys) with id_ or any other suitable prefix if available as long as they are consistent.

Prefix any bit (boolean) fields with is_ or has_ or any other suitable prefix available which convey a true or false value.

Views
Always prefix views with v_ and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. View are collections and should always be pluralized. Attempt to use nouns and mush as possible.
e.g. v_Customer_Sales

For field names, generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary.

Procedures
Always prefix with p_ for SQL stored procedure and pc_ for CLR stored procedures and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.
e.g. p_Get_Customer_Sales, pc_Get_Product_Promotions, p_Validate_Customer_Email, p_Load_dim_Customers

Functions
Always prefix with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.

Triggers
Always prefix with tr_ for SQL triggers and ta_ for CLR triggers. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Indexes
Always prefix ci_ for clustered indexes and ni_ for non clustered indexes. For unique indexes always prefix uci_ for unique clustered indexes and uni_ for unique non clustered indexes. The index name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the index all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the index.

e.g. ci_audit_Validation_Log_LogId :- 
This is a clustered index on the Validation_Log table located in the audit schema using the LogId field as the primary key
e.g. uni_audit_Rejects_Log_Component_Name_Key_Field_Key_Field_Value :-
This is a unique non clustered index on the Rejects_Log table located in the audit schema using the composite fields Component_Name, Key_Field and Key_Field_Value

Constraints
Always prefix with pk_ for primary key constraint, uq_ for unique key constraints, f_ for foreign key constraints, c_ for check constraints and  d_ for default constraints. The constraint name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the constraint all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the constraint.

Note that creating primary key and unique key constraints will also automatically create supporting indexes with the same name as the constraint.

e.g. pk_person_Address_Address_Id :-
This is a primary key on the Address table located in the person schema using the Address_Id field.

Full naming of foreign key constraints can become very lengthy. In order to shorten this avoid specifying the referencing field and only include the referenced primary key field from the referenced table as these two fields would typically be the same if the naming convention stated in this document is followed.

e.g. f_person_Address_person_State_Province_State_Province_Id :-
This is a foreign key on the Address table located in the person schema referencing the primary key field State_Province_Id on the State_Province table located in the person schema. Notice that only the primary key field of the referenced table is included in the name since the referencing field by implication should also be the same. 

SQL Statements

Since SQL is a declarative language, it does not inherently bring code blocks as you would see in C# or Java. Hence always use capital case for all SQL statements including built-in functions and the applicable case for all referenced objects. This method will provide an alternative visual code block separation based on SQL key words.
e.g.
SELECT
    *
FROM audit.Rejects_Log rl
JOIN audit.Execution_Log el 
    ON rl.Log_Id = el.Log_Id
LEFT JOIN audit.Execution_Log pel
    ON el.Parent_Log_Id = pel.Log_Id

Code all control flow statements in simple case. The following are the most commonly used control flow statements:

if .. else
while
begin .. end

You can use round brackets where available to block code. When using round brackets as code blocks, do not write anything on the right side of the left bracket ( and do not write anything on the left side of the right bracket ). The below is a correct format:

select (
     fld1
    ,fld2
) from (
    select
        fld1
       ,fld2
    from table
) as t

SQL Script files

SQL script files are used to perform builds and releases. The key is to make them rerun-able so that they can be applied on to new or existing releases. SQL scripts can be categorized as follows:
* DDL scripts
* Ad-hoc DDL and DML combined scripts
A release may contain one or more of the above types of script files.

Ideally each release should have a separate folder in which all scripts relating to that release is placed without having to prefix or suffix the version numbers to the script file itself. This way the script file name can remain the same from one version to another. Thus for an individual version release you can run the scripts contain in the folder for that version. And for a cumulative version release, you can run all the scripts in all the folders in order of version number or date.

DDL script file names can generally follow the same naming convention as the objects themselves as stated above with the addition of the schema prefix and file name extension of .sql. When referring to objects within the script file name do not consider the object identification prefixes to be part of the object name.

Schemas
Always prefix schema DDL scripts with sc_. Use underscore to separate any other prefixes, suffixes and key words where necessary.

Tables
Always prefix table DDL scripts with u_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.
e.g. u_production_Product_Category.sql :-
This script is the DDL script to create the table Product_Category located in the production schema

Views
Always prefix view DDL scripts with v_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.
e.g. v_sales_Store_With_Contacts.sql :-
This script is the DDL script to create the view Store_With_Contacts located in the sales schema

Procedures
Always prefix stored procedure DDL scripts with p_ for SQL stored procedure and pc_ for CLR stored procedures and include the full object name including the schema. Use underscore ( _ ) to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.
e.g. p_sales_Get_Customer_Sales, pc_production_Get_Product_Promotions, p_customer_Validate_Customer_Email, p_customer_Load_dim_Customers

Functions
Always prefix function DDL scripts with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.
e.g. if_audit_Get_Execution_Log_Tree.sql :-
This is the DDL script for the in-line table valued function Get_Execution_Log_Tree located in the audit schema. 

Triggers
Always prefix trigger DDL scripts with tr_ for SQL triggers and ta_ for CLR triggers and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Ad-hoc DDL and DML scripts
Apart from the regular DDL scripts relating to the database objects, there may be other scripts that are release and version specific. Such as when a new table is introduced and its required to pre-populate that table with some initial data, or when a field within an existing table is split into two fields in such a way that the data for both fields needs to be replaced and re-populated. These scripts may touch one or more objects within the database and typically perform some type of DML but may also contain DDL. Hence their names must reflect the specific operation. For these scripts generally use pascal case with underscore to separate prefixes, suffixes and other any key words where necessary. Since they usually perform actions attempt to use verbs as much as possible.

Where the names of any database objects become part of the file name, make sure to use the full object name including the schema.

e.g. u_dim_Lookup_Codes.sql :- This is the DDL script to create the dim_Lookup_Codes table in the default dbo schema (don't use a prefix to represent the default dbo schema)
Initialise_dim_Lookup_Codes.sql :- This is a DML script only specific to the release that introduced the dim_Lookup_Codes table which pre-populated the table with initial data.

SSIS

Unlike SQL Server, SSIS objects are case sensitive, so it maybe more appropriate to use camel or pascal case where its more relevant. However using underscores to separate camel case prefixes can still be useful to more easily identity component types and scopes. 

Packages and package files
Packages perform actions and as such, can in principle follow the same naming conventions as SQL procedures and functions. The package file can share the same name as the package objects with the addition of the .dtsx file extension. Unlike procedures or functions however they do not have a specific object type prefix. Instead they can start with a verb in pascal case with underscores to separate any prefixes, suffixes and key words. Where the names of any database objects become part of a name, make sure to use the full object name including the schema (except for the default dbo schema). 

e.g. Load_dim_Item.dtsx :-
This is the package file that performs the data load for the dim_Item table located in the default dbo schema. The name of the package object is also the same without the .dtsx file extension.
e.g. Process_sales_Facts.dtsx :-
This package is a driver package (master package) that is responsible for controlling the execution of all packages which loads the fact tables within the sales schema.

Connections
Prefix all connections with cn<Type>_<ConnectionName> where <Type> in camel or capital case is the abbreviated name of the connection manager. The following is a list of some of the more frequently used connection managers :
OLEDB - OLEDB connection manager
FlatFile - Flat file connection manager
ADONet - ADO.Net connection manager
for the complete list of connection managers see here https://msdn.microsoft.com/en-us/library/ms140203(v=sql.110).aspx

Substitute <ConnectionName> with the appropriate name of the connection in pascal case. In case the <ConnectionName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. cnOLEDB_AdventureWorks2012, 
cnFlatFile_Daily_Sales.csv :- This is a flat file connection manager to the Daily_Sales.csv comma separated value file.

Variables
When referring to variables its good practice to always use the fully qualified name. In order to easily identify the scope of variables declared at data flow level, its best to prefix them df<Name>_ in camel case for data flow variables where <Name> is the name of the data flow. For packaged scoped user variables, prefix then with pk<Name> where <Name> is the name of the variable. Other variables explicitly scoped at package and project levels need not have a prefix as the fully qualified name will readily reveal its scope.

@[User::dfDimItem_LoadStatus] - This is a user variable within the local scope of the dfDimItem data flow
@[$Package::FileArchivePath] - This is a package scoped variable
@[$Project::DatabaseServerName] - This is a project scoped variable

Components
Prefix control flow components with cf<Type>_<TaskName> and data flow components with df<Type>_<TaskName> where <Type> in camel or capital case is a abbreviated name of the component type.

Substitute the <TaskName> with the appropriate name of the control flow or data flow task in pascal case. In case the <TaskName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. Control flow tasks :
cfSEQC_Staging :- This is a sequence container within the control flow
cfDFT_StageDimITem :- This is a data flow task within the control flow
cfSQL_TruncateStaging :- This is a execute SQL task within the control flow

The following is the complete list of control flow component types to be substituted with <Type>
Task
Prefix
For Loop Container
FLC
Foreach Loop Container
FELC
Sequence Container
SEQC
ActiveX Script
AXS
Analysis Services Execute DDL
ASE
Analysis Services Processing
ASP
Bulk Insert
BLK
Data Flow
DFT
Data Mining Query
DMQ
Execute DTS 2000 Package
EDPT
Execute Package
EPT
Execute Process
EPR
Execute SQL
SQL
File System
FSYS
FTP
FTP
Message Queue
MSMQ
Script
SCR
Send Mail
SMT
Transfer Database
TDB
Transfer Error Messages
TEM
Transfer Jobs
TJT
Transfer Logins
TLT
Transfer Master Stored Procedures
TSP
Transfer SQL Server Objects
TSO
Web Service
WST
WMI Data Reader
WMID
WMI Event Watcher
WMIE
XML
XML
Expression
EXPR

e.g. Data flow tasks :
dfFFSRC_Daily_Sales.csv :- This is the data flow task responsible for loading the Daily_Sales.csv file.

The following is the complete list of data flow component types to be substituted with <Type>
Component
Prefix
DataReader Source
DRSRC
Excel Source
EXSRC
Flat File Source
FFSRC
OLE DB Source
OLESRC
Raw File Source
RFSRC
XML Source
XMLSRC
Aggregate
AGG
Audit
AUD
Character Map
CHM
Conditional Split
CSPL
Copy Column
CPYC
Data Conversion
DCNV
Data Mining Query
DMQ
Derived Column
DER
Export Column
EXPC
Fuzzy Grouping
FZG
Fuzzy Lookup
FZL
Import Column
IMPC
Lookup
LKP
Merge
MRG
Merge Join
MRGJ
Multicast
MLT
OLE DB Command
CMD
Percentage Sampling
PSMP
Pivot
PVT
Row Count
CNT
Row Sampling
RSMP
Script Component
SCR
Slowly Changing Dimension
SCD
Sort
SRT
Term Extraction
TEX
Term Lookup
TEL
Union All
ALL
Unpivot
UPVT
Data Mining Model Training
DMMTDST
DataReader Destination
DRDST
Dimension Processing
DPDST
Excel Destination
EXDST
Flat File Destination
FFDST
OLE DB Destination
OLEDST
Partition Processing
PPDST
Raw File Destination
RFDST
Recordset Destination
RSDST
SQL Server Destination
SSDST
SQL Server Mobile Destination
SSMDST