A naming conversion for MS datawarehousing projects

SQL Server

Database Objects

Object identifiers within relational database systems are typically not case sensitive and therefore the capitalisation styles adopted here are purely for the sake of maintaining consistency. An alternative capitalisation style may be adopted as long as it is consistent. The convention adopted here is the use of the underscore ( _ ) as a means of separating object identification prefixes, suffixes and key words where necessary.

Databases
Generally use pascal case and use meaning full name. Use underscore to separate prefixes, suffixes and other any key words where necessary. 

Schema's
Generally use simple case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Schemas should ideally be named with nouns.

Tables
Generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Although tables could be thought of as being collections of data, from a data model point of view it is considered more appropriate to name them in singular. Attempt to use nouns as much as possible.
e.g. Customers, dim_Customer, Product_Category, dim_Customer_Category

Fields within tables
For field names, generally use pascal case without any prefix or if a prefix is a must then use camel with underscores to separate the prefix from the field name. Use pascal case for the field name itself. To keep field names as short as possible, you may omit underscores. This is an exception made for the only field names for the sake of aesthetics.  
e,g, CustomerName,
old_CustomerName or old_Customer_Name

Note: Contrary to the above example, try to avoid using the entity name as part of the field name, such as CustomerName, CustomerAddress, CustomerTelephone where Customer is in fact the name of the table entity. Instead simply use Name, Address, Telephone, ext... The above example is only to illustrate the capitalisation rules used for this naming convention.

In multi-dimensional data models, prefix any surrogate key fields with key_ both for the primary key field  and any foreign key fields that reference the primary key. These are typically auto-generated integer based identity fields. Prefix natural keys (business or logical keys) with id_ or any other suitable prefix if available as long as they are consistent.

Prefix any bit (boolean) fields with is_ or has_ or any other suitable prefix available which convey a true or false value.

Views
Always prefix views with v_ and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Since view are typically based on queries, its more appropriate to pluralise their names. This will also help to differentiate between tables and views.
e.g. v_Customer_Sales

For field names, generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary.

Procedures
Always prefix with p_ for SQL stored procedure and pc_ for CLR stored procedures and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.
e.g. p_Get_Customer_Sales, pc_Get_Product_Promotions, p_Validate_Customer_Email, p_Load_dim_Customers

Functions
Always prefix with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.

Triggers
Always prefix with tr_ for SQL triggers and ta_ for CLR triggers. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Indexes
Always prefix ci_ for clustered indexes and ni_ for non clustered indexes. For unique indexes always prefix uci_ for unique clustered indexes and uni_ for unique non clustered indexes. The index name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the index all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the index.

e.g. ci_audit_Validation_Log_LogId :- 
This is a clustered index on the Validation_Log table located in the audit schema using the LogId field as the primary key
e.g. uni_audit_Rejects_Log_Component_Name_Key_Field_Key_Field_Value :-
This is a unique non clustered index on the Rejects_Log table located in the audit schema using the composite fields Component_Name, Key_Field and Key_Field_Value

Constraints
Always prefix with pk_ for primary key constraint, uq_ for unique key constraints, f_ for foreign key constraints, c_ for check constraints and  d_ for default constraints. The constraint name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the constraint all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the constraint.

Note that creating primary key and unique key constraints will also automatically create supporting indexes with the same name as the constraint.

e.g. pk_person_Address_Address_Id :-
This is a primary key on the Address table located in the person schema using the Address_Id field.

Full naming of foreign key constraints can become very lengthy. In order to shorten this avoid specifying the referencing field and only include the referenced primary key field from the referenced table as these two fields would typically be the same if the naming convention stated in this document is followed.

e.g. f_person_Address_person_State_Province_State_Province_Id :-
This is a foreign key on the Address table located in the person schema referencing the primary key field State_Province_Id on the State_Province table located in the person schema. Notice that only the primary key field of the referenced table is included in the name since the referencing field by implication should also be the same. 

SQL Statements

Since SQL is a declarative language, it does not inherently bring code blocks as you would see in C# or Java. Hence always use capital case for all SQL statements including built-in functions and the applicable case for all referenced objects. This method will provide an alternative visual code block separation based on SQL key words.
e.g.
SELECT
    *
FROM audit.Rejects_Log rl
JOIN audit.Execution_Log el 
    ON rl.Log_Id = el.Log_Id
LEFT JOIN audit.Execution_Log pel
    ON el.Parent_Log_Id = pel.Log_Id

Code all control flow statements in simple case. The following are the most commonly used control flow statements:

if .. else
while
begin .. end

You can use round brackets where available to block code. When using round brackets as code blocks, do not write anything on the right side of the left bracket ( and do not write anything on the left side of the right bracket ). The below is a correct format:

select (
     fld1
    ,fld2
) from (
    select
        fld1
       ,fld2
    from table
) as t

SQL Script files

SQL script files are used to perform builds and releases. The key is to make them rerun-able so that they can be applied on to new or existing releases. SQL scripts can be categorized as follows:
* DDL scripts
* Ad-hoc DDL and DML combined scripts
A release may contain one or more of the above types of script files.

Ideally each release should have a separate folder in which all scripts relating to that release is placed without having to prefix or suffix the version numbers to the script file itself. This way the script file name can remain the same from one version to another. Thus for an individual version release you can run the scripts contain in the folder for that version. And for a cumulative version release, you can run all the scripts in all the folders in order of version number or date.

DDL script file names can generally follow the same naming convention as the objects themselves as stated above with the addition of the schema prefix and file name extension of .sql. When referring to objects within the script file name do not consider the object identification prefixes to be part of the object name.

Schemas
Always prefix schema DDL scripts with sc_. Use underscore to separate any other prefixes, suffixes and key words where necessary.

Tables
Always prefix table DDL scripts with u_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.
e.g. u_production_Product_Category.sql :-
This script is the DDL script to create the table Product_Category located in the production schema

Views
Always prefix view DDL scripts with v_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.
e.g. v_sales_Store_With_Contacts.sql :-
This script is the DDL script to create the view Store_With_Contacts located in the sales schema

Procedures
Always prefix stored procedure DDL scripts with p_ for SQL stored procedure and pc_ for CLR stored procedures and include the full object name including the schema. Use underscore ( _ ) to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.
e.g. p_sales_Get_Customer_Sales, pc_production_Get_Product_Promotions, p_customer_Validate_Customer_Email, p_customer_Load_dim_Customers

Functions
Always prefix function DDL scripts with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.
e.g. if_audit_Get_Execution_Log_Tree.sql :-
This is the DDL script for the in-line table valued function Get_Execution_Log_Tree located in the audit schema. 

Triggers
Always prefix trigger DDL scripts with tr_ for SQL triggers and ta_ for CLR triggers and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Ad-hoc DDL and DML scripts
Apart from the regular DDL scripts relating to the database objects, there may be other scripts that are release and version specific. Such as when a new table is introduced and its required to pre-populate that table with some initial data, or when a field within an existing table is split into two fields in such a way that the data for both fields needs to be replaced and re-populated. These scripts may touch one or more objects within the database and typically perform some type of DML but may also contain DDL. Hence their names must reflect the specific operation. For these scripts generally use pascal case with underscore to separate prefixes, suffixes and other any key words where necessary. Since they usually perform actions attempt to use verbs as much as possible.

Where the names of any database objects become part of the file name, make sure to use the full object name including the schema.

e.g. u_dim_Lookup_Codes.sql :- This is the DDL script to create the dim_Lookup_Codes table in the default dbo schema (don't use a prefix to represent the default dbo schema)
Initialise_dim_Lookup_Codes.sql :- This is a DML script only specific to the release that introduced the dim_Lookup_Codes table which pre-populated the table with initial data.

SSIS

Unlike SQL Server, SSIS objects are case sensitive, so it maybe more appropriate to use camel or pascal case where its more relevant. However using underscores to separate camel case prefixes can still be useful to more easily identity component types and scopes. 

Packages and package files
Packages perform actions and as such, can in principle follow the same naming conventions as SQL procedures and functions. The package file can share the same name as the package objects with the addition of the .dtsx file extension. Unlike procedures or functions however they do not have a specific object type prefix. Instead they can start with a verb in pascal case with underscores to separate any prefixes, suffixes and key words. Where the names of any database objects become part of a name, make sure to use the full object name including the schema (except for the default dbo schema). 

e.g. Load_dim_Item.dtsx :-
This is the package file that performs the data load for the dim_Item table located in the default dbo schema. The name of the package object is also the same without the .dtsx file extension.
e.g. Process_sales_Facts.dtsx :-
This package is a driver package (master package) that is responsible for controlling the execution of all packages which loads the fact tables within the sales schema.

Connections
Prefix all connections with cn<Type>_<ConnectionName> where <Type> in camel or capital case is the abbreviated name of the connection manager. The following is a list of some of the more frequently used connection managers :
OLEDB - OLEDB connection manager
FlatFile - Flat file connection manager
ADONet - ADO.Net connection manager
for the complete list of connection managers see here https://msdn.microsoft.com/en-us/library/ms140203(v=sql.110).aspx

Substitute <ConnectionName> with the appropriate name of the connection in pascal case. In case the <ConnectionName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. cnOLEDB_AdventureWorks2012, 
cnFlatFile_Daily_Sales.csv :- This is a flat file connection manager to the Daily_Sales.csv comma separated value file.

Variables
When referring to variables its good practice to always use the fully qualified name. In order to easily identify the scope of variables declared at data flow level, its best to prefix them df<Name>_ in camel case for data flow variables where <Name> is the name of the data flow. For packaged scoped user variables, prefix then with pk<Name> where <Name> is the name of the variable. Other variables explicitly scoped at package and project levels need not have a prefix as the fully qualified name will readily reveal its scope.

@[User::dfDimItem_LoadStatus] - This is a user variable within the local scope of the dfDimItem data flow
@[$Package::FileArchivePath] - This is a package scoped variable
@[$Project::DatabaseServerName] - This is a project scoped variable

Components
Prefix control flow components with cf<Type>_<TaskName> and data flow components with df<Type>_<TaskName> where <Type> in camel or capital case is a abbreviated name of the component type.

Substitute the <TaskName> with the appropriate name of the control flow or data flow task in pascal case. In case the <TaskName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. Control flow tasks :
cfSEQC_Staging :- This is a sequence container within the control flow
cfDFT_StageDimITem :- This is a data flow task within the control flow
cfSQL_TruncateStaging :- This is a execute SQL task within the control flow

The following is the complete list of control flow component types to be substituted with <Type>
Task
Prefix
For Loop Container
FLC
Foreach Loop Container
FELC
Sequence Container
SEQC
ActiveX Script
AXS
Analysis Services Execute DDL
ASE
Analysis Services Processing
ASP
Bulk Insert
BLK
Data Flow
DFT
Data Mining Query
DMQ
Execute DTS 2000 Package
EDPT
Execute Package
EPT
Execute Process
EPR
Execute SQL
SQL
File System
FSYS
FTP
FTP
Message Queue
MSMQ
Script
SCR
Send Mail
SMT
Transfer Database
TDB
Transfer Error Messages
TEM
Transfer Jobs
TJT
Transfer Logins
TLT
Transfer Master Stored Procedures
TSP
Transfer SQL Server Objects
TSO
Web Service
WST
WMI Data Reader
WMID
WMI Event Watcher
WMIE
XML
XML
Expression
EXPR

e.g. Data flow tasks :
dfFFSRC_Daily_Sales.csv :- This is the data flow task responsible for loading the Daily_Sales.csv file.

The following is the complete list of data flow component types to be substituted with <Type>
Component
Prefix
DataReader Source
DRSRC
Excel Source
EXSRC
Flat File Source
FFSRC
OLE DB Source
OLESRC
Raw File Source
RFSRC
XML Source
XMLSRC
Aggregate
AGG
Audit
AUD
Character Map
CHM
Conditional Split
CSPL
Copy Column
CPYC
Data Conversion
DCNV
Data Mining Query
DMQ
Derived Column
DER
Export Column
EXPC
Fuzzy Grouping
FZG
Fuzzy Lookup
FZL
Import Column
IMPC
Lookup
LKP
Merge
MRG
Merge Join
MRGJ
Multicast
MLT
OLE DB Command
CMD
Percentage Sampling
PSMP
Pivot
PVT
Row Count
CNT
Row Sampling
RSMP
Script Component
SCR
Slowly Changing Dimension
SCD
Sort
SRT
Term Extraction
TEX
Term Lookup
TEL
Union All
ALL
Unpivot
UPVT
Data Mining Model Training
DMMTDST
DataReader Destination
DRDST
Dimension Processing
DPDST
Excel Destination
EXDST
Flat File Destination
FFDST
OLE DB Destination
OLEDST
Partition Processing
PPDST
Raw File Destination
RFDST
Recordset Destination
RSDST
SQL Server Destination
SSDST
SQL Server Mobile Destination
SSMDST

0 comments:

Post a Comment