A naming conversion for MS datawarehousing projects ~ Ravi Palihena Tek pages

SQL Server

Database Objects

Object identifiers within relational database systems are typically not case sensitive and therefore the capitalisation styles adopted here are purely for the sake of maintaining consistency. An alternative capitalisation style may be adopted as long as it is consistent. The convention adopted here is the use of the underscore ( _ ) as a means of separating object identification prefixes, suffixes and key words where necessary.

Databases

Generally use pascal case and use meaning full name. Use underscore to separate prefixes, suffixes and other any key words where necessary.

Schema's

Generally use simple case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Schemas should ideally be named with nouns.

Tables

Generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Although tables could be thought of as being collections of data, from a data model point of view it is considered more appropriate to name them in singular. Attempt to use nouns as much as possible.

e.g. Customers, dim_Customer, Product_Category, dim_Customer_Category

Fields within tables

For field names, generally use pascal case without any prefix or if a prefix is a must then use camel with underscores to separate the prefix from the field name. Use pascal case for the field name itself. To keep field names as short as possible, you may omit underscores. This is an exception made for the only field names for the sake of aesthetics.

e,g, CustomerName,
old_CustomerName or old_Customer_Name

Note: Contrary to the above example, try to avoid using the entity name as part of the field name, such as CustomerName, CustomerAddress, CustomerTelephone where Customer is in fact the name of the table entity. Instead simply use Name, Address, Telephone, ext... The above example is only to illustrate the capitalisation rules used for this naming convention.

In multi-dimensional data models, prefix any surrogate key fields with key_ both for the primary key field and any foreign key fields that reference the primary key. These are typically auto-generated integer based identity fields. Prefix natural keys (business or logical keys) with id_ or any other suitable prefix if available as long as they are consistent.

Prefix any bit (boolean) fields with is_ or has_ or any other suitable prefix available which convey a true or false value.

Views

Always prefix views with v_ and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Since view are typically based on queries, its more appropriate to pluralise their names. This will also help to differentiate between tables and views.

e.g. v_Customer_Sales

For field names, generally use pascal case without any prefix or if a prefix is a must then use camel case. Use underscore to separate prefixes, suffixes and other any key words where necessary.

Procedures

Always prefix with p_ for SQL stored procedure and pc_ for CLR stored procedures and generally use pascal case. Use underscore to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.

e.g. p_Get_Customer_Sales, pc_Get_Product_Promotions, p_Validate_Customer_Email, p_Load_dim_Customers

Functions

Always prefix with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.

Triggers

Always prefix with tr_ for SQL triggers and ta_ for CLR triggers. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Indexes

Always prefix ci_ for clustered indexes and ni_ for non clustered indexes. For unique indexes always prefix uci_ for unique clustered indexes and uni_ for unique non clustered indexes. The index name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the index all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the index.

e.g. ci_audit_Validation_Log_LogId :-

This is a clustered index on the Validation_Log table located in the audit schema using the LogId field as the primary key

e.g. uni_audit_Rejects_Log_Component_Name_Key_Field_Key_Field_Value :-

This is a unique non clustered index on the Rejects_Log table located in the audit schema using the composite fields Component_Name, Key_Field and Key_Field_Value

Constraints

Always prefix with pk_ for primary key constraint, uq_ for unique key constraints, f_ for foreign key constraints, c_ for check constraints and d_ for default constraints. The constraint name should ideally contain the full name of the table referenced including the schema (except perhaps the dbo schema) and the fields used for the constraint all separated by underscores. Try to maintain the same case of the referenced schema, table and field objects in the name of the constraint.

Note that creating primary key and unique key constraints will also automatically create supporting indexes with the same name as the constraint.

e.g. pk_person_Address_Address_Id :-

This is a primary key on the Address table located in the person schema using the Address_Id field.

Full naming of foreign key constraints can become very lengthy. In order to shorten this avoid specifying the referencing field and only include the referenced primary key field from the referenced table as these two fields would typically be the same if the naming convention stated in this document is followed.

e.g. f_person_Address_person_State_Province_State_Province_Id :-

This is a foreign key on the Address table located in the person schema referencing the primary key field State_Province_Id on the State_Province table located in the person schema. Notice that only the primary key field of the referenced table is included in the name since the referencing field by implication should also be the same.

SQL Statements

Since SQL is a declarative language, it does not inherently bring code blocks as you would see in C# or Java. Hence always use capital case for all SQL statements including built-in functions and the applicable case for all referenced objects. This method will provide an alternative visual code block separation based on SQL key words.

e.g.

SELECT

FROM audit.Rejects_Log rl

JOIN audit.Execution_Log el

ON rl.Log_Id = el.Log_Id

LEFT JOIN audit.Execution_Log pel

ON el.Parent_Log_Id = pel.Log_Id

Code all control flow statements in simple case. The following are the most commonly used control flow statements:

if .. else
while
begin .. end

You can use round brackets where available to block code. When using round brackets as code blocks, do not write anything on the right side of the left bracket ( and do not write anything on the left side of the right bracket ). The below is a correct format:

select (
fld1
,fld2
) from (
select
fld1
,fld2
from table
) as t

SQL Script files

SQL script files are used to perform builds and releases. The key is to make them rerun-able so that they can be applied on to new or existing releases. SQL scripts can be categorized as follows:

* DDL scripts

* Ad-hoc DDL and DML combined scripts

A release may contain one or more of the above types of script files.

Ideally each release should have a separate folder in which all scripts relating to that release is placed without having to prefix or suffix the version numbers to the script file itself. This way the script file name can remain the same from one version to another. Thus for an individual version release you can run the scripts contain in the folder for that version. And for a cumulative version release, you can run all the scripts in all the folders in order of version number or date.

DDL script file names can generally follow the same naming convention as the objects themselves as stated above with the addition of the schema prefix and file name extension of .sql. When referring to objects within the script file name do not consider the object identification prefixes to be part of the object name.

Schemas
Always prefix schema DDL scripts with sc_. Use underscore to separate any other prefixes, suffixes and key words where necessary.

Tables

Always prefix table DDL scripts with u_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.

e.g. u_production_Product_Category.sql :-

This script is the DDL script to create the table Product_Category located in the production schema

Views

Always prefix view DDL scripts with v_ and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Tables are collections and should always be pluralized. Attempt to use nouns and mush as possible.

e.g. v_sales_Store_With_Contacts.sql :-

This script is the DDL script to create the view Store_With_Contacts located in the sales schema

Procedures

Always prefix stored procedure DDL scripts with p_ for SQL stored procedure and pc_ for CLR stored procedures and include the full object name including the schema. Use underscore ( _ ) to separate prefixes, suffixes and other any key words where necessary. Procedures perform actions, do attempt to use verbs as much as possible.

e.g. p_sales_Get_Customer_Sales, pc_production_Get_Product_Promotions, p_customer_Validate_Customer_Email, p_customer_Load_dim_Customers

Functions

Always prefix function DDL scripts with fn_ for scalar functions, tf_ for table values functions, if_ for in-line table valued functions, fs_ for CLR scalar functions, ft_ for CLR table valued functions and af_ for aggregate CLR functions and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Functions perform actions, so attempt to use verbs as much as possible.

e.g. if_audit_Get_Execution_Log_Tree.sql :-

This is the DDL script for the in-line table valued function Get_Execution_Log_Tree located in the audit schema.

Triggers

Always prefix trigger DDL scripts with tr_ for SQL triggers and ta_ for CLR triggers and include the full object name including the schema. Use underscore to separate prefixes, suffixes and other any key words where necessary. Triggers perform actions, so attempt to use verbs as much as possible.

Ad-hoc DDL and DML scripts

Apart from the regular DDL scripts relating to the database objects, there may be other scripts that are release and version specific. Such as when a new table is introduced and its required to pre-populate that table with some initial data, or when a field within an existing table is split into two fields in such a way that the data for both fields needs to be replaced and re-populated. These scripts may touch one or more objects within the database and typically perform some type of DML but may also contain DDL. Hence their names must reflect the specific operation. For these scripts generally use pascal case with underscore to separate prefixes, suffixes and other any key words where necessary. Since they usually perform actions attempt to use verbs as much as possible.

Where the names of any database objects become part of the file name, make sure to use the full object name including the schema.

e.g. u_dim_Lookup_Codes.sql :- This is the DDL script to create the dim_Lookup_Codes table in the default dbo schema (don't use a prefix to represent the default dbo schema)

Initialise_dim_Lookup_Codes.sql :- This is a DML script only specific to the release that introduced the dim_Lookup_Codes table which pre-populated the table with initial data.

SSIS

Unlike SQL Server, SSIS objects are case sensitive, so it maybe more appropriate to use camel or pascal case where its more relevant. However using underscores to separate camel case prefixes can still be useful to more easily identity component types and scopes.

Packages and package files

Packages perform actions and as such, can in principle follow the same naming conventions as SQL procedures and functions. The package file can share the same name as the package objects with the addition of the .dtsx file extension. Unlike procedures or functions however they do not have a specific object type prefix. Instead they can start with a verb in pascal case with underscores to separate any prefixes, suffixes and key words. Where the names of any database objects become part of a name, make sure to use the full object name including the schema (except for the default dbo schema).

e.g. Load_dim_Item.dtsx :-

This is the package file that performs the data load for the dim_Item table located in the default dbo schema. The name of the package object is also the same without the .dtsx file extension.

e.g. Process_sales_Facts.dtsx :-

This package is a driver package (master package) that is responsible for controlling the execution of all packages which loads the fact tables within the sales schema.

Connections

Prefix all connections with cn<Type>_<ConnectionName> where <Type> in camel or capital case is the abbreviated name of the connection manager. The following is a list of some of the more frequently used connection managers :

OLEDB - OLEDB connection manager

FlatFile - Flat file connection manager

ADONet - ADO.Net connection manager

for the complete list of connection managers see here https://msdn.microsoft.com/en-us/library/ms140203(v=sql.110).aspx

Substitute <ConnectionName> with the appropriate name of the connection in pascal case. In case the <ConnectionName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. cnOLEDB_AdventureWorks2012,

cnFlatFile_Daily_Sales.csv :- This is a flat file connection manager to the Daily_Sales.csv comma separated value file.

Variables

When referring to variables its good practice to always use the fully qualified name. In order to easily identify the scope of variables declared at data flow level, its best to prefix them df<Name>_ in camel case for data flow variables where <Name> is the name of the data flow. For packaged scoped user variables, prefix then with pk<Name> where <Name> is the name of the variable. Other variables explicitly scoped at package and project levels need not have a prefix as the fully qualified name will readily reveal its scope.

@[User::dfDimItem_LoadStatus] - This is a user variable within the local scope of the dfDimItem data flow

@[$Package::FileArchivePath] - This is a package scoped variable

@[$Project::DatabaseServerName] - This is a project scoped variable

Components

Prefix control flow components with cf<Type>_<TaskName> and data flow components with df<Type>_<TaskName> where <Type> in camel or capital case is a abbreviated name of the component type.

Substitute the <TaskName> with the appropriate name of the control flow or data flow task in pascal case. In case the <TaskName> refer to a database object such as a table or system object such as a flat file, make sure to include its full name and maintain the same case as used in the referred object.

e.g. Control flow tasks :

cfSEQC_Staging :- This is a sequence container within the control flow

cfDFT_StageDimITem :- This is a data flow task within the control flow

cfSQL_TruncateStaging :- This is a execute SQL task within the control flow

The following is the complete list of control flow component types to be substituted with <Type>

Task	Prefix
For Loop Container	FLC
Foreach Loop Container	FELC
Sequence Container	SEQC
ActiveX Script	AXS
Analysis Services Execute DDL	ASE
Analysis Services Processing	ASP
Bulk Insert	BLK
Data Flow	DFT
Data Mining Query	DMQ
Execute DTS 2000 Package	EDPT
Execute Package	EPT
Execute Process	EPR
Execute SQL	SQL
File System	FSYS
FTP	FTP
Message Queue	MSMQ
Script	SCR
Send Mail	SMT
Transfer Database	TDB
Transfer Error Messages	TEM
Transfer Jobs	TJT
Transfer Logins	TLT
Transfer Master Stored Procedures	TSP
Transfer SQL Server Objects	TSO
Web Service	WST
WMI Data Reader	WMID
WMI Event Watcher	WMIE
XML	XML
Expression	EXPR

e.g. Data flow tasks :

dfFFSRC_Daily_Sales.csv :- This is the data flow task responsible for loading the Daily_Sales.csv file.

The following is the complete list of data flow component types to be substituted with <Type>

Component	Prefix
DataReader Source	DRSRC
Excel Source	EXSRC
Flat File Source	FFSRC
OLE DB Source	OLESRC
Raw File Source	RFSRC
XML Source	XMLSRC
Aggregate	AGG
Audit	AUD
Character Map	CHM
Conditional Split	CSPL
Copy Column	CPYC
Data Conversion	DCNV
Data Mining Query	DMQ
Derived Column	DER
Export Column	EXPC
Fuzzy Grouping	FZG
Fuzzy Lookup	FZL
Import Column	IMPC
Lookup	LKP
Merge	MRG
Merge Join	MRGJ
Multicast	MLT
OLE DB Command	CMD
Percentage Sampling	PSMP
Pivot	PVT
Row Count	CNT
Row Sampling	RSMP
Script Component	SCR
Slowly Changing Dimension	SCD
Sort	SRT
Term Extraction	TEX
Term Lookup	TEL
Union All	ALL
Unpivot	UPVT
Data Mining Model Training	DMMTDST
DataReader Destination	DRDST
Dimension Processing	DPDST
Excel Destination	EXDST
Flat File Destination	FFDST
OLE DB Destination	OLEDST
Partition Processing	PPDST
Raw File Destination	RFDST
Recordset Destination	RSDST
SQL Server Destination	SSDST
SQL Server Mobile Destination	SSMDST

A naming conversion for MS datawarehousing projects

SQL Server

Database Objects

SQL Statements

SQL Script files

SSIS

0 comments:

Post a Comment

Categories

Blog Archive