Dinesh's Blog :::: Being Compiled ::::

Sunday, August 25, 2013

Using GUID column as Clustered Index Key– SS SLUG Aug 2013 – Demo II

GUIDs are commonly used in distributed applications which require “uniqueness” across the entire world. Unfortunately I have seen the usage of GUIDs with clustered keys in non-distributed applications, where global uniqueness is not required. This was discussed in my presentation and showed how useful the GUIDs as clustered key as well as how it makes the index fragmented. This post is for the demo code related to the discussion.

Before going through the code, we must understand that GUIDs are not as bad as we think if it is managed well. You can make an uniqueidentifier column as PRIMARY KEY, as clustered key. Although it does not as efficient as int data type, it gives moderate efficiency. Let’s look at how this makes the clustered index fragmented and how it can be avoided.

USE tempdb

GO

-- create a table with uniqueidentifier

-- and make it as the clustered key

IF OBJECT_ID('dbo.GUID_Table') IS NOT NULL

    DROP TABLE dbo.GUID_Table

GO

CREATE TABLE dbo.GUID_Table

    Id uniqueidentifier PRIMARY KEY

    , name char(2000)

GO

-- insert 100 records with default values

INSERT INTO dbo.GUID_Table

VALUES

    (NEWID(), 'a')

GO 100

SELECT * FROM sys.dm_db_index_physical_stats (

    DB_ID(), OBJECT_ID('dbo.GUID_Table'), NULL, NULL, 'DETAILED')

Once the last statement is run, you will see how fragmented your table is. For more info on fragmentation, please refer: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html.

Both external fragmentation (97%) and internal fragmentation (54%) are very high. The reason is, page splits and records movement during insertions. Since GUIDs are not sequentially generated, record placement in pages is always an issue for SQL Server. What happen is, when a GUID is to be inserted as the key (entire record in this case), it looks for the page which record needs to be placed, and if no space in the page, it splits the page, moving 50% of records in the page to a new page, breaking the order of the pages which are ordered based on keys. Run the code below for seeing the linkage between pages.

DBCC IND (tempdb, [GUID_Table], -1)

As you see, SQL Server has to do many “read-back” for reading data sequentially, making all queries slowing down. The only way to avoid this with GUIDs is, use NEWSEQUENTIALID instead of NEWID. It generates GUIDs that are sequential to the last generated GUID. If Insertion is made using NEWSEQUENTIALID, external fragmentation will be lesser because of its sequential order on generation. Re-create the table and run the INSERT statement as below;

-- create table again

DROP TABLE dbo.GUID_Table

GO

CREATE TABLE dbo.GUID_Table

    Id uniqueidentifier PRIMARY KEY DEFAULT (NEWSEQUENTIALID())

    , name char(2000)

INSERT INTO dbo.GUID_Table

VALUES

    (DEFAULT, 'a')

GO 100

SELECT * FROM sys.dm_db_index_physical_stats (

    DB_ID('tempdb'), OBJECT_ID('GUID_Table'), NULL, NULL, 'DETAILED')

DBCC IND (tempdb, [GUID_Table], -1)

If you analyze PagePID and NextPagePID in DBCC IND result-set now, you will see how pages are ordered and no “read-backward” is needed. And the SELECT statement proves that no fragmentation has happened too. This clearly shows that with NEWSEQUENTIALID, split is not required as the value generated is always greater than the value exist. There are two key things not remember on it;

NEWSEQUENTIALID always generates a higher value greater than the one generated before by same server.
The uniqueness is limited to the server used only. Duplication can happen if values are generated with two servers.

Saturday, August 24, 2013

Understanding Index Fragmentation – SS SLUG Aug 2013 – Demo I

Index fragmentation is something we need to monitor frequently specifically on databases that are heavily indexed. If indexes added are fragmented, you will not get the expected performance from indexes. This is one of the areas discussed with last user group meeting, here is the sample codes that shows how indexes are getting fragmented.

Run below code to create a sample table with values.

USE tempdb

GO

-- creating test table

IF OBJECT_ID(N'dbo.TestTable', 'U') IS NOT NULL

BEGIN

DROP TABLE dbo.TestTable

END

GO

CREATE TABLE TestTable (Id int IDENTITY(1,1) PRIMARY KEY, [Value] varchar(900))

GO

-- Inserting sample values

DECLARE @a int, @b int

SET @a = 65

WHILE (@a < 91)

BEGIN

    SET @b = 0

    WHILE (@b < 20)

    BEGIN

        INSERT INTO TestTable ([Value])

        SELECT REPLICATE(CHAR(@a), 445) + CONVERT(VARCHAR(10), @b)

        SET @b = @b + 1

END

    SET @a = @a + 2

END

GO

-- See the values inserted, [Value] column contains

-- values like AAAA..., CCCC..., EEEE..., etc

SELECT * FROM dbo.TestTable ORDER BY Id

-- making an index on [Value] column

CREATE INDEX IX_TestTable ON dbo.TestTable([Value])

GO

-- checking for internal fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'), 2, NULL, 'DETAILED')

The last statement which is DMF used to check the fragementation of indexes. Two columns used for checking the fragmentation, avg_fragmentation_in_percent for External fragmentation and avg_page_space_used_in_percent for Internal fragmentation.

Here are few points discussed regarding fragmentation;

Internal fragmentation
Inefficient use of pages within an index because the amount of data stored within each page is less than the data page can contain.
- <= 30% reorganize index
- > 30% rebuild index
External fragmentation (Logical and Extent)
Inefficient use of pages within an index because the logical order of the page is wrong.
- <= 75% and >= 60% reorganize index
- < 60% rebuild index

As per result, we do not need to worry much on fragmentation. SQL Server uses 16 pages for holding 260 records (averagely 16 records per page). For testing purposes, let’s fragment the index . Here is the code;

-- reducing the zize of the index key for some records

UPDATE dbo.TestTable

SET [Value] = LEFT([Value], 1)

WHERE Id % 2 = 0

-- checking for internal fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'), 2, NULL, 'DETAILED')

And the result is;

As you see, since we removed some characters from Value column, pages have space now, only 47% of space has been used. You really do not need 16 pages for holding 260 records now but SQL Server still holds records in 260 pages which is not efficient, making more IO and using more memory for your queries, decreasing the performance of them. Rebuilding the index sorts this out;

-- removing the fragmentation

ALTER INDEX IX_TestTable ON dbo.TestTable REBUILD

What you have to remember is, update like above makes your indexes fragmented, hence try to avoid such updates, if performed, make sure you check for fragmentation and defragment if required.

Let’s start from the beginning for understanding External fragmentation. It is all about breaking the order of the records. Run the below code for understanding the current order. Make sure you run the first code segment above (table creation and inserting)again before running this.

-- checking for external fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'),2, NULL, 'DETAILED')

-- run this undocumented command to see the NextPageID

DBCC IND (tempdb, 'TestTable', 2)

Here are the results;

Look at the first result, it says that External fragmentation of the clustered index is 18.5% which is okay. Next result is from DBCC IND command (For more info on DBCC IND, refer: http://dinesql.blogspot.com/2013/08/dbcc-ind.html) shows how pages are linked. Focus on PagePID and NextPagePID columns, and PageType column. Page type 2 represents index pages. Index level 0 represents leaf pages. This says that next page of 312is 314 and next page of 314 is 316, which is a perfect order.

Let’s make the table externally fragmented. Run the code below;

-- inserting new values such as

-- Bs, Ds, Fs, etc....

-- Making page-splits

DECLARE @a INT

SET @a = 66

WHILE (@a < 86)

BEGIN

    INSERT INTO TestTable ([Value])

    SELECT REPLICATE(CHAR(@a), 445) + CONVERT(VARCHAR(10), @a)

    SET @a = @a + 2

END

GO

When records are inserted in between existing values (B has to be inserted between A and C), SQL Server needs to check and see the space availability of the correct page. If no space available, page-split occurs and 50% of the records are moved to a new page. This breaks the order of pages. Run the sys.dm_db_index_physical_stats and DBCC IND again and see;

We can clearly see what has happened to the order of pages, it is broken, it is fragmented. When you query data from this table (for example: data from pages 312, 341, 50699, 316), it has to read forward (which is fine) and read back (which is very costly) making more IOs and slowing down the generation of the resultset. If you rebuild the index, this can be fixed, however what you have to remember is, not to make your indexes externally fragmented from your actions, making query performance very poor. Here are some points you need to consider when dealing with index keys. These reduce external fragmentation.

Select a non-volatile columns for the keys.
Make sure column values are increasing values.

Presentation related to this demo is available at: SQL Server Universe.

DBCC IND

DBCC IND command is one of the important commands used when indexes are analyzed. Specifically this can be used for seeing the linkage between pages associated with a given table. It requires three parameters;

DBCC IND ( {‘db_name’ | ‘db_id’}, {‘table_name’ | ‘table_id’}, {‘index_name’ | ‘index_id’ | –1}

db_name | ‘db_id – requires database name or id. If 0 or ‘’ passed, current database will be used.
table_name | table_id – requires table name or object id of the table.
index_name | index_id | –1 – requires index id of the table. If –1 is used, result is generated for all the indexes.

Here is brief on output of IND command.

PageFID	File number where the page is located
PagePID	Page number for the page
IAMFID	File ID where the IAM page is located
IAMPID	Page ID for the page in the data file
ObjectID	Object ID for the associated table
IndexID	Index ID associated with the heap or index
PartitionNumber	Partition number for the heap or index
PartitionID	Partition ID for the heap or index
iam_chain_type	he type of IAM chain the extent is used for. Values can be in-row data, LOB data, and overflow data.
PageType	Number identifying the page type; 1 - Data page 2 - Index page 3 - Large object page 4 - Large object page 8 - Global Allocation Map page 9 - Share Global Allocation Map page 10 - Index Allocation Map page 11 - Page Free Space page 13 - Boot page 15 - File header page 16 - Differential Changed Map page 17 - Bulk Changed Map page
IndexLevel	Level at which the page exists in the page organizational structure. The levels are organized from 0 to N, where 0 is the lowest level of the index and N is the index root
NextPageFID	File number where the next page at the index level is located
NextPagePID	Page number for the next page at the index level
PrevPageFID	File number where the previous page at the index level is located
PrevPagePID	Page number for the previous page at the index level

Here is an example for running the command;

DBCC IND (tempdb, 'TestTable', 2)

Sunday, June 30, 2013

SQL Server 2012 Posters

These posters have been published few months back, if you have not downloaded them yet, here are the links;

	SQL Server 2012 System Views Map http://www.microsoft.com/en-us/download/details.aspx?id=39083 The Microsoft SQL Server 2012 System Views Map shows the key system views included in SQL Server 2012, and the relationships between them.

	Microsoft Business Intelligence at a Glance Poster http://www.microsoft.com/en-us/download/details.aspx?id=35586 Provides an overview of Microsoft's Business Intelligence technologies in Office, SQL Server, and BI services in Windows Azure.

Tuesday, June 25, 2013

SQL Server 2014 CTP1 available for downloading

It is too early, but this is how Microsoft SQL Server team works, yes SQL 2014, CTP1 is available for downloading Smile .

Here is the path:http://technet.microsoft.com/en-US/evalcenter/dn205290.aspx.

Few things you should know before installing;

Two versions available: General SQL Server and Cloud-based
This does not support upgrade or side-by-side installation. Install this in new, clean machine.
This is only available in 64-bit architecture.
Three types of downloads: ISO DVD image, CAB file or Azure version.

Key features;

“Hekaton” in-memory capabilities that gives significant performance on database applications.
xVelocity ColumnStore provides in-memory capabilities for data warehousing workloads that result in dramatic improvement for query performance.
Seamless and transparent SSD support to SQL Server buffer pool.
Enhance high availability

Product guide is available at: http://www.microsoft.com/en-us/download/details.aspx?id=39269

Sunday, June 16, 2013

Have you ever lost in Business Intelligence?

Everybody says that they all have business intelligence applications. Everybody says that they develop business intelligence applications, including me Smile . But all BI applications are truly giving BI? What it provides? An indicator formed with a traffic light? A speedometer showing performance or actual of a KPI?

I have been designing and developing many modules related to business intelligence for years. Yes, we tried our best to convert data into information spending considerable, costly time with cleansing, validating, structuring formed/unformed data. I assumed that I had truly implemented BI. Although the success was seen with many cases, sometimes it ended up with just a centralized repository. A stranded repository, a stranded treasure. That is where I felt that I have lost in BI.

There can be many reasons for such unsatisfactory finale. I looked back, figured out few. They looked very simple but they have been ignored partially, sometime fully but not purposely. You may do it too, you may not, however, let me share them, you may consider them as precautional steps.

Neglect the champion (not purposely)
This is the most common mistake we always make. Being an IT guy with experience, I still think that I am not qualified enough for structuring the data warehouse in terms of business types or the domain. I am not talking about setting up ragged hierarchies, setting up slowly-changing dimensions or partitioning OLAP DW for real-time BI. This is all about structuring information for the domain, for the business, using their own business language. That you in deed need a business user, a champion. Once I faced for this;

“Dinesh, I see a discrepancy in financial figures, have you taken “control accounts” into consideration when calculating revised budgets for projects?”

That was something I was unaware. He started seeing inaccurate info in my BI solution and he lost the interest in my BI solution. This simply explains that no matter how hard you struggle to implement the solution, if business users see it as useless for them, you fail. The lesson I learnt from this was, never implement a BI solution without a champion.

Visualization is the key

Here are two quotes from two different business users;

“I like these graphical representations, I love this slicing-dicing feature that truly shows the insights”

“This dashboard makes me confused, too many gadgets, can I see the revenue of my companies in a simple manner?”

Many get attracted by interactive widgets but some do not. This is what we need to understand, this is what I realized. When you make a solution, make sure that right visualization is available regardless of the level of the business user because we cannot insist or force them to use what we have implemented without a proper business case. One could be looking for a simple scorecard that shows green-amber-red traffic lights, another could be looking for a trend-chart combined with what-if analysis. Requirements could be different from company to company, user to user but having a fully-fledge solution will allow you to win the heart of any sort of audience. Therefore, do not just design the data warehouse without thinking the visualization of the output and profiles of consumers.

Analysis cannot be performed
This is what I have witnessed with most BI solutions, they provide standard way of reporting, either production or analytical, or some dashboards. It can be a set of reports that have static columns with collapse and expand feature. It can be a dashboard with pre-defined, in other word, static graphical widgets. What if user says that I cannot perform what I have been doing, here is an example, I was questioned;

“Why can’t I add sales and distributor vehicle unavailability in to same chart and do a comparison? I have been doing it manually with Excel.”

The information on vehicle unavailability was not programmatically maintained, it was a manual, irregular recording which was performed on demand. But he has been doing it with his own Excel sheet. Yes, we can argue that information is not electronically available, hence it cannot be fulfilled with current BI solution but what if I simply allow him to upload his manual work and combine them with facts in DW supporting his analysis? If I had known all sort of analysis performed by business users, I would have made sure that everything was covered, at leaset with workarounds.

Though there are more, I think these three are the keys for failing BI solutions, making DW/BI solutions handicapped. If your solution is being built, make sure they have been considered.

Sunday, September 16, 2012

Bookmark: Working with lengthy codes

Have you ever thought to use bookmarks in Management Studio when working with lengthy codes? If yes, you are smart, but what I have seen is, very low usage of it.

Bookmark allows you to tag code segments with marks and move into them when you want. It can be done with few keystrokes, here is the way of doing it;

Make sure cursor is placed where you want bookmarked.
Press CTRL+k, CTRL+K
You will see the bookmark as below;
Follow the second step for bookmarking wherever required.
Done!
Use CTRL+K, CTRL+N for jumping to the next bookmark.
Use CTRL+K, CTRL+P for jumping to the previous bookmark.
Use CTRL+k, CTRL+K for removing bookmarks.
Use CTRL+k, CTRL+L for clearing all bookmarks.

Have fun!

Thursday, August 30, 2012

SQL Server 2012 Cumulative Update #3 is available

The cumulative update #3 for SQL Server 2012 is available for downloading now at: http://support.microsoft.com/kb/2723749. Make sure you apply this to test environment first and check before applying to live environment.

Sunday, January 22, 2012

TABLE Type and MERGE Statement

As I get many offline questions regarding the presentation “Reducing RoundTrips” I did, thought to summarize the content we discussed as a post. Session spoke about how to reduce multiple database calls from client to database application when an updated data set (Ex. data table bound to a grid) is required to send from client application to database. In addition to that we discussed the way of handling all three operations: INSERT, UPDATE, and DELETE using a single statement. This is what the session focused on;

MERGE
MERGE statement is used to insert data that does not exist but to update if it does exist. It is an extension of UPSERT which is a known term in other database management systems for performing both UPDATE and INSERT statement using a single technique. The main benefit of this is the facility to manipulate many number of records in a single table using a source as an atomic operation. This eliminates individual statements for INSERT, UPDATE and DELETE operations, one statement which is MERGE handles all three operations. MERGE statement has following elements;

Target: This the table or view (destination) which needs to be updated.
Source: This contains updated data (modified, newly added) and will be used for updating the target. This can be a table, view, derived table, CTE, or table function.
WHEN MATCHED [AND] ...... THEN: This is where the action which needs to be performed when the row in the source is found in the target, should be written. Two WHEN MATCHED clauses are allowed, limiting one for UPDATE as an action and other for DELETE as an action.
WHEN NOT MATCHED [BY TARGET] ...... THEN: The action which performs when the row in the source is not found in the target, has to be written with this. It is usually the INSERT action.
WHEN NOT MATCHED [BY SOURCE] ...... THEN: This is to perform an action when the row in the target is not supplied with the source. Usually it is DELETE.
OUTPUT clause and $Action: The standard inserted and deleted virtual tables are available with MERGE. In addition to that $Action provides the operation performed; INSERT, UPDATE, DELETE. This is normally used for logging/auditing.

In order to use MERGE, a source must be defined and filled. Usually the source is defined and filled by the client but for various limitations and less facilities, we used to iterate the filled-source at the client site itself and send one row at a time to database, which makes many calls to database for updating all records in the source.

TABLE Type
This can be overcome using TABLE data type. TABLE data type can be created as a user-defined data type definition and can be used as either variable or parameter. Following are the key elements of creating a type;

CREATE TYPE: This is used for creating the user defined type. It has to be created as a TABLE and structure has to be defined.
READONLY keyword: If the type is used as a parameter, READONLY keyword has to be used. It is as an OUTPUT parameter is not supported.
SqlDbType.Structured: When a data table from a .NET application is sent, parameter data type must be declared as System.Data.SqlDbType.Structured.

Here is a sample code for all;

This code creates a table called Customer and inserts 4 rows. In addition to that it creates a log table too, for recording operations.

USE tempdb

GO

-- create a table for customer

CREATE TABLE dbo.Customer

    CustomerId int identity(1,1) not null

    , CustomerCode char(4) not null

    , FirstName varchar(25) null

    , LastName varchar(25) not null

    , CreditLimit money not null

    , IsLocal bit not null

    , LastModified smalldatetime not null

GO

-- insert four records

INSERT INTO Customer

    (CustomerCode, FirstName, LastName, CreditLimit, IsLocal, LastModified)

VALUES

    ('C001', 'Dinesh', 'Priyankara', 10000, 1, getdate())

    , ('C002', 'Jane', 'K', 10000, 1, getdate())

    , ('C003', 'Martinie', 'Ku', 10000, 0, getdate())

    , ('C004', 'Joe', 'Anderson', 10000, 0, getdate())

GO

CREATE TABLE dbo.CustomerLog

    CustomerCode char(4) not null

    , DateModified datetime not null

    , Action varchar(15) not null

GO

Here is the stored procedure for accepting an updated data set using TABLE type and updating Customer using MERGE.

-- create a stored procedure for accepting created TABLE type as a parameter

-- note that READONLY has to be used

CREATE PROC dbo.InsertCustomer @CustomerType dbo.CustomerType READONLY

AS

BEGIN

    INSERT INTO dbo.CustomerLog

    (CustomerCode, DateModified, [Action])

    SELECT o.CustomerCode, getdate(), o.[Action]

    FROM

        (MERGE INTO dbo.Customer AS c

        USING @CustomerType AS t

            ON c.CustomerCode = t.CustomerCode

        WHEN MATCHED AND t.IsLocal = 0 THEN

            UPDATE SET c.FirstName = t. FirstName

                    , c.LastName = t.LastName

                    , c.CreditLimit = t.CreditLimit

                    , c.LastModified = getdate()

        WHEN NOT MATCHED THEN

            INSERT (CustomerCode, FirstName, LastName, CreditLimit, IsLocal, LastModified)

                VALUES

                    (t.CustomerCode, t.FirstName, t.LastName, t.CreditLimit, t.IsLocal, getdate())

        WHEN NOT MATCHED BY SOURCE THEN

            DELETE

        OUTPUT isnull(inserted.CustomerCode, deleted.CustomerCode) as CustomerCode, $ACTION AS Action) AS o

END

GO

Everything is ready from database level. Here is a sample code for .NET application.

SqlConnection connection = new SqlConnection(@"Server=.\SQL2008R2;Integrated Security=SSPI;Database=tempdb");

DataTable customer = new DataTable("Customer");

customer.Columns.Add("CustomerCode", typeof(string));

customer.Columns.Add("FirstName", typeof(string));

customer.Columns.Add("LastName", typeof(string));

customer.Columns.Add("CreditLimit", typeof(decimal));

customer.Columns.Add("IsLocal", typeof(bool));

customer.Rows.Add("C001", "Dinesh", "Priyankara", 10000, true); // no change and no update

customer.Rows.Add("C002", "Jane", "Kani", 25000, true); // should not update

customer.Rows.Add("C004", "Joe", "Andrew", 35000, true); // update all

customer.Rows.Add("C005", "Kate", "Neo", 35000, true); // new record

SqlCommand command = new SqlCommand("InsertCustomer", connection);

command.CommandType = CommandType.StoredProcedure;

SqlParameter parameter = new SqlParameter("@CustomerType", System.Data.SqlDbType.Structured);

parameter.Value = customer;

command.Parameters.Add(parameter);

connection.Open();

command.ExecuteNonQuery();

connection.Close();

Enjoy!

Tuesday, November 1, 2011

Is SQL Server @@IDENTITY a Global Function?

The @@IDENTITY is a system function that returns last generated identity value. But the question is, is it the last identity generated by my code? Last generated by my session? Or last generated by the system for any user?

Understanding what exactly it returns help you to code your database modules accurately. Let’s try this with a small example.

CREATE TABLE T1 (Id int IDENTITY(1,1), Value varchar(20))

CREATE TABLE T2 (Id int IDENTITY(100,1), Value varchar(20))

GO

CREATE PROC InsertT2

AS

BEGIN

    INSERT INTO T2 (Value) VALUES ('Test')

END

GO

CREATE PROC InsertT1

AS

BEGIN

    INSERT INTO T1 (Value) VALUES ('Test')

    EXEC InsertT2

    SELECT @@IDENTITY

    SELECT SCOPE_IDENTITY()

END

GO

EXEC InsertT1

The result of InsertT1 execution will 100 and 1. As you see in the result, @@IDENTITY returns the last generated identity value for my session, not for the scope I am in. If you use this function to get the last generated identity value for your insert, result of it may be wrong unless no other code is executed that inserts a record to another table with IDENTITY property enabled. Note that SCOPE_IDENTITY returns the identity value for the scope, hence if the requirement is for get the value generated for current scope, use it.

There is another function that can be used for getting generated identity value, which is called IDENT_CURRENT. This function accepts the table name as a parameter and returns the last generated identity value for given table, by any scope, any session, any connection.

Monday, September 19, 2011

Reporting Services Parameters: Adding “All” item.

There are some instances we have to create parameters with multiple items, including an item representing all items in the list, mostly called as “All”. There are various ways of doing it. If the list is dynamic and data source is OLAP, you get this item automatically. If it is not OLAP, and loading from OLTP database, this item has to be manually added and programmatically handled.

Here is an easy way to handle it. Have this additional item in a new query and use UNION for joining it to main query.

SELECT  'All Years' AS OrderYear

    , 0 AS OrderYearValue

UNION

SELECT DISTINCT

    CONVERT(varchar(100), YEAR(OrderDate)) AS OrderYear

    , YEAR(OrderDate)  as OrderYearValue

FROM Sales.SalesOrderHeader

ORDER BY 2

Now data source contains the item. In order to get this handled with query parameter, follow below code;

SELECT {your columns}

FROM Sales.SalesOrderHeader

WHERE YEAR(OrderDate) = @Year

    OR @Year = 0

Happy Coding Smile .

Wednesday, August 31, 2011

PIVOT and UNPIVOT: Integration Services

As you all know, Pivoting and Unpivoting are supported with SSIS but the implementation of PIVOTing is bit difficult when compared with UNPIVOTing. This post explains the implementation of both PIVOT and UNPIVOT transformations that come with SSIS with all possible explanations.

Pivot allows us to convert normalized dataset into less normalized format, transferring (pivoting) row values into columns. In simple term, it identifies distinct values in a designated column and makes columns based on identified values (note that it is not fully dynamic and automated). Here is an example.

Example shows how PIVOT works, you can see that values (months) in OrderMonth column have been converted into columns, combining with OrderAmount column. Now let’s see how we can do this with SSIS. The code below is the source for pivoting that returns Orders placed by customers for months and years.

   1:  SELECT

   2:    YEAR(OrderDate) OrderYear

   3:     , p.LastName Customer

   4:    , DATENAME(mm, OrderDate) OrderMonth

   5:    , SUM(SubTotal) OrderAmount

   6:  FROM Sales.SalesOrderHeader h

   7:    INNER JOIN Sales.Customer c

   8:      ON c.CustomerID = h.CustomerID

   9:    INNER JOIN Person.Person p

  10:      ON p.BusinessEntityID = c.PersonID

  11:  GROUP BY

  12:    YEAR(OrderDate)

  13:    ,p.LastName

  14:    ,DATENAME(mm, OrderDate)

  15:  ORDER BY 1, 2

In order to do this in SSIS, create a new project and have a Data Flow Task in Control Flow. Place an OLE DB Data Source in Data Flow Task and set the above query, making the connection to AdventureWorks2008R2 database. Then, drag PIVOT transformation on to Data Flow and connect them using Connection Line.

Pivot configuration is little bit tricky, not as easy as T-SQL PIVOT. When you open the Editor of it, you get three tabs. Second tab allows you to marks columns for pivoting, selecting them from input flow.

Next tab which is Input and Output Properties, is for instructing SSIS the way of using input columns for pivot operation. The Pivot Default Input node in left pane allows us to set the usage of input columns for pivoting. The Pivot Default Output allows us to form the output columns based on input columns and their usage. Let’s first have a look on Pivot Default Input node.

If you expand the next level of Pivot Default Input, which is Input Columns, you can see all columns available for the operation. Selecting one column loads its properties to right pane, which contains a property called PivotUsage that accepts a numeric value from 0 to 3. Here is the way of deciding the value for PivotUsage property.

Value	Meaning
0	Column does not participate in PIVOTing, it is passed through unchanged to the output. If PIVOTing identifies multiple values in an output row, only the first value of them will be used with the output row. In our example, we do not have any column to be marked as 0.
1	Column is the key (Set key) for generating the pivot output rows. Values in this column will be used to identify unique values for rows. We can mark one or more columns as 1 but need to make sure that combination of them make record unique without providing any duplicates. In our example, we have to column to mark as 1: OrderYear and Customer.
2	Values in this column (Pivot Column) will be used for creating columns in the pivot output. In our example, OrderMonth is the column which should be marked as 2.
3	Values in this column will be used to generate the values for columns created from pivot column. In our example, OrderAmount should be set as 3.

Here is the way of setting it;

Next step is configuring Pivot Default Output. If you expand the node, Output Columns node is appeared. In order to defines columns, including columns created through pivot column, we have to create them. Make sure you have selected Output Columns node and click on Add Column for creating 14 columns. Once the columns are created, name them with Name property as OrderYear, Customer, January, February, March, April, May, June, July, August, September, October, November, and December.

Next is, linking output columns with input columns. This will be done through a property called SourceColumn in output columns. The value of the SourceColumn has to be taken from relevant input column’s property called LineageID. Find the LineageID of OrderYear input column (in my case it is 17, see above image) and set it in SourceColumn of OrderYear output column. Do the same for Customer column too.

Next columns are month columns. The property SourceColumn of all month columns has to be set with LineageID of OrderAmount. In addition to that, a property called PivotKeyValue should be filled too. This property should be filled with values coming from PivotColumn. For example, January column’s property should be filled with “January”, and February column’s property should be filled with “February”.

Package is done. Let’s send the output to an Excel file for testing purposes. Here is the package execution and the output of it.

SSIS UNPIVOT is not as tricky/difficult as PIVOT. Let’s see how UNPIVOTing can be done with SSIS. Easiest way to understand this is, reversing the process, unpivoting Excel sheet created with previous example. Let’s have a Data Flow Task and Excel Source on it, connecting to the Excel we created. Add an Unpivot transformation and connect it with the source.

Let’s configure UNPIVOT transformation. Open the Unpivot Transformation Editor and configure it as follows;

Select Pass Through checkbox only for OrderYear and Customer.
Select all month columns. This loads all months into the grid.
Give a destination column name (Eg. OrderAmount) for month columns. Make sure that same name is set for all columns. This column will hold values related to months.
Make sure Pivot Key Value has been properly set. This value will be set as a row value for column which will be unpivoted.
Finally, give a name for Pivot Key Value column (OrderMonth).

Done. Have a destination as you want and see the result.

Sunday, July 10, 2011

Default Logging: Microsoft SQL Server Integration Services

Everybody knows how important logging is, when it comes for troubleshooting. It applies for SSIS packages too. Unfortunately, most of developers, pay not much attention for this, and later regret for not having any records on package executions. If you have not implemented any mechanisms for logging and need to know how often the package gets executed or whether the package is started, you are not stranded. Integration Services has done the needful for you.

Integration Services has a default logging behavior. It captures starting and ending events, and adds two log entries to Windows Event Viewer. If you run the package with Business Intelligence Studio while it is being designed, you should see two entries in Windows Event Viewer as follows;

This behavior cannot be stopped. It logs not only when the package is run with BIDS, with DTEXEC too. As per my experience this is not enough for troubleshooting if it is a complex package, so, make sure you have used other out-of-the-box logging features for recoding your package processes.

Tuesday, July 5, 2011

How the language “SQL” is born?

Simple thing that you may not know ……….

Structured Query Language was born in 1970, supporting Edgar F. Codd’s Relational Database Model. It was invented at IBM by Donald D. Chamberlin and Raymond F. Boyce (who introduced 3.5 NF) for IBM’s RDBMS called System R. The initial name of it was Structured English Query Language (SEQUEL) but changed to SQL later as SEQUEL was a trademark of another company.

The first commercially available implementation of SQL was released by Relational Software Inc. (now known as Oracle Corporation). It was in June 1979, for Oracle V2. Relational Software Inc. started developing their own RDBMS based on Codd’s theories in 1970s.

Not only SQL, there were other RDBMS and SQL related languages. In 1970, University of California, Berkeley created a RDBMS named Ingres (Known as Open Source RDBMS) and QUEL was the language created for managing its data. With various different implementations, later it evolved into PostgreSQL.

IBM continued with its System R and SQL Implementation, making it as a commercial product named System38. It was in 1979. Now it has been evolved into DB2 which was released in 1983.

SQL has many extensions now. Some of them are;

Oracle – PL/SQL
IBM – SQL PL
Microsoft – T-SQL

SQL was standardized by American National Standard Institute (ANSI) in 1986 as SQL-86. In 1987, it was standardized by International Organization for Standardization (ISO) too. It has been revised in many times, starting with SQL-86 to SQL:2008.

Sunday, July 3, 2011

Script Task cannot be debugged: Integration Services

If you run on 64-bit environment and try to debug a script task after placing a breakpoint, it is not going to work as you expect. The reason for this is 64-bit environment. There are few things that do not support on 64-bit mode but you can still configure them with 32-bit SSIS designer; Business Intelligence Development Studio. If you experience problems such as connecting to Excel via Microsoft OLE DB Provider for Jet or debugging Script task, while running in 64-bit mode, all you have to do is, changing the SSIS runtime into 32-bit. This is done by changing a property called Run64BitRuntime on the Debugging page.

Note that this property is used and applied only at design time. When you execute the package in a production server, environment for it is based on installed dtexec utility. The dtexec utility is available in 64-bit mode. If you execute the package in 64-bit environment, 64-bit dtexec will be automatically selected and run. If need it to be executed in 32-bit mode, use command prompt for running it. When you install 64-bit Integration Services, it installs both 32-bit and 64-bit. As command prompt uses PATH environment variable for finding directories, and 32-bit version path (C:\Program Files(x86)\....) appears before 64-bit version path, it uses 32-bit dtexec for executing packages.

If the package needs to be scheduled with SQL Agent and run in 32-bit mode, that has to be specifically instructed as SQL Agent uses 64-bit version without using PATH environment variable. It can be done by setting Use 32 bit runtime on the Execution Options of the Job Step.

Sunday, May 15, 2011

How the term “Business Intelligence” is born?

This is not about processes or usage of Business Intelligence solutions, it is about usage of this term. Unfortunately it looks like this is unknown to many (Unawareness for this is reasonable but I can remember that when a picture of Charles Babbage is shown at a session, many did not recognize or aware of him).

The term “Business Intelligence” firstly appeared in 1958. It was used by a IBM researcher, Hans Peter Luhn, in one of his articles. His definition for it was;

“The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal”

Although Hans introduced (or used) this word for processes related to business intelligence, it was not a widespread term until late 1990s. It was the time for Decision Support Systems. In 1989, Howard Dresner proposed “Business Intelligence” as an umbrella term but it looks like the real usage of the term started in 1996. Later Howard became an Analyst for Gartner Group, and it is Gartner Group (Gartner, Inc.) who labeled this on collective technology used for DSSs. This is how it was published;

“Data analysis, reporting, and query tools can help business users wade through a sea of data to synthesize valuable information from it—today these tools collectively fall into a category called ‘Business Intelligence.’”

Thursday, May 12, 2011

PerformancePoint web-parts cannot be connected (linked)

Few days back, I faced for a strange issue with PerformancePoint web-parts. It was a simple thing, all I had was, a web page created with SharePoint and couple of web-parts. Once the page is open in design mode and web-parts are placed, tried to link two web-parts (a report web-part and a filter) through a connection as below;

Funny thing was, though it allowed me to create the connection (link) and save, it did not get saved permanently. If I open the connection window again, the created connection is missing.

I could not find the issue immediately but it was a terrible thing. Thanks for my colleagues, they have found the issue. It is with Internet Explorer. If you try to do this with Internet Explorer 8.x, it is not going to work. For some of my colleagues, older versions of IE have been worked. For some, older version of FireFox has been worked. Finally what I did was, installed FireFox 3.6 and got it done. Note that it is not going to work with FireFox 4.x too.

Anyone faced this issue? Why SharePoint gives an issue with latest browsers? If you know anything on this, please share with me.

Sunday, May 1, 2011

Object Name Resolution – SQL Server

Understanding how SQL Server resolves objects’ names will be an effective remedy against performance issues. This name resolution happens if securables are not fully qualified. A statement like SELECT SalesOrderID FROM SalesOrderHeader will be a candidate for this. As you see, the SalesOrderHeader is not written as a fully qualified name, hence SQL Server tries to resolve it by adding Schema owned by user connected. If SQL Server cannot find an object like ConnectedUser’sSchema.SalesOrderHeader then it tries to find an object called dbo.SalesOrderHeader as the second step. Error is thrown, if it is not successful too. Here is an example for it;

     USE AdventureWorks2008

    GO

    -- creating a new table with dbo schema

    SELECT TOP (1) * 

    INTO SalesOrderHeader

    FROM Sales.SalesOrderHeader

    -- Create two logins for testing

    CREATE LOGIN HR_Manager WITH PASSWORD = '123'

        , CHECK_POLICY = OFF;

    CREATE LOGIN Sales_Manager WITH PASSWORD = '123'

        , CHECK_POLICY = OFF;

    GO

    -- Adding them as users with default schemas

    CREATE USER HR_Manager WITH DEFAULT_SCHEMA = HumanResources

    CREATE USER Sales_Manager WITH DEFAULT_SCHEMA = Sales

    GO

    -- Given users permission on data reading

    sp_addrolemember 'db_datareader', 'HR_Manager';

    sp_addrolemember 'db_datareader', 'Sales_Manager';

    -- Execute as HT_Manager and see

    EXECUTE AS USER = 'HR_Manager'

    GO

    -- This shows records from newly created

    -- table, not from original table

    SELECT * FROM SalesOrderHeader

    -- This throws an error

    SELECT * FROM SalesOrderDetail

    REVERT

    GO

    -- Execute as Sales_Manager

    EXECUTE AS USER = 'Sales_Manager'

    GO

    -- Both statements will work

    SELECT * FROM SalesOrderHeader

    SELECT * FROM SalesOrderDetail

    REVERT

    GO

As you see with the code, when HR_Manager executes SELECT statements, as the first step, names of them will be resolved as HumanResources.SalesOrderHeader and HumanResources.SalesOrderDetail. SQL Server does not find any objects and do the next step, resulting dbo.SalesOrderHeader and dbo.SalesOrderDetail. A match will be found for first one as we have created one with dbo schema but not for the second. In that case, second statement throws an error.

When Sales_Manager executes, SQL Server resolves objects’ names as Sales.SalesOrderHeader and Sales.SalesOrderDetail. In that case, both statements are successfully executed.

Even though Sales_Manager executes statements without any error, it would be always better to make securables as fully qualified objects, that helps SQL Server to quickly execute the code without performing an additional task.