# Monday, March 19, 2012

After a successful TechDays Hong Kong the week before last, I am off to Bangalore today to speak at Tech Ed India 2012! Besides the usual running around and talking with customers, partners, attendees, and MVPs, I’ll be doing three breakout sessions:

Wednesday @ 12:15 : Beyond Scrum: Kanban and Software Kaizen

This is a slight modification of my Introduction to Kanban talk, here are the slides for that one:

On Wednesday afternoon at 2:15, I’ll be doing a session on Big Data Processing with SQL Server 2012 and Hadoop. I don’t really have any slides for this one besides a few from MS DPE, I plan on using all my time in demos. I’ll be talking about Hadoop on Azure, columnstore indicies, data warehouse improvements, and other things that will help you deal with large amounts of data like table partitioning (I know, I know, “Big Data” does not always mean “Big” data. Smile ) This will be a fun session, come see me screw up some live demos. Smile

Lastly, on Thursday at 4:30, I’ll be doing a session on Agile Estimation. I’ve done this one in India a few times before, but my first time at TechEd India. Here are the slides:

If you can’t make that session, I did it last year at TechEd North America and it was live streamed, so the recording is here:

See you all in Bangalore!

posted on Monday, March 19, 2012 2:52:59 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Tuesday, January 25, 2011

When I was at TechEd Europe in Berlin last November, I sat down with my old friends from DotnetMania in Spain and chatted about Telerik, the cloud (I pimp SQL Azure besides Windows Azure!), developer tools, and also what it means to be a writer (first magazine articles then blogs) and speaking at conferences. I also explain what a Chief Strategy Officer is (I get that a lot) and have a great conversation about R&D and entrepreneurship.

Pay attention, I tell how and when I made the move from VB to C#. It is a bit geeky. Winking smile I also touch on the HTML5 v Silverlight debate (which was a hot topic back last November.) We finish up with a walk down memory lane to dispel the hype that we will “write once, run many” with HTML5.

 

Interview with Stephen Forte (CSO of Telerik) from Netalia on Vimeo.

posted on Tuesday, January 25, 2011 10:40:53 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Monday, December 06, 2010

I recently wrote a piece in DevProConnections on the recent advances in ORM technologies. While I am still not a complete convert to ORMs, I think that they can play an important role in your application development toolkit. As time progresses and we abstract the database even more, ORMs will become more and more common. You can read the piece here.

posted on Monday, December 06, 2010 4:03:28 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Friday, September 03, 2010

See also:

In Part I we looked at the advantages of building a data warehouse independent of cubes/a BI system and in Part II we looked at how to architect a data warehouse’s table schema. In Part III, we looked at where to put the data warehouse tables. In Part IV, we are going to look at how to populate those tables and keep them in sync with your OLTP system. Today, our last part in this series, we will take a quick look at the benefits of building the data warehouse before we need it for cubes and BI by exploring our reporting and other options.

As I said in Part I, you should plan on building your data warehouse when you architect your system up front. Doing so gives you a platform for building reports, or even application such as web sites off the aggregated data. As I mentioned in Part II, it is much easier to build a query and a report against the rolled up table than the OLTP tables.

To demonstrate, I will make a quick Pivot Table using SQL Server 2008 R2 PowerPivot for Excel (or just PowerPivot for short!). I have showed how to use PowerPivot before on this blog, however, I usually was going against a SQL Server table, SQL Azure table, or an OData feed. Today we will use a SQL Server table, but rather than build a PowerPivot against the OLTP data of Northwind, we will use our new rolled up Fact table.

To get started, I will open up PowerPivot and import data from the data warehouse I created in Part II. I will pull in the Time, Employee, and Product dimension tables as well as the Fact table.

clip_image001

Once the data is loaded into PowerPivot, I am going to launch a new PivotTable.

clip_image002

PowerPivot understands the relationships between the dimension and fact tables and places the tables in the designed shown below. I am going to drag some fields into the boxes on the PowerPivot designer to build a powerful and interactive Pivot Table. For rows I will choose the category and product hierarchy and sum on the total sales. I’ll make the columns (or pivot on this field) the month from the Time dimension to get a sum of sales by category/product by month. I will also drag in Year and Quarter in my vertical and horizontal slicers for interactive filtering. Lastly I will place the Employee field in the Report Filter pane, giving the user the ability to filter by employee.

clip_image003

The results look like this, I am dynamically filtering by 1997, third quarter and employee name Janet Leverling.

clip_image005

This is a pretty powerful interactive report build in PowerPivot using the four data warehouse tables. If there was no data warehouse, this Pivot table would have been very hard for an end user to build. Either they or a developer would have to perform joins to get the category and product hierarchy as well as more joins to get the order details and sum of the sales. In addition, the breakout and dynamic filtering by Year and Quarter, and display by month, are only possible by the DimTime table, so if there were no data warehouse tables, the user would have had to parse out those DateParts. Just about the only thing the end user could have done without assistance from a developer or sophisticated query is the employee filter (and even that would have taken some PowerPivot magic to display the employee name, unless the user did a join.)

Of course Pivot Tables are not the only thing you can create from the data warehouse tables you can create reports, ad hoc query builders, web pages, and even an Amazon style browse application. (Amazon uses its data warehouse to display inventory and OLTP to take your order.)

I hope you have enjoyed this series, enjoy your data warehousing.

posted on Friday, September 03, 2010 4:14:33 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Thursday, September 02, 2010

See also:

In Part I we looked at the advantages of building a data warehouse independent of cubes/a BI system and in Part II we looked at how to architect a data warehouse’s table schema. In Part III, we looked at where to put the data warehouse tables. Today we are going to look at how to populate those tables and keep them in sync with your OLTP system.

No matter where your data warehouse is located, the biggest challenge with a data warehouse, especially one where you are going to do real time reporting off of, is that the data is published from the transactional system (OLTP). By definition, the data warehouse is out of date compared to the OLTP system.

Usually when you ask the boss, “How much latency can you accept between the OLTP and data warehouse systems” the boss will reply: none. While that is impossible, the more time to develop and the more money you have will to develop said system, the closer to real time you can get. Always bring up the American Express example used in Part I for proof that your system can accept at least some latency.

If you remember the code examples from Part II, most of the time you have to query OLTP data, aggregate it, and then load it into your new schema. This is known as the process of extraction, transformation, and loading (ETL) data from the OLTP system into the data warehouse. The workflow usually goes like this: at a pre-set time (at every change, hourly, nightly, or weekly) query the OLTP data (extraction) and then aggregate and flatten it out (transformation) and then copy the transformed data to the data warehouse (star or snowflake) tables (load).

You have several options ranging from near real time ETL (very complex and expensive) to nightly data dumps (least complex and least expensive.) You can also perform full ETL (wipe out the data warehouse tables completely and refill on each operation) and partial ETL (only add incremental data as per a time series.) The actual load can be done via database triggers on an add/update/delete for near real time to simple scheduled SQL batches that wipe data and run your load scripts. Most likely you will want to take a hybrid approach, depending on the maturity of your system. You have three basic options, ranging from least complex to most complex:

  • Direct database dump
  • ETL tool
  • Database triggers

When you have a long time series to publish your data, say nightly, or weekly, you can do a direct database dump. The process would be pretty straightforward. At a regular interval (or manually) a process would start that would query the OLTP database and perform all of the aggregations, etc and then load it into a staging data warehouse database, then wipe out the production data warehouse and load the data in.

Another option is to use an ETL tool. A good example is SQL Server Integration Services (SSIS) if you are using Microsoft SQL Server. (Actually SSIS will work with multiple database, you just need a SQL Server host.) A modern ETL tool will give you the ability to segment the work into logical groups, have a control flow based on success and failure of a condition, and allows rollbacks.

clip_image002

A typical workflow with an ETL tool is that the ETL will run on a schedule (or based on a condition, such as a message arriving from a queue or a record written to a admin table) and have a parameter(s) passed to it. This parameter is usually a time series and the ETL will perform all of the extraction on data from the OLTP database filtered by that parameter. An ETL tool is the most likely solution you will employ, especially if you have to make frequent updates from your OLTP system to your data warehouse.

Another option is to use database triggers. For those of you that don’t know a lot about triggers, well they can lead to evil. ;) That said, they are events that fire when data changes. You can then write SQL code to run when the data changes, even ETL code. Triggers are hard to debug and difficult to maintain, so I would only suggest that you use a trigger when you need “real time” updates to your data warehouse and even then, the trigger should only write a record into an admin table that your ETL process is polling to get started.

A common design pattern with a sophisticated ETL is to do all of the ETL to a staging data warehouse and allow a check mechanism to verify that the ETL was successful. Once this check is performed (either by computer or by human, depending on the system), you can then push the data from the staging tables to the production tables. There is a SQL operator specifically for this process, the MERGE operator. MERGE looks at two tables and looks at their joins and compares the data and will do add, updates, inserts, and deletes to keep the two tables in sync. Here is how we would do that with a MERGE statement and our fact table from Part II.

--MERGE the FACT Local to FACT Remote
MERGE dbo.FactOrder as t--target
  USING Northwind.dwh.FactOrder as s--source
  ON t.ProductID = s.ProductID AND
  t.EmployeeID = s.EmployeeID AND
  t.ShipperID = s.ShipperID AND
  t.TimeKey = s.TimeKey --joining keys

--record exists in target, but data different
WHEN MATCHED AND (s.OrderDate != t.OrderDate OR 
   s.PostalCode != t.PostalCode OR
   s.[Total Sales] != t.[Total Sales] OR
   s.Discount != t.Discount OR
   s.[Unit Sales] != t.[Unit Sales]) THEN
       UPDATE SET t.OrderDate = s.OrderDate, 
       t.PostalCode = s.PostalCode,
       t.[Total Sales] = s.[Total Sales],
       t.Discount = s.Discount,
t.[Unit Sales] = s.[Unit Sales]

WHEN NOT MATCHED BY SOURCE THEN
   --only in target, get rid of it
   Delete

WHEN NOT MATCHED BY TARGET THEN 
   --record only exists in source
   INSERT VALUES (s.OrderDate, s.PostalCode, s.ProductID,s.EmployeeID,
   s.ShipperID, s.[Total Sales], s.Discount, s.[Unit Sales], s.[TimeKey]);--required!!

Tomorrow we will warp up with the application development options.

posted on Thursday, September 02, 2010 5:58:03 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Wednesday, September 01, 2010

See also:

In Part I we looked at the advantages of building a data warehouse independent of cubes/a BI system and in Part II we looked at how to architect a data warehouse’s table schema. Today we are going to look at where to put your data warehouse tables.

Let’s look at the location of your data warehouse. Usually as your system matures, it follows this pattern:

  • Segmenting your data warehouse tables into their own isolated schema inside of the OLTP database
  • Moving the data warehouse tables to their own physical database
  • Moving the data warehouse database to its own hardware

When you bring a new system online, or start a new BI effort, to keep things simple you can put your data warehouse tables inside of your OLTP database, just segregated from the other tables. You can do this a variety of ways, most easily is using a database schema (ie dbo), I usually use dwh as the schema. This way it is easy for your application to access these tables as well as fill them and keep them in sync. The advantage of this is that your data warehouse and OLTP system is self-contained and it is easy to keep the systems in sync.

As your data warehouse grows, you may want to isolate your data warehouse further and move it to its own database. This will add a small amount of complexity to the load and synchronization, however, moving the data warehouse tables to their own table brings some benefits that make the move worth it. The benefits include implementing a separate security scheme. This is also very helpful if your OLTP database scheme locks down all of the tables and will not allow SELECT access and you don’t want to create new users and roles just for the data warehouse. In addition, you can implement a separate backup and maintenance plan, not having your date warehouse tables, which tend to be larger, slow down your OLTP backup (and potential restore!). If you only load data at night, you can even make the data warehouse database read only. Lastly, while minor, you will have less table clutter, making it easier to work with.

Once your system grows even further, you can isolate the data warehouse onto its own hardware. The benefits of this are huge, you can have less I/O contention on the database server with the OLTP system. Depending on your network topology, you can reduce network traffic. You can also load up on more RAM and CPUs. In addition you can consider different RAID array techniques for the OLTP and data warehouse servers (OLTP would be better with RAID 5, data warehouse RAID 1.)

Once you move your data warehouse to its own database or its own database server, you can also start to replicate the data warehouse. For example, let’s say that you have an OLTP that works worldwide but you have management in offices in different parts of the world. You can reduce network traffic by having all reporting (and what else do managers do??) run on a local network against a local data warehouse. This only works if you don’t have to update the date warehouse more than a few times a day.

Where you put your data warehouse is important, I suggest that you start small and work your way up as the needs dictate.

posted on Wednesday, September 01, 2010 5:39:54 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, August 31, 2010

See also:

In Part I we looked at when you should build your data warehouse and concluded that you should build it sooner rather than later to take advantage of reporting and view optimization. Today we will look at your options to build your data warehouse schema.

When architecting a data warehouse, you have two basic options: build a flat “reporting” table for each operation you are performing, or build with BI/cubes in mind and implement a “star” or “snowflake” schema. Let’s take a quick look at the first option and then we will take a look at the star and snowflake schemas.

Whenever the business requests a complex report, developers usually slow down the system with a complex SQL statement or operation. For example, pretend in our order entry system (OLTP) the business wants a report that says this: show me the top ten customers in each market including their overall rank. You would usually have to perform a query like this:

  1. Complex joins for unique customer
  2. Rollup the sales
  3. Ranking functions to determine overall rank
  4. Partition functions to segment the rank by country
  5. Standard aggregates to get the sales
  6. Dump all of this to a work table in order to pull out the top 10 (if you don’t do this, you will lose the overall rank)

A typical SQL statement to do steps 1-5 would look like this:

With CTETerritory
As
(
   Select cr.Name as CountryName, CustomerID, 
                Sum(TotalDue) As TotalAmt
   From Sales.SalesOrderHeader soh 
   inner join Sales.SalesTerritory ter
   on soh.TerritoryID=ter.TerritoryID
   inner join Person.CountryRegion cr 
   on cr.CountryRegionCode=ter.CountryRegionCode
   Group By cr.Name, CustomerID
)
Select *, Rank() Over (Order by TotalAmt DESC) as OverallRank,
Rank() Over
     (Partition By CountryName Order By TotalAmt DESC,
            CustomerID DESC) As NationalRank
From CTETerritory

Argh! No wonder developers hate SQL and want to use ORMs! (I challenge the best ORM to make this query!)

Instead you can create a table, lets call it SalesRankByRegion, with the fields: CountryName, CustomerID, TotalSales, OverallRank, and NationalRank, and use the above SQL as part of a synchronization/load script to fill your reporting table on a regular basis. Then your SQL statement for the above query looks like this:

SELECT * FROM SalesRankByRegion
WHERE CustomerNationalRank Between 1 and 10
ORDER BY CountryName, CustomerNationalRank

The results look like:

clip_image001

That is more like it! A simple select statement is easier for the developer to write, the ORM to map, and the system to execute.

The SalesRankByRegion table is a vast improvement over having to query all of the OLTP tables (by my count there are three tables plus the temp table). While this approach has its appeal, very quickly, your reporting tables will start to proliferate.

Your best option is to follow one of the two industry standards for data warehouse tables, a “star” or a “snowflake’ schema. Using a schema like this gives you a few advantages. They are more generic than the SalesRankByRegion, which was a table built for one query/report, giving you the ability to run many different reports off each table. Another advantage is that you will have the ability to build cubes very easily off of a star or snowflake schema as opposed to a bunch of SalesRankByRegion tables.

The design pattern for building true data warehouse tables are to build a “fact” table, or a table that contains detail level (or aggregated) “facts” about something in the real world, like an order or customer for example. Inside of the fact table you will also have “measures” or a numeric value that represents a “fact.” To support your fact table you will have “dimension” tables. Dimensions are a structure that will categorize your data, usually in the form of a hierarchy. A dimension table for example could be “time” with a hierarch of OrderYear, OrderQuarter, OrderMonth, OrderDate, OrderTime.

There are tons of tutorials on the internet that show you how to build a star or snowflake schema and the difference between them, so I will not repeat them here. (You may want to start here.) I’ll give you the high level on a simple star schema here.

Let’s say we have an order entry system, such as Northwind (in the Microsoft SQL Server sample database.) You can have a fact table that revolves around an order. You can then have three (or more) fact tables that focus on: time, product, and salesperson. The time dimension would roll-up the order date by year, quarter, month, and date. The product dimension would roll-up the product by the product and category. (In most systems you would have a much deeper hierarchy for products.) The salesperson dimension would be roll-up of the employee, the employee manager and the department they work in. The key in each of these tables would then be foreign keys in the fact table, along with the measure (or numerical data describing the fact.)

There is an example similar to this in Programming SQL Server 2008, a book where I am a co-author. Here is modified version of that demo:

Dimension tables:

CREATE TABLE [dwh].[DimTime] (
[TimeKey] [int] IDENTITY (1, 1) NOT NULL Primary Key,
[OrderDate] [datetime] NULL ,
[Year] [int] NULL ,
[Quarter] [int] NULL ,
[Month] [int] NULL 
) 

CREATE TABLE [dwh].[DimProduct] (
[ProductID] [int] not null Primary Key,
[ProductName] nvarchar(40) not null,
[UnitPrice] [money] not null,
[CategoryID] [int] not null,
[CategoryName] nvarchar(15) not null
) 

CREATE TABLE [dwh].[DimEmployee] (
EmployeeID int not null Primary Key,
EmployeeName nvarchar(30) not null,
EmployeeTitle nvarchar(30),
ManagerName nvarchar(30)
)

Fact table:
CREATE TABLE [dwh].FactOrder (
[PostalCode] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[ProductID] [int] NOT NULL ,
[EmployeeId] [int] NOT NULL ,
[ShipperId] [int] NOT NULL ,
[Total Sales] [money] NULL ,
[Discount] [float] NULL ,
[Unit Sales] [int] NULL ,
[TimeKey] [int] NOT NULL 
)

We have the basis of a star schema. Now we have to fill those tables and keep them up to date. That is a topic for Part III.

posted on Tuesday, August 31, 2010 7:30:42 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [2] Trackback
# Monday, August 30, 2010

Most developers are scared of “Business Intelligence” or BI. Most think that BI consists of cubes, pivot/drill down apps, and analytical decision support systems. While those are very typical outcomes of a BI effort, many people forget about the first step, the data warehouse.

Typically this is what happens with a BI effort. A system is built, usually a system that deals with transactions. We call this an OLTP or on-line transaction processing system. Some time passes and reports are bolted on and some business analysts build some pivot tables from “raw dumps” of data. As the system grows, reports start to slow since the system is optimized to deal with one record at a time. Someone, usually a CTO type says: “we need a BI system.” A development effort is then spent to build a data warehouse and cubes, and some kind of analytical system on top of those cubes.

I make the argument that developers and project planners should embrace the data warehouse up front. When you design your OLTP system, also design the supporting data warehouse, even if you have no intention of building a full-fledged BI application with cubes and the like. This way you have two distinct advantages. First is that you have a separate system that is optimized for reporting. This system will allow the rapid creation of many new reports as well take the load off the OLTP system. Second, when you do decide to build a BI system based on cubes, you will already have the hard part done, building the data warehouse and supporting ETL.

Since a data warehouse uses more of a flatter data model (more on this in Part II), you can even design your application to use both the OLTP and data warehouse as data sources. For example, when you have highly normalized, 3rd normal form transactional tables to support transactions, it is never easy to use those tables for reporting and displaying of information. Those tables are optimized and indexed to support retrieving and editing (or adding/deleting) one record at a time. When you try to do things in aggregate, you start to stress your system, since it was designed to deal with one record at a time.

This design pattern is already in use today at many places. Consider your credit card company for example. I use American Express, and I never see my transactions show up for at least 24 hours. If go buy something and I phone American Express and say “what was my last transaction” they will tell you right away. If you look online, you will not see that transaction until the next business day. Why? When you call the customer service representative, they are looking at the OLTP system, pulling up one record at a time. When you are looking online, you are looking at the data warehouse, a system optimized for viewing lots of data in a reporting environment.

You can take this to an extreme, if you ran an e-commerce site, you can power your product catalog view portion of the site with the data warehouse and the purchase (inventory) system with the OLTP model. Optimize the site for browsing (database reads) and at the same time have super-fast e-commerce (database writes.) Of course you have to keep the purchasing/inventory (OLTP) and product display (data warehouse) databases in sync. I’ll talk about that in Part III. Next, I will take a look at how to build the data warehouse.

posted on Monday, August 30, 2010 7:33:02 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Wednesday, August 11, 2010

Microsoft recently released a CTP of the cloud based SQL Azure management tool, code named “Houston”. Houston was announced last year at the PDC and is a web based version of SQL Management Studio (written in Silverlight 4.0.) If you are using SQL Management Studio, there really is no reason to use Houston, however, having the ability to do web based management is great. You can manage your database from Starbucks without the need for SQL Management Studio. Ok, that may not be a best practice, but hey, we’ve all done it. :)

You can get to Houston here. It will ask you for your credentials, log in using your standard SQL Azure credentials, however for “Login” you have to use the username@server format.

image

I logged in via FireFox and had no problem at all. I was presented with a cube control that allowed me see a snapshot of the settings and usage statistics of my database. I browsed that for a minute and then went straight to the database objects. Houston gives you the ability to work with SQL Azure objects (Tables, Views, and Stored Procedures) and the ability to create, drop, and modify them.

image

I played around with my tables’ DDL and all worked fine. I then decided to play around with the data. I was surprised that you can open a .SQL file off your local disk inside of Houston!

image

I opened up some complex queries that I wrote for Northwind on a local version of SQL Server 2008 R2 and tested it out. The script and code all worked fine, however there was no code formatting that I could figure out (hey, that is ok).

I wanted to test if Houston supported the ability to select a piece of TSQL and only execute that piece of SQL. I was sure it would not work so I tested it with two select statements and got back one result.  (I tried rearranging the statements and only highlighted the second one and it still worked!)  Just to be sure I put in a select and a delete statement and highlighted only the select statement and only that piece of TSQL executed.

image

I then tried two SQL statements and got back two results, so the team clearly anticipated this scenario!

image

All in all I am quite happy with the CTP of Houston.  Take it for a spin yourself.

posted on Wednesday, August 11, 2010 6:16:33 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Monday, May 10, 2010

SQL Server 2008 R2 was RTMed a few weeks ago and late last week it was available for download on MSDN. You can also do download the Express version  and evaluation versions from the Microsoft web site.

I was trying to install R2 over the weekend in a Win7 VPC and hit a snag. I downloaded the iso from MSDN and mounted the iso as my VPC’s DVD drive. Halfway through the installation it bombed. Argh. I got the very descriptive error message 2337. Great. After intense googling I reached out to an email listserver I am on. A colleague of mine, Bob Beauchemin, suggested that I burn a DVD and then run it from there. He thought that it was a problem with the VPC iso reader. I tried this and it did not work, but because my Win7 VPC did not allow me to mount my physical DVD drive. So I mapped a network drive on the VPC to the physical DVD drive (using the integration features), copied the files over to the VPC desktop and ran my install from there. Worked like a charm.

Bob suspected that there was a problem with the VPC’s iso reader and blogged about it here. He was right. Thanks to Bob- and hopefully nobody else will bang their head against the wall like I did. :)

posted on Monday, May 10, 2010 4:16:16 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Monday, March 22, 2010

Tomorrow I will be presenting a “What's new for SQL Server 2008 R2” session at the IT Efficiency Event in Hong Kong put on by Microsoft. Even thought it is an overview session, I’ll be doing a few extensive demos. The demos are on:

  • New TSQL constructs and other goodies like that
  • BING map integration (if I get internet access in the session room!)
  • Data-Tier Applications (fun for both developers and DBAs)
  • PowerPivot

I will be the only English speaking speaker, should be fun. :)

posted on Monday, March 22, 2010 4:30:19 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Monday, December 28, 2009

About a week ago, I showed a simple demo on my blog of SQL Server PowerPivot for Excel, or just PowerPivot for short. I used SQL Azure as my backend and just did a mapping to a few Northwind tables and then built a pivot table using the Order Details table.

Today I will take that example one small step further and write a custom TSQL query and build a more advanced pivot table against that data source.  In a future post I will build a PowerPivot example against a true OLAP Cube, but that will have to wait for the new year. :)

Here is a TSQL query that shows an Order and joins to get the Product Name and Category as well as the Employee and Customer names. We also do a DATEPART() to get the order year off the Order Date. In addition we also calculate the Total Price. This will allow us to do many different sorts and groupings in a pivot table. The TSQL is here:

Select
od.orderid, o.orderdate,datepart(MONTH,o.orderdate) as OrderMonth,
datepart(YEAR,o.orderdate) As OrderYear,
emp.FirstName + ' ' + emp.LastName as EmployeeName,cust.CompanyName,
cust.City, cust.Country,
cat.CategoryName, p.ProductName,
((od.UnitPrice*od.Quantity) * (1-Discount)) As TotalPrice
from dbo.[Order Details] od
    inner join dbo.Products p
on od.productid=p.productid
    inner join dbo.Categories cat
on cat.CategoryID=p.CategoryID
    inner join orders o
on od.orderid=o.orderid
    inner join dbo.Employees emp
on emp.EmployeeID=o.EmployeeID
    inner join dbo.Customers cust
on cust.customerid=o.customerid

Now we have to connect to our SQL Azure database in PowerPivot. To do this, follow the instructions I placed on my blog last week to connect. After you connect, instead of choosing tables to connect to, choose the query option. Put in the TSQL from above and give your query a friendly name like Steve1 in PowerPivot (or rather NorthwindOrderDetails since we will be using this in our pivot table). Now you can use this query and its data to build a pivot table.

image

Building the PivotTable is pretty easy. Let’s use the TotalSales as our main value and Category Name and Product Name as our row labels. That means that we will see Category and be able to drill down into Product and get a sum for that category or product for each City (the field we choose for our column.) In addition we added a filter for Country that will allow us to see a drop down of each Country and we can filter for as many countries as we want. Lastly, we can add a “vertical slice” of Order Year that will add an additional filter in a graphical style. This filter is applied after your main filter (Country.)

image

I did this part in under a minute. This gives the user an extremely powerful pivot table and the ability to do some pretty sophisticated drill downs and filters. In addition, it is pretty easy to add/remove even more data (Customers and Employees for example) to this pivot table.

image

Enjoy!

posted on Monday, December 28, 2009 5:33:56 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Monday, December 21, 2009

At the PDC Microsoft released the Sync Framework Power Pack for SQL Azure tool. It is a tool built on top of the Sync Framework 2.0 that provides a wizard to keep an on-premises and SQL Azure database in sync as well as a Visual Studio template where you can take a SQL Azure database offline using SQL Compact (or whatever they call it these days.) Lets take a look at the data sync wizard today and soon look at the Visual Studio template.

I started the wizard via Start|Microsoft Sync Framework|SQL Azure Data Sync Tool for SQL Server. The wizard works by taking a database that is on-premises and creating a new one in SQL Azure and creating the SQL Agent jobs to keep them in sync. I could not find out how to sync two identical databases that already exist. To get started you need to provide your SQL Azure credentials and then the name of the new database that you are going to create in SQL Azure.

image

After you provide your credentials for your local server and select a database, you have to select a database to sync with. The next screen shown here allows you to select the tables you want to keep in sync. You also get to choose who wins in a conflict, the local server or SQL Azure.

image

Next you get to choose the order of your sync. This is a very smart feature, you get to say sync in this order to avoid the classic order detail missing order id problem.

image

After you finish up the wizard, it will go ahead and create the new database in SQL Azure for you as well as the tables and associated tracking tables. It will also create the SQL Server Agent jobs required to keep the data in sync. I noticed that on my local server I did not have SQL Agent running, so no sync was performed and after the wizard ran I had no data. So I had to go into SQL Server Agent and enable it and change the default schedule of my job (which was daily at midnight.)

image

You may want to get more creative with the SQL Server Agent schedule depending on your environment, however, the Data Sync wizard is a handy tool to get you started.

Enjoy!

Technorati Tags:
posted on Monday, December 21, 2009 5:59:41 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Thursday, November 19, 2009

Today at the PDC, Microsoft announced a new SQL Azure developer tool that is still pre-alpha: Code Name Houston.  Houston is a web based developer tool for SQL Azure databases. Built in Silverlight and hooked into the SQL Azure developer portal, Houston allows you to rapidly create tables, views, procedures, add data, delete data, etc. It kinda reminds me of Microsoft Access, but in a good way. This tool is not for admin stuff like adding users, just rapid database development in the cloud.

Houston is not available yet, but was demoed at PDC. Building a table was done very fast. It was not demoed, but I did see a button for import and export of data. When asked about general availability, no dates were given but calendar 2010 was indicated as the target. Can’t wait…

posted on Thursday, November 19, 2009 6:24:28 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Friday, November 13, 2009

Earlier this week, Microsoft released SQL Server 2008 R2 November CTP (Microsoft needs to hire some people from Apple to do something about the naming of their products.) I learned while at TechEd Europe 2009 that the version of SQL Server Management Studio (SSMS) that ships with R2 CTP3 has support for SQL Azure. So I downloaded the CTP (I had to do it in the speaker room, the speaker hotel blocks just about everything on its “free” wifi) and got rolling.

After installing, I decided to play around with the SQL Azure support. I was able to put my login credentials right into the SQL Server connection dialog and did not need to specify a database in the “Options” menu like I had to do before.

image

I was able to play around with SSMS and Azure and create databases tables, views, etc (via TSQL, not dialogs). Most importantly, SSMS does support the Object Explorer for SQL Azure!  We can’t change the database as we can in other community tools, however, this is still a beta and it has come a long way so far. SSMS supports most of the functionality that you will need to get started with SQL Azure.

image

Enjoy!

Technorati Tags:
posted on Friday, November 13, 2009 5:02:34 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Friday, September 04, 2009

SQL Data Services underwent a massive change with the last CTP, eliminating the entity bag container, whatever they called it thingie and moved to a fully cloud-based relational model. Another major change was that it was given a new and more appropriate name: SQL Azure. You can get the CTP and have access to what I call SQL Server lite in the cloud. Since SQL Azure supports the fully relational model along with stored procedures and views, you can connect to SQL Azure with a regular old ADO.NET connection string like the following one, allowing you to code against SQL Azure with .NET the same way you did with plain old SQL Server.

Server=tcp:tpzlfbclx1.ctp.database.windows.net;Database=Northwind_Lite;User ID=Stevef;Password=myPassword;Trusted_Connection=False;

Once you are all signed up for the CTP you can go into the web based admin tools and create a database. I created a database called Northwind and another one called Northwind_Lite for testing.

image

To be honest, I am not sure what else you can do in the web interface. So you have to connect via SQL Management Studio to create your database schema. There is the first problem. SQL Azure does not support the object explorer view that you get in SQL Management Studio, so you will have to hack a little bit.

Connecting to SQL Azure with SQL Server Management Studio

This is not as easy as it sounds. :) Since you can’t connect through the object explorer, you will have to open a new TSQL Query window.

image

In the log in dialog, enter in the server name from the CTP’s connection string and the user name and password that you choose to administer the CTP.

image

SQL Azure does not support the “Use” statement, or the ability to change databases on your connection. So you have to cheat and use some of the advanced options when logging in. You can do this by selecting the “Options >>” button on the log in dialog and then selecting “Connection Properties”. Under the Connect to database option, you have to select the database that you want to work with, since the default will be the Master database and most likely you will not be building any applications using the Master database.

image

After you connect you will get an error about the inability to apply connection settings, which you can ignore.

image

You will notice right away that there is nothing in your database as the following SQL statement will show:

select * from sys.objects

We now have to migrate some database objects from our SQL Server database to SQL Azure.

Migrating Existing SQL Server Objects to a SQL Azure Database

It would be cool if there were some easy way to migrate your databases to SQL Azure in this CTP. There is not. I suspect that in future CTPs this will not be a problem. But for now, you have to get creative. Some hacks and shortcuts are in order.

To get started, let’s just copy over one table. To do this, open your local SQL Server in the object explorer. Drill down to the Northwind database and drill down into the Customers table. Right click and select Script Table as|CREATE To|Clipboard and you will have a nice CREATE TABLE statement on your clipboard.

 

image

Then paste the TSQL into the Query Window that is connected to your SQL Azure database. Here is what my generated TSQL looks like:

   1:  USE [Northwind]
   2:  GO
   3:   
   4:  /****** Object:  Table [dbo].[Customers]    Script Date: 09/04/2009 03:35:38 ******/
   5:  SET ANSI_NULLS ON
   6:  GO
   7:   
   8:  SET QUOTED_IDENTIFIER ON
   9:  GO
  10:   
  11:  CREATE TABLE [dbo].[Customers](
  12:      [CustomerID] [nchar](5) NOT NULL,
  13:      [CompanyName] [nvarchar](40) NOT NULL,
  14:      [ContactName] [nvarchar](30) NULL,
  15:      [ContactTitle] [nvarchar](30) NULL,
  16:      [Address] [nvarchar](60) NULL,
  17:      [City] [nvarchar](15) NULL,
  18:      [Region] [nvarchar](15) NULL,
  19:      [PostalCode] [nvarchar](10) NULL,
  20:      [Country] [nvarchar](15) NULL,
  21:      [Phone] [nvarchar](24) NULL,
  22:      [Fax] [nvarchar](24) NULL,
  23:   CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
  24:  (
  25:      [CustomerID] ASC
  26:  )
  27:  WITH
  28:   (
  29:  PAD_INDEX  = OFF, 
  30:  STATISTICS_NORECOMPUTE  = OFF, 
  31:  IGNORE_DUP_KEY = OFF, 
  32:  ALLOW_ROW_LOCKS  = ON, 
  33:  ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  34:  ) 
  35:  ON [PRIMARY]
  36:   
  37:  GO
  38:   

We already know that SQL Azure does not support USE, so eliminate lines 1&2 and press F5. You will see that line 5 also is not supported, so eliminate that and keep going by pressing F5 again. You will see that ANSI_NULLs, PAD_INDEX, ALLOW_ROW_LOCKS, ALLOW_PAGE_LOCKS, and ON [PRIMARY] are not supported, so you will have to eliminate them as well. Your new trimmed down SQL Azure SQL script looks like this:

   1:  SET QUOTED_IDENTIFIER ON
   2:  GO
   3:  CREATE TABLE [dbo].[Customers](
   4:      [CustomerID] [nchar](5) NOT NULL,
   5:      [CompanyName] [nvarchar](40) NOT NULL,
   6:      [ContactName] [nvarchar](30) NULL,
   7:      [ContactTitle] [nvarchar](30) NULL,
   8:      [Address] [nvarchar](60) NULL,
   9:      [City] [nvarchar](15) NULL,
  10:      [Region] [nvarchar](15) NULL,
  11:      [PostalCode] [nvarchar](10) NULL,
  12:      [Country] [nvarchar](15) NULL,
  13:      [Phone] [nvarchar](24) NULL,
  14:      [Fax] [nvarchar](24) NULL,
  15:   CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
  16:  (
  17:      [CustomerID] ASC
  18:  )WITH 
  19:      (STATISTICS_NORECOMPUTE  = OFF, 
  20:          IGNORE_DUP_KEY = OFF) 
  21:  ) 
  22:  GO
  23:   

Run this and you will have a new Customers table! Unfortunately there is no data in there, but we will get to that soon.

image

If you are moving a lot of tables and foreign key constraints, etc, you should use the SQL Azure Migration Wizard developed by George Huey. This tool,available on codeplex, will assist you in migrating your SQL Server schemas over to SQL Azure. Wade Wegner blogged about it here, including an instructional video.

Unfortunately there is no such tool for migrating data that I know of. Time for the next hack.

Migrating Data from SQL Server to SQL Azure

I thought that maybe I can cheat the same way I altered the connection settings and use SSIS to migrate the data. I choose the ADO.NET option and entered in all of the data, but it bombed. Then I tried my old reliable tool, Red Gate’s SQL Data Compare. No go. But it was worth a try, since it got me thinking. I created a new database locally called “Azure_Staging” and ran the same CREATE TABLE script there, creating a blank Customers table. I then ran SQL Data Compare using the full Customer table in Northwind as my source and my newly created blank Customer table in Azure_Staging as the destination.

Of course SQL Data Compare found 91 missing rows and I launched the Synchronization Wizard.

image

Click through it and on the 3rd page, click on the “View SQL Script…” button and copy and paste the generated SQL.

image

Copy and paste just the 91 INSERT INTO statements into your SQL Azure Query Window and run it. Now we have data in SQL Azure!

image

Unfortunately this is not the best situation, having to manually create some TSQL scripts, but this is an early CTP. I am sure that future CTPs will make this much easier.

posted on Friday, September 04, 2009 4:19:51 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Thursday, April 23, 2009

As we all know by now, I hate twitter. Why, why, why do you think I care about when you are picking your nose? Here is a really funny (and accurate!) list of reasons why twitter sucks.

With that backdrop, my good friend Mary Chipman is trying to trying to convince me that some people are twittering useful stuff. She is correct, she twitters here, mostly on SQL Server and other data related stuff. That said, I still won’t use twitter since there is still way too much noise.

Mary and I are doing a session at TechEd next month called “Access and SQL Server: Solve problems without spending money.” In this session we look at a few use cases where it makes sense from both a technological and business perspective to use the wiz-bang features of Access to augment your .NET and SQL Server solutions. We are not advocating using Access as a development platform, just as an augment to your solution. You do this with Excel all the time, why not use a relational engine with build in reporting as well?

Let me give you an example. Let’s say you have an enterprise application written in .NET with a SQL Server back end. You are a PR firm and the system takes inputs from some other systems of press and PR items. Then the data is transformed and put into data warehouse tables and viewed on the web via an ASP.NET app and SQL Server Reporting Services.

Now the boss tells you that you have to track twitter. You protest! But the boss insists. The problem is that twitter has so much junk in it and you can’t accept a raw feed into your enterprise application like you do for press releases, etc. You ask your developers to build an app that will pull in the twitter feeds via its RESTful API and store the tweets locally to give you the ability to rate the tweets relevant or irrelevant and then upload to the enterprise database to flow into the data warehouse and .NET app. They say, sure, but it will take a little while to build the app but they are busy on higher priority stuff, so they can’t get started.

So why not as a stop gap, just build a simple little Access app that uses VBA to call the twitter API and allow you to download the tweets into a local Access table, and then you can scroll through the data and click a “relevant” field as true/false. You can build this mini-solution in about 15 minutes. We’ll show you how.

Now just to be uber geeks, we also will want to get that data back into the enterprise system. The enterprise system has a locked down table structure (good!) so the only way to get data in is via a stored procedure. This stored procedure will only accept a table-valued parameter. Based on Mary’s MSDN white paper, we’ll show you how to do that too.

Hope to see you there, Tuesday May 12th after lunch. We have a few other scenarios to show you, some with Sharepoint (which are really cool), some with agile prototyping, and some using Access reporting for some solutions for annoying power users.

posted on Thursday, April 23, 2009 10:27:03 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Wednesday, April 08, 2009

Get it here.

posted on Wednesday, April 08, 2009 9:13:54 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Thursday, January 29, 2009

Mary Chipman and I are doing a talk together at TechEd in Los Angeles this May on building solutions “without spending any money.” One of the tricks we will show is using an Access front end utilizing TVPs from the back end SQL Server. She posted a blog on the Access team’s blog about it yesterday. Check it out here.

posted on Thursday, January 29, 2009 1:37:56 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, January 20, 2009

I will speaking at the SQL Server User Group on Thursday at 6pm.

You must register to attend: http://www.clicktoattend.com/?id=134822

Location:  Microsoft , 1290 Avenue of the Americas (the AXA building - bet. 51st/52nd Sts.) , 6th floor

Directions: B/D/F/V to 47th-50th Sts./Rockefeller Ctr 1 to 50th St./Bway N/R/W to 49th St./7th Ave.

Session Info:
Applications today are expected to expose their data and consume data-centric services via REST. In this session we discuss ADO .NET Data Services or “Project Astoria” and see how we can REST enable your data. Then you will learn how to leverage existing skills related to LINQ and data access to customize the behavior, control-flow, security model and experience of your data service. We will then see how to enable data-binding to traditional ASP.NET controls as well as Silverlight Then switching gears we will look quickly at consuming of REST services from any platform (including Ruby on Rails) using Visual Studio and the WCF REST Starter kit. We will conclude with a discussion on developing offline applications with the ability to sync back to the online data service. This is a very demo intensive session.

posted on Tuesday, January 20, 2009 11:42:37 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, November 25, 2008

They filmed my talk at TechEd last week and made if available on the internet. You can watch here. Unfortunately the video of the code (about 85% of the session) is not as clear as I would like but you should be able to see it.

posted on Tuesday, November 25, 2008 9:56:58 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1] Trackback
# Saturday, November 08, 2008

Here is a cool TSQL tip I will be showing off next week at TechEd Europe at one of my sessions (Rock Star Common Table Expressions).

Let’s say you have a product table that looks like this:

image

As you can see in the image, we have some duplicate data. You can use a Common Table Expression in SQL Server 2005 or 2008 to look up the “parents” using an aggregate HAVING clause. Then you join to the CTE to the Product table on the Product_Name field as well as the Product_ID field, but using the > indicator, so we only return the “children” or products with a higher product ID. Here is the CTE and code:

--Find the Dupes with a CTE

WITH CTEMinProductRecords

AS

(

 SELECT MIN(Product_ID) AS Product_ID, Product_Name

 FROM Products

 GROUP BY Product_Name

 HAVING COUNT(*) > 1

)

SELECT cte.Product_ID as DupeProductID,

cte.Product_Name as DupeProduct,

p.Product_ID as ParentID, p.Product_Name as ParentProduct,

p.Price as ParentPrice

FROM Products p

JOIN CTEMinProductRecords cte ON

 p.Product_Name = cte.Product_Name

 AND p.Product_ID > cte.Product_ID

Here are the results:

image

Let’s say you want to automatically delete the children. While the business case for this may exist (my old business did have a rule, the higher product id was the dupe), you will want to update all of the “many” tables to the lower product ID first. To then do the delete using the CTE all you have to do is convert the above select statement to a delete statement:

WITH CTEMinProductRecords

AS

(

 SELECT MIN(Product_ID) AS Product_ID, Product_Name

 FROM Products

 GROUP BY Product_Name

 HAVING COUNT(*) > 1

)

DELETE Products

FROM Products p

JOIN CTEMinProductRecords cte ON

 p.Product_Name = cte.Product_Name

 AND p.Product_ID > cte.Product_ID

Enjoy!
posted on Saturday, November 08, 2008 4:30:25 PM (Eastern Standard Time, UTC-05:00)  #    Comments [2] Trackback
# Wednesday, November 05, 2008

Yesterday I showed how there are some new language constructs in TSQL that will be very familiar to C# developers. I will be doing three talks on TSQL next week at TechED Developers in Barcelona and the talks are designed for application developers. (But database geeks will enjoy these sessions as well!) If you write any TSQL code in your applications today, or interact with it, these sessions will give you some advice and help change the way you approach particular problems.

Today I want to show a new feature about using Table Valued Parameters (TVP) in a stored procedure and then calling it from an ASP .NET page or any other .NET client using ADO.NET. A TVP is new in SQL Server 2008, so you can now declare a type of TABLE, insert data into it, and then pass it to a stored procedure. Have you ever passed in a CSV string to a DAL or stored procedure or even an XML data type, then parsed it and executed some SQL? TVP will eliminate that.  Here is an example:

First create a new table for us to play with:

Create Table TVP_TEST

(Field_ID int primary key, Field_DESC varchar(10))

Next create the TVP by declaring a new type. Notice that I have created the type to be exactly the same shape as the underlying table, this is not a requirement, however, it is important to do so if you are going to be doing inserts as we do below.

 

CREATE TYPE TVP_TESTType AS TABLE(

     Field_ID int,

     Field_DESC varchar(10))

 

Next create a variable as type TVP_TESTType and insert some data into it, insert is done the same way we would add data to a normal table. Notice that this only puts data into the variable, NOT the underlying table:

--put data into the type

DECLARE @MyTable as TVP_TESTType

 

--put data into the type

INSERT INTO @MyTable VALUES

     (1, 'Test'), (2, 'Next Test')

 

Now you can insert data into a table very easily (remember @MyTable has to be in scope):

INSERT INTO TVP_TEST

     SELECT * FROM @Mytable

 

To get really useful, let’s create a stored procedure that will accept as a parameter the TVP_TESTType data type and insert data into the TVP_TEST table using the parameter, notice no parsing of CSV strings, etc:

CREATE PROCEDURE usp_ins_MyTable

(@MyTVP as TVP_TESTType READONLY)

As

INSERT INTO TVP_TEST

     SELECT * FROM @MyTVP

You can call this stored procedure as so, as long as @MyTable is in scope and is filled with data:

 

exec usp_ins_MyTable @MyTVP=@MyTable

But what is really exciting is calling this data from a .NET client. If you have an ADO .NET dataset that is the same shape of the TVP you can pass it in from the client to the server, greatly enhancing data access and batch operations.You can pass in a dataset as a parameter using the SqlDbType.Structured as its data type.

 

Here is some code from C#, just remember that ds is our active dataset filled with data and shaped the same way as TVP_TESTType:

 

SqlCommand cmd = new SqlCommand("usp_ins_Portfolio", conn);

cmd.CommandType = CommandType.StoredProcedure;

//add the ds here as a tvp

SqlParameter sp = cmd.Parameters.AddWithValue("@Portfolio", ds.Tables[0]);

//notice structured

sp.SqlDbType = SqlDbType.Structured;

cmd.ExecuteNonQuery();

Pretty easy! Now you can pass data sets in as parameters to a stored procedure, eliminating some funky parsing on either the client or server.

posted on Wednesday, November 05, 2008 10:59:27 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1] Trackback
# Tuesday, November 04, 2008

I am doing a talk next week at TechED Developers in Barcelona about new TSQL enhancements for developers. The session is:

 DAT313 T-SQL Enhancements in SQL Server 2008 : Getting Down and Dirty. it is Wednesday at 1:30 in room 115.

The goal of this talk is to show you some cool new stuff in TSQL but also to show C# and VB developers that TSQL is not all that bad. I get the impression that developers hate TSQL. :)

So the TSQL designers decided to add a few “delighters” to the syntax to make C# and VB developers feel more at home.

The first is variable assignment at declaration time. For example:

DECLARE @MSFT char(4) = 'MSFT'

Print 'Using: ' + @MSFT

The next is using shortcut syntax like += or –+, etc:

UPDATE Portfolio

     SET Qty += 10

     WHERE Symbol='MSFT'

The addition of Table Value Parameters (TVPs) allows you to pass in datasets via ADO.NET in C#/VB as a parameter to a stored procedure, etc, will really  make C# developers life easier (demo soon on the blog). Also cool integration with Google Maps or Virtual Earth and the geospacial data types will make developing apps with location much easier.

There are lots of other cool new features in there for developers, I will post some more on the blog soon.

posted on Tuesday, November 04, 2008 10:20:51 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Wednesday, October 29, 2008

We take this break from the PDC to let you know that our book is out today. Enjoy!

 

image

posted on Wednesday, October 29, 2008 10:51:10 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Monday, June 30, 2008

Yesterday was my last day of MBA classes and needless to say I was out partying hard last night. I woke up early for some strange reason and decided to get a jump on my book chapters for the upcoming Programming SQL Server 2008 from Microsoft Press.

My VPC died on me so I decided to give SQL Server 2008 RC0 an install on a spare machine (what better to do on a Sunday morning with a hangover). I remember reading that if you have SQL Server 2005 Express edition installed you will have trouble installing RC0. So I went in and uninstalled everything on a spare machine, too hungover to reformat. I figured I am an MBA now, I should make myself more efficient.

Well, even with nothing left of SQL Server 2005, I still got a very mean error message (well after I entered in all the account, mixed mode, information):

"The SQL Server 2005 Express Tools are installed.  To continue, remove the SQL Server 2005 Express Tools."

I was not happy since I had already removed it. So I started removing Visual Studio 2005 too. That did not work. So I decided the next thing to do was Google. Someone else had to have had this pain already. Found this post from Jason Follas. He said that if you have Red Gate SQL Prompt software installed, as I did, it leaves a bad registry key and showed how to fix it. Read the blog post here.

I can't live without Red Gate SQL Compare and SQL Data Compare so I did not mind losing some time. They save me day and days of time over the year. The good news is that Red Gate left a note on Jason's blog that the issue has been resolved.

I guess the MBA was lost on me. Oh well, at least I can write TSQL and google for install problems...

posted on Monday, June 30, 2008 12:01:05 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Friday, June 27, 2008

Unless you have been living under a rock, you should already know about the controversy over the nHibernate mafia's Microsoft Entity Framework's Vote of No Confidence.  The manifesto says that the upcoming Entity Framework (to be released soon via .NET 3.5 SP1) is just not usable. As far as ORMs go, there are some valid points working against EF- in order of (my) importance:

  • Lack of Lazy Loading (The EF team said that they had "explicit" lazy loading, but that is just playing with words and the team has since retracted from that.)
  • Lack of persistence ignorance (at best this is inelegant design, at worst it will cripple the performance of your application and make it a burden to maintain)
  • Too many merge conflicts when using a model in a source control environment
  • Bad design for TDD development (and in my opinion, CI as well)

The manifesto also brings up some Domain Driven Design issues that I feel are less important than the above four bullets. To their credit the normally reserved EF team has responded quickly and nicely to the manifesto. On the heels of last week's transparent-design process announcement, Tim Mallalieu gave a detailed reply which has started a good discussion and wiki. They are listening and that is good.  Since the Entity Framework is starting work this week on v.next, they have also put together an advisory council including Eric Evans, Martin Fowler, Paul Hruby, Jimmy Nilsson, and myself.

To some degree the vote of no confidence worked, they got Tim's attention. I think that the manifesto has some very valid points, but I also think it lacks professionalism. Anyone can say something sucks, it is more important to give valid feedback, suggest solutions, and engage in a dialog. (The vote of no confidence was just too in your face-and a dialog was stated only because of the professionalism of the EF team.) In addition there are some mafia style attacks on anyone who does not agree with them, most recently against the always honest and open Julie Lerman.

So this blog post sounds like an endorsement of the Entity Framework and ORMs in general, right?

Wrong.

My first problem with ORMs in general is that they force you into a "objects first" box. Design your application and then click a button and magically all the data modeling and data access code will work itself out. This is wrong because it makes you very application centric and a lot of times a database model is going to support far more than your application. In addition an SOA environment will also conflict with ORM.

I prefer to build the application's object model and the data model at about the same time, with a "whiteboarding" approach that outlines the flows of data and functionality across the business process and problem set. Maybe it is the MBA talking but I tend to be "business and customers first" when I design a system. (Those of you that know me know that I have designed some very large and scalable systems in my day.) I usually like to "follow the money" as Ron Rose of Priceline taught me 10 years ago.  In addition, I prefer a very agile design process where you are constantly making changes to your design in the early sprints, the database and object model are both part of this process.

My second major problem with ORMs is that ORMs are a solution to a problem that should not be solved. Developers who write object oriented and procedural code like C# and Java have trouble learning the set-based mathematics theory that govern the SQL language. Developers are just plain old lazy and don't want to code SQL since it is too "hard." That is why you see bad T-SQL: developers try to solve it their way, not in a set-based way.

The premise of EF, LINQ, nHibernate, and Luca Bolongese’s original premise with ObjectSpaces, is that set-based theory causes an “impedance mismatch” between data access and all the other (more procedural) coding we do.  And it’s ORMs to the rescue to resolve the impedance mismatch.

So ORMs are trying to solve the issue of data access in a way that C# and VB developers can understand: objects, procedural, etc.  That is why they are doomed to fail. The further you abstract the developer from thinking in a set-based way and have them write in a procedural way and have the computer (ORM) convert it to a set-based way, the worse we will be off over time.

What I am saying (and have been saying for a long time) is that we should accept, no, embrace the impedance mismatch!  While others are saying we should eradicate it, I say embrace it.

ORM tools should evolve to get closer to the database, not further away.

One of the biggest hassles I see with LINQ to SQL is the typical many-to-many problem. If I have a table of Ocean Liners, vessels,  and ports, I’ll typically have a relational linking table to connect the vessels and ports via a sailing. (Can you tell I am working with Ocean Freight at the moment?) The last thing I want at the object layer is three tables! (And then another table to look up the Ocean Liner that operates the vessel.) Unfortunately, this is what most tools give me. Actually I don't even want one table, I want to hook object functionality to underlying stored procedures. I really want a port object with a vessel collection that also contains the ocean liner information. At least the Entity Framework does this however I have major concerns about the performance of the code it produces.

The ironic thing I’m now seeing is developers who are lazy and don't want to learn SQL using tools that will produce SQL for them. The SQL is bad, and now those same anti-SQL lazy developers are struggling to read pages of generated and near-unreadable SQL trying to solve performance problems. They’re dealing with SQL that’s more verbose and orders of magnitude harder to understand than what was needed in the first place!

So where do we go from here? We can't just keep pretending that this mismatch problem can be solved and keep throwing good money after bad. As Ted Neward said that is the Vietnam of Computer Science.

I say that developers should start embracing the impedance mismatch. I also say that Microsoft and other ORM vendors need to realize that embracing the mismatch is a good thing and design ORM tools that allow this. (This is the advice I will give at the EF council.) I am not anti-ORM (even tho I did trash LINQ on stage at TechEd last year), but I am pro using the right tool/technology for the right job. ORM is real good for CRUD and real bad at other things. Let's start from there. To quote Ted, "Developers [should] simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems."

Hopefully "cooler heads will prevail."

 

Update 6/29/08

My fellow RD Greg Low brought to my attention that my discussion on Linq to SQL above is very similar to an email he sent not to long ago. At Triton Works (the ocean containers) we used his example in our own entity model debate and I realized after he pointed it out to me that the words above are just too close to his, so I am giving him the credit above. (Though he did not call programmers lazy and anti-SQL, I did.)

FYI, we went with POAN or Plain Old ADO.NET, ocean containers are old school.

The inspiration for this post came from not only the public debate on the EF vote of no confidence but also a private email discussion on nHybernate, EF and Linq to SQL between:  Greg, Andrew Brust, Rocky Lhotka, Sten Sundbland, Joel Semeniuk, Barry Gervin, and myself. I won't tell you who was on what side. If you want to know, you will have to ask them. Hint, hint, Canadians are lazy and anti-SQL. :)

posted on Friday, June 27, 2008 9:04:49 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [4] Trackback
# Friday, February 29, 2008

Yesterday we discussed the changes to XQuery by allowing LET statements. Today the last XML enhancement is an "enhancement" to XML DML.

XML Data Manipulation Language (DML) is specific to MS SQL Server 2005 and 2008. It is a very useful feature that allows you to insert nodes to an existing XML instance, delete an element, or replace the value of (sort of an edit.) It works like this:

First let's create a table with an XML data type and insert a piece of XML into it:

Create table XMLTest1
    (
    Speaker_ID int primary key,
    SpeakerXML xml not null
    )

GO

Insert into XMLTest1
Values
(
    1,
    '
    <classes>
        <class name="Writing Secure Code for ASP .NET " />
        <class name="Using XQuery in SQL Server 2008" />
        <class name="SQL Server and Oracle Working Together" />
    </classes>
    '
    )

Now to insert an element into this table it is pretty easy, just use the XML modify method of the XML data type. Since our XML instance is in a SQL Server table we have to use the standard SQL UPDATE and SET syntax with a WHERE clause to get to the piece of XML in our table we are concerned with. You can see how easy it is to add an element to the table here:

Update XMLTest1
Set SpeakerXML.modify(
'insert
<class name="Using Linq to SQL" />
into /classes[1]'
)
Where Speaker_ID=1

If we select * from XMLTest1 WHERE Speaker_ID=1 the XML column looks like this now:

<classes>
  <class name="Writing Secure Code for ASP .NET " />
  <class name="Using XQuery in SQL Server 2008" />
  <class name="SQL Server and Oracle Working Together" />
  <class name="Using Linq to SQL" />
</classes>

SQL Server 2008 has an enhancement to this. You can substitute the XML element with a variable. For example let's create a variable here:

DECLARE @newElement xml
Set @newElement='<class name="ASP.NET Scalability"/>'

Now let's do the same insert as before, however, we will use the @newElement variable. This will enable you to select XML from other areas and stick it into a variable and then insert it into another XML instance. Here is the final piece:

Update XMLTest1
Set SpeakerXML.modify(
'insert
sql:variable("@newElement") 
into /classes[1]'
)
Where Speaker_ID=1

If we select * from XMLTest1 WHERE Speaker_ID=1 the XML column looks like this now:

<classes>
  <class name="Writing Secure Code for ASP .NET " />
  <class name="Using XQuery in SQL Server 2008" />
  <class name="SQL Server and Oracle Working Together" />
  <class name="Using Linq to SQL" />
  <class name="ASP.NET Scalability" />
</classes>

Not the greatest of new features, but a handy dandy feature to say the least.

posted on Friday, February 29, 2008 12:00:54 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1] Trackback
# Thursday, February 28, 2008

Continuing our discussion of XML in SQL Server 2008, SQL Server 2005 alowed us to use XQuery to query XML data in an XML data type.  You can use the FLWOR statements of for, where, order by, and retun, but not LET. SQL Server 2008 now has support for LET, which is used to assign values to variables in an XQuery expression. For example we are going to take the XML below and count the number of sessions per speaker using the count XQuery expression and assign it to a variable via LET.

declare @xml xml
set @xml=
'
<Speakers>
    <Speaker name="Stephen Forte">
        <classes>
            <class name="Writing Secure Code for ASP .NET "/>
            <class name="Using XQuery in SQL Server 2008"/>
            <class name="SQL Server and Oracle Working Together"/>
        </classes>
    </Speaker>
    <Speaker name="Richard Campbell">
        <classes>
            <class name="SQL Server Profiler"/>
            <class name="Advanced SQL Querying Techniques"/>
            <class name="SQL Server and Oracle Working Together"/>
            <class name="From 1 Server to 2"/>
        </classes>
    </Speaker>
</Speakers>
'
SELECT @xml.query(
'<Speakers>
{
for $Speaker in /Speakers/Speaker
let $count :=count($Speaker/classes/class)
order by $count descending
return
<Speaker>
{$Speaker/@name}
{$Speaker/count}
<SessionCount>{$count}</SessionCount>
</Speaker>
}
</Speakers>')

The result is shown here, notice that Richard sorts on top since we used the descending order to our order by and ordered by the expression we created via the LET statement. This is very helpful if you are manipulating simple XML and need to do a mini-transform via XQuery. I used to use the RETURN statement all the time to make slight manipulations to my XML, so I was waiting for LET. While this is the only addition to XQuery, it is a much needed and welcome one.

<Speakers>
  <Speaker name="Richard Campbell">
    <SessionCount>4</SessionCount>
  </Speaker>
  <Speaker name="Stephen Forte">
    <SessionCount>3</SessionCount>
  </Speaker>
</Speakers>

posted on Thursday, February 28, 2008 12:00:35 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1] Trackback
# Wednesday, February 27, 2008

Continuing our discussion from yesterday on SQL Server 2008 XML Enhancements, let's take a look at the XSD enhancement of Union and List types. In SQL Server 2005 you could define a simple type of xsd:list with an enumeration to require items in a list for any element that inherits that type. For example our XSD below will allow FedEx, DHL, and UPS for a shipping.

create XML SCHEMA COLLECTION dbo.order_xsd
AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:simpleType name="orderAmt" >
     <xsd:restriction base="xsd:int" >
       <xsd:maxInclusive value="5000" />
       <xsd:minExclusive value="1" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:simpleType name="shiptypeList">
    <xsd:list>
        <xsd:simpleType>
          <xsd:restriction base="xsd:string">
            <xsd:enumeration value="FexEx"/>
            <xsd:enumeration value="DHL"/>
            <xsd:enumeration value="UPS"/>
          </xsd:restriction>
        </xsd:simpleType>
    </xsd:list>
    </xsd:simpleType>
   <xsd:element name="Order">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="CustomerName" type="xsd:string" />
        <xsd:element name="OrderDate" type="xsd:dateTime"/>
        <xsd:element name="OrderAmt" type="orderAmt"/>
        <xsd:element name="ShipType" type="shiptypeList"/>   
      </xsd:sequence>
    </xsd:complexType>
   </xsd:element> 
</xsd:schema>'
GO

DECLARE @xml XML(dbo.order_xsd)
SET @xml =
'<Order>  
    <CustomerName>Bill Gates</CustomerName>
    <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
    <OrderAmt>100</OrderAmt>
    <ShipType>DHL</ShipType>
</Order>'
GO

So this is cool, gives you some database lookup functionality, kind of third normal form Clemens Vasters style. With the new support for union of lists in SQL Server 2008 with xsd:union, you can combine multiple lists to one simple type. For example in our XSD below we allow the same strings as FedEx, etc, but also numerical values.

My example below is pretty basic, but this is useful if you have more than one way to describe something and need two lists to do so. One thing that comes to mind is units of measurement, metric and English measurements, so shoe sizes, US and Italian, etc.  This is very useful when you need to restrict items and are writing them from a database, etc.

Take a look at the demo for SQL 2008 below, notice that I changed the data to 1 from FedEx:

create XML SCHEMA COLLECTION dbo.order_xsd
AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:simpleType name="orderAmt" >
     <xsd:restriction base="xsd:int" >
       <xsd:maxInclusive value="5000" />
       <xsd:minExclusive value="1" />
     </xsd:restriction>
   </xsd:simpleType>
    <xsd:simpleType name="shiptypeList">
        <xsd:union>
            <xsd:simpleType>
              <xsd:list>
                <xsd:simpleType>
                  <xsd:restriction base="xsd:integer">
                    <xsd:enumeration value="1"/>
                    <xsd:enumeration value="2"/>
                    <xsd:enumeration value="3"/>
                  </xsd:restriction>
                </xsd:simpleType>
              </xsd:list>
            </xsd:simpleType>
            <xsd:simpleType>
              <xsd:list>
                <xsd:simpleType>
                  <xsd:restriction base="xsd:string">
                    <xsd:enumeration value="FedEx"/>
                    <xsd:enumeration value="DHL"/>
                    <xsd:enumeration value="UPS"/>
                  </xsd:restriction>
                </xsd:simpleType>
              </xsd:list>
            </xsd:simpleType>
        </xsd:union>
    </xsd:simpleType>

   <xsd:element name="Order">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="CustomerName" type="xsd:string" />
        <xsd:element name="OrderDate" type="xsd:dateTime"/>
        <xsd:element name="OrderAmt" type="orderAmt"/>   
        <xsd:element name="ShipType" type="shiptypeList"/>
        <xsd:any namespace="##other" processContents="lax"/>   
      </xsd:sequence>
    </xsd:complexType>
   </xsd:element> 
</xsd:schema>'
GO

DECLARE @xml XML(dbo.order_xsd)
SET @xml =
'<Order>  
    <CustomerName>Bill Gates</CustomerName>
    <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
    <OrderAmt>100</OrderAmt>
    <ShipType>1</ShipType>
    <Notes xmlns="sf">Steve Test 1</Notes>
</Order>'
GO

posted on Wednesday, February 27, 2008 12:00:16 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, February 26, 2008

The SQL Server 2008 February CTP is now installed. I did it on my daily machine to force me to use it in a more realistic environment. It installed a-ok with my default SQL 2005 instance. As you can see below you can use the SQL 2008 IDE to manage and work with both SQL Server instances on your computer at the same time.

image

I am playing with the new XML features of SQL Server. For those of you that know my old company, Corzen, used the XML data type to do some really cool stuff. I'm now investigating the XML features for the update to the Programming SQL Server 2008 book from MS Press.

I guess that XLINQ is pretty cool and well accepted Microsoft did not change all that much in the XML world in SQL Server 2008. The new XML features of SQL Server 2008 are:

  • Lax validation support in XSD-giving you some more flexibility with your required areas of the XSD
  • Full support of the XSD: xs:dateTime, basically the preservation of time zones
  • Union and List types in XSDs (I have not played with this yet so can't speak to how useful it will be yet, give me a day)
  • Support of the LET FLOWR expression in XQuery (yay!)
  • XML DML "enhancements." I put that in quotes for two reasons, there is only one true enhancement, allowing you to insert data from a SQL column or a SQL variable as part of your insert. This is a very minor enhancement, I was hoping for a new syntax for specifying position, etc. Also XML DML is SQL Server specific, so it is hard to get very excited about this feature.

Now let's take a look at some of the XSD enhancements. SQL Server 2005 added the ability put XML data into an intrinsic XML data type. This was a vast improvement over SQL Server 2000 where we had to put data in text fields. In addition, SQL Server 2005 allowed us to restrict this column with an XSD. Consider this XSD:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:simpleType name="orderAmt" >
     <xsd:restriction base="xsd:int" >
       <xsd:maxInclusive value="5000" />
       <xsd:minExclusive value="1" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="Order">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="CustomerName" type="xsd:string" />
        <xsd:element name="OrderDate" type="xsd:dateTime"/>
        <xsd:element name="OrderAmt" type="orderAmt"/>
      </xsd:sequence>
    </xsd:complexType>
   </xsd:element> 
</xsd:schema>

We have defined a few elements, given them a sequence and data types. We also say that you can only have an order amount between 1 and 5000 (work with me, I like simple demos) since the OrderAmt element inherits the simple type orderAmt. This works well, for example if we create an XSD and a table with an XML column bound to that XSD in SQL 2005:

--Valid SQL Server 2005 Schema
CREATE XML SCHEMA COLLECTION dbo.order_xsd
AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:simpleType name="orderAmt" >
     <xsd:restriction base="xsd:int" >
       <xsd:maxInclusive value="5000" />
       <xsd:minExclusive value="1" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="Order">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="CustomerName" type="xsd:string" />
        <xsd:element name="OrderDate" type="xsd:dateTime"/>
        <xsd:element name="OrderAmt" type="orderAmt"/>
      </xsd:sequence>
    </xsd:complexType>
   </xsd:element> 
</xsd:schema>'
GO

CREATE TABLE xml_schematest (
   Order_ID int primary key,
   Order_XML XML(order_xsd) --XML Schema Name
)

Now let's insert some XML in there, the XML will be like this:

Insert into xml_schematest
VALUES
(1,
'<Order>  
    <CustomerName>Bill Gates</CustomerName>
    <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
    <OrderAmt>100</OrderAmt>
</Order>
')

Here is the result, notice the Z for UTC as well as the new time (more on that below):

<Order>
  <CustomerName>Bill Gates</CustomerName>
  <OrderDate>2008-10-10T19:22:27.250Z</OrderDate>
  <OrderAmt>100</OrderAmt>
</Order>

This XML will fail since it has an order over 5000.

Insert into xml_schematest
VALUES
(2,
'<Order>  
    <CustomerName>Bill Gates</CustomerName>
    <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
    <OrderAmt>10000</OrderAmt>
</Order>
')

So life is good. We have a nice XSD and it restricts our content. SQL Server 2008 has added two new features of an XSD that we will look at today. The first is something called Lax validation. Let's say that you wanted to add an additional element after <OrderAmt> that is not part of the same schema. In SQL Server 2005, schemas can use processContents values of skip and strict for any and anyAttribute declarations as a wildcard. If it was set to skip, SQL will skip completely the validation of the additional element, if it is set to strict, SQL will require that it has an element or namespace defined in the current schema. SQL Server 2008 adds support for an additional validation option: lax. By setting the processContents attribute for this wildcard section to lax, you can enforce validation for any elements that have a schema associated with them, but ignore any elements that are not defined in the schema. Pretty cool, no?

Also the xsd:dateTime XML data type is now timezone aware. SQL Server 2005 you had to provide a time zone for dateTime, time and date data (did you really ever do that? If you needed the flexibility, trust me, this was a pain). What was not cool was that SQL Server 2005 did not preserve the time zone information for your data for dateTime or time,  it normalizes it to UTC (so for example 2008-10-10T08:00:00:000-5:00 is normalized to 2008-10-10T13:00:00:000Z. Notice the Z for UTC. Quite annoying.) In SQL Server 2008, this has been removed! Take a look at this code:

CREATE XML SCHEMA COLLECTION dbo.order_xsd
AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <xsd:simpleType name="orderAmt" >
     <xsd:restriction base="xsd:int" >
       <xsd:maxInclusive value="5000" />
       <xsd:minExclusive value="1" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="Order">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="CustomerName" type="xsd:string" />
        <xsd:element name="OrderDate" type="xsd:dateTime"/>
        <xsd:element name="OrderAmt" type="orderAmt"/>   
        <xsd:any namespace="##other" processContents="lax"/>   
      </xsd:sequence>
    </xsd:complexType>
   </xsd:element> 
</xsd:schema>'
GO
CREATE TABLE xml_schematest (
   Order_ID int primary key,
   Order_XML XML(order_xsd) --XML Schema Name
)
GO

Now this will work and even preserve the time zone of -5 (New York) and notice the random <Notes> element due to lax validation. Here is how to insert the XML:

Insert into xml_schematest
VALUES
(1,
'<Order>  
    <CustomerName>Bill Gates</CustomerName>
    <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
    <OrderAmt>100</OrderAmt>
    <Notes xmlns="sf">Steve Test 1</Notes>
</Order>
')

Here are the results in the database with the original time zone:

<Order>
  <CustomerName>Bill Gates</CustomerName>
  <OrderDate>2008-10-10T14:22:27.25-05:00</OrderDate>
  <OrderAmt>100</OrderAmt>
  <Notes xmlns="sf">Steve Test 1</Notes>
</Order>

Tomorrow I will look at Union and List types.

posted on Tuesday, February 26, 2008 12:00:36 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Wednesday, May 16, 2007

Take a look at Andrew's and my book in French, Polish and Italian!

Enjoy!

posted on Wednesday, May 16, 2007 1:39:02 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, February 06, 2007

The MDC is here!

The code and slides for SQL CLR In Action session are here. The code for Advanced TSQL is here.

posted on Tuesday, February 06, 2007 4:43:13 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Thursday, January 25, 2007

Code for T_SQL kullanımını kolaylaştıran ve hızlandıran, bilinmeyen teknikler is here. Enjoy!

posted on Thursday, January 25, 2007 9:16:48 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, December 12, 2006

We are hiring a DBA at my company Corzen and instead of looking at resumes/cvs we are looking at how people solve particular SQL problems-not just the solution but the approach they take. (Do you use temp tables, (nolock), CTEs, table alias, etc.)

Here are the five test questions we ask. (If you want to come work here, just email me the answers.)

Here are three TSQL problems that can judge how you approach a SQL problem. There are no real wrong answers, but the more efficient and bulletproof the query is the better. No cheating and no cursors! Don't cheat and don't google the answers (we'll know.)

 

Step 1: run SetupTSQLProblems.sql in any database.

 

Problem 1.

Finding Duplicate Products in the Product table. Write a query to list only the duplicated products in the Products table. Assume that the dupe is the product with the highest Product_ID. The results of your SQL statement should be:

 

Dupe_Product_ID

Dupe_Product_Name

Dupe_Price

Parent_Product_ID

Parent_Product_Name

7

Widgets

88

1

Widgets

 

 

Problem 2.

Assign rooms to classes based on capacity using the Classes and Rooms tables. Rules: each class should have a room (and NULL if a room is not available). No class can be in a room where there are more students than capacity. No room can be used twice.

 

The results of your SQL statement should be:

class

students

room

capacity

Advanced SQL Queries

65

225

70

Introduction to C#

52

224

60

XQuery Deep Dive

35

222

40

 

 

Problem 3.

Make the following static pivot dynamic in AdventureWorks, assume that the orderyear can increment to 2005 and 2006. Assume that order years can skip.

 

SELECT *

FROM (SELECT CustomerID, YEAR(OrderDate) AS orderyear, TotalDue

      FROM Sales.SalesOrderHeader) as Header

PIVOT(SUM(TotalDue) FOR orderyear IN([2002],[2003],[2004])) as Piv

 

Problem 4.

You have a piece of XML:

<office id="NYC">

  <Employees>

    <employee num="1">Steveemployee>

    <employee num="2">Alemployee>

    <employee num="3">Mikeemployee>

    <employee num="4">Malishaemployee>

  Employees>

office>

<office id="Cairo">

  <Employees>

    <employee num="1">Remonemployee>

    <employee num="2">Lameesemployee>

    <employee num="3">Bassmaemployee>

  Employees>

office>

<office id="Pune">

  <Employees>

    <employee num="1">Vinayemployee>

    <employee num="2">Pradeepemployee>

    <employee num="3">Rashmiemployee>

    <employee num="4">Sujataemployee>

    <employee num="5">Ajitemployee>

  Employees>

office>

<office id="Karachi">

  <Employees>

    <employee num="1">Amratemployee>

  Employees>

office>

 

A. Using the XML datatype, write an XQuery to return the 3rd employee in the NYC office. Your results should look like:

 

EmployeeName

Mike

 

B. Using the XML datatype, write an XQuery to return the 3rd employee in each office. Your results should look like:

EmployeeName

Mike

Bassma

Rashmi

 

Problem 5.

Using adventureworks, rank each salesperson by total sales in 2004 with the rank starting over for each country. Do not use temp tables, self joins, or cursors.

 

CountryName

SalesPersonID

TotalSales

SalesRank

Australia

290

955311.5851

1

Australia

288

35006.4385

2

Canada

285

2411105.152

1

Canada

278

755593.2997

2

Canada

268

15655.2241

3

France

286

1832662.467

1

France

284

2278.3269

2

Germany

289

1121896.781

1

Germany

284

74363.4408

2

United Kingdom

282

1648152.541

1

United Kingdom

284

54036.9765

2

United States

276

2534645.3

1

United States

275

2053782.757

2

United States

277

1824701.882

3

United States

279

1446386.797

4

United States

281

1429353.893

5

United States

287

1114278.877

6

United States

283

901142.2512

7

United States

280

871513.6294

8

United States

268

318200.2683

9

 

Enjoy!

posted on Tuesday, December 12, 2006 7:08:14 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, November 07, 2006

Microsoft is releasing the SQL Server 2005 Service Pack 2 Community Technology Preview November 2006.   The CTP release is scheduled for Tuesday, November 7th 

Some bullets:

§  Support for the upcoming Windows Vista.

§  Data Mining Add-Ins for Office 2007, which enables data mining functionality from SSAS to be used directly within Excel 2007 and Visio 2007.

§  SSRS integration with MOSS 2007, which allows integration with the Report Center in SharePoint providing seamless consumption and management of SSRS reports within SharePoint.

§  SSAS improvements for Excel 2007 and Excel Services relating to performance and functionality. 

Heterogeneous Environments (Interoperability):

§  Oracle Support in Report Builder. Users will now be able to use Report Builder on top of Oracle data sources.

§  Hyperion support with SSRS. Organizations will now be able to use SSRS to build reports on top of Hyperion Essbase cubes.  

Performance/Enterprise:

§  Data compression (varDecimal), which is important for data warehouse scenarios and is specifically important for SAP BW scenarios. This requires less disk storage of decimal data which increases overall performance.

§  Manageability enhancements. Based on customer feedback, SQL Server provides enhanced management capabilities for DBAs such as improvements in database maintenance plans, enhanced management reports and a new copy database wizard. 

SQL Server Express Edition

§  Management reports have been added to SQL Server Express Edition enabling customers to get insights into the performance of their Express Edition and SQL Server Compact Edition databases.

§  SQL Server Management Studio Express Edition now enables management of SQL Server Compact Edition databases. 

SQL Server 2005 SP2 will ship shortly after the launch of Windows Vista and Office.  You can find more details about the features in SP2 at http://go.microsoft.com/fwlink/?LinkId=71711.

posted on Tuesday, November 07, 2006 12:07:06 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, October 24, 2006

I will be speaking at the NY Metro SQL Server Users Group on Thursday at 6pm on XQuery in SQL Server. Hope to see you all there.

Topic:

Using XQuery to Retrieve and Manipulate XML Data with SQL Server 2005

Speaker:

Stephen Forte, Microsoft Regional Director

Date:

6:00 PM on October 26, 2006

Place:

Microsoft Office in Manhattan

 

The Axa Financial Building

 

1290 6th Avenue, NY, NY

Due to new security guidelines at the building, you will have an easier time getting in if you confirm your attendance via email to joelax@dbdirections.com. Otherwise you'll have to wait till someone comes downstairs to sign you in. Also remember to have a photo id with you.


Blogs, Web Services and general interoperability have proliferated the use of XML in recent years. With all of that XML out there, there needs to be an easy way to incorporate XML data with SQL Server relational data.

This session will look at how to use XQuery to retrieve and manipulate XML data inside the database. We'll start with a look at the new XML datatype in SQL Server 2005, then the ability to validate with XML Schema (XSD) and then creating XML indexes for use with XQuery statements. After a brief look at the W3C XQuery specification we quickly move to SQL Server’s implementation of XQuery 1.0. We'll incorporate XQuery in SELECT and WHERE clauses to retrieve information as well as see how to manipulate XML data with XQuery DML.

Pizza and refreshment will be served at the meeting, and there will be a drawing for several giveaways.   

posted on Tuesday, October 24, 2006 8:29:58 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Sunday, May 28, 2006

Finally, after 4 years of pain and suffering, Programming SQL Server 2005 has finally been shipped to the printer. Should be out in stores in about a month or less. Wow.

posted on Sunday, May 28, 2006 1:19:18 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Wednesday, May 24, 2006

Red Gate has released SQL Prompt 2.0, a pretty cool auto-complete and Intellisense for SQL Server Management Studio and Query Analyzer. The best part: t is FREE! Download it here.

 

When you load it up and connect to a database, it will ask you to connect.

Now when you are typing code in SQL you have real auto-complete and Intellisense!

posted on Wednesday, May 24, 2006 11:15:44 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, October 11, 2005

When I installed the latest beta of SQL Server 2005 and ran the Merge Replication wizards to create a Publication, I got a strange new error not present in previous betas. The error said that the “XP Agents are not enabled and you must run sp_configure to enable them.”

I figured that MS did the right thing and turned this feature off by default. I ran the new SQL Server Surface Area Configuration Tool and could not find anything for enabling XP Agent stored procedures, etc. I did find a lot of good stuff disabled by default that will make SQL Server more secure. This is good since by default out of the box an installation of SQL Server will leave some of the advanced, yet powerful tools disabled to deny a hacker the ability to guarantee that it will be turned on.

So in my case, MS turned it (XP Agents) off but do not give a way in the tool to enable it. After some trial and error and help from help I got this to make it all work and my publication agent ran smoothly:

You can't just run sp_configure, you first have to turn on advanced options so SQL Server knows it exits:

use master
go
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Agent XPs', 1;
go
RECONFIGURE
GO

It is good that this stuff is turned off by default, it will just mean a little more time for us developers to setup and deploy. Worth the tradeoff.

posted on Tuesday, October 11, 2005 8:15:40 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Saturday, October 08, 2005

SQL Server 2005 allows you to store XML in a native data type. That is cool, however, having XML in the database is almost useless unless you can query the elements and attributes of the XML data natively. XQuery becomes very useful when you can use it to search based on the values of a particular element or attribute. The xml.exist() function accepts an XQuery as input and returns 0, 1, or NULL, depending on the result of the query; 0 is returned if no elements match, 1 is returned if there is a match, and NULL is returned if there is no data to query on. For example we will verify if a node exists in this particular XML string of classes.

DECLARE @XML xml

Set @XML='

<classes>

   <class name="SQL Server Index"/>

   <class name="SQL Precon"/>

</classes>

'

Select @XML.exist('/classes')

 

The code above returns a 1 since the “classes” element exists in the XML variable. If you change the XQuery expression to search for an XML node that does not exist like,  Select @XML.exist('/dogs), then it will return 0.

You can see this in action as part of a CHECK CONSTRAINT. SQL Server will not allow you to use an xml.exist as part of a CHECK CONSTRAINT. So you have to first create a user defined function (UDF) to perform the action. This UDF accepts an XML field and retunes the value of an xml.exist() method looking for an instance of < Orders'>:

USE AdventureWorks

GO

CREATE FUNCTION dbo.DoesOrderXMLDataExist        

(@XML XML)                    

RETURNS bit                       

AS

BEGIN                              

RETURN @XML.exist('/Orders')   

END;

GO

 

            To use this UDF as a CHECK CONSTRAINT, just create a table and pass the column you want to apply the constraint to the UDF you just created.

--create the table using the function

CREATE TABLE OrdersXMLCheck

   (OrderDocID INT PRIMARY KEY,

   xOrders XML NOT NULL Default '<Orders/>'

 CONSTRAINT xml_orderconstraint

  CHECK(dbo.DoesOrderXMLDataExist(xOrders)=1))

 
Its that simple, now you have a rule enforced on that column making sure that an <Order> element is added to this table.
posted on Saturday, October 08, 2005 3:56:06 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [2] Trackback
# Tuesday, August 09, 2005

SQL Server 2005 Notification Services is what you would traditionally call “middle ware” or part of an application server. SQL Server is traditionally a “back end” or an infrastructure component to your architecture. SSNS is middleware provided by Microsoft as part of the SQL Server 2005 package. Microsoft has not traditionally provided a lot of middleware, but ever since the success of “Viper” or Microsoft Transaction Server (MTS), Microsoft has been providing more and more reliable and scalable middleware. Since SSNS is middleware and not part of the core database engine, SSNS is a separate setup option that is turned off by default when you initially set up your SQL Server 2005 server. You will need to select SSNS to install it at initial setup. 

If you choose to install SQL Server 2005 Notification Services, all of its components and support files install in a separate subdirectory (typically the \Notification Services subdirectory of your default SQL Server installation.). If you are like me and need to know what everything is, you can see by inspecting the subdirectory that SSNS is made up of a few components, the important ones are explained here:

microsoft.sqlserver.notificationservices.dll-this is the actual guts of Notification Services. A managed .NET assembly that contains the core code for SSNS and the built-in providers you use to retrieve data and create notifications and subscriptions.

NSService.exe-the executable shell of microsoft.sqlserver.notificationservices.dll used for running as a Windows Service. Each instance of NSService.exe runs as a Windows Service and can be managed by SQL Management Studio. Instances of NSService.exe are independent of SQL Server 2005 instances.

Providers-is the extensible framework that SSNS is built around: event providers, formatters and delivery protocol providers.

Nscontrol.exe-this is a utility program that is used when you compile your application and generates the SQL Server Database and objects that the Windows Service application uses to retrieve data and create notifications.

XML Schemas-When you generate a SSNS application, you use configuration files that are XML based. The schemas provided as part of the framework validate those documents.

Sample applications-more than most other tools, the community has used the sample applications to fully show the power of Notification Services. We will explore them as well.

This is good stuff. More later..

posted on Tuesday, August 09, 2005 4:03:48 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, June 07, 2005

SQL Sever 2005 will ship on NOvember 7th. It has been a long time, worth the wait. Go get the current beta, released today: http://msdn.microsoft.com/

posted on Tuesday, June 07, 2005 10:36:53 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, May 31, 2005

 

So last night during the geek night session at the SDC, the Dutch, inspired by Richard Campbell called me on my SMO Backup and Restore GUI that had a progress meter. They thought I was hacking it, not that I was actually providing a true representation of the progress made by the status of the backup. Here is the progress meter in action, as the database backup makes progress we update the progress meter:

 

 

 

To do a backup programmatically you can to use SMO (see yesterday). Begin by setting the variables to get started.

 

Server svr = new Server();//assuming local server

Backup bkp = new Backup();

 

Cursor = Cursors.WaitCursor;

 

Then you have to set the device to backup to and what database to backup. Notice in the comments the code to the progress meter

 

try

{

      string strFileName = txtFileName.Text.ToString();

      string strDatabaseName = txtDatabase.Text.ToString();

                       

      bkp.Action = BackupActionType.Database;

      bkp.Database = strDatabaseName;

 

      //set the device: File, Tape, etc

      bkp.Devices.AddDevice(strFileName, DeviceType.File);

      //set this when you want to do Incremental

      bkp.Incremental = chkIncremental.Checked;

 

      //progress meter stuff

      progressBar1.Value = 0;

      progressBar1.Maximum = 100;

progressBar1.Value = 10;

 

      //this gives us the % complete by handling the event

      //provided by SMO on the percent complete, we will

      //update the progress meter in the event handler

                       

      //set the progress meter to 10% by default

bkp.PercentCompleteNotification = 10;

//call to the event handler to incriment the progress meter

bkp.PercentComplete += new PercentCompleteEventHandler(ProgressEventHandler);

     

//this does the backup

      bkp.SqlBackup(svr);

      //alert the user when it is all done

      MessageBox.Show("Database Backed Up To: " + strFileName, "SMO Demos");

 

                       

                       

      }

catch (SmoException exSMO)

      {

      MessageBox.Show(exSMO.ToString());

 

      }

catch (Exception ex)

      {

      MessageBox.Show(ex.ToString());

      }

 

finally

      {

      Cursor = Cursors.Default;

      progressBar1.Value = 0;

      }

 

 

Here is the ProgressEventHandler, notice that I made it generic enough that I can call it from both the backup and restore methods!

 

public void ProgressEventHandler(object sender, PercentCompleteEventArgs e)

      {

            //increase the progress bar up by the percent

            progressBar1.Value = e.Percent;

      }

 

posted on Tuesday, May 31, 2005 6:28:53 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Monday, May 30, 2005

The Software Developers Conference in the Netherlands has begun. Today I show how to prevent SQL Injection Attacks in ASP .NET as well as other cool tricks. Lots of RDs here and lots of happy attendees. As usual at SDC I will show something very new and cool. I will show off some of the new SQL Server Management Objects (SMO) in the keynote tonight. Ill give you a preview here.

The SMO object model is a logical continuation of the work done in SQL-DMO. SMO is feature-compatible with SQL-DMO, containing many of the same objects. . To achieve maximum data definition language (DDL) and administrative coverage for SQL Server 2005, SMO adds more than 150 new classes. The primary advantages of SMO are in its performance and scalability. SMO has a cached object model, which allows you to change several properties of an object before effecting the changes to SQL Server. As a result, SMO makes fewer round trips to the server, and makes its objects more flexible. SMO also has optimized instantiation, meaning that you can partially or fully instantiate objects. You can load many objects quickly by not instantiating all the properties of the objects. To get started you have to set a reference to it and pull in the namespace:

using Microsoft.SqlServer.Management.Smo;

Now I will show you how to programitically do a database restore. You start with getting the SMO objects: Server and Restore.

Server svr = new Server();

Restore res = new Restore();

Now take a look at how easy you can do a restore, just a few lines of code:

res.Database = “AdventureWorks“;

res.Action = RestoreActionType.Database;

res.Devices.AddDevice(“c:\mybackup.bak“, DeviceType.File);

res.ReplaceDatabase = true;

res.SqlRestore(svr);

There is a lot more that you can do with SMO, but this shows you how easy it is to manage your server from code. A very cool thing to do it put some of the server monitor stuff into an ASP .NET page for viewing your server stats from a remote location.

More on SMO to come...

 

posted on Monday, May 30, 2005 7:32:15 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Saturday, May 28, 2005

That is the title of Richard and my Monday night keynote at the Software Developers Confrence in the Netherlands. Well I think it is now: “Estaban, splain dis Jukon to me!“ where we talk all about SQL Server 2005 in bad Spanish accents. To tell you the truth we have no idea what we are going to talk about, we do know that it will  be fun, contain beer and technical content. Confrences in the Netherlands rock.

Off to the Red Lights...

posted on Saturday, May 28, 2005 7:33:16 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Friday, April 29, 2005
DBA304  Advanced Querying Techniques, Tips & Tricks Using Transact-SQL
Speaker(s): Richard Campbell, Stephen Forte
Session Type(s): Breakout
Track(s): Database Administration
Day/Time: Monday, June 6 3:15 PM - 4:30 PM Room: S 310 A
Take your querying to the next level! This session gets away from the fundamentals of SQL queries and into the hard stuff. See two experts in SQL Server compare and contrast querying techniques between SQL Server 2000 and SQL Server 2005. This session has a series of real world examples to show how creative SQL queries can generate solutions in record time. Some techniques you'll learn include how to do crosstab queries that take seconds to execute instead of hours, exploiting sub-queries and taking advantage of self-joining. Along the way, get some insight into how SQL servers work, as well as how SQL Server 2005 is going to make advanced querying even easier.
 
posted on Friday, April 29, 2005 8:13:37 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [1] Trackback
# Friday, April 15, 2005

Back from India, boy was it hot. Sanjay (RD Bombay) and I had a blast. Off to speak at the North Africa Developers Conference in Algeria this weekend. Seven RDs (including the ones who speak French ) will be there in full force. I plan on making Clemens see the light while I am there. I am doing these 4 sessions:

Writing Secure Code for ASP .NET

Data Controls and Advanced Cache Techniques with ASP .NET

Ranking and Windowing Functions in SQL Server 2005

Using XQuery to Query and Manipulate Data in SQL Server 2005

And: The New York Mets and the New York Yankees have the exact same record. I am enjoying it while it lasts. It seems that the Mets are 1-5 when I am in the country and 3-0 when I am not.

posted on Friday, April 15, 2005 10:47:43 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, April 05, 2005

Microsoft has announced the launch of the SQL Server 2005 Virtual Hands on labs. They are pretty cool.

In these labs, you will get to experience many of the new features in SQL Server 2005 including CLR integration, XML support (my favorite) and deep business intelligence integration.

Registration link:
http://msdn.demoservers.com/login.aspx?group=sql2005

posted on Tuesday, April 05, 2005 10:39:11 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, March 29, 2005

 Using Ranking and Windowing Functions in SQL Server 2005. See it here.

posted on Tuesday, March 29, 2005 11:45:56 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Monday, February 14, 2005

I know this. (Ask me if I care.) Anyway, all of those folks who dislike me usually say "is there a way to shut him up." Well there is a way!!!! Have me train for a marathon. Ran 18 miles on Saturday and then a 9.3 mile race on Sunday inside the “gates“ in Central Park. I was very quiet all weekend. (Marathon in 12 days)

 

Another thing people thought was impossible was referencing a RANK()ing function in a WHERE clause in SQL Server 2005 or using an aggregate with the Ranking and Windowing functions. A simple solution that I did not see in any of the docs, use a Common Table Expression. Take this simple aggregate, rounding to the nearest 100th the sales figure from AdventureWorks;

 

Select CustomerID,round(convert(int, sum(totaldue))/100,8) *100 as totalamt

From Sales.SalesOrderHeader

Group by CustomerID

 

Gives you results like:

 

CustomerID  totalamt

----------- -----------

22814       0

11407       0

28387       600

15675       7900

18546       0

 

(and so on)

 

What if you want to rank them? Easy, make the aggregate a CTE and rank over the new field:

--rank by totaldue, summed and rounded (nearest 100)

--need a CTE to do a sum & rounding

--so this example will have a

--customerID summed with all of

--their orders

With CustomerSum

As

(

Select CustomerID,round(convert(int, sum(totaldue))/100,8) *100 as totalamt

From Sales.SalesOrderHeader

Group by CustomerID

)

Select *,

            Rank() Over (Order By totalamt Desc) as Rank

From CustomerSum

 

Results:

CustomerID  totalamt    Rank

----------- ----------- --------------------

678         1179800     1

697         1179400     2

170         1134700     3

328         1084400     4

514         1074100     5

(and so on)

 

Ditto if you want to filter a query by the results of a ranking function. Just move the ranking function up to the CTE:

 

--use a common table expression if you want

--to filter by one of the rows that contain a

--ranking function since ranking functions

--are not allowed in where or having clauses

With NumberRows

As

(

Select SalesOrderID, CustomerID,

            Row_Number() Over (Order By SalesOrderID) as RowNumber

From Sales.SalesOrderHeader

)

 

Select * from NumberRows

where RowNumber between 100 and 200

Order By SalesOrderID

 

Resutls:

SalesOrderID CustomerID  RowNumber

------------ ----------- --------------------

43758        27646       100

43759        13257       101

43760        16352       102

43761        16493       103

43762        27578       104

(and so on)

 

 

 

 

posted on Monday, February 14, 2005 5:49:05 PM (Eastern Standard Time, UTC-05:00)  #    Comments [4] Trackback
# Monday, January 17, 2005

Yukon has enhanced the SQL Server 2000 FOR XML function. If you want to create element based XML (yay!), FOR XML PATH allows you to specify column aliases that contain valid XPath expressions that will shape your XML output.

--XML FOR PATH

Select

ContactID as [@Contact_ID],

FirstName as [ContactName/First],

LastName as [ContactName/Last],

Phone as [ContactPhone/Phone1]

From Person.Contact For XML PATH

Produces output like:

<Customer>

  <CustomerID>1</CustomerID>

  <OrderDetail>

    <SalesOrderID>43860</SalesOrderID>

    <OrderDate>2001-08-01T00:00:00</OrderDate>

  </OrderDetail>

  <OrderDetail>

    <SalesOrderID>44501</SalesOrderID>

    <OrderDate>2001-11-01T00:00:00</OrderDate>

  </OrderDetail>

</Customer>

<Customer>

  <CustomerID>2</CustomerID>

  <OrderDetail>

    <SalesOrderID>46976</SalesOrderID>

    <OrderDate>2002-08-01T00:00:00</OrderDate>

  </OrderDetail>

  <OrderDetail>

    <SalesOrderID>49054</SalesOrderID>

    <OrderDate>2003-02-01T00:00:00</OrderDate>

  </OrderDetail>

  <OrderDetail>

</Customer>

If you are familiar and comfortable with XPath, there are some additional features to XML PATH you may like. You can use the following XPath node test functions to further control the shape of your XML output:

·        node()

·        text()

·        data()

·        comment()

·        processing-instruction()

 

posted on Monday, January 17, 2005 1:31:13 PM (Eastern Standard Time, UTC-05:00)  #    Comments [6] Trackback
# Tuesday, January 11, 2005

The XML Data Type can, just like other data types, conform to Nullability, Defaults and Constraints. If you want to make the field required (NOT NULL) and provide a default value on your XML column, just specify like you would any other column.

CREATE TABLE OrdersXML

   (OrderDocID INT PRIMARY KEY,

   xOrders XML NOT NULL Default '</Orders>'

This insert will work because it is relying on the default:

Insert Into OrdersXML (OrderDocID, xOrders) Values (2, DEFAULT)

The following insert will work, even if we add non <Orders> in our table because we have not declared a constraint on the column:

Insert Into OrdersXML (OrderDocID, xOrders) Values (3, 'steve')

SQL Server 2005 gives you the opportunity to add a constraint. An interesting feature of an XML column is that the CHECK constraint is based on the XML Infoset model using the methods of the XML column. You can use the exist method, which is specific to the XML type, as part of our CHECK constraint to force a particular type of XML element to exist.  In the following example we will drop and recreate the table from scratch (you can use ALTER TABLE as well) adding a default and CHECK CONSTRAINT.

Drop Table OrdersXML

--create the table with the XML Datatype

--using a default value

--also has the check constraints

CREATE TABLE OrdersXML

   (OrderDocID INT PRIMARY KEY,

   xOrders XML NOT NULL Default '<Orders>'

 CONSTRAINT xml_orderconstraint

  CHECK(xOrders.exist('/Orders')=1))

 

Now this example from before will fail.

Insert Into OrdersXML (OrderDocID, xOrders) Values (3, 'steve')

When you attempt to insert data that will violate the xml_orderconstraint you get the following error message from SQL Server:

Msg 547, Level 16, State 0, Line 1

INSERT statement conflicted with CHECK constraint

'xml_orderconstraint'. The conflict occurred in database

'AdventureWorks', table 'OrdersXML', column 'xOrders'.

The statement has been terminated.

 

posted on Tuesday, January 11, 2005 10:25:06 AM (Eastern Standard Time, UTC-05:00)  #    Comments [23] Trackback
# Wednesday, December 22, 2004

SQL Server 2005 introduces an enhancement to the OPENROWSET function, the new BULK rowset OLE DB provider. Via OPENROWSET, it lets you access data in files in a relational fashion. In the past, you used BULK INSERT to pull data from the file into a temp table and then ran your relational operations, but with SQL Server 2005, you can use the data contained in the file as part of your SQL statement rowset results. Keep in mind that you must specify a format file, which is the same format file you use with bcp.exe or the BULK INSERT statement. The following code shows how to access a file named c:\bulk.txt using the format file c:\bulk.fmt:

SELECT customerid, customername, totalorders

FROM OPENROWSET(BULK 'c:\bulk.txt',

       FORMATFILE = 'c:\bulk.fmt') AS (customerid, customername, totalorders)

posted on Wednesday, December 22, 2004 8:27:10 PM (Eastern Standard Time, UTC-05:00)  #    Comments [16] Trackback
# Tuesday, December 21, 2004

SQL Server 2005 adds a new feature called Common Table Expressions (CTE). The true power of CTEs emerges when you use them recursively to perform hierarchical queries on tree structured data. In fact, besides SQL-92 compliance, this was the main reason Microsoft built CTEs. A recursive CTE is constructed from a minimum of two queries, the first, or anchor member (AM), is a nonrecursive query, and the second, or recursive member (RM), is the recursive query. Within your CTE’s parentheses (after the AS clause), you define queries that are either independent or refer back to the same CTE. The AM and RM are separated by a UNION ALL statement. Anchor members and are invoked only once and are invoked repeatedly until the query returns no rows. Multiple AMs can be appended to each other using either a UNION or a UNION ALL operator, depending on whether you want to eliminate duplicates. (You must append recursive members using a UNION ALL operator.)  Here is the syntax:

With SimpleRecursive( field names)

As

(

     <Select Statement for the Anchor Member>

 

     Union All

    

     <Select Statement for the Recursive Member>

)

 

Select * From SimpleRecursive

 

To demonstrate this feature, I will create an example here. We create a table with employees and a self referencing  field back to Employee_ID called ReportsTo (I call this a Domestic Key in lue of a Foreign Key). We are going to write a query that returns all the employees who report to Stephen (Employee_ID=2) and all the employees who report to Stephen’s subordinates: 

--create a table with tree data

--Reportsto is a "domestic key" back to Employee_id

create table Employee_Tree (Employee_NM nvarchar(50), Employee_ID int Primary Key, ReportsTo int)

--insert some data, build a reporting tree

insert into employee_tree values('Richard', 1, 1)

insert into employee_tree values('Stephen', 2, 1)

insert into employee_tree values('Clemens', 3, 2)

insert into employee_tree values('Malek', 4, 2)

insert into employee_tree values('Goksin', 5, 4)

insert into employee_tree values('Kimberly', 6, 1)

insert into employee_tree values('Ramesh', 7, 5)

 

Our table looks like this:

Employee_NM

Employee_ID

ReportsTo

Richard

1

null

Stephen

2

1

Clemens

3

2

Malek

4

2

Goksin

5

4

Kimberly

6

1

Ramesh

7

5

 

Now the recursive query to determine all the employees who will report to Stephen:

 

--Recursive Query

WITH SimpleRecurvice(Employee_NM, Employee_ID, ReportsTO)

            AS

(SELECT Employee_NM, Employee_ID, ReportsTO

  FROM Employee_Tree WHERE Employee_ID = 2

UNION ALL

SELECT p.Employee_NM, p.Employee_ID, p.ReportsTO

 FROM Employee_Tree  P  INNER JOIN

 SimpleRecurvice A ON A.Employee_ID = P.ReportsTO

)

SELECT Employee_NM FROM SimpleRecurvice

 

Employee_NM

--------------------------------------------------

Stephen

Clemens

Malek

Goksin

Ramesh

 

(5 row(s) affected)

This recursion starts where Employee_ID= 2 (the ANCHOR MEMBER or the first SELECT). It picks up that record and then, via the RECURSIVE MEMBER (the SELECT after the UNION ALL), picks up all of the records that report to Stephen and that record’s children (Goksin reports to Malek and Malek reports to Stephen). Each subsequent recursion tries to find more children that have as parents the employees found by the previous recursion. Eventually the recursion returns no results and that is what causes the recursion to stop (the reason why Kimberly is not returned).

posted on Tuesday, December 21, 2004 8:51:18 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1] Trackback
# Monday, December 13, 2004

Come one, come all, this Thursday at the NYC .NET Developers Group, Andrew, Bill and I will give a real in-depth sneak peak on SQL Server 2005 including a tools overview, TSQL, XQuery,XML, Service Broker, OLAP and the Unified Demensional Model. Too bad Clemens will not be there, he is arriving on Friday and spending the weekend at my place for some year end partying. Maybe we will rewrite dasBlog to use SQL Server 2005. (Most likely we will just drink a lot and fall down.)

Did another race this weekend, 10K in Central Park. 8:39 pace, no way I can hold that up for an entire marathon.


 
Last Name


 
First Name


Sex/
Age


 
Bib


 
Team


 
City


 
State


Net
Time


Pace/
Mile

GURBISZ

KATHLEEN

F27

5484

 

NY

NY

51:48

8:21

FORTE

STEPHEN

M32

5448

 

NEW YORK

NY

53:40

8:39

posted on Monday, December 13, 2004 10:41:30 AM (Eastern Standard Time, UTC-05:00)  #    Comments [9] Trackback
# Friday, December 10, 2004

 

The ranking functions can also combine with windowing functions. A windowing function will divide a resultset into equal partitions based on the values of your PARTITION BY statement in conjunction with the OVER clause in your ranking function. It is like applying a GROUP BY to your ranking function-you get a separate ranking for each partition. The example below uses ROW_NUMBER with PARTITION BY to count the number of orders by order date by salesperson. We do this with a PARTITION BY SalesPersonID OVER OrderDate. This can be used with any of the four ranking functions.

Select SalesOrderID, SalesPersonID, OrderDate,

Row_NUMBER() Over (Partition By SalesPersonID Order By OrderDate) as OrderRank

From Sales.SalesOrderHeader

Where SalesPersonID is not null

 

SalesOrderID SalesPersonID OrderDate               OrderRank

------------ ------------- ----------------------- ---

43659        279           2001-07-01 00:00:00.000 1

43660        279           2001-07-01 00:00:00.000 2

43681        279           2001-07-01 00:00:00.000 3

43684        279           2001-07-01 00:00:00.000 4

43685        279           2001-07-01 00:00:00.000 5

43694        279           2001-07-01 00:00:00.000 6

43695        279           2001-07-01 00:00:00.000 7

43696        279           2001-07-01 00:00:00.000 8

43845        279           2001-08-01 00:00:00.000 9

43861        279           2001-08-01 00:00:00.000 10

. . . More

48079        287           2002-11-01 00:00:00.000 1

48064        287           2002-11-01 00:00:00.000 2

48057        287           2002-11-01 00:00:00.000 3

47998        287           2002-11-01 00:00:00.000 4

48001        287           2002-11-01 00:00:00.000 5

48014        287           2002-11-01 00:00:00.000 6

47982        287           2002-11-01 00:00:00.000 7

47992        287           2002-11-01 00:00:00.000 8

48390        287           2002-12-01 00:00:00.000 9

48308        287           2002-12-01 00:00:00.000 10

. . . More

 

PARTITION BY supports other SQL Server aggregate functions including MIN and MAX.

 

posted on Friday, December 10, 2004 8:47:01 AM (Eastern Standard Time, UTC-05:00)  #    Comments [9] Trackback
# Thursday, December 09, 2004

 

            DENSE_RANK works exactly like RANK() but will remove the skipping of numbers in the tie.  

 

Select SalesOrderID, CustomerID,

            DENSE_RANK() Over (Order By CustomerID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID

 

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43860        1           1

44501        1           1

45283        1           1

46042        1           1

46976        2           2

47997        2           2

49054        2           2

50216        2           2

51728        2           2

57044        2           2

63198        2           2

69488        2           2

44124        3           3

. . . More

 

NTile(n) will evenly divide all the results into approximately even pieces and assign each piece the same number in the resultset. A perfect example is the percent of 100 (like for an examination in University)  or a percentile of runners in a road race.

 

Select SalesOrderID, CustomerID,

            NTILE(10000) Over (Order By CustomerID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID

 

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43860        1           1

44501        1           1

45283        1           1

46042        1           1

46976        2           2

47997        2           2

49054        2           2

50216        2           2

51728        2           3

57044        2           3

63198        2           3

69488        2           3

44124        3           4

. . . More

 

One last example will bring these all together in one SQL Statement and show the difference between all four ranking functions.

 

--Ranking All

use adventureworks

Select SalesOrderID as OrderID, CustomerID,

            Row_Number() Over (Order By CustomerID) as RowNum,

            RANK() Over (Order By CustomerID) as Rank,

            DENSE_RANK() Over (Order By CustomerID) as DRank,

            NTILE(10000) Over (Order By CustomerID) as NTile

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID

 

 

OrderID     CustomerID  RowNum               Rank                 DRank             NTile  

----------- ----------- -------------------- -------------------- -------------------- --------------------

43860       1           1                    1                    1                    1

44501       1           2                    1                    1                    1

45283       1           3                    1                    1                    1

46042       1           4                    1                    1                    1

46976       2           5                    5                    2                    2

47997       2           6                    5                    2                    2

49054       2           7                    5                    2                    2

50216       2           8                    5                    2                    2

51728       2           9                    5                    2                    3

57044       2           10                   5                    2                    3

63198       2           11                   5                    2                    3

69488       2           12                   5                    2                    3

44124       3           13                   13                   3                    4

44791       3           14                   13                   3                    4

 

. . . More

posted on Thursday, December 09, 2004 1:34:28 PM (Eastern Standard Time, UTC-05:00)  #    Comments [2] Trackback
# Wednesday, December 08, 2004

            RANK() works a lot like ROW_NUMBER() except that it will not break ties, you will not get a unique value for ties.

Select SalesOrderID, CustomerID,

            RANK() Over (Order By CustomerID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID

 

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43860        1           1

44501        1           1

45283        1           1

46042        1           1

46976        2           5

47997        2           5

49054        2           5

50216        2           5

51728        2           5

57044        2           5

63198        2           5

69488        2           5

44124        3           13

. . . More

Dense_Rank tomorrow...

posted on Wednesday, December 08, 2004 12:39:51 AM (Eastern Standard Time, UTC-05:00)  #    Comments [12] Trackback
# Tuesday, December 07, 2004

SQL Server 2005 adds the functionality of a Ranking expression that can be added to your resultset that is based on a ranking algorithm being applied to a column that you specify. This will come in handy in .NET applications for paging and sorting in a grid as well as many other scenarios.

 The most basic new ranking function is ROW_NUMBER(). ROW_NUMBER() returns a column as an expression that contains the row’s number in the result set. This is only a number used in the context of the resultset, if the result changes, the ROW_NUMBER() will change. The ROW_NUMBER() expression takes an ORDER BY statement with the column you want to use for the row count with an OVER operator as shown here:

Select SalesOrderID, CustomerID,

            Row_Number() Over (Order By SalesOrderID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By SalesOrderID

 

Results are shown here:

 

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43659        676         1

43660        117         2

43661        442         3

43662        227         4

43663        510         5

43664        397         6

43665        146         7

43666        511         8

43667        646         9

...More

Alternatively if you have an ORDER BY clause in your result set different than your ORDER BY in your ROW_NUMBER() expression

--Row_Number using a unique value, different order by

Select SalesOrderID, CustomerID,

            Row_Number() Over (Order By SalesOrderID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID --Different ORDER BY than in Row_NUMBER

 

The result is shown here:

 

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43860        1           202

44501        1           843

45283        1           1625

46042        1           2384

46976        2           3318

47997        2           4339

49054        2           5396

...More

If you choose the ROW_NUMBER() function to run against a non-unique column, it will break the tie and still produce a running count so no rows will have the same number. For example, CUSTOMERID can repeat in this example and there will be several ties, SQL Server will just produce a monotonically increasing number, which means nothing other than the number in the result set as shown here:

Select SalesOrderID, CustomerID,

            Row_Number() Over (Order By CustomerID) as RunningCount

From Sales.SalesOrderHeader

Where SalesOrderID>10000

Order By CustomerID

The result are shown here:

SalesOrderID CustomerID  RunningCount

------------ ----------- --------------------

43860        1           1

44501        1           2

45283        1           3

46042        1           4

46976        2           5

47997        2           6

49054        2           7

50216        2           8

51728        2           9

57044        2           10

63198        2           11

69488        2           12

44124        3           13

. . . More

Tomorrow RANK()...

posted on Tuesday, December 07, 2004 4:19:45 PM (Eastern Standard Time, UTC-05:00)  #    Comments [10] Trackback
# Thursday, August 26, 2004

Microsoft has gathered 300 of its top internal and 3rd party (RDs) Evangelists in one place and have given us a brain dump on their 7 major categories over the next few years. I am sitting in on the Yukon/SQL Server category.

 

Many RDs are here and we of course are causing lots of mischief. Lots of info overload here today…

 

posted on Thursday, August 26, 2004 4:47:01 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [23] Trackback
# Friday, July 09, 2004

My buddy Clemens Vasters wrote a few days ago that you don't need inheritance or full data encapsulation for business objects. Some other dude wrote a passionate response to him saying that he was all wrong.

You see out here in the real world, in the coding trenches, the “full data encapsulation” in your business objects just doesn’t hold any water. Who reuses a customer object between applications? At my company we don’t even have a customer object! But we do have customers. :) The data logic all should be handled by the database. Way too much effort, code and BS to deal with when you do it otherwise.

Academic conference speakers who don't write code in the real world always argue to over complex-ify the application. I'm sorry, anyone paid to write code in a production environment has no desire to throw all this logic into the objects and have lots of levels of inheritance so they are “reusing code”. Sure sure we all read the Gang of Four's book and spend a year using all the design patterns but then come back after that year only using the few that make sense to our application. Academic conference speakers who don't write production code should take a year off, write some production code and then come back and see what they have to say.

 Lastly at the PDC Clemens said publicly that rows and columns suck and all you need in a database is a PK field and an XML field. Looks like he changed his tune, at least on storage, we all can learn buddy.

What you end up with are elements and attributes (infoset) for the data that flows across, services that deal with the data that flows, and rows and columns that efficiently store data and let you retrieve it flexibly and quickly.”

 

 

posted on Friday, July 09, 2004 12:02:25 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [10] Trackback
# Friday, May 14, 2004

Another successful CTTM (or now called SDC) in the Netherlands. What a great show and 7 RDs where speaking. Richard and I did a dry run of From Interoperability to Migration: SQL Server and Linux Databases Working Together and it went very well. Hopefully we can get Oracle to work and play well on the VPC image before our TechED session in 13 days. This session is quite cool, hope to see you all there.

DATC02  From Interoperability to Migration: SQL Server and Linux Databases Working Together
Monday, May 24 1:30 PM- 2:45 PM, Cabana 08
Speaker(s): Richard Campbell, Stephen Forte
Track(s): Data Management
"They" say it can be done, now see it in action! This session demonstrates how SQL Server can acts as the gateway to interoperability with Linux databases such as DB2 and Oracle! You'll see a fully functioning Linux-based web application using Red Hat Linux, Apache, PHP and Oracle sharing data with an identically implemented ASP.NET application using SQL Server. This session shows not only how to interoperate, but to use these interoperate capabilities to facilitate a seamless migration from the Linux based system to SQL Server and Windows . This is how migration was meant to be!


 

posted on Friday, May 14, 2004 2:30:52 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [18] Trackback
# Monday, May 10, 2004

Today I gave software developers in the Netherlands a preview of the XML Features of SQL Server 2005 or better know by its code name “Yukon” (the US National Park, not the Canadian frozen land). I show off the very cool and advanced XML to Relational mapping. (Clemens is a big fan). I then talk all about the deep XML integration via the XML Data type. This is a native SQL Server type so we have the option to store XML in its native format. For example you can do just about anything a standard datatype can do, excluding Foreign Keys and PKs. Here is the XML Data Type at work in a variable:

 

DECLARE @xmlData AS XML

SET @xmlData = (SELECT * From Customers where CustomerID='ALFKI'  FOR XML AUTO, TYPE)

SELECT @xmlData

 

The results look like this:

<Customers CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" ContactName="Maria Anders" ContactTitle="Sales Representative" Address="Obere Str. 57" City="Berlin" PostalCode="12209" Country="Germany" Phone="030-0074321" Fax="030-0076545"/>

 

(1 row(s) affected)

 

Then it is fun to create tables with XML fields in them and do things like create an index on them, etc.

--create the table with the XML Datatype

CREATE TABLE OrdersXML

     (OrderDocID INT PRIMARY KEY, xOrders XML NOT NULL)

 

Then you can insert some XML into the table, you can do this manually, from an XML Raw statement or via a XML file.

--Get some XML Data, can also use a file

DECLARE @xmlData AS XML

SET @xmlData = (SELECT * From Orders FOR XML AUTO, TYPE)

 

--insert into the table

Insert Into OrdersXML (OrderDocID, xOrders) Values (1, @xmlData)

 

Then let’s add an index on that column:

CREATE XML INDEX idx_1 ON OrdersXML (xOrders)

 

It is more fun to create an XML Schema as data validation for data validation. This is my favorite thing about the XML integration, it gives you the best of both worlds. This shows off the deep integration of the XML model and the relational model, the XML Schema works just like a constraint, allowing us to validate the XML that we put into our XML data type fields in the database.

 

--xml schema as validation

use northwind

CREATE XMLSCHEMA

'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

            xmlns:tns="http://corzen.com/customer"

            targetNamespace="http://corzen.com/customer" >

   <xsd:simpleType name="personAge" >

     <xsd:restriction base="xsd:float" >

       <xsd:maxInclusive value="65" />

       <xsd:minExclusive value="18" />

     </xsd:restriction>

   </xsd:simpleType>

   <xsd:element name="age" type="tns:personAge" />

</xsd:schema>'

GO

 

-- create table with xml column and use schema as a 'constraint'

CREATE TABLE xml_schematest (

   the_id INTEGER,

   xml_col XML('http://corzen.com/customer')

)

GO

 

-- works

INSERT INTO xml_schematest VALUES(1,

 '<p:age xmlns:p="http://corzen.com/customer">55</p:age>')

GO

 

-- fails, age > 65

INSERT INTO xml_schematest VALUES(2,

 '<p:age xmlns:p="http://corzen.com/customer">100</p:age>')

GO

 

-- fails, column is validated on update also

UPDATE xml_schematest

  SET xml_col = '<p:age xmlns:p="http://corzen.com/customer">100</p:age>'

  WHERE the_id = 1

 

DROP TABLE xml_schematest

GO

 

DROP XMLSCHEMA NAMESPACE 'http://corzen.com/customer'

GO

 

 

But wait, there’s more! I show off a little bit of XQuery. What is cool is that XQuery is integrated fully with TSQL.

 

--XQuery

use adventureworks

--take a look at an XML column

select * from ProductModel where productmodelid=19

 

--query out a piece of data from the xml field

SELECT Name, CatalogDescription::query('

namespace PD="http://www.adventure-works.com/schemas/products/description"

<Product ProductModelID="{ /PD:ProductDescription[1]/@ProductModelID }" />

') as Result

 

FROM ProductModel

 

where Name like 'Mo%' and CatalogDescription::exist('

namespace PD="http://www.adventure-works.com/schemas/products/description"

namespace wm="http://www.adventure-works.com/schemas/products/WarrAndMain"

     /PD:ProductDescription/PD:Features/wm:Warranty ') = 1

 

 

posted on Monday, May 10, 2004 3:45:37 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Friday, March 26, 2004

Got up early this morning and did an 8 mile run over the Golden Gate bridge and back to the center of town with tri pal Andy Catlin. Put me in a good mood to see the VSLive keynote on Yukon Business Intelligence by Microsoft PM Bill Baker. Besides enhancements to Reporting Services and DTS (DTS will be renamed), Yukon will help bring BI to the masses with UDM:

 

There will be “Visual Studio Controls for Reporting Services“ in Visual Studio 2005 where you can embed reports into ASP pages and Windows Forms much easier. There is navigation, ad hoc query and other cool controls to play with.

 

DTS is completely rewritten. Total event driven and based on the CLR.

 

The Unified Dimension Model is new and great. The UDM basically combines OLAP and the relational worlds into one programming model that will truly bring OLAP to you and me.  

 

Can't wait. :)

 

I give three talks today: SQL Server Notification Services, XQuery in Yukon and ADO.NET Best Practices. I am a busy kid today.

 

Trivia: Yukon is named after the national park in Alaska, not the Canadian province (or territory, who can keep track!). J

posted on Friday, March 26, 2004 1:55:42 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [12] Trackback
# Tuesday, March 23, 2004

On Saturday I will be doing a post-con with Andrew Brust on .NET Data Access at CSLive in San Francisco. Hope to see some of you there!

.NET Data Access Soup to Nuts
Andrew Brust and Stephen Forte
Saturday, March 27


In this workshop, we'll cover the basics and fine points of ADO .NET, seen from both Windows Forms and ASP.NET vantage points. After a brief introduction, we'll cover connected and disconnected data access, ADO .NET data binding, strongly-typed DataSets, and the XML features of ADO.NET. We'll then take a close look at using ADO .NET and SQL Server together, including development of stored procedures, triggers, and functions; advanced T-SQL techniques; and working with SQL Server and COM+ transactions. Attendees of this workshop will also get a high-level look at the forthcoming features of "Yukon," the watershed next release of SQL Server.
 

posted on Tuesday, March 23, 2004 1:11:31 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [11] Trackback
# Thursday, December 11, 2003

An Access Killer?

 

The reporting engine in Access is almost 10 years old with major modification. It is that good. Now something better has come around. SQL Server Reporting Services, due out any week now, could be an Access killer. After demoing it to the London Database Forum last night, we all realized that it is pretty cool. An Access Report Killer, yes it is (but Access will still be better for local disconnected reporting). Attention Access developers, SSRS is based off the old Access report designed, so you have a leg up, so go learn it. If you don’t know about SQL Server Reporting Services, go download it from Microsoft now!

 

And Microsoft: Its about time!

posted on Thursday, December 11, 2003 9:29:01 AM (Eastern Standard Time, UTC-05:00)  #    Comments [11] Trackback
# Thursday, December 04, 2003

CLR v TSQL Stored Procedures

 

I am here in Redmond chatting with a few fellow .NET gurus about SQL Server and Yukon’s ability to create stored procedures with any CLR language. We have decided, TSQL for everything, except when you 1. need something from the .NET framework like encryption or RegEx and 2. you are doing something that is a very CPU intensive operation.

 

So why bother with the CLR inside of Yukon? I think that the convergence is a good thing, but TSQL will be around for a long time (as it should be). It is not an every day occurrence that you will use the CLR (think replacement for extended stored procedures), but when you need it, you will be very thankful.

 

So I am going to make a prediction that a lot of people will try to create their whole app around CLR stored procedures when they fire up Yukon for the first time. Please don’t fall in this trap!

posted on Thursday, December 04, 2003 10:18:30 PM (Eastern Standard Time, UTC-05:00)  #    Comments [2] Trackback
# Wednesday, November 19, 2003

Rob Howard Started a War

On Stored Procedures v InLine SQL (heavy business logic in middle tier) in his blog on Monday..

I almost always use Stored Procedures. There is almost never a reason to use Dynamic SQL, but I am sure that there are times. My highlights:

  1. SPs are more secure. Most DBAs do not allow select permissions on any base tables-for obvious reasons. Using SPs gives you a nice abstraction layer. Don’t want someone accidentally deleting everything from a table, don’t create a SP to allow it.
  2. SPs can save your butt. What if a developer creates a dynamic SQL statement that looks ok but will not use an index, etc. And then in a few months your dynamic SQL is super slow since they are restricting on a billion row table without an index. Your procedure will never allow that if you say so.
  3. SPs are way easier to maintain in your code.

So what does Yukon with the ability to create SPs with C# or VB .NET do to all of this? Nothing, Microsoft still recomends TSQL for your data access/CRUD code.

posted on Wednesday, November 19, 2003 9:09:53 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Tuesday, November 11, 2003
SubQueries Rock in a Where Clause

I have been using subqueries in Where clauses since I learned TSQL. This SQL below gives me all of the data in one table that is not in another table. Clemons, try doing that with XML! <g>

select location_id  from rpt_rcep11 where location_ID not in (select location_ID from dbo.tblExtrnl_Location_Mapping group by location_ID) group by location_id

posted on Tuesday, November 11, 2003 9:33:48 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Wednesday, October 29, 2003

Yukon XML Features

 

Don’t get me wrong, I like XML. Yukon has so many awesome XML features I can’t even absorb it all. I have been focusing on XQuery since beta 1, but there are many more things to work with.

 

For starters there is a native XML data type. You can also use Full Text Indexing on top of the XML datatype and use the full text query as a filter or a XQuery statement.

 

The coolest thing that I saw was the XML Schema validation of the XML Datatype, so if you try to insert data into the XML field and it violates the XSD, it will bomb.

 

Good stuff, stay tuned for more as we write the book….

posted on Wednesday, October 29, 2003 11:04:00 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, October 14, 2003

The Case for Case

 

Ok I have gotten a lot of press because I love the Rozenshtein Method for creating cross tab queries. I have even traveled the world and spoke about it all summer at TechEds around the world.

 

I have gotten flack, especially from annoying Australians, about how “complex” the Rozenshtein Method is. So today I found myself writing a crosstab and used the Case Method.

 

It was an ad-hoc query that I have to run a few times over the next few weeks. It never has to run in another database like Access or Oracle. It was needed to be quick and dirty. I decided NOT to use the Boolean aggregates and use a Case statement. The basic structure of a case statement is as follows:

 

CASE  FieldName WHEN ValueYouAreChecking THEN TrueExpression ELSE FalseExpression END

 

Pretty easy no? Here is a sample:

 

SELECT tlkpWeekEnding.WeekEnding_DT as Weekending, DiceCat.DiceCat_NM as 'Job Category', Sum(tblData.TotalListings) AS 'Total Listings', SUM(CASE  Service_ID WHEN 1 THEN TotalListings ELSE 0 END) AS HotJobsTotal,

SUM(CASE  Service_ID WHEN 2 THEN TotalListings ELSE 0 END) AS MonsterTotal,

SUM(CASE Service_ID WHEN 3 THEN TotalListings ELSE 0 END) AS CareerBuilderTotal

FROM DiceCat INNER JOIN ((tlkpWeekEnding INNER JOIN (trelServiceURL INNER JOIN tblData ON trelServiceURL.URL_ID = tblData.URL_ID) ON tlkpWeekEnding.WeekEnding_ID = tblData.WeekEnding_ID) INNER JOIN DiceCatDetail ON trelServiceURL.JobCategory_ID = DiceCatDetail.CorzenJobCatID) ON DiceCat.DiceCat_ID = DiceCatDetail.DiceCat_ID

WHERE tlkpWeekEnding.WeekEnding_ID=75

GROUP BY tlkpWeekEnding.WeekEnding_DT, DiceCat.DiceCat_NM

Order by  DiceCat.DiceCat_NM

 

Here are the results, we move rows to columns:

View1

Weekending

Job Category

Total Listings

HotJobsTotal

MonsterTotal

CareerBuilderTotal

10/12/2003

Accounting/Auditing/Finance

37226

10021

9694

17511

10/12/2003

Banking/Mortgage

10657

2026

2886

5745

10/12/2003

Biotech/Pharmaceutical

7569

2290

2644

2635

10/12/2003

Engineering

20549

3513

5800

11236

10/12/2003

Healthcare

44080

4415

8028

31637

10/12/2003

Information Technology

34309

10637

8924

14748

10/12/2003

Insurance

8364

1900

2718

3746

10/12/2003

Legal

7962

1976

2498

3488

10/12/2003

Science

4636

0

1074

3562

10/12/2003

Telecommunications

3650

1005

1137

1508

 

 

So if this is so easy why oh why do I insist on using the Rozenshtein Method? Well, I don’t insist on using it all the time. I like it because it is super fast and make sense to me (I was into Math as a kid, sorry). I think that you should know both and use the one that you think is most effective for the job at hand.

 

Happy Crosstabbing!

posted on Tuesday, October 14, 2003 8:40:23 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, October 07, 2003

Why didn’t they do this the first time?

 

I love DataReaders. You know this; I am a DataReader snob. I feel like a superior coder when I use a DataReader. Maybe it is my hatred of the false hype behind WebServices and XML that make me shun the DataSet. Or just my rather strange way of living my life. Who knows. Like most things you love in life, nothing is perfect. I have been using .NET 1.1 for over a year now since my company was an early adopter of Visual Studio 2003 and have gotten use to the HasRows property. For example if you want to test for an empty DataReader after you open it you can use code like this. (This code is more useful when you are NOT looping, but opening a DataReader for a single record.)

 

             if (dr.HasRows())

             {

                 txt1.Text=dr["MyField"];

             }

             else

             {

                 txt1.Text="No Data!";

             }

 

But what I forgot is that for .NET 1.0 (Visual Studio 2002), there is no HasRows property. You have to call the Read method which will return False when there are no rows. Here is the code. (Once again if you are in a loop, you would use a while Read=True and be done with it.)

 

             if (dr.Read()==true)

             {

                 txt1.Text=dr["MyField"];

             }

             else

             {

                 txt1.Text="No Data!";

             }

 

 

So this works, but my ultimate question about HasRows is why didn’t they do this the first time?

 

posted on Tuesday, October 07, 2003 3:30:52 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [13] Trackback
# Tuesday, September 30, 2003

Debugging DTS Packages

 

You haven’t really experienced sheer hell until you have to debug a DTS package. SQL Server development is something completely different from the traditional 4GL languages like VB and C#, you have to think in rows and columns and joins. A lot of 4GL programmers are forced or simply want to get into writing T/SQL Stored Procedures and realize that it is a whole new ballgame. But one that is easy to master if you put the time in. So a lot of beginning SQL Server programmers who came over from the 4GL side ask me often at conferences, “How do I debug a DTS package?” The answer is “You don’t.” (Oh we are all so spoiled by setting a breakpoint in Visual Studio.)

 

Microsoft has publicly announced some of the new DTS features of Yukon at TechEd in Barcelona and will announce more at the PDC in LA later this month, so I won’t go there and besides Yukon will ship sometime over the rainbow. So let’s start thinking about SQL Server 2000.

 

First off, proper design of your package will only make debugging much easier. Use only Stored Procedures (with or without parameters) and if you have to use SQL dynamically utilize Views. This is because the more dependencies on “real” database objects, the easier it will be to track down your problem. Avoid ActiveX Scripts as much as you can-consider an Extended Stored Procedure that calls a DLL wrote yourself (or call the DLL with a CreateObject in your script if you must).

For the actual debugging itself my advice to you is to take everything in steps. You can run each DTS package’s step individually just by right clicking on it in the designed and selecting “Execute Step” from the pop-up menu.  That is the first part. Then you can deconstruct the step manually and run those pieces in Query Analyzer. (In theory you can debug your stored procedure in Visual Studio too.) From there it gets easier, small bits and pieces of your step may or may not be working, so start looking at your select statements in QA. Before you know it, you will be in DTS debugging heck instead of hell.

posted on Tuesday, September 30, 2003 1:18:06 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [10] Trackback
# Thursday, August 07, 2003

I have to admit that I am totally hooked on the The Rozenshtein Method. My buddy, Richard Campbell showed it to me a year or two ago and I have been hooked ever since. I recently demoed it at TechED in Dallas, a recent WebCast, VSLive in New York and will be showing it off at TechEd in Malaysia next week. I have gotten lots of email and positive feedback so I decided to blog it here.

 

Here is how it works. You need a crosstab query.  You have to move rows into columns. You also need ANSI 92 SQL that will run in any database.Well there are several ways to do this, but the most generic and one of the most powerful ways is called the Rozenshtein Method, which was developed by the Russian mathematician David Rozenshtein. This technique was taken from his book: Optimizing Transact-SQL : Advanced Programming Techniques.

 

First let’s look at the desired results. We want to take the orders data from Northwind and pivot the sales date (aggregated by month) as columns with the sum of the total sales in the row grouped by customer. It would look something like this:

 

CompanyName TotalAmount Jan Feb Mar…(etc)

Company1        100              25 33   10

Company2           467                 76 62    87

(etc)

 

The TSQL query to do this is, go ahead and run it in Northwind in SQL Server:

 

SELECT  CompanyName, SUM((UnitPrice*Quantity)) As TotalAmt,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-1)))) AS Jan,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-2)))) AS Feb,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-3)))) AS Mar,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-4)))) AS Apr,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-5)))) AS May,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-6)))) AS Jun,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-7)))) AS Jul,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-8)))) AS Aug,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-9)))) AS Sep,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-10)))) AS Oct,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-11)))) AS Nov,

  SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-12)))) AS Dec

FROM         Customers INNER JOIN

                      Orders ON Customers.CustomerID = Orders.CustomerID INNER JOIN

                      [Order Details] ON Orders.OrderID = [Order Details].OrderID

Group By Customers.CompanyName

 

So how does this work?

 

This method uses Boolean aggregates, so that each column has a numeric expression that resolves each row as a zero or one and that value (0 or 1) is multiplied by your numeric expression (Like TotalSales or (UnitPrice*Quantity). That is all there is to it, quite simple. But wait, there’s more to explain:

 

We want to create columns for each Month in our data. To find a month use DatePart. But we need to subtract the DatePart value (1-12) from the amount you’re looking for (1 for Jan, 2 for Feb, etc) as shown here for January:

DatePart(mm,OrderDate)-1

 

So that true = zero, false > 0 or < 0. For example if the month you were looking for was January and the DatePart was 1 and you subtract 1 from that value you get 0, which is true. If you are looking for March you would get -2 and that would be false.

 

Next you have to compute the sign of the expression and get the absolute value like so:

ABS(SIGN(DatePart(mm,OrderDate)-1)))

 

This will give us a positive value. Remember 0 is still true. Now subtract the value computed from 1 in order to get a 0 or 1 from the value of your expression (the Boolean aggregate). The code is:

(1-ABS(SIGN(DatePart(mm,OrderDate)-1))))

 

For example if you had March return 3 from the Datepart, 3-1=2 and 1-2 =-1. The absolute value is 1. This will always return 0 or 1. If your expression was zero, the value is now one. If was one, the value is zero.

 

Last step. Taking the SUM of the Boolean values will give you a count of the values that qualify. So you can find out how many sales you made in Jan, Feb, etc. So now multiply the value by the price and quantity, but remember its now one = true. Take a look here:

 

SUM((UnitPrice*Quantity)*(1-ABS(SIGN(DatePart(mm,OrderDate)-1)))) AS Jan

 

If its zero, nothing gets added, if its one, you get the value of the sale. The sum of the total expression is the total of sales for the month. If you have a DatePart that is evaluated to 0 then ((UnitPrice*Quantity)*0) is 0 and those results are ignored in the SUM. If you have a month that matches your expression resolves to 1 and ((UnitPrice*Quantity)*1) is the value of the sale.

 

How easy!

 

But wait, there’s more! Suppose you wanted two values combined? Compute each value down to zero or one separately. Now you can use AND by multiplying, OR by adding (and reduce to 1 or 0 using SIGN).

 

Ok, have fun!!!

posted on Thursday, August 07, 2003 8:38:37 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [7] Trackback
# Tuesday, August 05, 2003

Yukon is in LA...

Up to my ears in coding and writing my MSDN Magazine article on Yukon. While NDAs are still in effect, I can let you know that Microsoft has made public the notion of writing a stored procedure in a CLR .NET Language. After hacking around most of the afternoon, I am marking this day down where I got my first CLR Stored Procedure working. Stay tunded to my MSDN Magazine feature story on Yukon that will be on the stands at the PDC.

posted on Tuesday, August 05, 2003 3:56:02 PM (Eastern Daylight Time, UTC-04:00)  #    Comments [16] Trackback
# Thursday, July 31, 2003
VSLive Workshop
posted on Thursday, July 31, 2003 7:11:53 AM (Eastern Daylight Time, UTC-04:00)  #    Comments [23] Trackback