Sunday, February 19, 2012

how to choose a Primary Key

Hi All,

I have a dilemn:
On one side, I have a column C1 which could be a primary key because it is never null, the value is unique and identify the record. The problem is its a char type and its lenght can be close to 30.
Then, I've planned to add another column C2 of int type as PK. But then I need to add a unique constraint index on C1. Does it improve performance anyway?

ThanksOnce you've got a primary key, adding a Unique constraint doesn't help performance. You should still add the constraint, it just won't help your performance.

-PatP|||I beg to differ, because unique constraint implicitly creates a unique index, thus - possibility of improving performance where the field(-s) is/are involved.|||when all around you are losing their heads.....

to answer your question generally
SQL Server 2000 Requires the primary key to be:
Unique
Not Null
16 columns or less for a composite key.

i cant help you with your di-lemon because the pk is such a personal thing. realistically your pk is your own and shouldnt be fondled by any other so i will go out on a limb here and say that what you have is fine.

However, {Opinion} i am a big fan of the monotonic key. simple, to the point, and oh, so very integer...mmmmn i can see my key right now in a dimly lit room with nothing on but an identity of 1,1 and sumptuous nonclustered index fill-factored just right.

i need a smoke..|||i need a smoke..My first guess would be that you need a bit more than a smoke, but you'll probably need to visit another web site for that!

I beg to differ, because unique constraint implicitly creates a unique index, thus - possibility of improving performance where the field(-s) is/are involved.On a slightly more serious note, I see a declared Primary Key (PK) as being somewhere between very important and critical for a table. Very little relational algebra is possible without a PK.

An Alternate Key (AK) or the equivalent unique constraint is a different matter. The AK should be declared in order to allow the database engine to do its job and ensure uniqueness of the AK. However, there is a cost assosciated with the AK, in that CPU and I/O time and disk space need to be used to enforce it. The benefits of a declared (and enforced) AK are important, but they aren't nearly as important in my opinion as the PK is. I've been willing to forego AK definitions if the potential benefits didn't outweigh the time/disk needed to acheive them.

While rdjabarov is right, and the AK might be useful, there are times that it can do more harm than good. From the pure relational standpoint of managing the table, I see the AK as optional. Of course, if you need to ensure uniqueness or if you have queries that will use the index, and the cost of maintaining it is less than the expected benefit, then I'm in favor of it. In fact, I'll even say that I'm generally in favor of AK definitions, but that you still need to use some judgement instead of just jumping in and declaring them everywhere they could be used.

-PatP|||While ... the AK might be useful, there are times that it can do more harm than good.bumph and nonsense

let's do an informal poll

of all the different threads we've seen in database forums (and some of us visit more than this one), what's the most common question?

that's right, "how do i remove my duplicates?"

i would say that, on balance, you will do good for 99 people by insisting on the unique constraint on the alternate key, and harm for 1 person, although i am hard pressed to think of the circumstances where this might occur

besides, if you (not you, pat, the reader) are really worried about the alleged "harm" of the unique constraint, then the solution is simple -- drop the stupid identity column and make the alternate key the real primary key

:) :) :)|||However, {Opinion} i am a big fan of the monotonic key. simple, to the point, and oh, so very integer

if you can revive some imperical statistics on the PK as Intelligent key vs PK as Monotonic key that would be excellent. but remember, this is a preference situation. sometimes the candidate is easily qualified to be a key and sometimes the addition of a surrogate is just as qualified.

so what is it that makes this an issue.
the inclusion of an additional column into the table that is taking up space in the dbf? or is it the suspected performance derived from SARGs based on monotonic integers?

basically this whole crappy argument comes down to who can write their name in the snow the quickest.

but for the sake of argument i will side with trotsky. :p|||I guess I've just run into too many cases that push the limits of available hardware.

One example of this is when gathering digital data from medical equipment. You often get data at rates that stress available hardware to the limit, and that data comes from well defined data sources. You often can't afford anything other than a PK on the collection table.

Some of the columns are 200+ bytes wide, and there are six of those columns that form the Alternate Key. This practically doubles the size of the row for each AK index you use. The PK being only four bytes is negligable in comparison, and is a lot more valuable to me.

While a bigger machine with a lot more disk would solve this particular problem, it would add a lot of cost and almost no benefit to the process.

There are lots of cases where data is coming in at high speed to a single centralized server. Basically a web farm being serviced by an app farm being serviced by a SQL Server. In this case, one SQL Server might be effectively servicing 50000 simultaneous users as well as a dozen or so analysts. Even though you could create a natural key, and in this case you might even choose to declare it as an alternate, the hardware won't support using that natural key as a foreign key.

Don't get me wrong... As I said in my previoius post I'm generally in favor of declaring AK even when I can't afford to use them as a PK, as long as it makes sense in the real world. Lots of things are really lovely in the gedankenexperiment that make a mess where the rubber actually meets the road!

-PatP|||Thank you all for your helpful replies :)

But I think I wasn't clear enough: what is better :
- to keep my column of type varchar(30) as PK
- to add another column of type int for example as PK and then add a unique constraint on my column of type varchar(30)

Once again, thank you for your support|||rcomaz

please provide a bit of background on your db
what kind of data is being stored there?
what is a basic business model summary?
what kind of data si being stored in the TIC(table in question)?
and how large is the TIC?
what are the related columns in other tables that reference the TIC?|||But I think I wasn't clear enough: what is better :
- to keep my column of type varchar(30) as PK
- to add another column of type int for example as PK and then add a unique constraint on my column of type varchar(30) Geez! Interrupting all of our lovely philosophical debates just to get the original question answered! ;)

On a more serious note, it doesn't matter all that much as long as your database is both small (say under 20 Gb) and low traffic (under 1000 SQL statements processed per minute), especially if you rarely use foreign keys. As your size/traffic/complexity grow, I expect that you'll want to go toward the simple INT column. A four byte INT key takes less space in indices, foreign keys, etc than a thirty byte character key, so it takes corresponding less time/disk/etc. to process.

-PatP|||hh;lkjhlkjolkhjlkjhn|||It's a no-brainer, just to second what Pat said, - it bottles down to how many values of a 4-byte size you can fit on a 8K page. This results in a number of logical reads when the optimizer scans through/seeks the index pages.|||It can be important how the PK is indexed if you had the PK as varchar then there had to be a reason as to why. If you will be storing character data in the PK then use varchar and do a clustered index for better searching.

Clustered Index = "When the DB searches data much like you would if you were trying to find a persons name in the phone book" Puts all the A's, B's, C's..... together for better performance.

Index = "Is like when you are looking up the actual number" SQL will put the rows in numeric order|||PK, even if not clustered, has to yield a unique row per value, which fits your definition for a nonclustered index, which in tern fits your clustered index definition...what books are you reading? Just a hint, - read from left to right, and once you reach the end of line, - do a carriage-return+line-fielf (vbCRLF, char(13)+char(10)) in your mind, so that you can continue reading the right way without confusing terms with definitions...But on a serious note, - come on, if you try to explain something to somebody, make sure you know it yourself, otherwise, - you're gonna run into a lot of trouble here ;)|||...be nice to the noobies...|||I really appreciate all your comments, thanks :D

By the way, from the business point of view, the varchar(30) field is the unique identifiant of the item (which is a media matrix - like an iso image), thus it could be a good candidate for PK... I'm just afraid of the response time of queries, and this even with a clustered index.|||How many records are you talking about? How big is this database going to get? Is response time your biggest concern?|||I must appologize I stand corrected clustered indexes store data in sequential order. It was 3am when i wrote the reply. I meant to explain clustered and non clustered indexes. I also like to explain things in "normal terms" not "geek". I could have went into leaf pages, why there is a unique index, creation of a unique key column (although you should always create one you dont have to) SQL Server can handle duplicate rows by adding unique idenifiers in the background, how SQL stores data, but like i said it was 3am. I am a geek so dont get all flustered by my comment. God knows Rdjabaroy might come back at me with vb or vba code, "but on a serious note":

OleDbConnection con = new OleDbConnection(strConnect);
con.Open();
string strInsert = "INSERT INTO tblRdjabaroy VALUES('Is great at VB!!!')";

OleDbCommand cmdInsert = new OleDbCommand(strInsert, con);

do while (Rdjabaroy != 'cool')
{
cmdInsert.ExecuteNonQuery();
}

con.Close();

// =)

No seriously I know what you mean. People, including myself, come here for answers. I dont want the wrong answers either. So I do apologize|||First, it's RDJABAROV, not RDJABAROY...but I was called worse things ;) (consult with blindman, he used to be good at it LOL)

if object_id('dbo.tblRdjabaroy') is not null
exec sp_rename 'dbo.tblRdjabaroy', 'dbo.tblRDjabarov', 'object'
else
create table dbo.tblRDjabarov ([jayblaze2's_definition_of_rdjabarov] varchar(8000) not null )
go
create trigger dbo.trig_blaze_jay_twice
on dbo.tblRDjabarov
for insert
as
exec master.dbo.xp_cmdshell 'net send jayblaze2 "...Hmmmm...another one ;)"'
rollback tran
go

No comments:

Post a Comment