Need to sum columns: Need to tune a table for performance gains

Hi :

I have a TableA with around 10 columns with varchar and numeric
datatypes
It has 500 million records and its size is 999999999 KB. i believe it
is kb
i got this data after running sp_spaceused on it. The index_size was
also pretty big in 6 digits.

On looking at the tableA
it didnot have any pks and hence no clustered index.
It had other indices
IX_1 on ColA
IX_2 on ColB
IX_3 on ColC
IX_4 on ColA, ColB and ColC put together.

Queries performed in this table are very slow. I have been asked to
tune up this table.
I know as much info as you.

Data prior to 2004 can be archived into another table. I need to run a
query to find out how many records that is.

I am thinking the following, but dont know if i am correct ?
I need to add a new PK column (which will increase the size of the
tableA) which will add a clustered index.
Right now there are no clustered indices

2. I would like help in understanding should i remove IX_1, IX_2, IX_3
as they are all used in IX_4 anyway .

3. I forget what the textbox is called on the index page. it is set to
0 and can be set from 0 to 100. what would be a good value for it ?

thank you.
RSOn May 1, 6:29 pm, rshivara...@.gmail.com wrote:

Quote:

Originally Posted by

Hi :
>
I have a TableA with around 10 columns with varchar and numeric
datatypes
It has 500 million records and its size is 999999999 KB. i believe it
is kb
i got this data after running sp_spaceused on it. The index_size was
also pretty big in 6 digits.
>
On looking at the tableA
it didnot have any pks and hence no clustered index.
It had other indices
IX_1 on ColA
IX_2 on ColB
IX_3 on ColC
IX_4 on ColA, ColB and ColC put together.
>
Queries performed in this table are very slow. I have been asked to
tune up this table.
I know as much info as you.
>
Data prior to 2004 can be archived into another table. I need to run a
query to find out how many records that is.
>
I am thinking the following, but dont know if i am correct ?
I need to add a new PK column (which will increase the size of the
tableA) which will add a clustered index.
Right now there are no clustered indices
>
2. I would like help in understanding should i remove IX_1, IX_2, IX_3
as they are all used in IX_4 anyway .
>
3. I forget what the textbox is called on the index page. it is set to
0 and can be set from 0 to 100. what would be a good value for it ?
>
thank you.
RS

I'm afraid I might not be able to offer all the hints you are looking
for, but here are some things I've found helpful:

1) LOOK AT YOUR QUERIES!! We've obtained huge performance boosts
just by looking at the queries and stored procedures and optimizing
them. Usually if a query is going extremely slow it is because of
poor programming techniques.
2) LOOK AT YOUR QUERIES!! Are they sometimes filtering only on
ColB? If so, IX_4 will be useless if you remove IX_2. IX_4 will kick
in when you are filtering on several columns. But are there other
columns that are being filtered on?
3) Are you dynamically parcing or calculating columns? For instance,
if you have a column with EventDateTime of 01-01-2007 12:34PM but you
are filtering on dates (ie WHERE CONVERT(varchar, EventDateTime, 101)
= '01-01-07') then that is going to slow down your query. Create a
new column of EventDate and calculate the event date from the event
date and time and then stick an index on it. That will add some
boosts as the sampled WHERE clause won't reference the index.
4) Do you look at the query execution plan? That can tell you if you
are doing full table scans or hitting an index.
5) Have you used the query index wizard?
6) Breaking the table up will help a lot. At one company I consult
for we have a similar table (a half million records added per day)
with data going back to 2003. We create two (what we call) _Mini
tables. TableName_Mini_Last45Days, TableName_Mini_Last6Months, and
TableName_Mini_Last1Year. This does take up space but since 80% of
the queries only want data from the last month the first mini table
makes sense to hit. If they need to go further back further then they
hit the appropriate table. The main table (TableName) is rarely
queries except after its morning data load (off hours) to create the
_Mini tables so if somebody does need information from it, it isn't
bogged down by heavy usage.
7) I've never done this personally so I cannot attest to its success
(though I should test it). I have heard you can recreate/rebuild an
index and that should get rid of some of the fragmentation. I think
I'll give it a test, but you might want to do that as well. But since
it is re-indexing half a billion records you may want to do that off-
hours.

I hope one or more of these helps you out!|||On May 2, 10:39 am, Utahduck <Utahd...@.hotmail.comwrote:

Quote:

Originally Posted by

On May 1, 6:29 pm, rshivara...@.gmail.com wrote:
>
>
>
>
>

Quote:

Originally Posted by

Hi :

Quote:

Originally Posted by

Quote:

Originally Posted by

On looking at the tableA
it didnot have any pks and hence no clustered index.
It had other indices
IX_1 on ColA
IX_2 on ColB
IX_3 on ColC
IX_4 on ColA, ColB and ColC put together.

Quote:

Originally Posted by

Queries performed in this table are very slow. I have been asked to
tune up this table.
I know as much info as you.

Quote:

Originally Posted by

Data prior to 2004 can be archived into another table. I need to run a
query to find out how many records that is.

Quote:

Originally Posted by

Quote:

Originally Posted by

2. I would like help in understanding should i remove IX_1, IX_2, IX_3
as they are all used in IX_4 anyway .

Quote:

Originally Posted by

3. I forget what the textbox is called on the index page. it is set to
0 and can be set from 0 to 100. what would be a good value for it ?

Quote:

Originally Posted by

thank you.
RS

>
I'm afraid I might not be able to offer all the hints you are looking
for, but here are some things I've found helpful:
>
1) LOOK AT YOUR QUERIES!! We've obtained huge performance boosts
just by looking at the queries and stored procedures and optimizing
them. Usually if a query is going extremely slow it is because of
poor programming techniques.
2) LOOK AT YOUR QUERIES!! Are they sometimes filtering only on
ColB? If so, IX_4 will be useless if you remove IX_2. IX_4 will kick
in when you are filtering on several columns. But are there other
columns that are being filtered on?
3) Are you dynamically parcing or calculating columns? For instance,
if you have a column with EventDateTime of 01-01-2007 12:34PM but you
are filtering on dates (ie WHERE CONVERT(varchar, EventDateTime, 101)
= '01-01-07') then that is going to slow down your query. Create a
new column of EventDate and calculate the event date from the event
date and time and then stick an index on it. That will add some
boosts as the sampled WHERE clause won't reference the index.
4) Do you look at the query execution plan? That can tell you if you
are doing full table scans or hitting an index.
5) Have you used the query index wizard?
6) Breaking the table up will help a lot. At one company I consult
for we have a similar table (a half million records added per day)
with data going back to 2003. We create two (what we call) _Mini
tables. TableName_Mini_Last45Days, TableName_Mini_Last6Months, and
TableName_Mini_Last1Year. This does take up space but since 80% of
the queries only want data from the last month the first mini table
makes sense to hit. If they need to go further back further then they
hit the appropriate table. The main table (TableName) is rarely
queries except after its morning data load (off hours) to create the
_Mini tables so if somebody does need information from it, it isn't
bogged down by heavy usage.
7) I've never done this personally so I cannot attest to its success
(though I should test it). I have heard you can recreate/rebuild an
index and that should get rid of some of the fragmentation. I think
I'll give it a test, but you might want to do that as well. But since
it is re-indexing half a billion records you may want to do that off-
hours.
>
I hope one or more of these helps you out!- Hide quoted text -
>
- Show quoted text -

Thank you Utahduck, your mail was helpful. Also am reading up on
indexes and that is helping me.
You idea of laying out mini tables by the year is a great suggestion i
hadnt thought of. I will be definitely using that
Thank you
RS|||Think of IX_4 like a phone book.

City, Last Name, First Name, Phone Number.

That's great, as long as you know what city a person is in. If you
don't know the city then it takes you a lot longer to look up the
perosns name because you have to loop through each city to find if the
person is in there.

That's where IX_1, Ix_2, and IX_3 come in. You could easily get rid of
IX_1, because it will be in the same order as IX_4.

Cheers,
Jason Lepack

On May 2, 1:07 pm, rshivara...@.gmail.com wrote:

Quote:

Originally Posted by

On May 2, 10:39 am, Utahduck <Utahd...@.hotmail.comwrote:
>
>
>
>
>

Quote:

Originally Posted by

On May 1, 6:29 pm, rshivara...@.gmail.com wrote:

Quote:

Originally Posted by

Quote:

Originally Posted by

Hi :

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

On looking at the tableA
it didnot have any pks and hence no clustered index.
It had other indices
IX_1 on ColA
IX_2 on ColB
IX_3 on ColC
IX_4 on ColA, ColB and ColC put together.

Quote:

Originally Posted by

Quote:

Originally Posted by

Queries performed in this table are very slow. I have been asked to
tune up this table.
I know as much info as you.

Quote:

Originally Posted by

Quote:

Originally Posted by

Data prior to 2004 can be archived into another table. I need to run a
query to find out how many records that is.

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

Quote:

Originally Posted by

2. I would like help in understanding should i remove IX_1, IX_2, IX_3
as they are all used in IX_4 anyway .

Quote:

Originally Posted by

Quote:

Originally Posted by

3. I forget what the textbox is called on the index page. it is set to
0 and can be set from 0 to 100. what would be a good value for it ?

Quote:

Originally Posted by

Quote:

Originally Posted by

thank you.
RS

Quote:

Originally Posted by

I'm afraid I might not be able to offer all the hints you are looking
for, but here are some things I've found helpful:

Quote:

Originally Posted by

Quote:

Originally Posted by

I hope one or more of these helps you out!- Hide quoted text -

Quote:

Originally Posted by

- Show quoted text -

>
Thank you Utahduck, your mail was helpful. Also am reading up on
indexes and that is helping me.
You idea of laying out mini tables by the year is a great suggestion i
hadnt thought of. I will be definitely using that
Thank you
RS- Hide quoted text -
>
- Show quoted text -

|||(rshivaraman@.gmail.com) writes:

Quote:

Originally Posted by

I have a TableA with around 10 columns with varchar and numeric
datatypes
It has 500 million records and its size is 999999999 KB. i believe it
is kb i got this data after running sp_spaceused on it. The index_size was
also pretty big in 6 digits.

Could you post the exact output from sp_spaceused and also from
DBCC SHOWCONTIG on the table? (The latter will take some time to
run on a table this size.)

Quote:

Originally Posted by

On looking at the tableA
it didnot have any pks and hence no clustered index.

Maybe the table has neither, however there is no connection between
the two. A table could have a clustered index, but no primary key or
vice versa. Most or rather all tables should really have a PK. And
the very most tables should have a clustered index. But far from all
tables should have their clustered index on their primary key.

Tables without a clustered index are known as heaps. Heaps are more
prone to fragmentation for various reasons, and given the size of your
table, I suspect yours is victim to that.

Quote:

Originally Posted by

It had other indices
IX_1 on ColA
IX_2 on ColB
IX_3 on ColC
IX_4 on ColA, ColB and ColC put together.
>
Queries performed in this table are very slow. I have been asked to
tune up this table.
I know as much info as you.

There are all reasons to investigate whether the queries align with
the index. Utahduck made a very good point about queries hiding the
column in an expression.

Quote:

Originally Posted by

Data prior to 2004 can be archived into another table. I need to run a
query to find out how many records that is.

Since we don't know the table definiton, we cannot help you with that.

Quote:

Originally Posted by

I am thinking the following, but dont know if i am correct ?
I need to add a new PK column (which will increase the size of the
tableA) which will add a clustered index.

Better investigate if there are any columns for which there are
typically range queries, like a datetime column.

You should also investigate if there are columns in the table that in
fact constitute a key, but someone has forgotten to define that key.
After all, adding an IDENTITY column as a PK, makes in practice very
little difference to not having a key at all.

Quote:

Originally Posted by

2. I would like help in understanding should i remove IX_1, IX_2, IX_3
as they are all used in IX_4 anyway .

That is impossible to say without knowing the queries. But a query
like

SELECT ... FROM tbl WHERE colB = @.value

will not be helped much by the index on (ColA, ColB, ColC). Possibly,
the index on ColA is redudant, but there are queries that will run
faster with this index in place, for instance:

SELECT ColA, COUNT(*) FROM tbl GROUP BY ColA

Quote:

Originally Posted by

3. I forget what the textbox is called on the index page. it is set to
0 and can be set from 0 to 100. what would be a good value for it ?

That is the fillfactor. We would need to know more about the table to
be able to make recommendations about it.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thank you All .
Erland, i will post answers to your queries shortly.
I was looking at a datatype of a column and it a numerica 9(15,0),
occupies 9 bytes but allowed a max lenght of 15.
This can also be a varchar field. So a varchar(15) compared to a
numeric 9(15,0)
Which occupies more space? or they are the same ?|||exec sp_spaceused TableA
TableA1 16 KB8 KB8 KB0 KB

exec sp_spaceused Tableb
TableB1 16 KB8 KB8 KB0 KB

Hi : I created two tables.
TableA has a varchar field of size 18
TableB has a numeric field of size 9(15,0)

I put data of 18 1's in TableA and 15 1's in TableB

Why is there an index_size whey i have not created any pks and ixs.
Looks like they are the same storage size ? Is that strange ?
Right now,there are only one rows in each, so if i had more rows of
same data, will the size differ ?|||DBCC SHOWCONTIG (TableA)-- 18:39 minutes

DBCC SHOWCONTIG scanning 'TableA' table...
Table: 'TableA' (549576996); index ID: 0, database ID: 15
TABLE level scan performed.
- Pages Scanned........................: 6007257
- Extents Scanned.......................: 750994
- Extent Switches.......................: 750993
- Avg. Pages per Extent..................: 8.0
- Scan Density [Best Count:Actual Count]......: 99.99%
[750908:750994]
- Extent Scan Fragmentation ...............: 21.20%
- Avg. Bytes Free per Page................: 368.0
- Avg. Page Density (full)................: 95.45%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.

exec SP_SPACEUSED TableA

namerowsreserved data
index_size unused
tbl06521AA504609530 125679832 KB48058056 KB77594288 KB27488 KB|||(rshivaraman@.gmail.com) writes:

Quote:

Originally Posted by

Erland, i will post answers to your queries shortly.
I was looking at a datatype of a column and it a numerica 9(15,0),
occupies 9 bytes but allowed a max lenght of 15.
This can also be a varchar field. So a varchar(15) compared to a
numeric 9(15,0)
Which occupies more space? or they are the same ?

They are not the same. Each numeric(15,0) value occupies nine bytes,
including NULL values. (Unless you are on SQL 2005 SP2 and use the
new vardecimal feature.)

A varchar(15) values can occupy anything from 2 to 17 bytes. That is, two
bytes for the length and then as many bytes as needed for the value.

I don't really understand why you are making these considerations, but
if you are choosing between the two for a key value, I would recommend
a numeric type (and rather bigint over numeric(15,0)), since varchar
is subject to more complex sorting and comparison rules (unless you
pick a binary collation.)

I looked at your SHOWCONTIG and spaceused data. The table has some
fragmentation, but it is not frightening. And the table certainly
calls for respect with its 125 GB.

I think you should examine exactly what queries that are run against
this table. With a table this size, you really to have indexes to
support all queries.

I guess that since the table has so low rate of fragmentation, that
data is only inserted, but never updated or deleted. Therefore it
may not be a pressing issue to add a clustered index - an operation
given the size of the table that will take some time.

Monday, February 20, 2012

Need to tune a table for performance gains

No comments:

Post a Comment

Need to sum columns

Blog Archive

About Me