Need to sum columns: Needing 1st Non Null Value

I have a table that contains about 20 columns. However, each row may only have values in one or two columns while the rest are null. Basically, a new row is added each time a parameter is updated thus having a value for one column but not the others.

Here's an example of the data.

PatientID BloodPressure Pulse Temperature Location UpdateTime
1 120/80 NULL NULL NULL 3/8/07 11:33:00
1 NULL NULL 98.5 NULL 3/8/07 11:45:30
1 NULL 82 NULL NULL 3/9/07 6:20:15
1 NULL NULL NULL RM101 3/9/07 6:24:18
1 125/85 NULL NULL NULL 3/9/07 8:15:00
2 115/75 NULL NULL NULL 3/8/07 12:32:02

2 NULL NULL 99.5 NULL 3/8/07 12:45:30

2 NULL 88 NULL NULL 3/9/07 7:20:19

2 NULL NULL NULL RM108 3/9/07 6:24:18

2 119/85 NULL NULL NULL 3/9/07 8:16:00

I need an efficient query that will return the most recent non-null value for each patientID based on the UpdateTime. Efficiency is of utmost importance because there will be hundreds of patients with potentially thousands of records.

The result set should be:
PatientID BloodPressure Pulse Temperature Location
1 125/85 82 98.5 RM101
2 119/85 88 99.5 RM108

Any help will be greatly appreciated.

If you are using SQL Server 2005, this should work although I'm not sure how well it will perform

with cte as
(select PatientID,
BloodPressure,
row_number() over(partition by PatientID
order by case when BloodPressure is null then null else UpdateTime end desc) as BloodPressureRN,
Pulse,
row_number() over(partition by PatientID
order by case when Pulse is null then null else UpdateTime end desc) as PulseRN,
Temperature,
row_number() over(partition by PatientID
order by case when Temperature is null then null else UpdateTime end desc) as TemperatureRN,
Location,
row_number() over(partition by PatientID
order by case when Location is null then null else UpdateTime end desc) as LocationRN,
UpdateTime
from mytable)
select PatientID,
max(case when BloodPressureRN=1 then BloodPressure end) as BloodPressure,
max(case when PulseRN=1 then Pulse end) as Pulse,
max(case when TemperatureRN=1 then Temperature end) as Temperature,
max(case when LocationRN=1 then Location end) as Location
from cte
group by PatientID

|||

An alternative that works on SQL Server 2000 is:

declare @.basicStats table
( patientId integer not null,
bloodPressure varchar(8),
pulse smallint,
temperature numeric (4,1),
location varchar(12),
UpdateTime datetime
)
insert into @.basicStats values (1, '120/80', NULL, NULL, NULL, '3/8/07 11:33:00')
insert into @.basicStats values (1, NULL, NULL, 98.5, NULL, '3/8/07 11:45:30')
insert into @.basicStats values (1, NULL, 82, NULL, NULL, '3/9/07 6:20:15')
insert into @.basicStats values (1, NULL, NULL, NULL, 'RM101', '3/9/07 6:24:18')
insert into @.basicStats values (1, '125/85', NULL, NULL, NULL, '3/9/07 8:15:00')
insert into @.basicStats values (2, '115/75', NULL, NULL, NULL, '3/8/07 12:32:02')
insert into @.basicStats values (2, NULL, NULL, 99.5, NULL, '3/8/07 12:45:30')
insert into @.basicStats values (2, NULL, 88, NULL, NULL, '3/9/07 7:20:19')
insert into @.basicStats values (2, NULL, NULL, NULL, 'RM108', '3/9/07 6:24:18')
insert into @.basicStats values (2, '119/85', NULL, NULL, NULL, '3/9/07 8:16:00')
--select * from @.basicStats
select patientId,
( select top 1 BloodPressure from @.basicStats b
where b.patientId = a.patientId
and b.BloodPressure is not null
order by b.updateTime desc
) as BloodPressure,
( select top 1 pulse from @.basicStats c
where c.patientId = a.patientId
and c.pulse is not null
order by c.updateTime desc
) as pulse,
( select top 1 temperature from @.basicStats d
where d.patientId = a.patientId
and d.temperature is not null
order by d.updateTime desc
) as temperature,
( select top 1 location from @.basicStats e
where e.patientId = a.patientId
and e.location is not null
order by e.updateTime desc
) as location
from @.basicStats a
group by patientId
-- patientId BloodPressure pulse temperature location
-- -- - --
-- 1 125/85 82 98.5 RM101
-- 2 119/85 88 99.5 RM108

|||

And a third solution (sorry Kent, I pinched your table declaration and inserts for this one)...

This solution appears to be more expensive than Kent's solution, according to the execution plan on my machine. You'll probably experience the same, or you might not - it's just another option for you to try out on your hardware.

Chris

declare @.basicStats table

(

patientId integer not null,
bloodPressure varchar(8),
pulse smallint,
temperature numeric (4,1),
location varchar(12),
UpdateTime datetime

)

insert into @.basicStats values (1, '120/80', NULL, NULL, NULL, '3/8/07 11:33:00')

insert into @.basicStats values (1, NULL, NULL, 98.5, NULL, '3/8/07 11:45:30')

insert into @.basicStats values (1, NULL, 82, NULL, NULL, '3/9/07 6:20:15')

insert into @.basicStats values (1, NULL, NULL, NULL, 'RM101', '3/9/07 6:24:18')

insert into @.basicStats values (1, '125/85', NULL, NULL, NULL, '3/9/07 8:15:00')

insert into @.basicStats values (2, '115/75', NULL, NULL, NULL, '3/8/07 12:32:02')

insert into @.basicStats values (2, NULL, NULL, 99.5, NULL, '3/8/07 12:45:30')

insert into @.basicStats values (2, NULL, 88, NULL, NULL, '3/9/07 7:20:19')

insert into @.basicStats values (2, NULL, NULL, NULL, 'RM108', '3/9/07 6:24:18')

insert into @.basicStats values (2, '119/85', NULL, NULL, NULL, '3/9/07 8:16:00')

SELECT DISTINCT

p.PatientID,
pbp.BloodPressure,
ppu.Pulse,
pt.Temperature,
pl.Location

FROM @.basicStats p

INNER JOIN (SELECT PatientID, MAX(UpdateTime) AS UpdateTime FROM @.basicStats WHERE BloodPressure IS NOT NULL GROUP BY PatientID) bp ON bp.PatientID = p.PatientID
INNER JOIN @.basicStats pbp ON pbp.PatientID = bp.PatientID AND pbp.UpdateTime = bp.UpdateTime
INNER JOIN (SELECT PatientID, MAX(UpdateTime) AS UpdateTime FROM @.basicStats WHERE Pulse IS NOT NULL GROUP BY PatientID) pu ON pu.PatientID = p.PatientID
INNER JOIN @.basicStats ppu ON ppu.PatientID = bp.PatientID AND ppu.UpdateTime = pu.UpdateTime
INNER JOIN (SELECT PatientID, MAX(UpdateTime) AS UpdateTime FROM @.basicStats WHERE Temperature IS NOT NULL GROUP BY PatientID) t ON t.PatientID = p.PatientID
INNER JOIN @.basicStats pt ON pt.PatientID = bp.PatientID AND pt.UpdateTime = t.UpdateTime
INNER JOIN (SELECT PatientID, MAX(UpdateTime) AS UpdateTime FROM @.basicStats WHERE Location IS NOT NULL GROUP BY PatientID) l ON l.PatientID = p.PatientID
INNER JOIN @.basicStats pl ON pl.PatientID = bp.PatientID AND pl.UpdateTime = l.UpdateTime

|||

Gary:

Please forgive me, but this is a kind of query that I have struggled with. Since you are interested in optimum performance I think it woul be best if you could get Umachandar or someone similar to comment. I am just too afraid that I will get this wrong.

Kent

|||

Mark's solution will work fine except that you will end up sorting the entire table 4 times (once each to compute row number for bloodpressure, pulse, temperature and location). If you have a large table this will be problematic. The other solutions also suffer from the same problem. SQL Server unfortunately doesn't have a concept of filtered indexes which will help in this case. Database systems like PostgreSQL can do better job because you can create filtered index (on non-null value of each attribute you are interested). Similarly, in Oracle you can make use of index compression to reduce I/O costs due to repeated values (NULL) or use function-based indexes. DB2 also has some features that you can take advantage of for this type of schema.

In SQL Server you can use indexed views to achieve the filtered indexes functionality. So define one indexed view per property like:

create view basicStats_bloodPressure
with schemabinding
as
select patientId, bloodPressure, UpdateTime
from dbo.basicStats
where bloodPressure is not null
go
create unique clustered index idx_basicStats_bloodPressure on basicStats_bloodPressure(patientId, UpdateTime desc)
go

create view basicStats_pulse
with schemabinding
as
select patientId, pulse, UpdateTime
from dbo.basicStats
where pulse is not null
go
create unique clustered index idx_basicStats_pulse on basicStats_pulse(patientId, UpdateTime desc)
go

create view basicStats_temperature
with schemabinding
as
select patientId, temperature, UpdateTime
from dbo.basicStats
where temperature is not null
go
create unique clustered index idx_basicStats_temperature on basicStats_temperature(patientId, UpdateTime desc)
go

create view basicStats_location
with schemabinding
as
select patientId, location, UpdateTime
from dbo.basicStats
where location is not null
go
create unique clustered index idx_basicStats_location on basicStats_location(patientId, UpdateTime desc)
go

And create an index on the table too like:

create clustered index idx_basicstats_patientid on basicStats( patientId )

Now, your query is as simple as what Kent had and SQL Server will automatically use the indexed views.

select patientId,
( select top 1 BloodPressure from basicStats b
where b.patientId = a.patientId
and b.BloodPressure is not null
order by b.updateTime desc
) as BloodPressure,
( select top 1 pulse from basicStats c
where c.patientId = a.patientId
and c.pulse is not null
order by c.updateTime desc
) as pulse,
( select top 1 temperature from basicStats d
where d.patientId = a.patientId
and d.temperature is not null
order by d.updateTime desc
) as temperature,
( select top 1 location from basicStats e
where e.patientId = a.patientId
and e.location is not null
order by e.updateTime desc
) as location
from basicStats a
group by patientId;

-- or below referencing indexed views directly

select patientId
, (select top 1 t1.bloodPressure from basicStats_bloodPressure as t1 where t1.patientId = b.patientId order by t1.UpdateTime desc) as bloodPressure
, (select top 1 t1.pulse from basicStats_pulse as t1 where t1.patientId = b.patientId order by t1.UpdateTime desc) as pulse
, (select top 1 t1.temperature from basicStats_temperature as t1 where t1.patientId = b.patientId order by t1.UpdateTime desc) as temperature
, (select top 1 t1.location from basicStats_location as t1 where t1.patientId = b.patientId order by t1.UpdateTime desc) as location
from (
select distinct patientId
from basicStats
) as b

These indexed views will need to be maintained during DML operations. But this is probably the best way to get the performance for this query given the nature of your data. You do incur some additional storage costs.

Alternatively, you should simply partition the table by property and use my last query except each indexed view is a physical table and the main table only contains the patientId columns & others that are not partitioned:

With this approach, you will save some storage also. DML operations are slightly complex and you need to do additional work instead of SQL Server with the indexed views approach. For example, if you insert any row then you need to first check if there is a row in the main table for that patientId first and then insert into the separate property table. This might work out better than the indexed views approach also. So test both and see. Indexed views approach however is straight forward and doesn't require much changes to your existing code.

|||( I knew it would be worth it; thanks, Umachandar. WOA! )|||Wow thanks guys for the advice. I'll try some of these and give some feedback.

Thanks again!

Gary B.

Wednesday, March 7, 2012

Needing 1st Non Null Value

No comments:

Post a Comment

Need to sum columns

Blog Archive

About Me