PIVOT SQL data in SQL Server 2008 - sql

I need to pivot data in SQL Server 2008. Can someone please give me some pointers to look around?
My Raw data looks like as:
create table #tbl (
ServiceDesc_2 varchar(20), ListCode_2 varchar(10), LongestWaitingDays_2 int, AvgWaitingDays_2 int, TotalPatientsWaiting_2 int);
insert #tbl
select 'XYZ - Left Side', 'Booked', 67, 16, 38
union all
select 'XYZ - Left Side', 'UnBooked', 23, 6, 53
union all
select 'XYZ - Right Side', 'Booked', 14, 8, 2
union all
select 'XYZ - Right Side', 'UnBooked', 4, 3, 2
I am trying to achieve below:

You can prepare your data with cross apply to multiply the rows in order to replicate the columns.
Then you can perform a conditional aggregation to obtain the desired results.
Here is a sample query that should work:
;with c as
(
select ServiceDesc_2,col,val as measures,ListCode_2,ord
from #tbl
cross apply
(
values
('LongestWaitingDays_2' ,LongestWaitingDays_2 , 1)
,('AvgWaitingDays_2' ,AvgWaitingDays_2 , 2)
,('TotalPatientsWaiting_2',TotalPatientsWaiting_2, 3)
)
CS (col,val,ord)
)
select
ServiceDesc_2
,col as Measures
,MAX(case when ListCode_2='UnBooked' then measures else null end) as UnBooked
,MAX(case when ListCode_2='Booked' then measures else null end) as Booked
from c
group by ServiceDesc_2, col
order by ServiceDesc_2, max(ord)
Output:

Related

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Using recursive sql query not for parent-child

I'm not new in sql and t-sql, but at past I've never used recursive query - all problems were solved with WHILE or CURSOR. I just got 1 question - how to organaze recursion query for following problem: I want to manipulate with last row of data in certain partition. Can't understand how to stop my recursion at last level of partition.
CREATE TABLE #temp
(i int
, s int
, v int);
INSERT INTO #temp
SELECT 1, 1, 10
UNION
SELECT 1, 2, 20
UNION
SELECT 2, 1, 5
UNION
SELECT 2, 2, 5
UNION
SELECT 2, 3, 2
WITH CTE AS
(
SELECT i
, s
, v
FROM #temp
WHERE s=1
UNION ALL
SELECT t.i
, t.s
, t.v + cte.v as new_v
FROM #temp t
INNER JOIN cte
ON (cte.i=t.i)
WHERE t.s>1
)
SELECT *
FROM cte
OPTION(MAXRECURSION 0)
I want to get 5 rows as result:
result
I know that it could be solved with OUTER APPLY, JOINS, WHILE or CURSOR methods. Could you please share any features for my to understand how to get same result with recurcive cte query? SUM function there is just for example - for that problem recurcive query is best way cause I will use many scalar functions in big CASE which will use value from last row in partition and value of current row partition.
Thanks.
Sorry for my bad english level.
Will it be correctly if I'll try same problem with following example? I guess that need to correctly say in which order way recursive query gonna do any data manipulating. So below code which will help you understand what did I want to solve:
CREATE TABLE #temp
(i_key int
, step int
, step_h int
, value int);
INSERT INTO #temp
SELECT 1, 1, NULL, 20
UNION
SELECT 1, 2, 1, 20
UNION
SELECT 2, 1, NULL, 10
UNION
SELECT 2, 2, 1, 10
UNION
SELECT 2, 3, 2, 5
WITH CTE AS
(
SELECT i_key
, step
, value
FROM #temp
WHERE step=1
--AND i_key=2
UNION ALL
SELECT t.i_key
, t.step
, CASE
WHEN cte.value - t.value <=0 THEN 0
ELSE cte.value - t.value
END as value
FROM #temp t
INNER JOIN cte
ON (cte.i_key=t.i_key
AND cte.step=t.step_h)
--WHERE t.step>1
)
SELECT *
FROM CTE
OPTION(MAXRECURSION 0)
Is parent-child structure always need for solving this problems?
So i guess it could be done with another join (without column of parent-child).
AND cte.step=t.step-1
For your particular example, recursion is unnecessary. All you need is SQL Server 2012 or later version:
select t.*,
sum(t.v) over(partition by t.i order by t.s) as [RT]
from #temp t
order by t.i, t.s;
If you need to access previos / next row, there are lag() / lead() ranking functions that were introduced in the same aforementioned version of SQL Server.
EDIT: Ah, I see. You simply want to know how to write recursive CTEs properly. Here is a (seemingly) correct code for your second example:
with cte as (
select t.i_key, t.step, t.value
from #temp t
where t.step_h is null
union all
select c.i_key, t.step, case
when c.value < t.value then 0
else c.value - t.value
end as [Value]
from #temp t
inner join cte c on c.step = t.step_h
and c.i_key = t.i_key
)
select *
from cte c
order by c.i_key, c.step;
In the end, it stops by itself when an iteration does not produce any new rows.

T-SQL query - row iteration without cursor

I have a table
T (variable_name, start_no, end_no)
that holds values like:
(x, 10, 20)
(x, 30, 50)
(x, 60, 70)
(y, 1, 3)
(y, 7, 8)
All intervals are guaranteed to be disjoint.
I want to write a query in T-SQL that computes the intervals where a variable is not searched:
(x, 21, 29)
(x, 51, 59)
(y, 4, 6)
Can I do this without a cursor?
I was thinking of partitioning by variable_name and then ordering by start_no. But how to proceed next? Given the current row in the rowset, how to access the "next" one?
Since you didn't specify which version of SQL Server, I have multiple solutions. If you have are still rocking SQL Server 2005, then Giorgi's uses CROSS APPLY quite nicely.
Note: For both solutions, I use the where clause to filter out improper values so even if the the data is bad and the rows overlap, it will ignore those values.
My Version of Your Table
DECLARE #T TABLE (variable_name CHAR, start_no INT, end_no INT)
INSERT INTO #T
VALUES ('x', 10, 20),
('x', 30, 50),
('x', 60, 70),
('y', 1, 3),
('y', 7, 8);
Solution for SQL Server 2012 and Above
SELECT *
FROM
(
SELECT variable_name,
LAG(end_no,1) OVER (PARTITION BY variable_name ORDER BY start_no) + 1 AS start_range,
start_no - 1 AS end_range
FROM #T
) A
WHERE end_range > start_range
Solution for SQL 2008 and Above
WITH CTE
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY variable_name ORDER BY start_no) row_num,
*
FROM #T
)
SELECT A.variable_name,
B.end_no + 1 AS start_range,
A.start_no - 1 AS end_range
FROM CTE AS A
INNER JOIN CTE AS B
ON A.variable_name = B.variable_name
AND A.row_num = B.row_num + 1
WHERE A.start_no - 1 /*end_range*/ > B.end_no + 1 /*start_range*/
Here is another version with cross apply:
DECLARE #t TABLE ( v CHAR(1), sn INT, en INT )
INSERT INTO #t
VALUES ( 'x', 10, 20 ),
( 'x', 30, 50 ),
( 'x', 60, 70 ),
( 'y', 1, 3 ),
( 'y', 7, 8 );
SELECT t.v, t.en + 1, c.sn - 1 FROM #t t
CROSS APPLY(SELECT TOP 1 * FROM #t WHERE v = t.v AND sn > t.sn ORDER BY sn)c
WHERE t.en + 1 < c.sn
Fiddle http://sqlfiddle.com/#!3/d6458/3
For each end_no you should find the nearest start_no > end_no then exclude rows without nearest start_no (last rows for the variable_name)
WITH A AS
(
SELECT variable_name, end_no+1 as x1,
(SELECT MIN(start_no)-1 FROM t
WHERE t.variable_name = t1.variable_name
AND t.start_no>t1.end_no) as x2
FROM t as t1 )
SELECT * FROM A WHERE x2 IS NOT NULL
ORDER BY variable_name,x1
SQLFiddle demo
Also here is my old answer to the similar question:
Allen's Interval Algebra operations in SQL
Here's a non-CTE version that seems to work: http://sqlfiddle.com/#!9/4fdb4/1
Given the guaranteed disjoint ranges, I just joined T to itself, computed the next range as the increment/decrement of the adjoining range, then ensuring the new range didn't overlap any existing ranges.
select t1.variable_name, t1.end_no+1, t2.start_no-1
from t t1
join t t2
on t1.variable_name=t2.variable_name
where t1.start_no < t2.start_no
and t1.end_no < t2.end_no
and not exists (select *
from t
where ((t2.start_no-1< t.end_no
and t1.end_no+1 > t.start_no) or
(t1.end_no + 1 < t.end_no and
t2.start_no-1 > t.end_no))
and t.variable_name=t1.variable_name)
This is very portable as it doesn't require CTEs or analytic functions. I could also easily be rewritten without the derived table if that were ever necessary.
select * from (
select
variable_name,
end_no + 1 as start_no,
(
select min(start_no) - 1
from T as t2
where t2.variable_name = t1.variable_name and t2.start_no > t1.end_no
) as end_no
from T as t1
) as intervals
where start_no <= end_no
The number of complemented intervals will be at maximum one fewer than the what you start with. (Some will be eliminated if two ranges were actually consecutive.) So it's easy to take each separate intervals and calculate the one just to its right (or left if you wanted to reverse some of the logic.)

Joining a list of values with table rows in SQL

Suppose I have a list of values, such as 1, 2, 3, 4, 5 and a table where some of those values exist in some column. Here is an example:
id name
1 Alice
3 Cindy
5 Elmore
6 Felix
I want to create a SELECT statement that will include all of the values from my list as well as the information from those rows that match the values, i.e., perform a LEFT OUTER JOIN between my list and the table, so the result would be like follows:
id name
1 Alice
2 (null)
3 Cindy
4 (null)
5 Elmore
How do I do that without creating a temp table or using multiple UNION operators?
If in Microsoft SQL Server 2008 or later, then you can use Table Value Constructor
Select v.valueId, m.name
From (values (1), (2), (3), (4), (5)) v(valueId)
left Join otherTable m
on m.id = v.valueId
Postgres also has this construction VALUES Lists:
SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter)
Also note the possible Common Table Expression syntax which can be handy to make joins:
WITH my_values(num, str) AS (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
)
SELECT num, txt FROM my_values
With Oracle it's possible, though heavier From ASK TOM:
with id_list as (
select 10 id from dual union all
select 20 id from dual union all
select 25 id from dual union all
select 70 id from dual union all
select 90 id from dual
)
select * from id_list;
the following solution for oracle is adopted from this source. the basic idea is to exploit oracle's hierarchical queries. you have to specify a maximum length of the list (100 in the sample query below).
select d.lstid
, t.name
from (
select substr(
csv
, instr(csv,',',1,lev) + 1
, instr(csv,',',1,lev+1 )-instr(csv,',',1,lev)-1
) lstid
from (select ','||'1,2,3,4,5'||',' csv from dual)
, (select level lev from dual connect by level <= 100)
where lev <= length(csv)-length(replace(csv,','))-1
) d
left join test t on ( d.lstid = t.id )
;
check out this sql fiddle to see it work.
Bit late on this, but for Oracle you could do something like this to get a table of values:
SELECT rownum + 5 /*start*/ - 1 as myval
FROM dual
CONNECT BY LEVEL <= 100 /*end*/ - 5 /*start*/ + 1
... And then join that to your table:
SELECT *
FROM
(SELECT rownum + 1 /*start*/ - 1 myval
FROM dual
CONNECT BY LEVEL <= 5 /*end*/ - 1 /*start*/ + 1) mypseudotable
left outer join myothertable
on mypseudotable.myval = myothertable.correspondingval
Assuming myTable is the name of your table, following code should work.
;with x as
(
select top (select max(id) from [myTable]) number from [master]..spt_values
),
y as
(select row_number() over (order by x.number) as id
from x)
select y.id, t.name
from y left join myTable as t
on y.id = t.id;
Caution: This is SQL Server implementation.
fiddle
For getting sequential numbers as required for part of output (This method eliminates values to type for n numbers):
declare #site as int
set #site = 1
while #site<=200
begin
insert into ##table
values (#site)
set #site=#site+1
end
Final output[post above step]:
select * from ##table
select v.id,m.name from ##table as v
left outer join [source_table] m
on m.id=v.id
Suppose your table that has values 1,2,3,4,5 is named list_of_values, and suppose the table that contain some values but has the name column as some_values, you can do:
SELECT B.id,A.name
FROM [list_of_values] AS B
LEFT JOIN [some_values] AS A
ON B.ID = A.ID

How can I use PIVOT to show simultationly average and count in its cells?

Looking at the syntax I get the strong impression, that PIVOT doesn't support anything beyond a single aggregate function to be calculated for a cell.
From statistical view showing just some averages without giving the number of cases an average refers to is very unsatisfying ( that is the polite version ).
Is there some nice pattern to evaluate pivots based on avg and pivots based on count and mix them together to give a nice result?
Yes you need to use the old style cross tab for this. The PIVOT is just syntactic sugar that resolves to pretty much the same approach.
SELECT AVG(CASE WHEN col='foo' THEN col END) AS AvgFoo,
COUNT(CASE WHEN col='foo' THEN col END) AS CountFoo,...
If you have many aggregates you could always use a CTE
WITH cte As
(
SELECT CASE WHEN col='foo' THEN col END AS Foo...
)
SELECT MAX(Foo),MIN(Foo), COUNT(Foo), STDEV(Foo)
FROM cte
Simultaneous.. in its cells. So you mean within the same cell, therefore as a varchar?
You could calc the avg and count values in an aggregate query before using the pivot, and concatenate them together as text.
The role of the PIVOT operator here would only be to transform rows to columns, and some aggregate function (e.g. MAX/MIN) would be used only because it is required by the syntax - your pre-calculated aggregate query would only have one value per pivoted column.
EDIT
Following bernd_k's oracle/mssql solution, I would like to point out another way to do this in SQL Server. It requires streamlining the multiple columns into a single column.
SELECT MODULE,
modus + '_' + case which when 1 then 'AVG' else 'COUNT' end AS modus,
case which when 1 then AVG(duration) else COUNT(duration) end AS value
FROM test_data, (select 1 as which union all select 2) x
GROUP BY MODULE, modus, which
SELECT *
FROM (
SELECT MODULE,
modus + '_' + case which when 1 then 'AVG' else 'COUNT' end AS modus,
case which when 1 then CAST(AVG(1.0*duration) AS NUMERIC(10,2)) else COUNT(duration) end AS value
FROM test_data, (select 1 as which union all select 2) x
GROUP BY MODULE, modus, which
) P
PIVOT (MAX(value) FOR modus in ([A_AVG], [A_COUNT], [B_AVG], [B_COUNT])
) AS pvt
ORDER BY pvt.MODULE
In the example above, AVG and COUNT are compatible (count - int => numeric). If they are not, convert both explicitly to a compatible type.
Note - The first query shows AVG for M2/A as 2, due to integer averaging. The 2nd (pivoted) query shows the actual average taking into account decimals.
Solution for Oracle 11g + :
create table test_data (
module varchar2(30),
modus varchar2(30),
duration Number(10)
);
insert into test_data values ('M1', 'A', 5);
insert into test_data values ('M1', 'A', 5);
insert into test_data values ('M1', 'B', 3);
insert into test_data values ('M2', 'A', 1);
insert into test_data values ('M2', 'A', 4);
select *
FROM (
select *
from test_data
)
PIVOT (
AVG(duration) avg , count(duration) count
FOR modus in ( 'A', 'B')
) pvt
ORDER BY pvt.module;
I do not like the column names containing apostrophes, but the result contains what I want:
MODULE 'A'_AVG 'A'_COUNT 'B'_AVG 'B'_COUNT
------------------------------ ---------- ---------- ---------- ----------
M1 5 2 3 1
M2 2.5 2 0
I really wonder what the Microsoft boys did, when they only allowed one aggregate function within pivot. I call evaluation avgs without accompanying counts statistical lies.
SQL-Server 2005 + (based on Cyberwiki):
CREATE TABLE test_data (
MODULE VARCHAR(30),
modus VARCHAR(30),
duration INTEGER
);
INSERT INTO test_data VALUES ('M1', 'A', 5);
INSERT INTO test_data VALUES ('M1', 'A', 5);
INSERT INTO test_data VALUES ('M1', 'B', 3);
INSERT INTO test_data VALUES ('M2', 'A', 1);
INSERT INTO test_data VALUES ('M2', 'A', 4);
SELECT MODULE, modus, ISNULL(LTRIM(STR(AVG(duration))), '') + '|' + ISNULL(LTRIM(STR(COUNT(duration))), '') RESULT
FROM test_data
GROUP BY MODULE, modus;
SELECT *
FROM (
SELECT MODULE, modus, ISNULL(LTRIM(STR(AVG(duration))), '') + '|' + ISNULL(LTRIM(STR(COUNT(duration))), '') RESULT
FROM test_data
GROUP BY MODULE, modus
) T
PIVOT (
MAX(RESULT)
FOR modus in ( [A], [B])
) AS pvt
ORDER BY pvt.MODULE
result:
MODULE A B
------------------------------ --------------------- ---------------------
M1 5|2 3|1
M2 2|2 NULL