Combing multiple rows into one row - sql

I have the following table
Index BookNumber
2 51
2 52
2 53
1 41
1 42
1 43
I am trying to come up with the following output
Index BookNumber1 Booknumber2 Booknumber3
----------------------------------------------
1 41 42 43
2 51 52 53
I was able to come up with the following query , however the output is unexpected
SELECT DISTINCT
index,
CASE WHEN index = 1 THEN Booknumber END AS BookNumber1,
CASE WHEN index = 2 THEN Booknumber END AS BookNumber2,
CASE WHEN index = 3 THEN Booknumber END AS BookNumber3
FROM Mytable;
I get following output
Index BN1 BN2 BN3
------------------------------
1 41 null null
1 null 42 null
1 null null 43
2 51 null null
2 null 52 null
2 null null 53
Is there a way to compress this to only 2 rows?

I am not quite sure how the index in your query matches the index column in your data. But the query that you want is:
SELECT index,
max(CASE WHEN index = 1 THEN Booknumber END) AS BookNumber1 ,
max(CASE WHEN index = 2 THEN Booknumber END) AS BookNumber2,
max(CASE WHEN index = 3 THEN Booknumber END) AS BookNumber3
FROM Mytable
GROUP BY index;
Give your data, the query seems more like:
SELECT index,
max(CASE WHEN ind = 1 THEN Booknumber END) AS BookNumber1 ,
max(CASE WHEN ind = 2 THEN Booknumber END) AS BookNumber2,
max(CASE WHEN ind = 3 THEN Booknumber END) AS BookNumber3
FROM (select mt.*, row_number() over (partition by index order by BookNumber) as ind
from Mytable mt
) mt
GROUP BY index;
By the way, "index" is a reserved word, so I assume that it is just a placeholder for another column name. Otherwise, you need to escape it with double quotes or square braces.

Assuming there are always 3 or fewer book numbers for each index, you could use:
with data as
(select idx,
booknumber as bn1,
lag(booknumber, 1) over(partition by idx order by idx, booknumber) as bn2,
lag(booknumber, 2) over(partition by idx order by idx, booknumber) as bn3
from books)
select *
from data
where data.bn1 = (select max(x.bn1) from data x where x.idx = data.idx)
sqlfiddle demo is here: http://sqlfiddle.com/#!6/8dc82/5/0

Don't forget that index is a reserved word. Personally I prefer not to use reserved words as column names, but you can compensate by using square brackets like in my example.
This will work from sqlserver 2008+
declare #t table([Index] int, BookNumber int)
insert #t values
(2,51),(2,52),(2,53),(1,41),(1,42),(1,43)
;with cte as
(
select [Index], BookNumber,
row_number() over (partition by [Index] order by BookNumber) rn
from #t
)
select [Index], [1] as Booknumber1, [2] as Booknumber2, [3] as Booknumber3
from cte
pivot (max([booknumber]) FOR [rn] IN ([1],[2],[3])) AS pvt
Result:
Index Booknumber1 Booknumber2 Booknumber3
1 41 42 43
2 51 52 53

Related

COUNT() OVER possible using DISTINCT and WINDOWING IN HIVE

I want to calculate the number of distinct port numbers that exist between the current row and the X previous rows (sliding window), where x can be any integer number.
For instance,
If the input is:
ID PORT
1 21
2 22
3 23
4 25
5 25
6 21
The output should be:
ID PORT COUNT
1 21 1
2 22 2
3 23 3
4 25 4
5 25 4
6 21 4
I am using Hive, over RapidMiner and I have tried the following:
select id, port,
count (*) over (partition by srcport order by id rows between 5 preceding and current row)
This must work for big data and when X is big integer number.
Any feedback would be appreciated.
I don't think there is an easy way. One method uses lag():
select ( (case when port_5 is not null then 1 else 0 end) +
(case when port_4 is not null and port_4 not in (port_5) then 1 else 0 end) +
(case when port_3 is not null and port_3 not in (port_5, port_4) then 1 else 0 end) +
(case when port_2 is not null and port_2 not in (port_5, port_4, port_3) then 1 else 0 end) +
(case when port_1 is not null and port_1 not in (port_5, port_4, port_3, port_2) then 1 else 0 end) +
(case when port is not null and port not in (port_5, port_4, port_3, port_2, port_2) then 1 else 0 end)
) as cumulative_distinct_count
from (select t.*,
lag(port, 5) over (partition by srcport order by id rows) as port_5,
lag(port, 4) over (partition by srcport order by id rows) as port_4,
lag(port, 3) over (partition by srcport order by id rows) as port_3,
lag(port, 2) over (partition by srcport order by id rows) as port_2,
lag(port, 1) over (partition by srcport order by id rows) as port_1
from t
) t
This is a complicated query, but the performance should be ok.
Note: port and srcport I assume are the same thing, but this borrows from your query.
One way to do it is with a self join as distinct isn't supported in window functions.
select t1.id,count(distinct t2.port) as cnt
from tbl t1
join tbl t2 on t1.id-t2.id>=0 and t1.id-t2.id<=5 --change this number per requirements
group by t1.id
order by t1.id
This assumes id's are in sequential order.
If not, first get the row numbers and use the logic from above. It would be like
with rownums as (select id,port,row_number() over(order by id) as rnum
from tbl)
select r1.id,count(distinct r2.port)
from rownums r1
join rownums r2 on r1.rnum-r2.rnum>=0 and r1.rnum-r2.rnum<=5
group by r1.id

Use a standard single sql to group data (SQL Server)

Raw data with 2 columns:
0 33
2 null
0 44
2 null
2 null
2 null
0 55
2 null
2 null
.....
Results I want:
2 33
2 44
2 44
2 44
2 55
2 55
....
Can I use a SQL statement to accomplish this? (return the rows with 2 only but fill with values come from the previous row that is 0), there could be many '2 null' between 0.
This way
with s as (
select *
from
(values
(1,0,33 ),
(2,2,null),
(3,0,44 ),
(4,2,null),
(5,2,null),
(6,2,null),
(7,0,55 ),
(8,2,null),
(9,2,null)
) T(id,a,b)
)
select s1.a, t.b
from s s1
cross apply (
select top(1) s2.b
from s s2
where s2.id < s1.id and s2.b is not null and s2.a = 0
order by s2.id desc ) t
where s1.a = 2
order by s1.id;
I use CROSS APPLY so the query may be easily extended to get other columns from the relevant '0' row.
First of all, select value for every row with null:
SELECT col2 FROM (SELECT MAX(ID) FROM your_tbl t WHERE t.ID < ID AND col2 IS NOT NULL);
Then write a condition for your table with that subquery:
SELECT col1, (
SELECT col2 FROM your_tbl where id = (SELECT MAX(ID) FROM your_tbl t
WHERE t.ID < tbl.ID AND col2 IS NOT NULL))
FROM your_tbl tbl WHERE col1 <> 0;

SQL: alternatives and substitutions for GROUPING SETS and PIVOT

I've got code like this:
SELECT id, YEAR(datek) AS YEAR, COUNT(*) AS NUM
FROM Orders
GROUP BY GROUPING SETS
(
(id, YEAR(datek)),
id,
YEAR(datek),
()
);
It gives me this output:
1 NULL 4
2 NULL 11
3 NULL 6
NULL NULL 21
1 2006 36
2 2006 56
3 2006 51
NULL 2006 143
1 2007 130
2 2007 143
3 2007 125
NULL 2007 398
1 2008 79
2 2008 116
3 2008 73
NULL 2008 268
NULL NULL 830
1 NULL 249
2 NULL 326
3 NULL 255
What I need to do is write it without "grouping sets" (nor cube or rollup) but with the same result. I thought about writing three different queries and join them with "union". I try something like "null" in group by settings but it does not work.
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, null
order by id, YEAR(datek)
I also have a question about "PIVOT". What kind of syntax can replace query with "PIVOT"?
Thanks for your time and all the answers!
You are right in that you need separate queries, although you actually need 4, and rather than GROUP BY NULL, just group by the columns in the corresponding grouping set, and replace the column in the SELECT with NULL:
SELECT id, YEAR(datek) AS rok, COUNT(*) AS NUM
FROM Orders
GROUP BY id, YEAR(datek)
UNION ALL
SELECT id, NULL, COUNT(*) AS NUM
FROM Orders
GROUP BY id
UNION ALL
SELECT NULL, YEAR(datek), COUNT(*) AS NUM
FROM Orders
GROUP BY YEAR(datek)
UNION ALL
SELECT NULL, NULL, COUNT(*) AS NUM
FROM Orders
ORDER BY ID, Rok
With regard to a replacement for PIVOT I think the best alternative is to use a conditional aggregate, e.g. instead of:
SELECT pvt.SomeGroup,
pvt.[A],
pvt.[B],
pvt.[C]
FROM T
PIVOT (SUM(Val) FOR Col IN ([A], [B], [C])) AS pvt;
You would use:
SELECT T.SomeGroup,
[A] = SUM(CASE WHEN T.Col = 'A' THEN T.Val ELSE 0 END),
[B] = SUM(CASE WHEN T.Col = 'B' THEN T.Val ELSE 0 END),
[C] = SUM(CASE WHEN T.Col = 'C' THEN T.Val ELSE 0 END)
FROM T
GROUP BY T.SomeGroup;

Counting number of positive value in a query

I'm working on the following query and table
SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf, date_dim dd
WHERE dd.date_id = sf.date_id
AND dd.week_number_overall BETWEEN 88-2 AND 88
AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
ORDER BY dd.actual_date ASC;
ACTUAL_DATE WEEK_NUMBER_OVERALL BRANCH_ID TARGETREACH
----------- ------------------- ---------- -----------
13/08/14 86 1 -11
14/08/14 86 1 12
15/08/14 86 1 11.8
16/08/14 86 1 1.4
17/08/14 86 1 -0.2
19/08/14 86 1 7.2
20/08/14 87 1 16.6
21/08/14 87 1 -1.4
22/08/14 87 1 14.4
23/08/14 87 1 2.8
24/08/14 87 1 18
26/08/14 87 1 13.4
27/08/14 88 1 -1.8
28/08/14 88 1 10.6
29/08/14 88 1 7.2
30/08/14 88 1 14
31/08/14 88 1 9.6
02/09/14 88 1 -3.2
the "TargetReach" column shows whether target has been reach or not.
A negative value means target wasn't reached on that day.
How can I get calculate the number of ROW with positive value for this query?
that will show something like:
TOTAL_POSITIVE_TARGET_REACH WEEK_NUMBER_OVERALL
--------------------------- ------------------
13 88
I have tried to use CASE but still not working right.
Thanks a lot.
You want to use conditional aggregation:
with t as (
<your query here>
)
select week_number_overall, sum(case when targetreach > 0 then 1 else 0 end)
from t
group by week_number_overall;
However, I would rewrite your original query to use proper join syntax. Then the query would look like:
SELECT week_number_overall,
SUM(CASE WHEN targetreach > 0 THEN 1 ELSE 0 END)
FROM (SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf JOIN
date_dim dd
ON dd.date_id = sf.date_id
WHERE dd.week_number_overall BETWEEN 88-2 AND 88 AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
) t
GROUP BY week_number_overall
ORDER BY week_number_overall;
THe difference between a CTE (the first solution) and a subquery is (in this case) just a matter of preference.
SELECT WEEK_NUMBER_OVERALL, COUNT(*) TOTAL_POSITIVE_TARGET_REACH
FROM (your original query)
WHERE TARGETREACH >= 0
GROUP BY WEEK_NUMBER_OVERALL
select sum( decode( sign( TARGETREACH ) , -1 , 0 , 0 , 0 , 1 , 1 ) )
from ( "your query here" );
Use HAVING Clause
SELECT dd.actual_date, dd.week_number_overall, sf.branch_id, AVG(sf.overtarget_qnt) AS targetreach
FROM sales_fact sf, date_dim dd
WHERE dd.date_id = sf.date_id
AND dd.week_number_overall BETWEEN 88-2 AND 88
AND sf.branch_id = 1
GROUP BY dd.actual_date, branch_id, dd.week_number_overall
HAVING AVG(sf.overtarget_qnt)>0
ORDER BY dd.actual_date ASC;
Using decode(), sign() get both positive count & negative count.
drop table test;
create table test (
key number(5),
value number(5));
insert into test values ( 1, -9 );
insert into test values ( 2, -8 );
insert into test values ( 3, 10 );
insert into test values ( 4, 12 );
insert into test values ( 5, -9 );
insert into test values ( 6, 8 );
insert into test values ( 7, 51 );
commit;
select sig , count ( sig ) from
(
select key, ( (decode( sign( value ) , -1 , '-ve' , 0 , 'zero' , 1 , '+ve' ) ) ) sig
from test
)
group by sig
SIG COUNT(SIG)
---- ----------------------
+ve 4
-ve 3

Aggregate within a group of unchanged values

I have sample data:
RowId TypeId Value
1 1 34
2 1 53
3 1 34
4 2 43
5 2 65
6 16 54
7 16 34
8 1 45
9 6 43
10 6 34
11 16 64
12 16 63
I want to count row for each type (The Value does not matter to me), but only for... neighbor TypeId
TypeId Count
1 3
2 2
16 2
1 1
6 2
16 2
How to achieve this result?
This should give you COUNT of rows within a group of unchanged values:
SELECT TypeId, grp, COUNT(*) FROM (
SELECT RowId, TypeId , Value, gap, SUM(gap) over (ORDER BY RowId ) grp
FROM (SELECT RowId, TypeId , Value,
CASE WHEN TypeId = lag(TypeId) over (ORDER BY RowId )
THEN 0
ELSE 1
END gap
FROM dummy
) t
) tt
GROUP BY TypeId, grp;
If you prefer WITH over endless sub-query inclusions:
WITH dummy_with_groups AS (
SELECT RowId, TypeId , Value, SUM(gap) OVER (ORDER BY RowId) grp
FROM (SELECT RowId, TypeId , Value,
CASE WHEN TypeId = lag(TypeId) OVER (ORDER BY RowId)
THEN 0 ELSE 1 END gap
FROM dummy) t
)
SELECT TypeId, COUNT(*) as Result
FROM dummy_with_groups
GROUP BY TypeId, grp;
http://www.sqlfiddle.com/#!6/f16e9/34
Check this fiddle demo. I have renamed your columns a little.
WITH myCTE AS
(SELECT row_id,
type_id,
ROW_NUMBER () OVER (PARTITION BY type_id ORDER BY row_id)
AS cnt,
CASE LEAD (type_id) OVER (ORDER BY row_id)
WHEN type_id THEN 0
ELSE 1
END
AS show
FROM dummy),
innerQuery AS
(SELECT row_id, type_id, cnt
FROM myCTE
WHERE show = 1)
SELECT iq1.type_id, iq1.cnt - ISNULL (iq2.cnt, 0) CNT
FROM innerQuery iq1
LEFT OUTER JOIN innerQuery iq2
ON iq1.type_id = iq2.type_id
AND EXISTS
(SELECT 1
FROM innerQuery iq3
WHERE iq3.type_id = iq1.type_id
AND iq3.row_id < iq1.row_id
HAVING MAX (iq3.row_id) = iq2.row_id)
The output is exactly as expected.