SQL DB2 Split result of group by based on count - sql

I would like to split the result of a group by in several rows based on a count, but I don't know if it's possible. For instance, if I have a query like this :
SELECT doc.client, doc.template, COUNT(doc) FROM document doc GROUP BY doc.client, doc.template
and a table document with the following data :
ID | name | client | template
1 | doc_a | a | temp_a
2 | doc_b | a | temp_a
3 | doc_c | a | temp_a
4 | doc_d | a | temp_b
The result for the query would be :
client | template | count
a | temp_a | 3
a | temp_b | 1
But I would like to split a row of the result in two or more if the count is higher than 2 :
client | template | count
a | temp_a | 2
a | temp_a | 1
a | temp_b | 1
Is there a way to do this in SQL ?

You can use RCTE like below. Run this statement AS IS first playing with different values in the last column. Max batch size here is 1000.
WITH
GRP_RESULT (client, template, count) AS
(
-- Place your SELECT ... GROUP BY here
-- instead of VALUES
VALUES
('a', 'temp_a', 4500)
, ('a', 'temp_b', 3001)
)
, T (client, template, count, max_batch_size) AS
(
SELECT client, template, count, 1000
FROM GRP_RESULT
UNION ALL
SELECT client, template, count - max_batch_size, max_batch_size
FROM T
WHERE count > max_batch_size
)
SELECT client, template, CASE WHEN count > max_batch_size THEN max_batch_size ELSE count END count
FROM T
ORDER BY client, template, count DESC
The result is:
|CLIENT|TEMPLATE|COUNT |
|------|--------|-----------|
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |500 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1 |
You may place your SELECT ... GROUP BY statement as specified above afterwards to achieve your goal.

You can use window functions and then aggregate:
SELECT client, template, COUNT(*)
FROM (SELECT doc.client, doc.template,
ROW_NUMBER() OVER (PARTITION BY doc.client, doc.template ORDER BY doc.client) - 1 as seqnum,
COUNT(*) OVER (PARTITION BY doc.client, doc.template) as cnt
FROM document doc
) d
GROUP BY doc.client, doc.template, floor(seqnum * n / cnt)
The subquery enumerates the rows. The outer query then splits the rows into groups of two using MOD().

Related

Sql: Join separately ordered tables

Let's assume I have two sets of events:
Foo
Bar
where I would always expect Bar to follow Foo: Foo -> Bar. I have a table of Foo values:
|----|---------------|------|
| id | ordering-foo | other|
|----|---------------|------|
|1 |1 |X |
|1 |2 |Y |
|----|---------------|------|
|2 |1 |X |
|----|---------------|------|
|3 |2 |X |
|----|---------------|------|
|4 |1 |X |
|4 |2 |Y |
|----|---------------|------|
the ordering field indicates the order at which the Foo events happened per id.
I also have a set of Bar events:
|----|---------------|-------|
| id | ordering_bar | other |
|----|---------------|-------|
|1 |A |XX |
|1 |B |YY |
|----|---------------|-------|
|3 |B |XX |
|----|---------------|-------|
|4 |A |XX |
|----|---------------|-------|
Note that:
while Foo and Bar are both ordered, they don't share the same ordering and we can't simply join them on the said ordering values. Here I have simplified them to numbers vs strings. In the problem that inspired this question, these are the timestamps for each Foo/Bar event respectively, which has the property of foo.ordering < bar.ordering for a Foo->Bar sequence of events, but that's probably not massively helpful to this problem.
The ordering isn't "???", ie just because we have an order entry of 2(B) doesn't mean we'd necessarily have a 1(A) entry. see entries for id: 3
It's possible for us to have a record for Foo but not the subsequent Bar, ie see entries for id: 2, 4
I want to end up with:
|----|----------|-----------|-----------|
| id | ordering | other-foo | other-bar |
| 1 | 1 | X | XX |
| 1 | 2 | Y | YY |
|----|----------|-----------|-----------|
| 2 | 1 | X | null |
|----|----------|-----------|-----------|
| 3 | 2 | X | XX |
|----|----------|-----------|-----------|
| 4 | 1 | X | XX |
| 4 | 2 | Y | null |
|----|----------|-----------|-----------|
How can I get there? In my special case of this problem I only ever have two possible events per event type, per id. ie the ordering values can only ever be: 1,2 / A,B I played around with things like:
case
when count(*) over (partition by foo.id) = 1 and count(*) over (partition by bar.id) = 1 then foo.ordering_foo
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 1 then 1
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and max(bar.ordering_bar) over (partition by bar.id) = bar.ordering_bar then 2
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and min(bar.ordering_bar) over (partition by bar.odering_bar)= bar.ordering_bar then 1
else -1
end as ordering,
ie, I treat each case of:
1 foo, 1 bar
2 foo, 1 bar
2 foo, 2 bar
separately to com up with a composite order. Tho it is likely error-prone, and most importantly I realise this is:
horrible to read/maintain
not flexible enough.
hard to use to get other fields.
So I'm curious if you could solve this more elegantly in the generic case.
You may join the tables using ROW_NUMBER as the following:
SELECT T.id ,T.ordering_foo, T.other other_foo, D.other other_bar
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_foo) foo_rn
FROM foo
) T
LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_bar) bar_rn
FROM bar
) D
ON T.ID=D.ID AND T.foo_rn=D.bar_rn
ORDER BY T.id ,T.ordering_foo
See a demo on SQL Server.

increment a variable foreach record

I have this query :
#create table #tmp_table( n_progressive int , name char(10),
id_numeric(11,0) )
declare #i int = 0 declare #c int declare n_progressive int = 0
declare #var_table table ( name char(10), id_number numeric(11,0) )
insert into #var_table( name, id_number ) select name,id_number from MainTable
select #c= count (*) from #var_table
while(#i<#c) begin set #n_progressive = #n_progressive + 1
insert into #Tmptable( n_progressive , name , id_numeric ) select #n_progressive ,name,id_numeric from #var_table
end
The records in var_table are 4. and for each record I want the n_progressive to be incremented +1.
The result of the query above is this :
+--------------+----------+------------+
|n_progressive | name | numeric_id |
+--------------+----------+------------+
|1 | RM1 | 1 |
|1 | RM2 | 2 |
|1 | RM3 | 3 |
|1 | RM4 | 4 |
|2 | RM1 | 1 |
|2 | RM2 | 2 |
|2 | RM3 | 3 |
|2 | RM4 | 4 |
|3 | RM1 | 1 |
|3 | RM2 | 2 |
|3 | RM3 | 3 |
|3 | RM4 | 4 |
|4 | RM1 | 1 |
|4 | RM2 | 2 |
|4 | RM3 | 3 |
|4 | RM4 | 4 |
+--------------+----------+------------+
What I want is this :
+---------------+----------+-------------+
|n_progressive | name | numeric_id |
+---------------+----------+-------------+
|1 | RM1 | 1 |
|2 | RM2 | 2 |
|3 | RM3 | 3 |
|4 | RM4 | 4 |
+---------------+----------+-------------+
I don't want to use Cursors.
You are selecting all the records from #var_table in each iteration of the loop, that's why you get all the records times 4 (the count of records in #var_table).
However, you don't need a loop at all, and you should strive to avoid loops any time you are using SQL anyway, since SQL works best with a set based approach and not a procedural approach (For more information, read RBAR: ‘Row By Agonizing Row’ and What is RBAR? How can I avoid it?)
Instead of a loop, you can simply use the row_number() window function to get the n_progressive value:
insert into #Tmptable( n_progressive, name, id_numeric)
select row_number() over(order by name), name, id_numeric
from #var_table
You are not restricting the INSERT to read one row from your source table, you're copying the whole table multiple times. To directly fix what you are trying to do, you should do something like this...
while(#i<#c) begin
set #n_progressive = #n_progressive + 1
insert into
#Tmptable( n_progressive , name , id_numeric )
select
#n_progressive, name, id_numeric
from
#var_table
WHERE
id_number = #i -- Only one row
SET #i = #i + 1 -- Move to the next row
end
A better idea could be to use ROW_NUMBER(), avoiding the need for the loop and much of the other boiler plate code.
insert into
#Tmptable( n_progressive , name , id_numeric )
select
ROW_NUMBER() OVER (ORDER BY id_numeric),
name,
id_numeric
from
#var_table
A better idea still could be to use an identity column, and let the table do the number allocations.
create table
#tmp_table(
n_progressive int IDENTITY(1,1),
name char(10) ,
id_ numeric(11,0)
)
insert into #Tmptable(name , id_numeric )
select name, id_numeric
from MainTable
ORDER BY id_numeric
Does this do what you want?
with n as (
select 1 as n
union all
select n + 1
from n
where n < #n_limit
)
select n.n, name + cast(n.n as varchar(255)), n.n as numeric_id
from n
option (maxrecursion 0);

2 column with same ID to 1 row

I have a table with only 2 column which is as follow
|ID | Date |
===================
|1 | 03/04/2017 |
|1 | 09/07/1997 |
|2 | 04/04/2014 |
I want to achieve an end result as follow
|ID | Date 1 |Date 2 |
================================
|1 | 03/04/2017 | 09/07/1997 |
|2 | 04/04/2014 | NULL |
I'm currently reading up on PIVOT function and I'm not sure am I on the right track. Am still new to SQL
A simple pivot query should work here, with a twist. For your ID 2 data, there is only one row, but in this case you want to report a first date and a NULL second date. We can use a CASE expression to handle this case.
SELECT
ID,
MAX(Date) AS date_1,
CASE WHEN COUNT(*) = 2 THEN MIN(Date) ELSE NULL END AS date_2
FROM yourTable
GROUP BY ID
Output:
Demo here:
Rextester
This can be done easily using min/max aggregate function
select Id,min(Date),
case when min(Date)<>max(Date) then max(Date) end
From yourtable
Group by Id
If this will not help you with your original data, then alter sample data and expected result

Better way of writing my SQL query with conditional group by

Here's my data
|vendorname |total|
---------------------
|Najla |10 |
|Disney |20 |
|Disney |10 |
|ToysRus |5 |
|ToysRus |1 |
|Gap |1 |
|Gap |2 |
|Gap |3 |
|Najla |2 |
Here's the resultset I want
|vendorname |grandtotal|
---------------------
|Disney |30 |
|Gap |6 |
|ToysRus |6 |
|Najla |2 |
|Najla |10 |
If the vendorname = 'Najla' I want individual rows with their respective total otherwise I would like to group them and return a sum of their totals.
This is my query--
select *
from
(
select vendorname, sum(total) grandtotal
from vendor
where vendorname<>'Najla'
group by vendorname
union all
select vendorname, total grandtotal
from vendor
where vendorname='Najla'
) A
I was wondering if there's a better way to write this query instead of repeating it twice and performing a union. Is there a condensed way to group some rows "conditionally".
Honestly, I think the union all version is going to be the best performing and easiest to read option if it has appropriate indexes.
You could, however, do something like this (assuming you have a unique id on your table):
select vendorname, sum(total) grandtotal
from t
group by
vendorname
, case when vendorname = 'Najla' then id else null end
rextester demo: http://rextester.com/OGZQ33364
returns
+------------+------------+
| vendorname | grandtotal |
+------------+------------+
| Disney | 30 |
| Gap | 6 |
| ToysRus | 6 |
| Najla | 10 |
| Najla | 2 |
+------------+------------+

SQL Insert Query For Multiple Max IDs

Table w:
|ID|Comment|SeqID|
|1 |bajg | 1 |
|1 |2423 | 2 |
|2 |ref | 1 |
|2 |comment| 2 |
|2 |juk | 3 |
|3 |efef | 1 |
|4 | hy | 1 |
|4 | 6u | 2 |
How do I insert a standard new comment for each ID for a new SeqID (SeqID increase by 1)
The Below query results in the highest SeqID:
Select *
From w
Where SEQID =
(select max(seqid)
from w)
Table w:
|2 |juk | 3 |
Expected Result
Table w:
|ID|Comment|SeqID|
|1 |sqc | 3 |
|2 |sqc | 4 |
|3 |sqc | 2 |
|4 |sqc | 3 |
Will I have to go through and insert all the values (new comment as sqc) I want into the table using the below, or is there a faster way?
INSERT INTO table_name
VALUES (value1,value2,value3,...);
Try this:
INSERT INTO mytable (ID, Comment, SeqID)
SELECT ID, 'sqc', MAX(SeqID) + 1
FROM mytable
GROUP BY ID
Demo here
You are probably better off just calculating the value when you query. Define an identity column on the table, say CommentId and run a query like:
select id, comment,
row_number() over (partition by comment order by CommentId) as SeqId
from t;
What is nice about this approach is that the ids are always sequential, you don't have no opportunities for duplicates, the table does not have to be locked to when inserting, and the sequential ids work even for updates and deletes.