Sql: Join separately ordered tables - sql

Let's assume I have two sets of events:
Foo
Bar
where I would always expect Bar to follow Foo: Foo -> Bar. I have a table of Foo values:
|----|---------------|------|
| id | ordering-foo | other|
|----|---------------|------|
|1 |1 |X |
|1 |2 |Y |
|----|---------------|------|
|2 |1 |X |
|----|---------------|------|
|3 |2 |X |
|----|---------------|------|
|4 |1 |X |
|4 |2 |Y |
|----|---------------|------|
the ordering field indicates the order at which the Foo events happened per id.
I also have a set of Bar events:
|----|---------------|-------|
| id | ordering_bar | other |
|----|---------------|-------|
|1 |A |XX |
|1 |B |YY |
|----|---------------|-------|
|3 |B |XX |
|----|---------------|-------|
|4 |A |XX |
|----|---------------|-------|
Note that:
while Foo and Bar are both ordered, they don't share the same ordering and we can't simply join them on the said ordering values. Here I have simplified them to numbers vs strings. In the problem that inspired this question, these are the timestamps for each Foo/Bar event respectively, which has the property of foo.ordering < bar.ordering for a Foo->Bar sequence of events, but that's probably not massively helpful to this problem.
The ordering isn't "???", ie just because we have an order entry of 2(B) doesn't mean we'd necessarily have a 1(A) entry. see entries for id: 3
It's possible for us to have a record for Foo but not the subsequent Bar, ie see entries for id: 2, 4
I want to end up with:
|----|----------|-----------|-----------|
| id | ordering | other-foo | other-bar |
| 1 | 1 | X | XX |
| 1 | 2 | Y | YY |
|----|----------|-----------|-----------|
| 2 | 1 | X | null |
|----|----------|-----------|-----------|
| 3 | 2 | X | XX |
|----|----------|-----------|-----------|
| 4 | 1 | X | XX |
| 4 | 2 | Y | null |
|----|----------|-----------|-----------|
How can I get there? In my special case of this problem I only ever have two possible events per event type, per id. ie the ordering values can only ever be: 1,2 / A,B I played around with things like:
case
when count(*) over (partition by foo.id) = 1 and count(*) over (partition by bar.id) = 1 then foo.ordering_foo
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 1 then 1
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and max(bar.ordering_bar) over (partition by bar.id) = bar.ordering_bar then 2
when count(*) over (partition by foo.id) = 2 and count(*) over (partition by bar.id) = 2 and min(bar.ordering_bar) over (partition by bar.odering_bar)= bar.ordering_bar then 1
else -1
end as ordering,
ie, I treat each case of:
1 foo, 1 bar
2 foo, 1 bar
2 foo, 2 bar
separately to com up with a composite order. Tho it is likely error-prone, and most importantly I realise this is:
horrible to read/maintain
not flexible enough.
hard to use to get other fields.
So I'm curious if you could solve this more elegantly in the generic case.

You may join the tables using ROW_NUMBER as the following:
SELECT T.id ,T.ordering_foo, T.other other_foo, D.other other_bar
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_foo) foo_rn
FROM foo
) T
LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering_bar) bar_rn
FROM bar
) D
ON T.ID=D.ID AND T.foo_rn=D.bar_rn
ORDER BY T.id ,T.ordering_foo
See a demo on SQL Server.

Related

SQL DB2 Split result of group by based on count

I would like to split the result of a group by in several rows based on a count, but I don't know if it's possible. For instance, if I have a query like this :
SELECT doc.client, doc.template, COUNT(doc) FROM document doc GROUP BY doc.client, doc.template
and a table document with the following data :
ID | name | client | template
1 | doc_a | a | temp_a
2 | doc_b | a | temp_a
3 | doc_c | a | temp_a
4 | doc_d | a | temp_b
The result for the query would be :
client | template | count
a | temp_a | 3
a | temp_b | 1
But I would like to split a row of the result in two or more if the count is higher than 2 :
client | template | count
a | temp_a | 2
a | temp_a | 1
a | temp_b | 1
Is there a way to do this in SQL ?
You can use RCTE like below. Run this statement AS IS first playing with different values in the last column. Max batch size here is 1000.
WITH
GRP_RESULT (client, template, count) AS
(
-- Place your SELECT ... GROUP BY here
-- instead of VALUES
VALUES
('a', 'temp_a', 4500)
, ('a', 'temp_b', 3001)
)
, T (client, template, count, max_batch_size) AS
(
SELECT client, template, count, 1000
FROM GRP_RESULT
UNION ALL
SELECT client, template, count - max_batch_size, max_batch_size
FROM T
WHERE count > max_batch_size
)
SELECT client, template, CASE WHEN count > max_batch_size THEN max_batch_size ELSE count END count
FROM T
ORDER BY client, template, count DESC
The result is:
|CLIENT|TEMPLATE|COUNT |
|------|--------|-----------|
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |1000 |
|a |temp_a |500 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1000 |
|a |temp_b |1 |
You may place your SELECT ... GROUP BY statement as specified above afterwards to achieve your goal.
You can use window functions and then aggregate:
SELECT client, template, COUNT(*)
FROM (SELECT doc.client, doc.template,
ROW_NUMBER() OVER (PARTITION BY doc.client, doc.template ORDER BY doc.client) - 1 as seqnum,
COUNT(*) OVER (PARTITION BY doc.client, doc.template) as cnt
FROM document doc
) d
GROUP BY doc.client, doc.template, floor(seqnum * n / cnt)
The subquery enumerates the rows. The outer query then splits the rows into groups of two using MOD().

Update a column value within a SELECT query

I have a complicated SQL question.
Can we update a column within a SELECT query? Example:
Consider this table:
|ID |SeenAt |
----------------
|1 |20 |
|1 |21 |
|1 |22 |
|2 |70 |
|2 |80 |
I want a SELECT Query that gives for each ID when was it seen for the first time. And when did it seen 'again':
|ID |Start |End |
---------------------
|1 |20 |21 |
|1 |20 |22 |
|1 |20 |22 |
|2 |70 |80 |
|2 |70 |80 |
First, both columns Start and End would have the same value, but when a second row with the same ID is seen we need to update its predecessor to give End the new SeenAt value.
I succeeded to create the Start column, I give the minimum SeenAt value per ID to all IDs. But I can't find a way to update the End column everytime.
Don't mind the doubles, I have other columns that change in every new row
Also, I am working in Impala but I can use Oracle.
I hope that I have been clear enough. Thank you
You could use lead() and nvl():
select id, min(seenat) over (partition by id) seen_start,
nvl(lead(seenat) over (partition by id order by seenat), seenat) seen_end
from t
demo
Start is easy just the MIN of the GROUP
End you need to find the MIN after the SeenAt and in case you don't find it then the current SeenAt
SQL DEMO
SELECT "ID",
(SELECT MIN("SeenAt")
FROM Table1 t2
WHERE t1."ID" = t2."ID") as "Start",
COALESCE(
(SELECT MIN("SeenAt")
FROM Table1 t2
WHERE t1."ID" = t2."ID"
AND t1."SeenAt" < t2."SeenAt")
, t1."SeenAt"
) as End
FROM Table1 t1
OUTPUT
| ID | START | END |
|----|-------|-----|
| 1 | 20 | 21 |
| 1 | 20 | 22 |
| 1 | 20 | 22 |
| 2 | 70 | 80 |
| 2 | 70 | 80 |
you seem to need min() analytic function with a self-join:
select distinct t1.ID,
min(t1.SeenAt) over (partition by t1.ID order by t1.ID) as "Start",
t2.SeenAt as "End"
from tab t1
join tab t2 on t1.ID=t2.ID and t1.SeenAt<=t2.SeenAt
order by t2.SeenAt;
Demo

Max value from joined table

I have two tables:
Operations (op_id,super,name,last)
Orders (or_id,number)
Operations:
+--------------------------------+
|op_id| super| name | last|
+--------------------------------+
|1 1 OperationXX 1 |
|2 1 OperationXY 2 |
|3 1 OperationXC 4 |
|4 1 OperationXZ 3 |
|5 2 OperationXX 1 |
|6 3 OperationXY 2 |
|7 4 OperationXC 1 |
|8 4 OperationXZ 2 |
+--------------------------------+
Orders:
+--------------+
|or_id | number|
+--------------+
|1 2UY |
|2 23X |
|3 xx2 |
|4 121 |
+--------------+
I need query to get table:
+-------------------------------------+
|or_id |number |max(last)| name |
|1 2UY 4 OperationXC|
|2 23X 1 OperationXX|
|3 xx2 2 OperationXY|
|4 121 2 OperationXZ|
+-------------------------------------+
use corelared subquery and join
select o.*,a.last,a.name from
(
select super,name,last from Operations from operations t
where last = (select max(last) from operations t2 where t2.super=t.super)
) a join orders o on t1.super =o.or_id
you can use row_number as well
with cte as
(
select * from
(
select * , row_number() over(partition by super order by last desc) rn
from operations
) tt where rn=1
) select o.*,cte.last,cte.name from Orders o join cte on o.or_id=cte.super
SELECT Orders.or_id, Orders.number, Operations.name, Operations.last AS max
FROM Orders
INNER JOIN Operations on Operations.super = Orders.or_id
GROUP BY Orders.or_id, Orders.number, Operations.name;
I don't have a way of testing this right now, but I think this is it.
Also, you didn't specify the foreign key, so the join might be wrong.

Better way of writing my SQL query with conditional group by

Here's my data
|vendorname |total|
---------------------
|Najla |10 |
|Disney |20 |
|Disney |10 |
|ToysRus |5 |
|ToysRus |1 |
|Gap |1 |
|Gap |2 |
|Gap |3 |
|Najla |2 |
Here's the resultset I want
|vendorname |grandtotal|
---------------------
|Disney |30 |
|Gap |6 |
|ToysRus |6 |
|Najla |2 |
|Najla |10 |
If the vendorname = 'Najla' I want individual rows with their respective total otherwise I would like to group them and return a sum of their totals.
This is my query--
select *
from
(
select vendorname, sum(total) grandtotal
from vendor
where vendorname<>'Najla'
group by vendorname
union all
select vendorname, total grandtotal
from vendor
where vendorname='Najla'
) A
I was wondering if there's a better way to write this query instead of repeating it twice and performing a union. Is there a condensed way to group some rows "conditionally".
Honestly, I think the union all version is going to be the best performing and easiest to read option if it has appropriate indexes.
You could, however, do something like this (assuming you have a unique id on your table):
select vendorname, sum(total) grandtotal
from t
group by
vendorname
, case when vendorname = 'Najla' then id else null end
rextester demo: http://rextester.com/OGZQ33364
returns
+------------+------------+
| vendorname | grandtotal |
+------------+------------+
| Disney | 30 |
| Gap | 6 |
| ToysRus | 6 |
| Najla | 10 |
| Najla | 2 |
+------------+------------+

SQL Insert Query For Multiple Max IDs

Table w:
|ID|Comment|SeqID|
|1 |bajg | 1 |
|1 |2423 | 2 |
|2 |ref | 1 |
|2 |comment| 2 |
|2 |juk | 3 |
|3 |efef | 1 |
|4 | hy | 1 |
|4 | 6u | 2 |
How do I insert a standard new comment for each ID for a new SeqID (SeqID increase by 1)
The Below query results in the highest SeqID:
Select *
From w
Where SEQID =
(select max(seqid)
from w)
Table w:
|2 |juk | 3 |
Expected Result
Table w:
|ID|Comment|SeqID|
|1 |sqc | 3 |
|2 |sqc | 4 |
|3 |sqc | 2 |
|4 |sqc | 3 |
Will I have to go through and insert all the values (new comment as sqc) I want into the table using the below, or is there a faster way?
INSERT INTO table_name
VALUES (value1,value2,value3,...);
Try this:
INSERT INTO mytable (ID, Comment, SeqID)
SELECT ID, 'sqc', MAX(SeqID) + 1
FROM mytable
GROUP BY ID
Demo here
You are probably better off just calculating the value when you query. Define an identity column on the table, say CommentId and run a query like:
select id, comment,
row_number() over (partition by comment order by CommentId) as SeqId
from t;
What is nice about this approach is that the ids are always sequential, you don't have no opportunities for duplicates, the table does not have to be locked to when inserting, and the sequential ids work even for updates and deletes.