Partition by multiple columns written in rows from another table - sql

Let's say I have this table:
+------+-------+-------+-------+-------+--------+
| User | Value | Rule1 | Rule2 | Rule3 | Rule4 |
+------+-------+-------+-------+-------+--------+
| 1 | 10 | 20 | 14 | 15 | 22 |
| 2 | 5 | 20 | 7 | 8 | 25 |
+------+-------+-------+-------+-------+--------+
I want to do a partition by query, like that:
SELECT sum(Value) OVER (PARTITION BY Rule1, Rule2)
FROM MY_TABLE
but i don't want to write "Rule1, Rule2". I want to read a table like
+----+-------+
| ID | Rules |
+----+-------+
| 1 | Rule1 |
| 1 | Rule2 |
| 2 | Rule1 |
| 2 | Rule2 |
| 2 | Rule3 |
| 3 | Rule2 |
| 3 | Rule3 |
| 3 | Rule4 |
+----+-------+
So i can write something like that:
SELECT sum(Value) OVER (PARTITION BY "all rules separated by comma where ID = 1")
FROM MY_TABLE
Is it possible? Someone has a better alternative?
Thanks in advance!

Is this what you want?
select t.*,
sum(value) over
(partition by (case when exists (select 1 from tablelikethis tlt where tlt.id = 1 and tlt.rules = 'rule1') then t.rule1 end),
(case when exists (select 1 from tablelikethis tlt where tlt.id = 1 and tlt.rules = 'rule2') then t.rule2 end),
(case when exists (select 1 from tablelikethis tlt where tlt.id = 1 and tlt.rules = 'rule3') then t.rule3 end),
(case when exists (select 1 from tablelikethis tlt where tlt.id = 1 and tlt.rules = 'rule4') then t.rule4 end)
)
from my_table t

Related

Each rows to column values

I'm trying to create a view that shows first table's columns plus second table's first 3 records sorted by date in 1 row.
I tried to select specific rows using offset from sub table and join to main table, but when joining query result is ordered by date, without
WHERE tblMain_id = ..
clause in joining SQL it returns wrong record.
Here is sqlfiddle example: sqlfiddle demo
tblMain
| id | fname | lname | salary |
+----+-------+-------+--------+
| 1 | John | Doe | 1000 |
| 2 | Bob | Ross | 5000 |
| 3 | Carl | Sagan | 2000 |
| 4 | Daryl | Dixon | 3000 |
tblSub
| id | email | emaildate | tblmain_id |
+----+-----------------+------------+------------+
| 1 | John#Doe1.com | 2019-01-01 | 1 |
| 2 | John#Doe2.com | 2019-01-02 | 1 |
| 3 | John#Doe3.com | 2019-01-03 | 1 |
| 4 | Bob#Ross1.com | 2019-02-01 | 2 |
| 5 | Bob#Ross2.com | 2018-12-01 | 2 |
| 6 | Carl#Sagan.com | 2019-10-01 | 3 |
| 7 | Daryl#Dixon.com | 2019-11-01 | 4 |
View I am trying to achieve:
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 | John#Doe3.com | 2019-01-03 |
View I have created
| id | fname | lname | salary | email_1 | emaildate_1 | email_2 | emaildate_2 | email_3 | emaildate_3 |
+----+-------+-------+--------+---------+-------------+---------------+-------------+---------------+-------------+
| 1 | John | Doe | 1000 | (null) | (null) | John#Doe1.com | 2019-01-01 | John#Doe2.com | 2019-01-02 |
You can use conditional aggregation:
select m.id, m.fname, m.lname, m.salary,
max(s.email) filter (where seqnum = 1) as email_1,
max(s.emailDate) filter (where seqnum = 1) as emailDate_1,
max(s.email) filter (where seqnum = 2) as email_2,
max(s.emailDate) filter (where seqnum = 3) as emailDate_2,
max(s.email) filter (where seqnum = 3) as email_3,
max(s.emailDate) filter (where seqnum = 3) as emailDate_3
from tblMain m left join
(select s.*,
row_number() over (partition by tblMain_id order by emailDate desc) as seqnum
from tblsub s
) s
on s.tblMain_id = m.id
where m.id = 1
group by m.id, m.fname, m.lname, m.salary;
Here is a SQL Fiddle.
Here is a solution that should get you what you expect.
This works by first ranking records within each table and joining them together. Then, the outer query uses aggregation to generate the expected output.
This solution will work even if the first record in the main table does not have id 1. Also filtering takes occurs within the JOINs, so this should be quite efficient.
SELECT
m.id,
m.fname,
m.lname,
m.salary,
MAX(CASE WHEN s.rn = 1 THEN s.email END) email_1,
MAX(CASE WHEN s.rn = 1 THEN s.emaildate END) email_date1,
MAX(CASE WHEN s.rn = 2 THEN s.email END) email_2,
MAX(CASE WHEN s.rn = 2 THEN s.emaildate END) email_date2,
MAX(CASE WHEN s.rn = 3 THEN s.email END) email_3,
MAX(CASE WHEN s.rn = 3 THEN s.emaildate END) email_date3
FROM
(
SELECT m.*, ROW_NUMBER() OVER(ORDER BY id) rn
FROM tblMain
) m
INNER JOIN (
SELECT
email,
emaildate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY emaildate) rn
FROM tblSub
) s
ON m.id = s.tblmain_id
AND m.rn = 1
AND s.rn <= 3
GROUP BY
m.id,
m.fname,
m.lname,
m.salary

How to get max sequence number?

This is a continuation of my previous post : here
I have this query:
SELECT INVOICE_NUMBER, INVOICE_SEQ_NUMBER, FILE_NUMBER
FROM (SELECT A.INVOICE_NUMBER, A.INVOICE_SEQ_NUMBER, B.FILE_NUMBER,
DENSE_RANK() OVER (ORDER BY A.INVOICE_NUMBER) as seqnum
FROM TABLE1 A JOIN
TABLE2 B
ON A.INVOICE_NUMBER = B.INVOICE_NUMBER AND
A.INVOICE_SEQ_NUMBER = B.INVOICE_SEQ_NUMBER
) t
WHERE seqnum <= 3;
And this result:
-----------------------------------------------------
| INVOICE_NUMBER | INVOICE_SEQ_NUMBER | FILE_NUMBER |
------------------------------------------------------
|1111111111-1 | 1 | P4324324525 |
-----------------------------------------------------
|1111111111-1 | 2 | P4565674574 |
-----------------------------------------------------
|1111111111-1 | 3 | V4324552557 |
-----------------------------------------------------
|1111111111-1 | 4 | V4324552525 |
-----------------------------------------------------
|2222222222-2 | 1 | S4563636574 |
-----------------------------------------------------
|3333333333-3 | 1 | Q4324325675 |
-----------------------------------------------------
|3333333333-3 | 2 | Q4565674574 |
-----------------------------------------------------
So the new requirement is how do I get the maximum invoice sequence number for the same invoice number? The result should be like this:
------------------------------------------------------------------------
| INVOICE_NUMBER | INVOICE_SEQ_NUMBER | FILE_NUMBER |MAX_INV_SEQ_NUMBER|
------------------------------------------------------------------------
|1111111111-1 | 1 | P4324324525 | 4 |
------------------------------------------------------------------------
|1111111111-1 | 2 | P4565674574 | 4 |
------------------------------------------------------------------------
|1111111111-1 | 3 | V4324552557 | 4 |
------------------------------------------------------------------------
|1111111111-1 | 4 | V4324552525 | 4 |
------------------------------------------------------------------------
|2222222222-2 | 1 | S4563636574 | 1 |
------------------------------------------------------------------------
|3333333333-3 | 1 | Q4324325675 | 2 |
------------------------------------------------------------------------
|3333333333-3 | 2 | Q4565674574 | 2 |
------------------------------------------------------------------------
SELECT INVOICE_NUMBER, INVOICE_SEQ_NUMBER, FILE_NUMBER, MAX(INVOICE_SEQ_NUMBER) OVER (PARTITION BY INVOICE_NUMBER)
FROM (SELECT A.INVOICE_NUMBER, A.INVOICE_SEQ_NUMBER, B.FILE_NUMBER,
DENSE_RANK() OVER (ORDER BY A.INVOICE_NUMBER) as seqnum
FROM TABLE1 A JOIN
TABLE2 B
ON A.INVOICE_NUMBER = B.INVOICE_NUMBER AND
A.INVOICE_SEQ_NUMBER = B.INVOICE_SEQ_NUMBER
) t
WHERE seqnum <= 3;
Essentially, you just need this in your select statement:
MAX(INVOICE_SEQ_NUMBER) OVER (PARTITION BY INVOICE_NUMBER)
Add the following expression to the select list:
, max(INVOICE_SEQ_NUMBER) over (partition by INVOICE_NUMBER) as MAX_INV_SEQ_NUMBER.
Add an extra column in the selection part as below-
SELECT
INVOICE_NUMBER,
INVOICE_SEQ_NUMBER,
FILE_NUMBER,
(
SELECT COUNT(*)
FROM TABLE1 A
JOIN TABLE2 B
ON A.INVOICE_NUMBER = B.INVOICE_NUMBER
AND A.INVOICE_SEQ_NUMBER = B.INVOICE_SEQ_NUMBER
AND A.INVOICE_NUMBER = t.INVOICE_NUMBER
)MAX_INV_SEQ_NUMBER
FROM ........

Oracle - Select row where desired column contains only one specific type of data

I've two Table
Table 1
+--------+--------+
| LC | STATUS |
+--------+--------+
| 010051 | 6 |
+--------+--------+
| 010071 | 2 |
+--------+--------+
| 010048 | 2 |
+--------+--------+
| 010113 | 2 |
+--------+--------+
| 010125 | 2 |
+--------+--------+
Table 2
+--------+-------------+-----------+------------+--------+
| LC | BILL | LAST_BILL | PAYMENT_BY | STATUS |
+--------+-------------+-----------+------------+--------+
| 010125 | BILL/17/001 | 0 | C | 6 |
+--------+-------------+-----------+------------+--------+
| 010125 | BILL/17/002 | 0 | I | 1 |
+--------+-------------+-----------+------------+--------+
| 010125 | BILL/17/003 | 0 | F | 1 |
+--------+-------------+-----------+------------+--------+
| 010125 | BILL/17/004 | 0 | C | 6 |
+--------+-------------+-----------+------------+--------+
| 010113 | BILL/17/005 | 0 | C | 6 |
+--------+-------------+-----------+------------+--------+
| 010113 | BILL/17/006 | 0 | I | 1 |
+--------+-------------+-----------+------------+--------+
| 010048 | BILL/17/007 | 0 | C | 6 |
+--------+-------------+-----------+------------+--------+
| 010071 | BILL/17/008 | 0 | C | 6 |
+--------+-------------+-----------+------------+--------+
Where I just want to get the LC whose PAYMENT_BY is 'C', but others who have 'C' value and other than 'C' value, I don't want to get this LC.
I've try following query, but I think there's have expert who can done it in better way or most tuning way.
SELECT LC
FROM (SELECT T1.LC
FROM TABLE1 T1, TABLE2 T2
WHERE T1.STATUS = 2
AND T1.LC = T2.LC
AND T2.PAYMENT_BY = 'C'
AND LAST_BILL = 0
AND T2.STATUS = 6
MINUS
SELECT T1.LC
FROM TABLE1 T1, TABLE2 T2
WHERE T1.STATUS = 2
AND T1.LC = T2.LC
AND T2.PAYMENT_BY = 'I'
AND LAST_BILL = 0)
Query/Expected Result:
+--------+
| LC |
+--------+
| 010048 |
+--------+
| 010071 |
+--------+
You can do it with NOT EXISTS:
select t2.lc from table2 t2
where
t2.payment_by = 'C'
and
not exists (
select lc from table2
where lc = t2.lc and payment_by <> 'C'
)
If you want all the columns of table2, then:
select t2.* from table2 t2
..........................
select t.lc,
count(case when t.payment_by = 'C' THEN 1 else NULL end ) as count_c,
count(case when t.payment_by <> 'C' THEN 1 else NULL end ) as count_not_c
from table2 t
group by t.lc
having count(case when t.payment_by <> 'C' THEN 1 else NULL end ) < 1
demo
If I understand correctly, I think group by and having is the simplest query:
select t2.lc
from table2 t2
group by t2.lc
having min(t2.payment_by) = 'C' and max(t2.payment_by) = 'C';
This also has the advantage of returning each lc exactly once.

Best Hive SQL query for this

i have 2 table something like this. i'm running a hive query and windows function seems pretty limited in hive.
Table dept
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
Table time (build with heavy load query so it's make a very slow process if i need to join to another newly created table time.)
id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |
i need to retrieve something like this :
SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1
How the most optimized answer ?
GROUP BY operation that will be done in a single map-reduce job
select id
,max(name) as name
,max(case when first = 1 then `date` end) as firstdate
,max(case when last = 1 then `date` end) as lastdate
from (select id
,null as name
,`date`
,first
,last
from time
where first = 1
or last = 1
union all
select id
,name
,null as `date`
,null as first
,null as last
from dept
) t
group by id
;
+----+------+------------+------------+
| id | name | firstdate | lastdate |
+----+------+------------+------------+
| 1 | a | 1992-01-01 | 1992-01-01 |
| 2 | b | 1993-02-02 | 1993-03-03 |
| 3 | c | 1993-01-01 | 1995-01-01 |
| 4 | d | (null) | (null) |
+----+------+------------+------------+
select d.id
,max(d.name) as name
,max(case when t.first = 1 then t.date end) as 'firstdate'
,max(case when t.last = 1 then t.date end) as 'lastdate'
from dept d left join
time t on d.id = t.id
where t.first = 1 or t.last = 1
group by d.id

Merging multiple rows according to an order

Suppose there are the following rows
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Alpha | Young | RUNNING |
| 1 | Beta | | STOPPED |
| 1 | Gamma | Foo | READY |
| 1 | Zeta | Zatta | |
| 2 | Guu | Niim | RUNNING |
| 2 | Yuu | Jaam | STOPPED |
| 2 | Nuu | | READY |
| 2 | Faah | Siim | |
| 3 | Iem | | RUNNING |
| 3 | Nyt | Fish | READY |
| 3 | Qwe | Siim | |
We want to merge these rows according to following priority :
STOPPED > RUNNING > READY > (null or empty)
If a row has a value for greatest priority, then value from that row should be used (only if it is not null). If it is null, a value from any other row should be used. The rows should be grouped by id
The correct output for the above input is :
| Id | MachineName | WorkerName | MachineState |
|----------------------------------------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
What would be a good sql query to accomplish this? I tried using joins, but it did not work out.
You can view this as a case of the group-wise maximum problem, provided you can obtain a suitable ordering over your MachineState column—e.g. by using a CASE expression:
SELECT a.Id,
COALESCE(a.MachineName, t.MachineName) MachineName,
COALESCE(a.WorkerName , t.WorkerName ) WorkerName,
a.MachineState
FROM myTable a JOIN (
SELECT Id,
MIN(MachineName) AS MachineName,
MIN(WorkerName ) AS WorkerName,
MAX(CASE MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END) AS MachineState
FROM myTable
GROUP BY Id
) t ON t.Id = a.Id AND t.MachineState = CASE a.MachineState
WHEN 'READY' THEN 1
WHEN 'RUNNING' THEN 2
WHEN 'STOPPED' THEN 3
END
See it on sqlfiddle:
| id | machinename | workername | machinestate |
|----|-------------|------------|--------------|
| 1 | Beta | Foo | STOPPED |
| 2 | Yuu | Jaam | STOPPED |
| 3 | Iem | Fish | RUNNING |
You could save yourself the pain of using CASE if MachineState was an ENUM type column (defined in the appropriate order). It so happens in this case that a simple lexicographic ordering over the string value will yield the same result, but that's a coincidence on which you really shouldn't rely as it's bound to slip under the radar when someone tries to maintain this code in the future.
This is a prioritization query. One method uses variables. Another uses union all . . . this works if the states are not repeated for a given id:
select t.*
from table t
where machinestate = 'STOPPED'
union all
select t.*
from table t
where machinestate = 'RUNNING' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED'))
union all
select t.*
from table t
where machinestate = 'READY' and
not exists (select 1 from table t2 where t2.id = t.id and t2.machinestate in ('STOPPED', 'RUNNING'));
change MachineState as enum:
`MachineState` enum('READY','RUNNING','STOPPED') DEFAULT NULL
and sql is simple:
select t.id,state.machinename,state.workername,t.mstate from state,(select id,max(MachineState) mstate from state group by Id) t where t.mstate=state.machinestate and t.id=state.id;