SQL::Self join a table to satisfy a particular condition? - sql

I have the following table:
mysql> SELECT * FROM temp;
+----+------+
| id | a |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+----+------+
I am trying to get the following output:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 4 |
+----+------+------+
but I am having a small problem. I wrote the following query:
mysql> SELECT A.id, A.a, B.a FROM temp A, temp B WHERE B.a>A.a;
but my output is the following:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
+----+------+------+
Can someone tell me how to convert this into the desired output? I am trying to get a form where only the consecutive values are produced. I mean, if 2 is greater than 1 and 3 is greater than 2, I do not want 3 is greater than 1.

Option 1: "Triangular Join" - Quadratic Complexity
SELECT A.id, A.a, MIN(B.a) AS a
FROM temp A
JOIN temp B ON B.a>A.a
GROUP BY A.id, A.a;`
Option 2: "Pseudo Row_Number()" - Linear Complexity
select a_numbered.id, a_numbered.a, b_numbered.a
from
(
select id,
a,
#rownum := #rownum + 1 as rn
from temp
join (select #rownum := 0) r
order by id
) a_numbered join (
select id,
a,
#rownum2 := #rownum2 + 1 as rn
from temp
join (select #rownum2 := 0) r
order by id
) b_numbered
on b_numbered.rn = a_numbered.rn+1

Related

SQL: Get row number which increases every time a value changes

I have the following table in Vertica:
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 1 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 2 |
| c | 3 | 2 |
+----------+----------+----------+
The table is ordered by column_1 and column_3.
I would like to add a row number, which increases every time when column_1 or column_3 change their value. It would look something like this:
+----------+----------+----------+------------+
| column_1 | column_2 | column_3 | row_number |
+----------+----------+----------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 1 | 1 | 2 |
| b | 2 | 1 | 2 |
| b | 3 | 1 | 2 |
| c | 1 | 1 | 3 |
| c | 2 | 1 | 3 |
| c | 3 | 1 | 3 |
| c | 1 | 2 | 4 |
| c | 2 | 2 | 4 |
| c | 3 | 2 | 4 |
+----------+----------+----------+------------+
I tried using partition over but I can't find the right syntax.
Vertica has the CONDITIONAL_CHANGE_EVENT() analytic functions.
It starts at 0, and increments by 1 every time the expression that makes the first argument undergoes a change.
Like so:
WITH
indata(column_1,column_2,column_3,rn) AS (
SELECT 'a',1,1,1
UNION ALL SELECT 'a',2,1,1
UNION ALL SELECT 'a',3,1,1
UNION ALL SELECT 'b',1,1,2
UNION ALL SELECT 'b',2,1,2
UNION ALL SELECT 'b',3,1,2
UNION ALL SELECT 'c',1,1,3
UNION ALL SELECT 'c',2,1,3
UNION ALL SELECT 'c',3,1,3
UNION ALL SELECT 'c',1,2,4
UNION ALL SELECT 'c',2,2,4
UNION ALL SELECT 'c',3,2,4
)
SELECT
*
, CONDITIONAL_CHANGE_EVENT(
column_1||column_3::VARCHAR
) OVER w + 1 AS rownum
FROM indata
WINDOW w AS (ORDER BY column_1,column_3,column_2)
;
-- out column_1 | column_2 | column_3 | rn | rownum
-- out ----------+----------+----------+----+--------
-- out a | 1 | 1 | 1 | 1
-- out a | 2 | 1 | 1 | 1
-- out a | 3 | 1 | 1 | 1
-- out b | 1 | 1 | 2 | 2
-- out b | 2 | 1 | 2 | 2
-- out b | 3 | 1 | 2 | 2
-- out c | 1 | 1 | 3 | 3
-- out c | 2 | 1 | 3 | 3
-- out c | 3 | 1 | 3 | 3
-- out c | 1 | 2 | 4 | 4
-- out c | 2 | 2 | 4 | 4
-- out c | 3 | 2 | 4 | 4
In the absence of an ORDER BY, SQL data sets are unordered. To establish the order in your example therefore, I've assumed the dataset can be sorted with ORDER BY column_1, column_3, column_2
If that assumption doesn't work, you MUST add additional columns that the data can be deterministically sorted by.
That gives the following query...
SELECT
yourTable.*,
DENSE_RANK() OVER (ORDER BY column_1, column_3) AS row_number
FROM
yourTable
ORDER BY
column_1, column_3, column_2
This would also work and doesn't require table sorting
Find distinct value from column_1 and column_3 and give new index for them
Merge the previous with origin table on column_1 and column_3
select t1.*, t2.row_number
from
your_table t1
join
(select column_1, column_2, row_number() over (partition by temp) as row_number from (select distinct column_1, column_2, 1 as temp from your_table) foo) t2
on
t1.column_1=t2.column_1 and t1.column_2=t2.column_2;

Cte within Cte in SQL

I have been encountered with a situation where I need to apply a where, group by condition on the result of CTE in the CTE.
Table 1 as follows
+---+---+---+---+
| x | y | z | w |
+---+---+---+---+
| 1 | 2 | 3 | 1 |
| 2 | 3 | 4 | 2 |
| 3 | 2 | 5 | 3 |
| 1 | 2 | 6 | 2 |
+---+---+---+---+
Table 2 as follows
+---+---+-----+---+
| a | b | c | d |
+---+---+-----+---+
| 1 | m | 100 | 1 |
| 2 | n | 23 | 2 |
| 4 | o | 34 | 4 |
| 1 | m | 23 | 2 |
+---+---+-----+---+
Assuming I have the data of following sql query in a table called TAB
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a and cte1.w=cte2.d
Result of above CTE would be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 2 | 3 | 4 | 2 | n | 23 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I would like to query the following from the table TAB
select * from TAB where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123)
I'm trying to formulate the SQL query as follows but it's not as i expected:
select * from (
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a) as TAB
where ((X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123))
The final result must be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I don't think DB2 allows CTEs in subqueries or to be nested. Why not just write this using another CTE?
with cte as (
select x,y,z from
table1
),
cte1 as (
select a,b,c
from table2
),
tab as (
select cte.x,cte1.y,cte1.z,cte1.w,cte2.b,cte2.c
from cte left join
cte1
on cte.x=cte.a and cte1.w=cte2.d
)
select *
from TAB
where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123);

TSQL - Referencing a changed value from previous row

I am trying to do a row calculation whereby the larger value will carry forward to the subsequent rows until a larger value is being compared. It is done by comparing the current value to the previous row using the lag() function.
Code
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
SELECT id
,d1
,d2 = CASE WHEN id <> (LAG(id,1,0) OVER (ORDER BY id,d1)) THEN d2
WHEN d2 < (LAG(d2,1,0) OVER (ORDER BY id,d1)) THEN (LAG(d2,1,0) OVER (ORDER BY id,d1))
ELSE d2 END
Output (Added row od2 for clarity)
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 4 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 3 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
As you can see from the output it lag function is referencing the original value of the previous row rather than the new value. Is there anyway to achieve this?
Desired Output
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 5 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 4 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
Try this:
SELECT id
,d1
,d2
,MAX(d2) OVER (PARTITION BY ID ORDER BY d1)
FROM #TAB
The idea is to use the MAX to get the max value from the beginning to the current row for each partition.
Thanks for providing the DDL scripts and the DML.
One way of doing it would be using recursive cte as follows.
1. First rank all the records according to id, d1 and d2. -> cte block
2. Use recursive cte and get the first elements using rnk=1
3. the field "compared_val" will check against the values from the previous rnk to see if the value is > than the existing and if so it would swap
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
;with cte
as (select row_number() over(partition by id order by d1,d2) as rnk
,id,d1,d2
from #TAB
)
,data(rnk,id,d1,d2,compared_val)
as (select rnk,id,d1,d2,d2 as compared_val
from cte
where rnk=1
union all
select a.rnk,a.id,a.d1,a.d2,case when b.compared_val > a.d2 then
b.compared_val
else a.d2
end
from cte a
join data b
on a.id=b.id
and a.rnk=b.rnk+1
)
select * from data order by id,d1,d2

group by top two results based on order

I have been trying to get this to work with some row_number, group by, top, sort of things, but I am missing some fundamental concept. I have a table like so:
+-------+-------+-------+
| name | ord | f_id |
+-------+-------+-------+
| a | 1 | 2 |
| b | 5 | 2 |
| c | 6 | 2 |
| d | 2 | 1 |
| e | 4 | 1 |
| a | 2 | 3 |
| c | 50 | 4 |
+-------+-------+-------+
And my desired output would be:
+-------+---------+--------+-------+
| f_id | ord_n | ord | name |
+-------+---------+--------+-------+
| 2 | 1 | 1 | a |
| 2 | 2 | 5 | b |
| 1 | 1 | 2 | d |
| 1 | 2 | 4 | e |
| 3 | 1 | 2 | a |
| 4 | 1 | 50 | c |
+-------+---------+--------+-------+
Where data is ordered by the ord value, and only up to two results per f_id. Should I be working on a Stored Procedure for this or can I just do it with SQL? I have experimented with some select TOP subqueries, but nothing has even come close..
Here are some statements to create the test table:
create table help(name varchar(255),ord tinyint,f_id tinyint);
insert into help values
('a',1,2),
('b',5,2),
('c',6,2),
('d',2,1),
('e',4,1),
('a',2,3),
('c',50,4);
You may use Rank or DENSE_RANK functions.
select A.name, A.ord_n, A.ord , A.f_id from
(
select
RANK() OVER (partition by f_id ORDER BY ord asc) AS "Rank",
ROW_NUMBER() OVER (partition by f_id ORDER BY ord asc) AS "ord_n",
help.*
from help
) A where A.rank <= 2
Sqlfiddle demo

Sql: Aggregation First() After Order by and Group by

id | name | value | time |
--------------------------
1 | A | 1 | 1 |
2 | B | 2 | 2 |
3 | C | 2 | 3 |
4 | A | 3 | 3 |
5 | A | 4 | 2 |
and I expected the result as below:
name | value |
--------------
A | 3 |
B | 2 |
C | 2 |
The results are to show name and value which are lastest time and not duplicate with name.
And I try to query:
SELECT name,First(value)
FROM
(SELECT name,value,time
FROM test
ORDER BY time DESC
)
GROUP BY name;
But I got this result:
name | value |
--------------
A | 1 |
B | 2 |
C | 2 |
I don't understand why A value isn't 3 because from subselect I got A values are 3,4,1 respectively.
Query:
SQLFIDDLEExample
SELECT t.name,
(SELECT t1.value
FROM test t1
WHERE t1.name = t.name
ORDER BY t1.time DESC
LIMIT 1) AS value
FROM test t
GROUP BY t.name
Result:
| NAME | VALUE |
----------------
| A | 3 |
| B | 2 |
| C | 2 |
also you can use partitionby
;with cte as (
select id, row_number() over (order by time desc) rn
from test
)
select * from test
join cte on test.id = cte.id and rn = 1
just choose the one which is faster