I have to populate a teradata table from another source where that can be simplify like that:
+------+------+------------+------------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------------+------------+
| 1234 | 0 | 01/01/2009 | 01/04/2019 |
| 1234 | 3 | 01/01/2010 | 01/05/2020 |
| 2345 | 1 | 20/02/2013 | 01/04/2019 |
| 2345 | 0 | 20/02/2013 | 01/04/2018 |
| 2345 | 2 | 31/01/2009 | 01/04/2017 |
| 3456 | 0 | 01/01/2009 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/05/2017 |
| 3456 | 3 | 01/01/2015 | 01/04/2019 |
+------+------+------------+------------+
Col1 is duplicated in source so we have rules to select the right row (with col1 unique in final result)
For if value in col1 :
If value is duplicated then select the most recent date in Col3
If (and only if) it is still duplicated then select row with col2=1
If still duplicated then select most recent date in col4.
Considering the the previous table we should get the following result :
+------+------+------------+------------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------------+------------+
| 1234 | 3 | 01/01/2010 | 01/05/2020 |
| 2345 | 1 | 20/02/2013 | 01/04/2019 |
| 3456 | 1 | 01/01/2015 | 01/04/2019 |
+------+------+------------+------------+
I start using partition by to group each value occurrences in col 3 but i have no good idea on how to apply the conditions for each partion in a sql query
Thank you for your help
You can use QUALIFY in Teradata to simplify the syntax:
SELECT col1, col2, col3, col4
FROM mytable
QUALIFY ROW_NUMBER() OVER(
PARTITION BY col1 -- Group rows by "col1" values
ORDER BY col3 DESC, CASE WHEN col2 = 1 THEN 1 ELSE 2 END, col4 DESC -- Order rows
) = 1 -- Get "first" row in each group
Otherwise, this is the same as the answer above.
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by col1
order by col3 desc,
(case when col2 = 1 then 1 else 2 end),
col4 desc
) as seqnum
from t
) t
where seqnum = 1;
Related
I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.
Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;
I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;
In my query, I am doing multiple types of ranking and for one of ranking types, I want to rank the row only if certain column is not null. Else I don't want ranking to happen.
For example here's a sample table:
+------+------------+------------+--------+--------+
| col1 | col2 | col3 | rank 1 | rank 2 |
+------+------------+------------+--------+--------+
| a | 2018-01-20 | 2018-03-04 | 2 | 2 |
| a | 2018-01-24 | 2018-04-04 | 1 | 1 |
| b | 2018-01-02 | 2018-05-03 | 1 | 1 |
| c | 2017-01-02 | 2017-05-08 | 3 | 2 |
| d | 2016-05-24 | null | 1 | null |
| c | 2018-02-05 | 2018-05-03 | 2 | 1 |
| c | 2018-07-28 | null | 1 | null |
+------+------------+------------+--------+--------+
rank1 is calculated alright based on partition by col1 order by col2 desc
rank 2 should be calculated the same way, but only when when col3 is null, else it should be null.
How can I achieve both ranks in a single query? I tried to use case statement for rank2, but it skips the ranking when col3 is null,
If I understand corrcly, you can try to use CASE WHEN with sum window function
CASE WHEN check col3 isn't null do accumulate else display NULL
CREATE TABLE T(
col1 VARCHAR(5),
col2 DATE,
col3 DATE
);
INSERT INTO T VALUES ( 'a' , to_date('2018-01-20','YYYY-MM-DD') , to_date('2018-03-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'a' , to_date('2018-01-24','YYYY-MM-DD') , to_date('2018-04-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'b' , to_date('2018-01-02','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , to_date('2017-01-02','YYYY-MM-DD') , to_date('2017-05-08','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'd' , TO_DATE('2016-05-24','YYYY-MM-DD') , null);
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-02-05','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-07-28','YYYY-MM-DD') , null);
Query 1:
select t1.*,
rank() OVER(partition by col1 order by col2 desc) rank1,
(CASE WHEN COL3 IS NOT NULL THEN
SUM(CASE WHEN COL3 IS NOT NULL THEN 1 ELSE 0 END) OVER(partition by col1 order by col2 desc)
ELSE
NULL
END) rank2
FROM T t1
Results:
| COL1 | COL2 | COL3 | RANK1 | RANK2 |
|------|----------------------|----------------------|-------|--------|
| a | 2018-01-24T00:00:00Z | 2018-04-04T00:00:00Z | 1 | 1 |
| a | 2018-01-20T00:00:00Z | 2018-03-04T00:00:00Z | 2 | 2 |
| b | 2018-01-02T00:00:00Z | 2018-05-03T00:00:00Z | 1 | 1 |
| c | 2018-07-28T00:00:00Z | (null) | 1 | (null) |
| c | 2018-02-05T00:00:00Z | 2018-05-03T00:00:00Z | 2 | 1 |
| c | 2017-01-02T00:00:00Z | 2017-05-08T00:00:00Z | 3 | 2 |
| d | 2016-05-24T00:00:00Z | (null) | 1 | (null) |
I think you might want:
select count(col3) over (partition by col1 order by col2 desc)
Note that this is equivalent to row_number() rather than rank(). For your data these are equivalent.
I am stuck in similar situation as this.
I have multiple columns with different types of data, and I want to select all columns but group by it with only one column.
My Table:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 1 | 1 | abcd | 100 | www.google.com |
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 4 | 3 | asda3 | 78 | www.imdb.com |
| 5 | 4 | zsdvf4 | 65 | www.youtube.com |
| 6 | 5 | sdf4 | 101 | www.ymail.com |
| 7 | 5 | ssdfsd | 200 | www.gmail.com |
| 8 | 1 | zxcgdf4 | 200 | www.club.com |
| 9 | 6 | yujhgj | 202 | www.thunderbird.com |
+--------+----------+----------+-------+-----------------------+
After reading the solution provided there, what I understood is to use aggregate function so my query is like:
select MIN(b_group),id,col2,col3,col4 from myTable where col3='200' group by id,col2,col3,col4;
But this is not working in my case, it is giving all the records where col3=200.
My desired Output:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 6 | 5 | sdf4 | 200 | www.ymail.com |
+--------+----------+----------+-------+-----------------------+
I don't care which record is picked, order don't matter.
I just want to select all columns with group by only one.
By applying a group by clause, you get a result row per unique combination of all the columns in it (in this case, per unique combination of id, col2, col3, and col4). Instead, you could use the row_number window function to number rows per b_group, and then select just the (arbitrary) first of each group:
SELECT id, b_group, col2, col3, col4
FROM (SELECT id, b_group, col2, col3, col4,
ROW_NUMBER() OVER (PARTITION BY b_group ORDER BY 1) AS rn
FROM mytable
WHERE col3 = 200)
WHERE rn = 1
I have a table like
| ID | COL1 | COL2 |
| 1 | 1 | w |
| 1 | 2 | x |
| 2 | 1 | y |
| 2 | 2 | z |
When I query it, I'd like to get
| ID | COL2:1 | COL2:2 | <--- (when COL1=1 and COL1 =2)
| 1 | w | x |
| 2 | y | z |
I've tried GROUP BY and JOIN for the same table but I get duplicates and not grouped data. I need some pointers for how to get the results I'm expecting.
You can use MAX() and a CASE statement for this:
SELECT ID
,MAX(CASE WHEN Col1 = 1 THEN Col2 END) AS Col2_1
,MAX(CASE WHEN Col1 = 2 THEN Col2 END) AS Col2_2
FROM YourTable
GROUP BY ID
Demo: SQL Fiddle
I have the table like this :
| Col1 | Col2 | col3 |
|:-----------|------------:|:------------:|
| type1 | 1 | aaaa |
| type3 | 101 | bbbb |
| type2 | 21 | cccc |
| type1 | 2 | aaa |
| type2 | 22 | bbb |
| type3 | 102 | ccc |
| type1 | 3 | aaax |
| type2 | 23 | bbbx |
| type3 | 103 | cccx |
I need output in following way...
| Col1 | Col2 | col3 |
|:-----------|------------:|:------------:|
| type1 | 1 | aaaa |
| type1 | 2 | aaa |
| type1 | 3 | aaax |
|
| type2 | 21 | cccc |
| type2 | 22 | bbb |
| type2 | 23 | bbbx |
|
| type3 | 101 | bbbb |
| type3 | 102 | ccc |
| type3 | 103 | cccx |
Please find some way to get such kind of output
And i have lot of records in this table but i need to get top 5 of each TYPE in same order.....
Try:
SELECT Col1,
Col2,
Col3
FROM(
SELECT
Col1,
Col2,
Col3,
ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col1, Col2, Col3) RNum
FROM YourTable
)X WHERE RNum<=5
Use the ORDER BY function
SELECT COL1, COL2, COL3
FROM MyTable ORDER BY COl1, COl2, COl3
You don't need to group this data because GROUP BY needs to aggregate some columns. but instead, use only ORDER BY clause.
SELECT *
FROM tableName
ORDER BY Col1 ASC, Col2 ASC, Col3 ASC