SQL Select for unique value in a column

SQL Select for unique value in a column - sql

I have a table like
| ID | COL1 | COL2 |
| 1 | 1 | w |
| 1 | 2 | x |
| 2 | 1 | y |
| 2 | 2 | z |
When I query it, I'd like to get
| ID | COL2:1 | COL2:2 | <--- (when COL1=1 and COL1 =2)
| 1 | w | x |
| 2 | y | z |
I've tried GROUP BY and JOIN for the same table but I get duplicates and not grouped data. I need some pointers for how to get the results I'm expecting.

You can use MAX() and a CASE statement for this:
SELECT ID
,MAX(CASE WHEN Col1 = 1 THEN Col2 END) AS Col2_1
,MAX(CASE WHEN Col1 = 2 THEN Col2 END) AS Col2_2
FROM YourTable
GROUP BY ID
Demo: SQL Fiddle

Related

How to aggregate from a snowflake pivot where keys have distinct value for each row?

Using snowflake is it possible to pivot table as follows?
I.e. given table A:
+----+-----+-------+
| id | key | value |
+----+-----+-------+
| 1 | k1 | 11 |
| 1 | k2 | 12 |
| 2 | k1 | 21 |
| 2 | k2 | 22 |
| 3 | k2 | 3 |
+----+-----+-------+
returns:
+----+------+----+
| id | k1 | k2 |
+----+------+----+
| 1 | 11 | 12 |
| 2 | 21 | 22 |
| 3 | null | 3 |
+----+------+----+
I suspect the query looks like the following but I am not sure how to aggregate:
select id, distinct(key)
from table_a
pivot table_a on value for value in distinct(key)
as p
order by id;

Just use conditional aggregation. It is more flexible and lacks ideosynchracies:
select id,
max(case when key = 'k1' then value end) as k1,
max(case when key = 'k2' then value end) as k2
from t
group by id;

How to select distinct records based on a given condition?

I have the following table in the MySQL database:
| id | col | val |
| -- | --- | --- |
| 1 | 1 | y |
| 2 | 1 | y |
| 3 | 1 | y |
| 4 | 1 | n |
| 5 | 2 | n |
| 6 | 3 | n |
| 7 | 3 | n |
| 8 | 4 | y |
| 9 | 5 | y |
| 10 | 5 | y |
Now I want to distinctly select the records where all the values of similar col are equal to y. I tried both the following queries:
SELECT DISTINCT `col` FROM `tbl` WHERE `val` = 'y'
SELECT `col` FROM `tbl` GROUP BY `col` HAVING (`val` = 'y')
But it's not working out as per my expectation. I want the result to look like this:
| col |
| --- |
| 4 |
| 5 |
But 1 is also being included in the results with my queries. Can anybody help me building the correct query? As far as I understand, I may need to create a derived table, but can't quite figure out the right path.

You are close, with the second query. Instead, compare the min and max values:
SELECT `col`
FROM `tbl`
GROUP BY `col`
HAVING MIN(val) = MAX(val) AND MIN(`val`) = 'y';

Check that 'y' is the minimum value:
HAVING MIN(val) = 'y'

How would I positionally select records based on a desired value (using SparkSQL)?

Let's say I had the following table:
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
| h | s | 6 |
| i | r | 0 |
+------+------+--------+
So I would like to go through all of the records. Every time I find the value 0 in NumCol, I want to select that record and every record that came before it, up to the precious occurence of the value 0. So for I should return something like this (if looped through the whole table):
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| h | s | 6 |
| i | r | 0 |
+------+------+--------+

What i would recommend is to use "Cursor" if you are using Microsoft -SQL.
Using cursor you can loop through records one by one and cut it off once it reaches zero.
You probably want to create a table separate to the one you have listed which you can feed from the cursor as it will speed up things.
if you will try to do all this in memory it may struggle.

First, SQL tables represent unordered sets. I am going to assume that the first two columns specify the ordering.
You can enumerate the groups using a cumulative sum -- by adding the number of 0 on or after each row. Then to get a value starting at zero, you can subtract from the total number of zeros:
select t.*
from (select t.*,
(sum(case when num_col = 0 then 1 else 0 end) over () + 1 -
sum(case when numcol = 0 then 1 else 0 end) over (order by col1 desc, col2 desc)
) as grp
from t
) t;
You can now select groups of rows by just using where grp = N.

Adding conditional statements to a SQL window function

I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.

Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;

I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;

Oracle SQL conditional ranking

In my query, I am doing multiple types of ranking and for one of ranking types, I want to rank the row only if certain column is not null. Else I don't want ranking to happen.
For example here's a sample table:
+------+------------+------------+--------+--------+
| col1 | col2 | col3 | rank 1 | rank 2 |
+------+------------+------------+--------+--------+
| a | 2018-01-20 | 2018-03-04 | 2 | 2 |
| a | 2018-01-24 | 2018-04-04 | 1 | 1 |
| b | 2018-01-02 | 2018-05-03 | 1 | 1 |
| c | 2017-01-02 | 2017-05-08 | 3 | 2 |
| d | 2016-05-24 | null | 1 | null |
| c | 2018-02-05 | 2018-05-03 | 2 | 1 |
| c | 2018-07-28 | null | 1 | null |
+------+------------+------------+--------+--------+
rank1 is calculated alright based on partition by col1 order by col2 desc
rank 2 should be calculated the same way, but only when when col3 is null, else it should be null.
How can I achieve both ranks in a single query? I tried to use case statement for rank2, but it skips the ranking when col3 is null,

If I understand corrcly, you can try to use CASE WHEN with sum window function
CASE WHEN check col3 isn't null do accumulate else display NULL
CREATE TABLE T(
col1 VARCHAR(5),
col2 DATE,
col3 DATE
);
INSERT INTO T VALUES ( 'a' , to_date('2018-01-20','YYYY-MM-DD') , to_date('2018-03-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'a' , to_date('2018-01-24','YYYY-MM-DD') , to_date('2018-04-04','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'b' , to_date('2018-01-02','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , to_date('2017-01-02','YYYY-MM-DD') , to_date('2017-05-08','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'd' , TO_DATE('2016-05-24','YYYY-MM-DD') , null);
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-02-05','YYYY-MM-DD') , to_date('2018-05-03','YYYY-MM-DD'));
INSERT INTO T VALUES ( 'c' , TO_DATE('2018-07-28','YYYY-MM-DD') , null);
Query 1:
select t1.*,
rank() OVER(partition by col1 order by col2 desc) rank1,
(CASE WHEN COL3 IS NOT NULL THEN
SUM(CASE WHEN COL3 IS NOT NULL THEN 1 ELSE 0 END) OVER(partition by col1 order by col2 desc)
ELSE
NULL
END) rank2
FROM T t1
Results:
| COL1 | COL2 | COL3 | RANK1 | RANK2 |
|------|----------------------|----------------------|-------|--------|
| a | 2018-01-24T00:00:00Z | 2018-04-04T00:00:00Z | 1 | 1 |
| a | 2018-01-20T00:00:00Z | 2018-03-04T00:00:00Z | 2 | 2 |
| b | 2018-01-02T00:00:00Z | 2018-05-03T00:00:00Z | 1 | 1 |
| c | 2018-07-28T00:00:00Z | (null) | 1 | (null) |
| c | 2018-02-05T00:00:00Z | 2018-05-03T00:00:00Z | 2 | 1 |
| c | 2017-01-02T00:00:00Z | 2017-05-08T00:00:00Z | 3 | 2 |
| d | 2016-05-24T00:00:00Z | (null) | 1 | (null) |

I think you might want:
select count(col3) over (partition by col1 order by col2 desc)
Note that this is equivalent to row_number() rather than rank(). For your data these are equivalent.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Select for unique value in a column - sql

You can use MAX() and a CASE statement for this: SELECT ID ,MAX(CASE WHEN Col1 = 1 THEN Col2 END) AS Col2_1 ,MAX(CASE WHEN Col1 = 2 THEN Col2 END) AS Col2_2 FROM YourTable GROUP BY ID Demo: SQL Fiddle

Related

How to aggregate from a snowflake pivot where keys have distinct value for each row?

How to select distinct records based on a given condition?

How would I positionally select records based on a desired value (using SparkSQL)?

Adding conditional statements to a SQL window function

Oracle SQL conditional ranking

Categories

Resources