Adding conditional statements to a SQL window function - sql

I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.

Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;

I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;

Related

How would I positionally select records based on a desired value (using SparkSQL)?

Let's say I had the following table:
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
| h | s | 6 |
| i | r | 0 |
+------+------+--------+
So I would like to go through all of the records. Every time I find the value 0 in NumCol, I want to select that record and every record that came before it, up to the precious occurence of the value 0. So for I should return something like this (if looped through the whole table):
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| h | s | 6 |
| i | r | 0 |
+------+------+--------+
What i would recommend is to use "Cursor" if you are using Microsoft -SQL.
Using cursor you can loop through records one by one and cut it off once it reaches zero.
You probably want to create a table separate to the one you have listed which you can feed from the cursor as it will speed up things.
if you will try to do all this in memory it may struggle.
First, SQL tables represent unordered sets. I am going to assume that the first two columns specify the ordering.
You can enumerate the groups using a cumulative sum -- by adding the number of 0 on or after each row. Then to get a value starting at zero, you can subtract from the total number of zeros:
select t.*
from (select t.*,
(sum(case when num_col = 0 then 1 else 0 end) over () + 1 -
sum(case when numcol = 0 then 1 else 0 end) over (order by col1 desc, col2 desc)
) as grp
from t
) t;
You can now select groups of rows by just using where grp = N.

Skip row that results in zero

Is there a way to skip all rows that result in zero after division. For example
+------+------+
| Col1 | Col2 |
+------+------+
| 5 | 5 |
| 3 | 0 |
| 12 | 6 |
+------+------+
Then col3 = col1 /col2 giving:
+------+------+------+
| Col1 | Col2 | col3 |
+------+------+------+
| 5 | 5 | 1 |
| 12 | 6 | 2 |
+------+------+------+
You can try the below -
select col1, col2, col1/col2
from tablename
where col2!=0

Semi-transposing a table in Oracle

I am having trouble semi-transposing the table below based on the 'LENGTH' column. I am using an Oracle database, sample data:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | LENGTH | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 4 | 1 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I would like to lengthen this table based on the LENGTH row; basically duplicating the row for each value in the LENGTH column.
See the desired output table below:
+-----------+-----------+--------+------+
| PERSON_ID | PERIOD_ID | NUMBER | FLAG |
+-----------+-----------+--------+------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 1 |
| 1 | 1 | 3 | 1 |
| 1 | 1 | 4 | 1 |
| 1 | 2 | 1 | 0 |
| 1 | 2 | 2 | 0 |
| 1 | 2 | 3 | 0 |
| 2 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 2 | 1 | 3 | 1 |
| 2 | 1 | 4 | 1 |
+-----------+-----------+--------+------+
I typically work in Posgres so Oracle is new to me.
I've found some solutions using the connect by statement but they seem overly complicated, particularly when compared to the simple generate_series() command from Posgres.
A recursive CTE subtracting 1 from length until 1 is reached should work. (In Postgres too, BTW, should you need something working cross platform.)
WITH cte (person_id,
period_id,
number_,
flag)
AS
(
SELECT person_id,
period_id,
length number_,
flag
FROM elbat
UNION ALL
SELECT person_id,
period_id,
number_ - 1 number_,
flag
FROM cte
WHERE number_ > 1
)
SELECT *
FROM cte
ORDER BY person_id,
period_id,
number_;
db<>fiddle

Oracle group by only ONE column - update

I am stuck in similar situation as this.
I have multiple columns with different types of data, and I want to select all columns but group by it with only one column.
My Table:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 1 | 1 | abcd | 100 | www.google.com |
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 4 | 3 | asda3 | 78 | www.imdb.com |
| 5 | 4 | zsdvf4 | 65 | www.youtube.com |
| 6 | 5 | sdf4 | 101 | www.ymail.com |
| 7 | 5 | ssdfsd | 200 | www.gmail.com |
| 8 | 1 | zxcgdf4 | 200 | www.club.com |
| 9 | 6 | yujhgj | 202 | www.thunderbird.com |
+--------+----------+----------+-------+-----------------------+
After reading the solution provided there, what I understood is to use aggregate function so my query is like:
select MIN(b_group),id,col2,col3,col4 from myTable where col3='200' group by id,col2,col3,col4;
But this is not working in my case, it is giving all the records where col3=200.
My desired Output:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 6 | 5 | sdf4 | 200 | www.ymail.com |
+--------+----------+----------+-------+-----------------------+
I don't care which record is picked, order don't matter.
I just want to select all columns with group by only one.
By applying a group by clause, you get a result row per unique combination of all the columns in it (in this case, per unique combination of id, col2, col3, and col4). Instead, you could use the row_number window function to number rows per b_group, and then select just the (arbitrary) first of each group:
SELECT id, b_group, col2, col3, col4
FROM (SELECT id, b_group, col2, col3, col4,
ROW_NUMBER() OVER (PARTITION BY b_group ORDER BY 1) AS rn
FROM mytable
WHERE col3 = 200)
WHERE rn = 1

SQL Increment number in select statement

I have an issue where I need group a set of values and increase the group number when the variance between 2 columns is greater than or equal to 4, please see below.
UPDATE: I added a date column so you can view the order, but I need the group to update based off of the variance not the date.
+--------+-------+-------+----------+--------------+
| Date | Col 1 | Col 2 | Variance | Group Number |
+--------+-------+-------+----------+--------------+
| 1-Jun | 2 | 1 | 1 | 1 |
| 2-Jun | 1 | 1 | 0 | 1 |
| 3-Jun | 3 | 2 | 1 | 1 |
| 4-Jun | 4 | 1 | 3 | 1 |
| 5-Jun | 5 | 1 | 4 | 2 |
| 6-Jun | 1 | 1 | 0 | 2 |
| 7-Jun | 23 | 12 | 11 | 3 |
| 8-Jun | 12 | 11 | 1 | 3 |
| 9-Jun | 2 | 1 | 1 | 3 |
| 10-Jun | 13 | 4 | 9 | 4 |
| 11-Jun | 2 | 1 | 1 | 4 |
+--------+-------+-------+----------+--------------+
The group number is simply the number of times that 4 or greater appears in the variance column. You can get this using a correlated subquery:
select t.*,
(select 1 + count(*)
from table t2
where t2.date < t.date and t2.variance >= 4
) as GroupNumber
from table t;
In SQL Server 2012+, you can also do this using a cumulative sum:
select t.*,
sum(case when variance >= 4 then 1 else 0 end) over
(order by date rows between unbounded preceding and 1 preceding
) as GroupNumber
from table t;