How would I positionally select records based on a desired value (using SparkSQL)? - sql

Let's say I had the following table:
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
| h | s | 6 |
| i | r | 0 |
+------+------+--------+
So I would like to go through all of the records. Every time I find the value 0 in NumCol, I want to select that record and every record that came before it, up to the precious occurence of the value 0. So for I should return something like this (if looped through the whole table):
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| a | z | 1 |
| b | y | 2 |
| c | x | 3 |
| d | w | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| e | v | 4 |
| f | u | 5 |
| g | t | 0 |
+------+------+--------+
+------+------+--------+
| Col1 | Col2 | NumCol |
+------+------+--------+
| h | s | 6 |
| i | r | 0 |
+------+------+--------+

What i would recommend is to use "Cursor" if you are using Microsoft -SQL.
Using cursor you can loop through records one by one and cut it off once it reaches zero.
You probably want to create a table separate to the one you have listed which you can feed from the cursor as it will speed up things.
if you will try to do all this in memory it may struggle.

First, SQL tables represent unordered sets. I am going to assume that the first two columns specify the ordering.
You can enumerate the groups using a cumulative sum -- by adding the number of 0 on or after each row. Then to get a value starting at zero, you can subtract from the total number of zeros:
select t.*
from (select t.*,
(sum(case when num_col = 0 then 1 else 0 end) over () + 1 -
sum(case when numcol = 0 then 1 else 0 end) over (order by col1 desc, col2 desc)
) as grp
from t
) t;
You can now select groups of rows by just using where grp = N.

Related

How to select distinct records based on a given condition?

I have the following table in the MySQL database:
| id | col | val |
| -- | --- | --- |
| 1 | 1 | y |
| 2 | 1 | y |
| 3 | 1 | y |
| 4 | 1 | n |
| 5 | 2 | n |
| 6 | 3 | n |
| 7 | 3 | n |
| 8 | 4 | y |
| 9 | 5 | y |
| 10 | 5 | y |
Now I want to distinctly select the records where all the values of similar col are equal to y. I tried both the following queries:
SELECT DISTINCT `col` FROM `tbl` WHERE `val` = 'y'
SELECT `col` FROM `tbl` GROUP BY `col` HAVING (`val` = 'y')
But it's not working out as per my expectation. I want the result to look like this:
| col |
| --- |
| 4 |
| 5 |
But 1 is also being included in the results with my queries. Can anybody help me building the correct query? As far as I understand, I may need to create a derived table, but can't quite figure out the right path.
You are close, with the second query. Instead, compare the min and max values:
SELECT `col`
FROM `tbl`
GROUP BY `col`
HAVING MIN(val) = MAX(val) AND MIN(`val`) = 'y';
Check that 'y' is the minimum value:
HAVING MIN(val) = 'y'

Oracle SQL unpivot and keep rows with null values [duplicate]

This question already has an answer here:
oracle - querying NULL values in unpivot query
(1 answer)
Closed 2 years ago.
I'm currently doing an unpivot for a Oracle Data Source (v.12.2) like this:
SELECT *
FROM some_table
UNPIVOT (
(X,Y,Val)
FOR SITE
IN (
(SITE1_X, SITE1_Y, SITE1_VAL) AS '1',
(SITE2_X, SITE2_Y, SITE2_VAL) AS '2',
(SITE3_X, SITE3_Y, SITE3_VAL) AS '3'
))
This works totally fine so far. There is only one exception - I have another column, let's say extend_info, ... if this column has the value y, there will be only one row of this column and all the site columns will be null. Nevertheless I would like to keep this row and not drop it.
I'm not really sure how to do this or what would be a nice way to do this. Any recommendations?
Example:
Original Table:
ID | SITE1_X | SITE1_Y |SITE1_VAL | SITE2_X | SITE2_Y | SITE2_VAL | ... | extend_info
-------
1 | 0 | 0 | 5 | 1 | 1 | 10 | ... | n
2 | 0 | 0 | 3 | null | null | null | ... | n
3 | null | null | null | null | null | null | ... | y
current output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
desired output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
4 | | | | | y
I don't really care what is in SITE|X|Y|VAL in that case, can be 0 for everything or null.
Bonus question:
If extend_info is y I would like to join another table with this ID. The other table looks like this:
ID | F_ID | X | Y | VAL
-----
1 | 4 | 1 | 1 | 8
2 | 4 | 2 | 2 | 9
and in that case my final output table should look like:
ID | SITE | X | Y | VAL | X_OTHER_TABLE | Y_OTHER_TABLE
-------
1 | 1 | 0 | 0 | 5 |
2 | 1 | 1 | 1 | 10 |
3 | 2 | 0 | 0 | 3 |
4 | | | | 8 | 1 | 1
5 | | | | 9 | 2 | 2
I know... the database structure is super ugly but that is what a vendor provides us and we are trying to create a View to make it easier to perform some data analysis tasks on it.
It doesn't have to look 1:1 like my final example - but maybe my itention gets clear = I want to have one single table/view with all the information in a single format.
Thanks for any help!
I would recommend a lateral join:
SELECT s.id, u.*
FROM some_table s CROSS JOIN LATERAL
(SELECT s.SITE1_X as SITE_X, s.SITE1_Y as SITE_Y, s.SITE1_VAL as SITE_VAL FROM DUAL UNION ALL
SELECT s.SITE2_X, s.SITE2_Y, s.SITE2_VAL FROM DUAL UNION ALL
SELECT s.SITE3_X, s.SITE3_Y, s.SITE3_VAL FROM DUAL
) u;
You can just join additional tables to this as you like.

Using LAG function with higher offset

Suppose we have the following input table
cat | value | position
------------------------
1 | A | 1
1 | B | 2
1 | C | 3
1 | D | 4
2 | C | 1
2 | B | 2
2 | A | 3
2 | D | 4
As you can see, the values A,B,C,D change position in each category, I want to track this change by adding a column change in front of each value, the output should look like this:
cat | value | position | change
---------------------------------
1 | A | 1 | NULL
1 | B | 2 | NULL
1 | C | 3 | NULL
1 | D | 4 | NULL
2 | C | 1 | 2
2 | B | 2 | 0
2 | A | 3 | -2
2 | D | 4 | 0
For example C was in position 3 in category 1 and moved to position 1 in category 2 and therefore has a change of 2. I tried inmplementing this using the LAG() function with an offset of 4 but I failed, how can I write this query.
Use lag() - with the proper partition by clause:
select
t.*,
lag(position) over(partition by value order by cat) - position change
from mytable t
You can use lag and then order by to maintain original order. Here is the demo.
select
*,
lag(position) over (partition by value order by cat) - position as change
from yourTable
order by
cat, position
output:
| cat | value | position | change |
| --- | ----- | -------- | ------ |
| 1 | A | 1 | null |
| 1 | B | 2 | null |
| 1 | C | 3 | null |
| 1 | D | 4 | null |
| 2 | C | 1 | 2 |
| 2 | B | 2 | 0 |
| 2 | A | 3 | -2 |
| 2 | D | 4 | 0 |
I think you just want lag() with the right partition by:
select t.*,
(lag(position) over (partition by value order by cat) - position) as change
from t;
Here is a db<>fiddle.

Adding conditional statements to a SQL window function

I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.
Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;
I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;

Query to give count of unique results in column A but then split by values in column B

I'm trying to create a SQL query to count the number of occurences of a dynamic value in one column, but then have this count split by what is in another column.
If the table I'm getting the data from looks something like:
id | Team | Period |
____________________
1 | A | 1 |
2 | B | 1 |
3 | B | 1 |
4 | A | 1 |
5 | E | 2 |
6 | D | 3 |
7 | A | 3 |
8 | A | 3 |
9 | B | 4 |
10 | A | 4 |
11 | C | 4 |
etc...
I want to get a list of how many times each team appears per period like so.
team | period | count |
_________________________
A | 1 | 2 |
B | 1 | 2 |
C | 1 | 0 |
D | 1 | 0 |
E | 1 | 0 |
A | 2 | 0 |
B | 2 | 0 |
C | 2 | 0 |
D | 2 | 0 |
E | 2 | 0 |
A | 3 | 2 |
This will be used in a PHP page to then make an assoc array and print out the data for reporting purposes.
I have previously used things like
SELECT sum(case when somecolumn = 'blah' then 1 else 0 end) as blah_count
But I can't do that here, because in future the names of the teams may change to a currently unknown value, so I can't use the names in the query. (and no, I won't be told about this so I can change the query.) So I need a query where it both gives count of any occurrence in the team column and splits them by period. Period will always be a number from 1 to 13.
You could cross join all distinct values of teams and periods, and then left join the original table and aggregate.
select
te.team,
pe.period,
count(ta.team) cnt
from
(select distinct team from mytable) te
cross join (select distinct period from mytable) pe
left join mytable ta
on ta.team = te.team
and ta.period = pe.period
group by te.team, pe.period
order by pe.period, te.team
Demo on DB Fiddle:
team | period | cnt
:--- | -----: | --:
A | 1 | 2
B | 1 | 2
C | 1 | 0
D | 1 | 0
E | 1 | 0
A | 2 | 0
B | 2 | 0
C | 2 | 0
D | 2 | 0
E | 2 | 1
A | 3 | 2
B | 3 | 0
C | 3 | 0
D | 3 | 1
E | 3 | 0
A | 4 | 1
B | 4 | 1
C | 4 | 1
D | 4 | 0
E | 4 | 0
SELECT team,period,count(*) as count
from TABLENAME group by team
Why not pivot the periods?
select team,
sum( period = 1 ) as period_1,
sum( period = 2 ) as period_2,
sum( period = 3 ) as period_3,
sum( period = 4 ) as period_4
from t
group by team;
You seem to know that there are four periods.