oracle group by using rank - sql

TableOne
Id1| Level |Type|Survey Nr
--------------------------------
1| Level 1 |A |1
2| Level 2 |A |1
3| Level 3 |A |1
4| All Levels|A |1
-------------------------------
5| Level 1 |B |1
6| Level 2 |B |1
7| Level 4 |B |1
--------------------------------
8| Level 1 |A |2
9| Level 2 |A |2
10| Level 3 |A |2
11| All Levels|A |2
I want to group my data by type and survey Nr an the output of my query to be
1. All levels |A |1
2. Level 1 |B |1
3. Level 2 |B |1
4. Level 4 |B |1
5. All Levels |A |2
So when my subgroup Type/survey nr have level "All Levels" i will only display that record like in case A -1 and A2 else i want to display all records like in case B-1.

You can do this with the RANK() function and a CASE statement:
WITH cte AS (SELECT "Id1","Lev","Type","Survey_Nr"
,RANK() OVER (PARTITION BY "Type","Survey_Nr" ORDER BY CASE WHEN "Lev" = 'All Levels' THEN 0 ELSE 1 END) AS RN
FROM Table1)
SELECT *
FROM cte
WHERE RN = 1
ORDER BY "Id1"
Demo: SQL Fiddle
RANK() will assign a rank value to each set indicated in the PARTITION BY clause, and the CASE statement in the ORDER BY is used to set all values of Lev into one of two categories, giving preference to the "All Levels" values. Running this without the WHERE clause will help you see how the RANK() function is working.

I'd just UNION ALL the data where you have the string 'All Levels' to the data where it doesn't exist within that group. I've effectively assumed that your table is unique on LEVEL, TYPE and SURVEY_NR.
with base_data as (
select a
from the_table
)
select level, type, survey_nr
from base_data
where level = 'All Levels'
union all
select level, type, survey_nr
from base_data
where not exists ( select 1
from base_data
where level = 'All Levels'
and type = x.type
and survey_nr = x.survey_nr
)
Please note that LEVEL is normally an invalid name for a column; it's worth changing it.

Related

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

SQL count distinct values for each row

I got a table looking like this
+-----+---------+
|Group|Value |
+-----+---------+
|A |1 |
+-----+---------+
|B |2 |
+-----+---------+
|C |1 |
+-----+---------+
|D |3 |
+-----+---------+
And I would like to add a column in my select command that count GROUP based on value, lookin like this:
+-----+---------+---------+
|Group|Value | COUNT |
+-----+---------+---------+
|A |1 |2 |
+-----+---------+---------+
|B |2 |1 |
+-----+---------+---------+
|C |1 |2 |
+-----+---------+---------+
|D |3 |1 |
+-----+---------+---------+
Value 1 got the two groups A and C the other values for each one in this example.
Additional is it possible to consider all values for VALUES and GROUP even if a WHERE filtered out some of them in the select query?
You want a window function:
select t.*, count(*) over (partition by value) as count
from t;
You have a problem if the query has a where clause. The where applies to the window function. So you need a subquery for the count:
select t.*
from (select t.*, count(*) over (partition by value) as count
from t
) t
where . . .;
Or a correlated subquery might be convenient under some circumstances:
select t.*,
(select count(*) from t t2 where t2.value = t.value) as count
from t
where . . .;

reset index in dense_rank or row_number after variable partitioning over changes

I'm using DB2 SQL. I have the following:
select * from mytable order by Var,Varseq
ID Var Varseq
-- --- ------
1 A 1
1 A 2
1 B 1
1 A 3
2 A 1
2 C 1
but would like to get:
ID Var Varseq NewSeq
-- --- ------ ------
1 A 1 1
1 A 2 2
1 B 1 1
1 A 3 1
2 A 1 1
2 C 1 1
However dense_rank produces the same as the original result. I hope you can see the difference in the desired output - in the 4th line when ID=1 returns to Var=A, I want the index reset to 1, instead of carrying on as 3. i.e. I would like the index to be reset every time Var changes for a given ID.
for ref here was my query:
SELECT *, DENSE_RANK() OVER (PARTITION BY ID, VAR ORDER BY VARSEQ) FROM MYTABLE
This is an example of a gaps-and-islands problem. However, SQL tables represent unordered sets. Without a column that specifies the overall ordering, your question does not make sense.
In this case, the difference of row numbers will do what you want. But you need an overall ordering column:
select t.*,
row_number() over (partition by id, var, seqnum - seqnum2 order by <ordering col>) as newseq
from (select t.*,
row_number() over (partition by id order by <ordering col>) as seqnum,
row_number() over (partition by id, var order by <ordering col>) as seqnum2
from t
) t
Not an answer yet, but just to have better formatting.
WITH TAB (ID, Var, Varseq) AS
(
VALUES
(1, 'A', 1)
, (1, 'A', 2)
, (1, 'A', 3)
, (1, 'B', 1)
, (2, 'A', 1)
, (2, 'C', 1)
)
SELECT *
FROM TAB
ORDER BY ID, <order keys>;
You specified Var, Varseq as <order keys> in the query above.
The result is:
|ID |VAR|VARSEQ |
|-----------|---|-----------|
|1 |A |1 |
|1 |A |2 |
|1 |A |3 |
|1 |B |1 |
|2 |A |1 |
|2 |C |1 |
But you need the following according to your question:
|ID |VAR|VARSEQ |
|-----------|---|-----------|
|1 |A |1 |
|1 |A |2 |
|1 |B |1 |
|1 |A |3 |
|2 |A |1 |
|2 |C |1 |
So, please, edit your question to specify such a <order keys> clause to get the result you need. And please, run your query getting such an order on your system first before posting here...

How to explode an Array and create a view in Hive?

I have the following data where id is an Integer and vectors is an array:
id, vectors
1, [1,2,3]
2, [2,3,4]
3, [3,4,5]
I would like to explode the vectors column with its index postioning such that it looks like this:
+---+-----+------+
|id |index|vector|
+---+-----+------+
|1 |0 |1 |
|1 |1 |2 |
|1 |2 |3 |
|2 |0 |2 |
|2 |1 |3 |
|2 |2 |4 |
|3 |0 |3 |
|3 |1 |4 |
|3 |2 |5 |
+---+-----+------+
I figured that I can do this using Spark Scala using selectExpr
df.selectExpr("*", "posexplode(vectors) as (index, vector)")
However, this is a relatively simple task and I would like to avoid writing ETL scripts and was thinking if there is anyway the expression can be used and creating a view for easy access through Presto.
This is easy to do in Presto using standard SQL syntax with UNNEST:
WITH data(id, vector) AS (
VALUES
(1, array[1,2,3]),
(2, array[2,3,4]),
(3, array[3,4,5])
)
SELECT id, index - 1 AS index, value
FROM data, UNNEST(vector) WITH ORDINALITY AS t(value, index)
Note that the index produced by WITH ORDINALITY is 1-based, so I subtracted 1 from it to produce the output you included in your question.
You can use Lateral view of Hive to explode array data.
Try below query -
select
id, (row_number() over (partition by id order by col)) -1 as `index`, col as vector
from (
select 1 as id, array(1,2,3) as vectors from (select '1') t1 union all
select 2 as id, array(2,3,4) as vectors from (select '1') t2 union all
select 3 as id, array(3,4,5) as vectors from (select '1') t3
) t
LATERAL VIEW explode(vectors) v;

Update each row with incremental value Postgres

I want to update every row in a table in Postgres and set each row to a different value; this value is gonna be an incremental value with a start value.
For instance, suppose I have table tab_a with the following data:
|attr_a|attr_b|
|1 |null |
|2 |null |
|3 |null |
|4 |null |
The output I might want is:
|attr_a|attr_b|
|1 |5 |
|2 |6 |
|3 |7 |
|4 |8 |
Here is my script:
UPDATE tab_a
SET attr_b = gen.id
FROM generate_series(5,8) AS gen(id);
However is not working as expected...
You could do
UPDATE tab_a upd
SET attr_b = row_number + 4 -- or something like row_number + (select max(attr_a) from tab_a)
FROM (
SELECT attr_a, row_number() over ()
FROM tab_a
ORDER BY 1
) a
WHERE upd.attr_a = a.attr_a;
Do something like this
UPDATE pr_conf_item upd
SET item_order = a.row_number
FROM (
SELECT id, row_number() over ()
FROM pr_conf_item
ORDER BY 1
) a
WHERE upd.id = a.id;