Combine row aggregate data with individual rows - sql

I have a table looking like below
base_data
session_id
event_type
player_guess
correct_answer
1
guess
'python'
NULL
1
guess
'javascript'
NULL
1
guess
'scala'
NULL
1
all_answered
NULL
['python','javascript','hadoop']
2
guess
'triangle'
NULL
2
guess
'square'
NULL
2
all_answered
NULL
['triangle','square']
I am trying to get a new column called as was_guess_correct defined as follow :
For each session_id, match the player_guess values with data in correct_answer. Correct answer for session_id is available when event_type = 'all_answered'
The result would look like -
session_id
event_type
player_guess
correct_answer
was_guess_correct
1
guess
'python'
NULL
1
1
guess
'javascript'
NULL
1
1
guess
'scala'
NULL
0
1
all_answered
NULL
['python','javascript','hadoop']
1
2
guess
'triangle'
NULL
1
2
guess
'square'
NULL
1
2
all_answered
NULL
['triangle','square']
1
The values in row all_answered are unique as well as sorted ( The order can be used or just checking using IN clause might also work )
For row with event_type all_answered, the column was_guess_correct does not matter. It can be 1 or 0 - whatever helps makes the query easier.
How would I be able to compute the above column in SQL/ Presto ?
I am trying to see - How to compute using JOIN/Unnest and also inline (without JOIN) if possible.

You can use window functions to get the correct answers on each row. Then how you manage the result depends on the type of the column. If it is a string, you can just use like:
select t.*,
(case when event_type = 'all_answered' or
max(correct_answer) over (partition by session_id) like '%''' || player_guess || '''%'
then 1 else 0
end) as was_guess_correct
from t;
Note that correct_answer is NULL in the "guess" rows, so max() works (assuming there is one correct answer row per session).

Related

SQL literal value that is alternative to NULL

Are there other special literal values besides NULL in SQL / PostgresQL?
NULL is nice in that we can interpret NULL as the concept of "nothing" (i.e. missing, not available, not asked, not answered, etc.), and data columns of any type can have NULL values.
I would like another value that I can interpret as representing another concept (here the idea of "everything"), in the same result set.
Is there another special value that I can return in a query, which like NULL doesn't type conflict?
Basically anything that doesn't throw ERROR: For 'UNION', types varchar and numeric are inconsistent in this toy query:
select 1 as numeral, 'one' as name UNION ALL
select 2 as numeral, 'two' as name UNION ALL
select NULL as numeral, NULL as name UNION ALL
select -999 as numeral, -999 as name UNION ALL -- type conflict
select '?' as numeral, 'x' as name -- type conflict
Here,
-999 doesn't work as its type conflicts with varchar columns
'~' doesn't work as its type conflicts with numeric columns
NULL doesn't work as it needs
More specifically here's my actual case, counting combinations of values and also include "Overall" rows in the same query. Generally I won't know or control the types of columns A, B, C in advance. And A, B, or C might also have NULL values which I would would still want to count separately.
SELECT A, COUNT(*) FROM table GROUP BY 1
UNION ALL
SELECT ?, COUNT(*) FROM table GROUP BY 1
and get a result set like:
A
COUNT
NULL
2
1
3
2
5
3
10
(all)
20
SELECT B, COUNT(*) FROM table GROUP BY 1
UNION ALL
SELECT ?, COUNT(*) FROM table GROUP BY 1
and get a result set like:
B
COUNT
NULL
2
'Circle'
3
'Line'
5
'Triangle'
10
(all)
20
You can use function CAST to convert the format to VARCHAR to be considered as string.
NOTE: Thanks to the comments above, I should completely rephrase this question as "How to COUNT/GROUP BY with ROLLUP using multiple columns of mixed/arbitrary/unknown types, and differentiate true NULL values from ROLLUP placeholders?"
The correct answer I believe is provided by #a_horse_with_no_name: use ROLLUP with GROUPING.
Below is is just me drafting that more completely with a revised example:
This toy example has an integer and a string
WITH table AS (
select 1 as numeral, 'one' as name UNION ALL
select 2 as numeral, 'two' as name UNION ALL
select 2 as numeral, 'two' as name UNION ALL
select NULL as numeral, NULL as name UNION ALL
select NULL as numeral, NULL as name UNION ALL
select NULL as numeral, NULL as name
)
select name, numeral, COUNT(*), GROUPING_ID()
FROM table
GROUP BY ROLLUP(1,2)
ORDER BY GROUPING_ID, name, numeral ;
It returns the following result:
numeral
name
count
grouping_id
note
NULL
NULL
3
0
both are true NULLs as grouping is 0
1
one
1
0
2
two
2
0
NULL
NULL
3
1
first is a true NULL, second is a ROLLUP
1
NULL
1
1
2
NULL
2
1
NULL
NULL
6
3
both NULLs are ROLLUPs

Subtract 2 case statements

I am trying to subtract the 2 case statements like this:
CASE
WHEN fct.measure IN ('A')
THEN fct.month_value
ELSE NULL
END
- CASE
WHEN fct.measure IN ('B')
THEN fct.month_value
ELSE NULL
END AS discounts
This query doesn't throw a syntax error, but it returns all NULL.
The month_value corresponding to A is 3173.100000 and the month value corresponding to B is 8043.000000.
Any suggestions on how this could return the correct result instead of all NULL?
I presume that you need some kind of conditional aggregation approach here:
SELECT
col,
MAX(CASE WHEN measure = 'A' THEN month_value END) -
MAX(CASE WHEN measure = 'B' THEN month_value END);
FROM yourTable
GROUP BY col;
This assumes that your table structure looks something like the following:
col | measure | month_value
1 | A | 3173.10
1 | B | 8043.00
We aggregate by each col value, and then use conditional aggregation to isolate the various month values based on the value of the measure column.

Trying to create a flag based on ranking values in two separate columns

I have a query where I have duplicate rows for certain ID's, but have two different columns created by window functions and need to create a flag for each row based on a certain order. I've provided an example below of what the data looks like.
I have a RANK_ONE column that returns a ranked value or null and not every ID has a value, but if it has a one, I need it to return a 1 for the row that contains 1.
The RANK_TWO column is basically the same, but I need to flag the 1 row IF there isn't already a 1 flagged from RANK_ONE for the same ID.
The PRIMARY column is my desired outcome. Any thoughts?
I don't know if I've just been in the query too long and can't see a simple solution right in front of me, but it's driving me nuts trying to figure it out.
You seem to want rank_one if it is every 1 for the id and otherwise rank_two:
select t.*,
(case when max(case when rank_one = 1 then 1 else 0 end) over (partition by id) = 1 and rank_one = 1 then 1
when max(case when rank_one = 1 then 1 else 0 end) over (partition by id) = 1 then 0
when rank_two = 1 then 1
else 0
end) as primary
from t;

Impala SQL, return value if a string exists within a subset of values

I have a table where the id field (not a primary key) contains either 1 or null. Over the past several years, any given part could have been entered multiple times with one, or both of these possible options.
I'm trying to write a statement that will return some value if there is ever a 1 associated with the select statement. There are lots of semi-duplicate rows, some with 1 and some with null, but if there is ever a 1, I want to return true, and if there are only null values, I want to return false. I'm not sure how to code this though.
If this is my SELECT part,id from table where part = "ABC1234" statement
part id
ABC1234 1
ABC1234 null
ABC1234 null
ABC1234 null
ABC1234 1
I want to write a statement that returns true, because 1 exists in at least one of these rows.
The closest I've come to this is by using a CASE statement, but I'm not quite there yet:
SELECT
a1.part part,
CASE WHEN a2.id is not null
THEN
'true'
ELSE
'false'
END AS id
from table.parts a1, table.ids a2 where a1.part = "ABC1234" and a1.key = a2.key;
I also tried the following case:
CASE WHEN exists
(SELECT id from table.ids where id = 1)
THEN
but I got the error subqueries are not supported in the select list
For the above SELECT statement, how do I return 1 single line that reads:
part id
ABC1234 true
You can use conditional aggregation to check if a part has atleast one row with id=1.
SELECT part,'True' id
from parts
group by part
having count(case when id = 1 then 1 end) >= 1
To return false when the id's are all nulls use
select part, case when id_true>=1 then 'True'
when id_false>=1 and id_true=0 then 'False' end id
from (
SELECT part,
count(case when id = 1 then 1 end) id_true,
count(case when id is null then 1 end) id_false,
from parts
group by part) t

Can I issue a 'select exists' that checks for ALL specified fields?

Say that I have a database:
TITLE | RUNTIME | EPISODES
-------------------------------------------
The X-Files 42 202
Fringe NULL 100
Seinfeld 21 NULL
I want to issue a statement like SELECT EXISTS(SELECT title,runtime,episodes FROM shows); that will return 1 if all three of those fields are present (as for The X-Files) but 0 if any of them are empty/null (as with Fringe and Seinfeld).
Is this possible using SQL alone?
I would suggest just doing:
select t.*,
(case when title is not null and runtime is not null and episodes is not null
then 1 else 0 end) as HasAllThree
from table t;
The EXISTS function checks if rows exist, not columns. You can add a WHERE clause to meet your business objectives with the EXISTS and a CASE.
SELECT
CASE WHEN EXISTS(
SELECT * FROM shows
WHERE title IS NOT NULL
AND runtime IS NOT NULL
AND episodes IS NOT NULL
) THEN 1 ELSE 0 END