I have the statement:
SELECT col_a, col_b, col_c from database.table sample 50;
I want my output to look like this:
id col_a col_b col_c
1 data goes here
2 data goes here
3 data goes here
4 data goes here
5 data goes here
6 data goes here
Basically - I need to create a column for id that starts at 1 and auto-increments with each row that's retrieved.
How do I do this?
If you actually want the sampled rows numbered:
SELECT ROW_NUMBER() OVER (ORDER BY some_field),
dt.*
FROM
(
SELECT col_a, col_b, col_c
FROM database.table
SAMPLE 50
) AS dt
Instead of SAMPLE you might also use TOP, but then it's not random anymore.
You can use row_number():
select row_number() over (order by ??) as id, col_a, col_b, col_c
from t;
The ?? represents the column/expression the gives the ordering for the id.
Related
QUESTION
When I save my table into a CTE, then use it with IN in another query it is considered as error.
WITH id_list AS (
SELECT DISTINCT tracking_id
FROM table_a
)
SELECT col_a, col_b, col_c
FROM table_b
WHERE tracking_id IN (id_list)
But in the same time, if I put the same query as subquery with IN it works correctly.
SELECT col_a, col_b, col_c
FROM table_b
WHERE tracking_id IN (
SELECT DISTINCT tracking_id
FROM table_a
)
I want to know the reason why this situation occur. Why can't we use CTE table directly with IN?
Thanks.
I have two tables something like this:
TABLE_1:
COL_A (int), COL_B (float), COL_C (float)
and
TABLE_2:
COL_A (int), COL_B (varchar), COL_C (varchar)
My query is using a UNION to get only COL_A(int) from table 2 like
SELECT COL_A, COL_B, COL_C FROM table1 UNION
SELECT COL_A FROM table2
It's throwing an error. How do we get the results?
All subquery members of a UNION must have the same number and types of columns. In your case the first subquery has three columns, but the second one has only one.
Solution: pad the second subquery with nulls.
For example:
select COL_A, COL_B, COL_C from table1
union
select COL_A, null, null from table2
Data example:
Date event_id user_id col_A col_B col_C
1/1/2021 a_1 1234 Bad Green In
1/1/2021 a_2 1234 Good Blue In
1/1/2021 a_3 1234 Good Red Out
2/7/2021 a_4 555 Good Green Out
2/7/2021 a_5 555 Good Blue None
Each user has multiple events that occur on the same day. Per user, I want to loop through all of their events (there is no HH:MM:SS but the day will always be the same) to check specific combinations of col_A, col_B, and col_C.
For example:
SELECT
ROW_NUMBER OVER
(PARTITION BY user_id,event_id ORDER BY event_id ASC) AS ROW_NUM,
user_id,
event_id,
date,
col_A,
col_B,
col_C,
--conditional flag
case when col_C = "None" then "Priority 1"
else when col_B = "Green" and col_A="Good" then "Priority 2"
else when col_A = "Bad" then "Priority 3"
else "Priority 4"
end as conditional_flag
FROM sample_Table
The conditional flag above is what I'm trying to apply across the partition per user_id. I am just unsure how to apply this conditional logic to check against all row_numbers per user_id. If I apply the logic at the row_num level (as above), it seems this is only checking per row, not as an aggregate of all rows per user, with a priority such as if col_C ="None" for any of the rows per a user's partition then the value is set per user_id.
EDIT - adding sample query/issue screenshot below. The example of "Priority 1" is only valid at the row level for the user_id=555. My target output is for every row to reflect "Priority 1" based on checking all rows in the partition.
Ideal output:
It looks like you want conditional counts over the whole partition:
SELECT
ROW_NUMBER() OVER
(PARTITION BY user_id, event_id ORDER BY event_id ASC) AS ROW_NUM,
user_id,
event_id,
date,
col_A,
col_B,
col_C,
--conditional flag
case when count(case when col_C = 'None' then 1 end)
over (partition by user_id) > 0
then 'Priority 1'
when count(case when col_B = 'Green' and col_A = 'Good' then 1 end)
over (partition by user_id) > 0
then 'Priority 2'
when count(case when col_A = 'Bad' then 1 end)
over (partition by user_id) > 0
then 'Priority 3'
else 'Priority 4'
end as conditional_flag
FROM sample_Table;
When I aggregate data in hive, how does the group by statement treat NULL values in the aggregating column?
Say I launch the following query
select col_a, count(1) from mytable group by col_a ;
and that col_a contains 0, 1 and NULL values. Will the result have 2 rows (0 and 1)
or 3 (0,1 and NULL)?
Hive does group by on NULL values and you will get 3 values. Also please modify your query to this:
Select col_a, count(1) from mytable group by col_a ;
As the title describes I'm looking for a way to exclude the entire row from my select clause when the value in column B of this row is present anywhere in column C of the same table.
SELECT
col_a,
col_b,
col_c
FROM table
WHERE col_b NOT IN (SELECT col_c FROM table)
Should do it.