Getting Unique ID foe a maximum amount grouped by days in BigQuery - google-bigquery

I have this query in BigQuery:
SELECT
ID,
max(amount) as money,
STRFTIME_UTC_USEC(TIMESTAMP(time), '%j') as day
FROM table
GROUP BY day
The console shows an error as it wants the ID to the group by clause but if i add ID in the group by it will get many ID for a specific day.
I want to print a unique ID with the maximum amount in a specific day.
For ex:
ID: 1 Money:123 Day:365

not clear from the question, but looks like you already have only one entry per given id for particular day. Assuming this, below query does what you need
SELECT id, amount, day
FROM (
SELECT
id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM dataset.table
)
WHERE win = 1
 
in case if above assumption is wrong (so you have multiple entries for the same id for same day), use below
SELECT id, amount, day
FROM (
SELECT id, amount, day,
ROW_NUMBER() OVER(PARTITION BY day ORDER BY amount DESC) AS win
FROM (
SELECT id, SUM(amount) AS amount, day
FROM dataset.table
GROUP BY id, day
)
)
WHERE win = 1
 

Related

Get last record by month/year and id

I need to get the last record of each month/year for each id.
My table captures daily, for each id, an order value which is cumulative. So, I need that at the end I only have the last record of the month for each id.
I believe without something simple, but with the examples found I could not replicate for my case.
Here is an example of my input data and the expected result: db_fiddle.
My attempt doesn't include grouping by month and year:
select ar.id, ar.value, ar.aquisition_date
from table_views ar
inner join (
select id, max(aquisition_date) as last_aquisition_date_month
from table_views
group by id
)ld
on ar.id = ld.id and ar.aquisition_date = ld.last_aquisition_date_month
You could do this:
with tn as (
select
*,
row_number() over (partition by id, date_trunc('month', aquisition_date) order by aquisition_date desc) as rn
from table_views
)
select * from tn where rn = 1
The tn cte adds a row number that counts incrementally in descending order of date, for each month/id.. Then you take only those with rn=1, which is the last aquisition_date of any given month, for each id

How to write Bigquery for below table

The output should be count of max items sold in a date.
This is bigquery table:
item,date
apple,1-1-2020
apple,1-1-2020
pear,1-1-2020
pear,1-1-2020
pear,1-2-2020
pear,1-2-2020
pear,1-2-2020
orange,1-2-2020
Expected output:
item,date
apple,1-1-2020
pear,1-1-2020
pear,1-2-2020
Consider below approach
select item, date, count(1) sales
from `project.dataset.table`
group by item, date
qualify rank() over(partition by date order by sales desc) = 1
When applied to sample data in your question - output is
If for some reason, you don't want to have sales column in your output - use below
select item, date
from `project.dataset.table`
group by item, date
qualify rank() over(partition by date order by count(1) desc) = 1
if applied to sample data in your question - output is
The following query should do it:
SELECT
item,
sale_date,
FROM (
SELECT
sample.*,
COUNT(item) AS item_count
FROM
sample
GROUP BY
sample.sale_date,
item )
# Here you need to use a WHERE (or HAVING, or GROUP BY) in order to be able to use QUALIFY
WHERE sale_date IS NOT NULL
QUALIFY RANK() OVER(PARTITION BY sale_date ORDER BY item_count DESC) = 1

Hive SQL: Find the last time a user had an entry

I am stuck a bit! I have a users table. The users get a score, but it doesn't come every day.
I need a way to show the score for the user for the last date that they got a score. It could be 1 month ago and I have 50M rows per day, so i can't just ingest all the partitions
Any idea how I can do this?
select userid, score from user_table where dt = 20201206
Get the most recent record as below:
select userid, score
from
(select userid, score, row_number() over (partition by userid order by dt desc) as rn
from user_table)
where rn = 1

Sum having a condition

I've a table that has this information:
And need to get the following information:
If the country of the same person name (in this case Artur) is different, then I need to sum the two values of quantity from the max date (in this case 04/10) and return both person (Artur) and the qty (15k)
If the country of the same person name (in this case Joseph) is the same, then I need only the first row of the max date available.
I'm really struguling as I'm not sure how to implement the logic into my code:
Select
table.person,
table.quantity
From
(
Select
table.date,
table.person,
table.country,
table.quantity,
ROW_NUMBER () over (
PARTITION by table.code, table.person
ORDER by table.date DESC
) AS rn
FROM
table
WHERE table.date >= DATE '{2020-04-10}' -5
) a
WHERE a.RN IN (1,2)
Is it possible to create a rule to sum rows 1 and 2 when country is different (Artur case) and only return row number 1 when the country is the same for a name (Joseph case)?
Use dense_rank() or max() as a window function:
select person, sum(quantity)
from (select t.*,
max(date) over (partition by person) as max_date
from t
) t
where date = max_date
group by person;
EDIT:
Hmmm . . . I think you might want one row per country per person on the max date. If so:
select person, sum(quantity)
from (select t.*,
row_number() over (partition by person, country order by date desc) as seqnum_pc,
rank() over (partition by person order by date desc) as seqnum_p
from t
) t
where seqnum_p = 1 and seqnum_pc = 1
group by person;

Amazon Redshift rank group

I'm trying to do a ranking on the 1st 10, and then group the remaining in groups of 1000 (based on volume)
Below is the desired results, whats the easiest way to do this?
desired results
I can get the ranking on volume all the way down using the following, but would like to group anything more than a ranking of 10
DENSE_RANK() over (PARTITION BY date ORDER BY count (DISTINCT volume_key) DESC)as rnk_loc_Vol
You can try to rank with regular function first (raw rank), then get tenth record's volume next to every row and produce another column (final rank) which is 1 to 10 or 11+integer division of the delta between tenth record volume and row's volume by 1000.
with
ranked_entries as (
select *
,dense_rank() over (partition by date order by volume desc) as raw_rnk
from tbl
)
,tenth_entry as (
select *
,min(case when raw_rnk<11 then volume end) over (partition by date) as tenth_record_volume
from tbl
)
select *
,case
when raw_rnk<11 then raw_rnk
else 11+(tenth_record_volume-volume)/1000
end as final_rnk
from tenth_entry
(haven't tested though)