equating an entry to an aggregated version of itself - google-bigquery

I am trying to find if an entry's value is the max of the grouped value. Its purpose is to sit in a larger if logic.
Which I'd expect would look something like this:
SELECT
t.id as t_id,
sum(if(t.value = max(t.value), 1, 0)) AS is_max_value
FROM dataset.table AS t
GROUP BY t_id
The response is:
Error: Expression 't.value' is not present in the GROUP BY list
How should my code look to do this?

You first need to compile in a subquery the max value, then join again the value to the table.
Using the public data set available here is an example:
SELECT
t.word,
t.word_count,
t.corpus_date
FROM
[publicdata:samples.shakespeare] t
JOIN (
SELECT
corpus_date,
MAX(word_count) word_count,
FROM
[publicdata:samples.shakespeare]
GROUP BY
1 ) d
ON
d.corpus_date=t.corpus_date
AND t.word_count=d.word_count
LIMIT
25
Results:
+-----+--------+--------------+---------------+---+
| Row | t_word | t_word_count | t_corpus_date | |
+-----+--------+--------------+---------------+---+
| 1 | the | 762 | 1597 | |
| 2 | the | 894 | 1598 | |
| 3 | the | 841 | 1590 | |
| 4 | the | 680 | 1606 | |
| 5 | the | 942 | 1607 | |
| 6 | the | 779 | 1609 | |
| 7 | the | 995 | 1600 | |
| 8 | the | 937 | 1599 | |
| 9 | the | 738 | 1612 | |
| 10 | the | 612 | 1595 | |
| 11 | the | 848 | 1592 | |
| 12 | the | 753 | 1594 | |
| 13 | the | 740 | 1596 | |
| 14 | I | 828 | 1603 | |
| 15 | the | 525 | 1608 | |
| 16 | the | 363 | 0 | |
| 17 | I | 629 | 1593 | |
| 18 | I | 447 | 1611 | |
| 19 | the | 715 | 1602 | |
| 20 | the | 717 | 1610 | |
+-----+--------+--------------+---------------+---+
You can see that retains the word that have the maximum word_count in the partition defined by corpus_date

Use window function to "spread" the max value over all relevant records.
this way you can avoid the Join.
SELECT
*
FROM (
SELECT
corpus,
corpus_date,
word,
word_count,
MAX(word_count) OVER (PARTITION BY corpus) AS Max_Word_Count
FROM
[publicdata:samples.shakespeare] )
WHERE
word_count=Max_Word_Count

select
id,
value,
integer(value = max_value) as is_max_value
from (
select id, value, max(value) over(partition by id) as max_value
from dataset.table
)
Explanation:
Inner select - for each row/record calculates max of value among all rows with the same id
Outer select - for each row/record compares row's value with max value for respective group and then converts true or false into respectively 1 or 0 (as per expectation in question)

Related

i need to return column number from sorting selected row

i have a table like this:
| uid | date |
+-----+------------+
| 032 | 16-04-2022 |
| 453 | 15-04-2022 |
| 425 | 13-04-2022 |
| 563 | 14-04-2022 |
i need to sorting them and return with new column like this:
| uid | date | num |
+-----+------------+-----+
| 425 | 13-04-2022 | 1 |
| 563 | 14-04-2022 | 2 |
| 453 | 15-04-2022 | 3 |
| 032 | 16-04-2022 | 4 |
WITH CTE(UID,DATED)AS
(
SELECT '032',TO_DATE('16-04-2022','DD-MM-YYYY')UNION ALL
SELECT '453',TO_DATE('15-04-2022','DD-MM-YYYY')UNION ALL
SELECT '425',TO_DATE('13-04-2022','DD-MM-YYYY')UNION ALL
SELECT '563',TO_DATE('14-04-2022','DD-MM-YYYY')
)
SELECT C.UID,C.DATED,
ROW_NUMBER()OVER(ORDER BY C.DATED ASC)NUM
FROM CTE AS C
You can use ROW_NUMBER()-functionality. CTE is representation of your table's data

In database my query getting first row When i grouped

SELECT
EVENT_ID, COUNT(*), SEQUENCE_NBR
FROM
ALERTS
WHERE
ACKNOWLEDGED = 0
AND SRC_EXT = '7878'
GROUP BY
EVENT_ID
ORDER BY
COUNT(*) DESC;
Running this query to get id,count and sequence number. by group by event id and count i am getting sequence number first row .
+----------+--------------+
| EVENT_ID | SEQUENCE_NBR |
+----------+--------------+
| 150 | 9752 |
| 150 | 9764 |
| 150 | 9775 |
| 170 | 9755 |
| 170 | 9763 |
| 170 | 9774 |
| 217 | 9748 |
| 217 | 9759 |
| 217 | 9770 |
| 218 | 9751 |
| 218 | 9762 |
| 218 | 9773 |
| 273 | 9749 |
| 273 | 9760 |
| 273 | 9771 |
| 285 | 9750 |
| 285 | 9761 |
| 285 | 9772 |
+----------+--------------+
This is my data in db by using above query
+----------+----------+--------------+
| EVENT_ID | COUNT(*) | SEQUENCE_NBR |
+----------+----------+--------------+
| 150 | 3 | 9752 |
| 170 | 3 | 9755 |
| 217 | 3 | 9748 |
| 218 | 3 | 9751 |
| 273 | 3 | 9749 |
| 285 | 3 | 9750 |
+----------+----------+--------------+
i need data in same format with seuence number should
150 | 3 | 9775
Your query is malformed. You have SEQUENCE_NBR in the SELECT, but it is not in the GROUP BY. In most databases (including the more recent versions of MySQL), this generates an error. Happily that is so.
If you want the maximum SEQUENCE_NBR, then use the MAX() function:
SELECT EVENT_ID, COUNT(*), MAX(SEQUENCE_NBR) as SEQUENCE_NBR
FROM ALERTS
WHERE ACKNOWLEDGED = 0 AND
SRC_EXT = '7878'
GROUP BY EVENT_ID
ORDER BY COUNT(*) DESC;
you can try like below
SELECT
EVENT_ID, COUNT(*), SEQUENCE_NBR
FROM
ALERTS
WHERE
ACKNOWLEDGED = 0
AND SRC_EXT = '7878'
GROUP BY
EVENT_ID
ORDER BY
COUNT(*) DESC,SEQUENCE_NBR desc;

Getting two columns one containing and one not containing a grouped value

My data looks like this -
+-----------+-----------+-----------+----------+
| FLIGHT_NO | FL_DATE | SERIAL_NO | PILOT_NO |
+-----------+-----------+-----------+----------+
| 501 | 15-OCT-19 | 456710 | 345 |
| 521 | 16-OCT-19 | 562911 | 345 |
| 534 | 17-OCT-19 | 877694 | 345 |
| 577 | 17-OCT-19 | 338157 | 345 |
| 501 | 14-OCT-19 | 921225 | 346 |
| 534 | 15-OCT-19 | 877694 | 346 |
| 534 | 14-OCT-19 | 338157 | 347 |
| 590 | 16-OCT-19 | 650012 | 347 |
| 531 | 14-OCT-19 | 562911 | 348 |
| 531 | 15-OCT-19 | 562911 | 348 |
| 501 | 16-OCT-19 | 220989 | 349 |
| 521 | 18-OCT-19 | 650012 | 349 |
| 590 | 14-OCT-19 | 562911 | 351 |
| 577 | 18-OCT-19 | 877694 | 351 |
| 590 | 18-OCT-19 | 456710 | 346 |
+-----------+-----------+-----------+----------+
My aim is to return the total number of flights flying and not flying on 18-oct-19.
I'm doing it with dual but that doesn't seem to be the correct/best method.
Can anyone help me do it the correct way?
SELECT
(SELECT COUNT(FLIGHT_NO) NO_FLY FROM schd_flight WHERE fl_date = '18-OCT-19') AS FLY,
(SELECT COUNT(FLIGHT_NO) NO_FLY FROM schd_flight WHERE fl_date <> '18-OCT-19') AS NO_FLY
FROM dual;
My output -
+-----+--------+
| fly | no_fly |
+-----+--------+
| 3 | 12 |
+-----+--------+
Simply use sum with case statement
Select
sum(case when fl_date = '18-OCT-19' then 1 end) fly,
sum(case when fl_date <> '18-OCT-19' then 1 end) no_fly
From schd_flight;
Cheers!!
I think the second query is not necessary, no_fly = total - fly.
So I came up with my solution, may improve the query time :
SELECT sub.FLY as FLY, (SELECT count(*) from schd_flight) - sub.FLY as NO_FLY
FROM (
SELECT COUNT(CASE when fl_date = '18-OCT-19' then 1 end) AS FLY
from schd_flight
) sub;
Not tested yet though.

How to calculate running total in SQL

I have my dataset in the given format
It's a month level data along with salary for each month.
I need to calculate cumulative salary for each month end. How can I do this
+----------+-------+--------+---------------+
| Account | Month | Salary | Running Total |
+----------+-------+--------+---------------+
| a | 1 | 586 | 586 |
| a | 2 | 928 | 1514 |
| a | 3 | 726 | 2240 |
| a | 4 | 538 | 538 |
| b | 1 | 956 | 1494 |
| b | 3 | 667 | 2161 |
| b | 4 | 841 | 3002 |
| c | 1 | 826 | 826 |
| c | 2 | 558 | 1384 |
| c | 3 | 558 | 1972 |
| c | 4 | 735 | 2707 |
| c | 5 | 691 | 3398 |
| d | 1 | 670 | 670 |
| d | 4 | 838 | 1508 |
| d | 5 | 1000 | 2508 |
+----------+-------+--------+---------------+
I need to calculate running total column which is cumulative column. How can I do efficiently in SQL?
You can use SUM with ORDER BY clause inside the OVER clause:
SELECT Account, Month, Salary,
SUM(Salary) OVER (PARTITION BY Account ORDER BY Month) AS RunningTotal
FROM mytable

Select rows with greatest value

I have a MS Access query called qryA380 that uses multiple INNER JOIN to join a couple of tables.
Running the query will show the results like this:
+----+-----------+----------+------------+
| ID | Aircraft | Route.ID | Passengers |
+----+-----------+----------+------------+
| 23 | A-380 | 1 | 556 |
| 2 | A-380 | 2 | 652 |
| 54 | A-380 | 2 | 489 |
| 16 | A-380 | 1 | 598 |
| 39 | A-380 | 1 | 627 |
| 45 | A-380 | 3 | 392 |
| 74 | A-380 | 3 | 726 |
+----+-----------+----------+------------+
My plan is to select the smallest Route.ID (in this case it's 1) and the final result should be:
+----+-----------+----------+------------+
| ID | Aircraft | MinRoute | Passengers |
+----+-----------+----------+------------+
| 23 | A-380 | 1 | 556 |
| 16 | A-380 | 1 | 598 |
| 39 | A-380 | 1 | 627 |
+----+-----------+----------+------------+
I thought this would be straight forward and simple. To save some time, I create a second query to do this work:
SELECT [qryA380].ID, [qryA380].Aircraft, MIN([qryA380].Route.ID) AS MinRoute, [qryA380].Passengers
FROM [qryA380]
GROUP BY [qryA380].ID, [qryA380].Aircraft, [qryA380].Passengers
But I kept getting a table identical with the table generated by qryA380. It has all the Route.ID on the results.
The Passenger and ID column should be excluded since they have unique values. By using a Subquery, I'm now able to generate the desired results:
SELECT [qryA380].*
FROM (
SELECT MIN([qryA380].Route.ID) AS MinRoute
FROM [qryA380]
) tblMinRoute
INNER JOIN [qryA380]
ON [qryA380].Route.ID = tblMinRoute.MinRoute
Try this
SELECT [qryA380].*
FROM [qryA380]
WHERE [qryA380].Route.ID = (
SELECT min(Route.ID)
FROM [qryA380]
)