BIGQUERY Sum Col1 based on unique entries in Col2 and on specific date - sum

I am trying to sum a column1 (invoice_value) in BQ based on a specific date but I want to avoid the duplicates in Column2 (invoice_no).
So far I can sum the column1, but the total sum I get includes several duplicates in column2 (invoice_no)
SELECT SUM(invoices_value) as INVOICES FROM my_data
WHERE invoice_value IS NOT NULL
AND timestamp >='2021-03-01'
AND timestamp < '2021-03-02'
Help will be greatly appreciated.

You can try following query to remove duplicates records
SELECT SUM(invoices_value) as INVOICES FROM
(SELECET DISTINCT invoices_value, invoice_no, timestamp FROM my_data )
WHERE invoice_value IS NOT NULL
AND CAST(timestamp AS TIMESTAMP) >=TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND CAST (timestamp AS TIMESTAMP) < CURRENT_TIMESTAMP()

Related

Big query, how to split TIMESTAMP data type column

I am trying to split a column that contains TIMESTAMP data type into two separate columns DATE and TIME. Because I am trying to use WHERE clause with condition where time is MORE THAN 2MINUTES in my case : WHERE ride_length > to_timestamp'00:02:00' and is not working.
You can use EXTRACT to get date, time and minutes from a timestamp.
Example:
WITH table1 AS
(
SELECT TIMESTAMP("2022-06-27 10:00:00") AS dt
UNION ALL
SELECT TIMESTAMP("2022-06-27 12:03:00") AS dt
)
SELECT
EXTRACT(DATE FROM dt) AS date,
EXTRACT(TIME FROM dt) AS time
FROM table1
WHERE EXTRACT(MINUTE FROM dt) > 2

Getting sum of a column that needs a distinct value from other column

I have this table where I wanted to get the sum of the balance column but each item should have a unique value from the date column.
I'm trying to find all the rows in the balance column that are the same and have the same date, and then find the sum of the balance column.
sample data with unique dates:
balance
date
700
2021-07-03
700
2021-09-03
300
2021-09-04
500
2021-09-05
query used goes like:
select distinct a.balance, a.date from table a where a.date between (some date) and (some other date)
I have tried:
select sum(a.balance), a.date from table a where a.date between (some date) and (some other date) group by a.date
but the balance column shows the sum of all of the values in the column but shows distinct dates as shown below.
balance
date
893938
2021-07-03
858585
2021-09-03
728366
2021-09-04
665322
2021-09-05
I guess this is a job for a subquery. So let's take your problem step by step.
I'm trying to find all the rows in the balance column that are the same and have the same date,
This subquery gets you that, I believe. It give the same result as SELECT DISTINCT but it also counts the duplicated rows.
SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance
and then find the sum of the balance column.
Nest the subquery like this.
SELECT SUM(balance) summed_balance, date
FROM (
SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance
) subquery
GROUP BY date
If you only want to consider rows that actually have duplicates, change your subquery to
SELECT COUNT(*) num_same_rows, balance, date
FROM `table`
WHERE a.datum BETWEEN '2021-01-01' AND '2021-09-01'
GROUP BY date, balance
HAVING COUNT(*) >= 1
Be careful here, though. You didn't tell us what you want to do, only how you want to do it. The way you described your problem calls for discarding duplicated data before doing the sums. Is that right? Do you want to discard data?
2nd query you posted looks OK - sort of.
However, I think that it is the fact that date column contains not only date, but also time (as DATE datatype in Oracle does). Therefore, I'd say that it is trunc you need. Something like this:
SELECT TRUNC (a.datum) datum,
SUM (a.balance) sum_balance
FROM table_a a
WHERE a.datum BETWEEN DATE '2021-01-01' AND DATE '2021-09-01'
GROUP BY TRUNC (a.datum)

ORACLE SQL: Hourly Date to be group by day time and sum of the amount

I have the following situation:
ID DATE_TIME AMOUNT
23 14-MAY-2021 10:47:01 5
23 14-MAY-2021 11:49:52 3
23 14-MAY-2021 12:03:18 4
How can get the sum of the amount and take the DATE by day not hourly?
Example:
ID DATE_TIME TOTAL
23 20210514 12
I tried this way but i got error:
SELECT DISTINCT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD'), SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY TOTAL, DATE_TIME
You don't need DISTINCT if you use GROUP BY - anything that is grouped must be distinct unless it joined to something else later on that caused it to repeat again
You were almost there too
SELECT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD') AS DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TO_CHAR(DATE_TIME, 'YYYYMMDD')
You need to group by the output of the function, not the input. Not every database can GROUP BY aliases used in the select (technically the SELECT hasn't been done by the time the GROUP is done so the aliases don't exist yet, and you wouldnt group by the total because that's an aggregate (the result of summing up every various value in the group)
If you need to do further work with that date, don't convert it to a string.. Cut the time off using TRUNC:
SELECT ID, TRUNC(DATE_TIME) as DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TRUNC(DATE_TIME)
TRUNC can cut a date down to other parts, for example TRUNC(DATE_TIME, 'HH24') will remove the minutes and seconds but leave the hours
Convert the DATE column to a string with the required accuracy and then group on that:
SELECT ID,
TO_CHAR("DATE", 'YYYY-MM-DD'),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TO_CHAR("DATE", 'YYYY-MM-DD')
or truncate the value so that the time component is set to midnight for each date:
SELECT ID,
TRUNC("DATE"),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TRUNC("DATE")
(Note: DATE is a keyword and cannot be used as an identifier unless you use a quoted-identifier; and you would need to use the quotes, and the exact case, everytime you refer to the column. You would be better to rename the column to something else that is not a keyword.)

Appending the result query in bigquery

I am doing a query where the query will append the data from previous date as the outcome in BigQuery.
So, the result data for today will be higher than yesterdays as the data is appending by days.
So far, what I only managed to get the outcome is the data by days (where you can see the number of ID declining and is not appending from previous day) as this result:
What should I do to add appending function in the query so each day will get the result of data from the previous day in bigquery?
code:
WITH
table1 AS (
SELECT
ID,
...
FROM t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
),
table2 AS (
SELECT
ID,
COUNTIF((rating < 7) as bad,
COUNTIF((rating >= 7 AND SAFE_CAST(NPS_Rating as INT64) < 9) as intermediate,
COUNTIF((rating as good
FROM
t
WHERE DATE_SUB('2020-01-31', INTERVAL 31 DAY) and '2020-01-31'
)
SELECT
DATE_SUB('2020-01-31', INTERVAL 31 DAY) as date,
*
FROM table1
FULL OUTER JOIN table2 USING (ID)
If you have counts that you want to accumulate, then you want a cumulative sum. The query would look something like this:
select datecol, count(*), sum(count(*)) over (order by datecol)
from t
group by datecol
order by datecol;

adding all columns from mutiple tables

I have a simple question.
I need to count all records from multiple tables with day and hour and add all of them together in a single final table.
So the query for each tab is something like this
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_1
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_2
select timestamp_trunc(timestamp,day) date, timestamp_trunc(timestamp,hour) hour, count(*) from table_3
and so on so forth
I would like to combine all the results showing number of total records for each day and hour from these tables.
Expected results will be like this
date, hour, number of records of table 1, number of records of table 2, number of records of table 3 ........
What would the most optimum SQL query for this?
Probably the simplest way is to union them together and aggregation:
select timestamp_trunc(timestamp, hour) as hh,
countif(which = 1) as num_1,
countif(which = 2) as num_2
from ((select timestamp, 1 as which
from table_1
) union all
(select timestamp, 2 as which
from table_2
) union all
. . .
) t
group hh
order by hh;
You are using timestamp_trunc(). It returns a timestamp truncated to the hour -- there is no need to also include the date.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
TIMESTAMP_TRUNC(TIMESTAMP, DAY) day,
EXTRACT(HOUR FROM TIMESTAMP) hour,
COUNT(*) cnt,
_TABLE_SUFFIX AS table
FROM `project.dataset.table_*`
GROUP BY day, hour, table