How to write an SQL aggregate function/query - sql

I have a query that displays the total value (sum of amount) for each day.
The query:
SELECT CAST(date AS DATE), SUM(amount) AS total_amount FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
The CAST is to abbreviate the datetime format to just a date.
Now I want to select only the day which has the highest sum with the max function.
To do this I tried writing the following aggregate query:
SELECT s.date, s.total_amount
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)) s
WHERE s.total_amount = (SELECT MAX(s.total_amount) FROM table)
This does not work. I know the problem is with the final WHERE clause, but I need help with making it work.

Use ORDER BY with LIMIT :
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC
LIMIT 1;
If you are working with SQL Server then you can use TOP :
SELECT TOP (1) CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC;
If you want ties then you can use window function :
SELECT t.*
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount,
RANK() OVER (ORDER BY SUM(amount) DESC) as Seq
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) t
WHERE seq = 1;
You can use CTE :
WITH CTE AS (
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
)
SELECT c.*
FROM CTE C
WHERE C.total_amount = (SELECT MAX(total_amount) FROM CTE);
Note : If your DBMS doesn't support CTE expression then you need repeat the SELECT statement in Subquery.
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
HAVING SUM(amount) = (SELECT MAX(total_amount)
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) t
);

If you are using SQL Server then you can use TOP
SELECT TOP 1 CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC
Use window function row_number() - should work with MySQL 8.0, PostgreSQL, Oracle and SQL Server.
select
date,
total_amount
from
(
SELECT
CAST(date AS DATE) as date,
SUM(amount) AS total_amount,
row_number() over (order by SUM(amount) desc) as rnk
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) val
where rnk = 1

SELECT s.dt, s.total_amount
FROM (SELECT CAST(date AS DATE) as dt, SUM(amount) AS total_amount
FROM table
WHERE CAST(date as date) BETWEEN '2019-01-01' AND '2019-12-31'
GROUP BY CAST(date AS DATE)) s
WHERE s.total_amount = (Select max(total_amount)
FROM (SELECT CAST(date AS DATE) as dt, SUM(amount) AS total_amount
FROM table
WHERE CAST(date as date) BETWEEN '2019-01-01' AND '2019-12-31'
GROUP BY CAST(date AS DATE)) ss )

Related

How to merge SQL Select queries?

I have three queries executed consistently:
SELECT TOP 1 max(value) FROM tableA
where site = 18
and (CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
SELECT TOP 1 max(value) FROM tableA
where site = 3
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
SELECT TOP 1 max(value) FROM tableA
where site = 4
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
I want to combine this three queries into one and query sites 18, 3, 4 via one select, but I don't see how. Please advise how to merge this 3 queries into one.
Any help will be appreciated!
You seem to want the maximum value for three different sites on the last day in February that has their data.
If so, this is simpler:
select site_id, max(value)
from (select t.*,
dense_rank() over (partition by site order by tstamp / (1000 * 24 * 60 * 60) desc) as seqnum
from t
where tstamp >= datediff(second, '1970-01-01', '2020-02-01') * 1000 and
tstamp < datediff(second, '1970-01-01', '2020-02-29') * 1000 and
site_id in (18, 3, 4)
) t
where seqnum = 1;
Actually, February in 2020 has 29 days. Perhaps you want the entire month; if so, then use '2020-03-01' for the second comparison.
Note that the manipulations on the date/time values are only on the "constant" side. This allows the query to use an index on tstamp if an appropriate index is available.
You can use the analytical function row_number in your existing query as follows:
Select * from
(SELECT max(value), site,
Row_number() over (partition by site order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) desc) as rn FROM tableA
where site in (4,18,3
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE), site)
Where rn = 1

Big query error:Failed to parse input string

Using Standard sql query but getting subject mentioned error(Failed to parse input string).
#standardSQL
SELECT
date,
EXTRACT(DAY FROM date) AS day_of_week,
EXTRACT(WEEK FROM date) AS week_of_year,
FORMAT_DATE("%y-%m", date) AS yyyymm
FROM(
SELECT PARSE_DATE('%y%m%d', date) date, campaign
FROM `tech-team-staging-2019.DFW_GA_Data_v1_05122019.DFW_G_Analytics_Predicted_data_v1_05122019`
GROUP BY 1,2
)
Below is for BigQuery Standard SQL
Note: it is not clear what your data field look like - so below are the options
in case if your date field is a string with YYYY-MM-DD - you should use below
#standardSQL
SELECT
date,
EXTRACT(DAY FROM date) AS day_of_week,
EXTRACT(WEEK FROM date) AS week_of_year,
FORMAT_DATE("%Y-%m", date) AS yyyymm
FROM(
SELECT PARSE_DATE('%Y-%m-%d', date) date, campaign
FROM `tech-team-staging-2019.DFW_GA_Data_v1_05122019.DFW_G_Analytics_Predicted_data_v1_05122019`
GROUP BY 1,2
)
in case if it is - YY-MM-DD
#standardSQL
SELECT
date,
EXTRACT(DAY FROM date) AS day_of_week,
EXTRACT(WEEK FROM date) AS week_of_year,
FORMAT_DATE("%Y-%m", date) AS yyyymm
FROM(
SELECT PARSE_DATE('%y-%m-%d', date) date, campaign
FROM `tech-team-staging-2019.DFW_GA_Data_v1_05122019.DFW_G_Analytics_Predicted_data_v1_05122019`
GROUP BY 1,2
)
finally, if it is YYMMDD
#standardSQL
SELECT
date,
EXTRACT(DAY FROM date) AS day_of_week,
EXTRACT(WEEK FROM date) AS week_of_year,
FORMAT_DATE("%Y-%m", date) AS yyyymm
FROM(
SELECT PARSE_DATE('%y%m%d', date) date, campaign
FROM `tech-team-staging-2019.DFW_GA_Data_v1_05122019.DFW_G_Analytics_Predicted_data_v1_05122019`
GROUP BY 1,2
)
and yet one more - YYYYMMDD
#standardSQL
SELECT
date,
EXTRACT(DAY FROM date) AS day_of_week,
EXTRACT(WEEK FROM date) AS week_of_year,
FORMAT_DATE("%Y-%m", date) AS yyyymm
FROM(
SELECT PARSE_DATE('%Y%m%d', date) date, campaign
FROM `tech-team-staging-2019.DFW_GA_Data_v1_05122019.DFW_G_Analytics_Predicted_data_v1_05122019`
GROUP BY 1,2
)

How to get count for each day between certain times

I'm trying to get a total count for each day between 07:00 and 19:00 for the last 7 days. The below query only displays the count for the date 7 days back and not each individual day. Any help would be greatly appreciated. Thanks!
DECLARE #Date AS DATETIME = DATEADD(HOUR, 7, CAST(CAST(DATEADD(DAY, -7, GETDATE()) AS DATE) AS DATETIME))
DECLARE #Date2 AS DATETIME = DATEADD(HOUR, 19, CAST(CAST(DATEADD(DAY, -7, GETDATE()) AS DATE) AS DATETIME))
SELECT CONVERT(NVARCHAR(20), DATE, 120) AS Report_Date, COUNT(DISTINCT GUID) AS ROW_COUNT
FROM TABLE WITH (NOLOCK)
WHERE DATEADD(MINUTE, +270, DATE) >= #Date
AND DATEADD(MINUTE, +270, DATE) < #Date2
GROUP BY CONVERT(NVARCHAR(20), DATE, 120)
As you need past past 7days so use getdate()- 7
SELECT CAST(DATE as DATE) AS Report_Date,
COUNT(DISTINCT GUID) AS ROW_COUNT
FROM t
WHERE DATEPART(HOUR, DATE) >= 7 AND
DATEPART(HOUR, DATE) < 19
and CAST(DATE as DATE)>=getdate()-7 and CAST(DATE as DATE)<=getdate()
GROUP BY CAST(DATE as DATE)
ORDER BY CAST(DATE as DTE)
Don't convert date columns to dates. Use date functions. I don't understand why you are adding 270 minutes to the date.
I would go for a more direct answer to your question:
SELECT CAST(DATE as DATE) AS Report_Date,
COUNT(DISTINCT GUID) AS ROW_COUNT
FROM TABLE
WHERE DATEPART(HOUR, DATE) >= 7 AND
DATEPART(HOUR, DATE) < 19
GROUP BY CAST(DATE as DATE)
ORDER BY CAST(DATE as DTE)
;With T AS
(
SELECT CAST(DATE as DATE) AS Report_Date,COUNT(DISTINCT GUID) AS ROW_COUNT
FROM tbl
WHERE
DATEPART(HOUR, DATE) >= 7 AND DATEPART(HOUR, DATE) < 19
)
SELECT Report_date,Row_Count From T GROUP BY Report_date
ORDER BY Report_date

SQL - How to find missing activity days using start_date and end_date

I have a few fields in a database that look like this:
trip_id
start_date
end_date
start_station_name
end_station_name
I need to write a query that shows all the stations with no activity on a particular day in the year 2015. I wrote the following query but it's not giving the right output:
select
start_station_name,
extract(date from start_date) as dt,
count(*)
from
trips_table
where
(
start_date >= timestamp('2015-01-01')
and
start_date < timestamp('2016-01-01')
)
group by
start_station_name,
dt
order by
count(*)
Can someone help come up with the right query? Thanks in advance!
Below is for BigQuery Standard SQL
It assumes start_date and end_date are of DATE type
It also assumes that all days in between start_date and end_date are "dedicated" to station in start_station_name field, which most likely not what is expected but question is missing details here thus such an assumption
#standardSQL
WITH days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
You can test/play it with below simple/dummy data
#standardSQL
WITH `trips_table` AS (
SELECT 1 AS trip_id, DATE '2015-01-01' AS start_date, DATE '2015-12-01' AS end_date, '111' AS start_station_name UNION ALL
SELECT 2, DATE '2015-12-10', DATE '2015-12-31', '111'
),
days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
ORDER BY station, day
the output is like below
station day
111 2015-12-02
111 2015-12-03
111 2015-12-04
111 2015-12-05
111 2015-12-06
111 2015-12-07
111 2015-12-08
111 2015-12-09
Use recursion for this purpose: try this SQL SERVER
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
Where CAST(sample.dt as date) NOT IN (
SELECT CAST(start_date as date)
FROM tablename
WHERE start_date >= '2015-01-01 00:00:00'
AND start_date < '2016-01-01 00:00:00'
)
Option(maxrecursion 0)
If you want the station data with it then you can use left join as :
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
left join tablename
on CAST(sample.dt as date) = CAST(tablename.start_date as date)
where sample.dt>= '2015-01-01 00:00:00' and sample.dt< '2016-01-01 00:00:00' )
Option(maxrecursion 0)
For mysql, see this fiddle. I think this would help you....
SQL Fiddle Demo

Summing a column by all transactions in a day

I'm trying to sum up all transactions for each day in my database.
SELECT DISTINCT
SUM(Balance) OVER (partition by Date) AS account_total,
Date
FROM tbl_FundData
ORDER BY Date;
The problem with the output is if a transaction is completed at a different time it becomes its own unique sum instead of rolling into the one day. I'm not sure how to modify the query to fix this.
I'm using SQL Server 2008 (I think)
Seems yo use DateTime as column data type so cast it as DATE :
SELECT DISTINCT SUM(Balance) OVER (partition by CAST([Date] AS DATE)) AS account_total, CAST([Date] AS DATE)
FROM tbl_FundData
ORDER BY CAST([Date] AS DATE);
Also you'd better use Group By in this case as :
SELECT SUM(Balance) AS account_total, CAST([Date] AS DATE)
FROM tbl_FundData
GROUP BY CAST([Date] AS DATE);
SELECT DISTINCT
SUM(Balance) OVER (partition by convert(varchar, Date, 103)) AS account_total,
convert(varchar, Date, 103) Date
FROM tbl_FundData
ORDER BY convert(varchar,Date,103)
SELECT SUM(Balance) account_total
,CAST(FLOOR(CAST(IssueDate AS FLOAT)) AS DATETIME) IssueDate
FROM tbl_FundData
GROUP BY
CAST(FLOOR(CAST(IssueDate AS FLOAT)) AS DATETIME)
ORDER BY
CAST(FLOOR(CAST(IssueDate AS FLOAT)) AS DATETIME)
I guess you have a timestamp as well in your date that's why you get unique values when you sum. Use this:
SELECT sum(balance)
FROM tbl_FundData
GROUP BY convert(date ,date, 106)
106 is a format for date. But you could use whatever.