How to merge SQL Select queries? - sql

I have three queries executed consistently:
SELECT TOP 1 max(value) FROM tableA
where site = 18
and (CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,t_stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
SELECT TOP 1 max(value) FROM tableA
where site = 3
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
SELECT TOP 1 max(value) FROM tableA
where site = 4
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE)
order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) DESC;
I want to combine this three queries into one and query sites 18, 3, 4 via one select, but I don't see how. Please advise how to merge this 3 queries into one.
Any help will be appreciated!

You seem to want the maximum value for three different sites on the last day in February that has their data.
If so, this is simpler:
select site_id, max(value)
from (select t.*,
dense_rank() over (partition by site order by tstamp / (1000 * 24 * 60 * 60) desc) as seqnum
from t
where tstamp >= datediff(second, '1970-01-01', '2020-02-01') * 1000 and
tstamp < datediff(second, '1970-01-01', '2020-02-29') * 1000 and
site_id in (18, 3, 4)
) t
where seqnum = 1;
Actually, February in 2020 has 29 days. Perhaps you want the entire month; if so, then use '2020-03-01' for the second comparison.
Note that the manipulations on the date/time values are only on the "constant" side. This allows the query to use an index on tstamp if an appropriate index is available.

You can use the analytical function row_number in your existing query as follows:
Select * from
(SELECT max(value), site,
Row_number() over (partition by site order by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) desc) as rn FROM tableA
where site in (4,18,3
and (CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) >= '2017-2-1'
and CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE) <= '2017-2-28')
Group by CAST(DATEADD(s,stamp/1000,'1970-01-01 00:00:00') as DATE), site)
Where rn = 1

Related

How to get the percentage change from same time 7 days ago?

I have a big PostgreSQL database with time series data.
I query the data with a resample to one hour. What I want is to compare the the mean value from the last hour to the value 7 days ago at the same time and I don't know how to do it.
This is what I use to get the latest value.
SELECT DATE_TRUNC('hour', datetime) AS time, AVG(value) as value, id FROM database
GROUP BY id, time
WHERE datetime > now()- '01:00:00'::interval
You can use a CTE to calculate last week's average in the same time period, then join on id and hour.
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
order by 1
I'm making a few assumptions on how your data appear.
**EDIT - JUST NOTICED YOU ASKED FOR PERCENT CHANGE
Showing change as decimal...
select id,
extract(hour from time_now) as hour_now,
avg_now,
avg_last_week,
coalesce(((avg_now - avg_last_week) / avg_last_week), 0) AS CHANGE
from (
with last_week as
(
SELECT
id,
extract(hour from datetime) as time,
avg(value) as avg_value
FROM my_table
where DATE_TRUNC('hour', datetime) =
(date_trunc('hour', now() - interval '7 DAYS'))
group by 1,2
)
select n.id,
DATE_TRUNC('hour', n.datetime) AS time_now,
avg(n.value) as avg_now,
t.avg_value as avg_last_week
from my_table n
left join last_week t
on t.id = n.id
and t.time = extract(hour from n.datetime)
where datetime > now()- '01:00:00'::interval
group by 1,2,4
)z
group by 1,2,3,4
order by 1,2
db-fiddle found here: https://www.db-fiddle.com/f/rWJATypGzHPZ8sG2vXAGXC/4

How to write an SQL aggregate function/query

I have a query that displays the total value (sum of amount) for each day.
The query:
SELECT CAST(date AS DATE), SUM(amount) AS total_amount FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
The CAST is to abbreviate the datetime format to just a date.
Now I want to select only the day which has the highest sum with the max function.
To do this I tried writing the following aggregate query:
SELECT s.date, s.total_amount
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)) s
WHERE s.total_amount = (SELECT MAX(s.total_amount) FROM table)
This does not work. I know the problem is with the final WHERE clause, but I need help with making it work.
Use ORDER BY with LIMIT :
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC
LIMIT 1;
If you are working with SQL Server then you can use TOP :
SELECT TOP (1) CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC;
If you want ties then you can use window function :
SELECT t.*
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount,
RANK() OVER (ORDER BY SUM(amount) DESC) as Seq
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) t
WHERE seq = 1;
You can use CTE :
WITH CTE AS (
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
)
SELECT c.*
FROM CTE C
WHERE C.total_amount = (SELECT MAX(total_amount) FROM CTE);
Note : If your DBMS doesn't support CTE expression then you need repeat the SELECT statement in Subquery.
SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
HAVING SUM(amount) = (SELECT MAX(total_amount)
FROM (SELECT CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) t
);
If you are using SQL Server then you can use TOP
SELECT TOP 1 CAST(date AS DATE), SUM(amount) AS total_amount
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
ORDER BY total_amount DESC
Use window function row_number() - should work with MySQL 8.0, PostgreSQL, Oracle and SQL Server.
select
date,
total_amount
from
(
SELECT
CAST(date AS DATE) as date,
SUM(amount) AS total_amount,
row_number() over (order by SUM(amount) desc) as rnk
FROM table
WHERE date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 00:00:00'
GROUP BY CAST(date AS DATE)
) val
where rnk = 1
SELECT s.dt, s.total_amount
FROM (SELECT CAST(date AS DATE) as dt, SUM(amount) AS total_amount
FROM table
WHERE CAST(date as date) BETWEEN '2019-01-01' AND '2019-12-31'
GROUP BY CAST(date AS DATE)) s
WHERE s.total_amount = (Select max(total_amount)
FROM (SELECT CAST(date AS DATE) as dt, SUM(amount) AS total_amount
FROM table
WHERE CAST(date as date) BETWEEN '2019-01-01' AND '2019-12-31'
GROUP BY CAST(date AS DATE)) ss )

SQL Server, filter for max date and max date minus 7 days

I'm trying to design a view and apply several conditions on my timestamp (datetime): last date and last date minus 7 days.
This works fine for the last date:
SELECT *
FROM table
WHERE timestamp = (SELECT MAX(timestamp) FROM table)
I couldn't figure out the way to add minus 7 days so far.
I tried, for instance
SELECT *
FROM table
WHERE (timestamp = (SELECT MAX(timestamp) FROM table)) OR (timestamp = (SELECT DATEADD(DAY, -7, MAX(timestamp)) FROM table)
and some other variations, including GETDATE() instead of MAX, however, I'm getting the execution timeout messages.
Please let me know what logic should I follow in this case.
Data looks like this, but there's more of it :)
So I want to get data only for rows with 29/11/2019 and 22/11/2019. I have an additional requirement for filtering for factors, but it's a simple one.
If you care about dates, then perhaps you want:
select t.*
from t cross join
(select max(timestamp) as max_timestamp from t) tt
where (t.timestamp >= convert(date, max_timestamp) and
t.timestamp < dateadd(day, 1, convert(date, max_timestamp))
) or
(t.timestamp >= dateadd(day, -7, convert(date, max_timestamp)) and
t.timestamp < dateadd(day, -6, convert(date, max_timestamp))
);
So I ended up with the next code:
SELECT *
FROM table
WHERE (timestamp >= CAST(DATEADD(DAY, - 1, GETDATE()) AS datetime)) AND (timestamp < CAST(GETDATE() AS DATETIME)) OR
(timestamp >= CAST(DATEADD(DAY, - 8, GETDATE()) AS datetime)) AND (timestamp < CAST(DATEADD(day, - 7, GETDATE()) AS DATETIME)) AND (Factor1 = 'Criteria1' OR
Factor2 = 'Criteria2')
Not sure if it's the best or the most elegant solution, but it works for me.

SQL - How to find missing activity days using start_date and end_date

I have a few fields in a database that look like this:
trip_id
start_date
end_date
start_station_name
end_station_name
I need to write a query that shows all the stations with no activity on a particular day in the year 2015. I wrote the following query but it's not giving the right output:
select
start_station_name,
extract(date from start_date) as dt,
count(*)
from
trips_table
where
(
start_date >= timestamp('2015-01-01')
and
start_date < timestamp('2016-01-01')
)
group by
start_station_name,
dt
order by
count(*)
Can someone help come up with the right query? Thanks in advance!
Below is for BigQuery Standard SQL
It assumes start_date and end_date are of DATE type
It also assumes that all days in between start_date and end_date are "dedicated" to station in start_station_name field, which most likely not what is expected but question is missing details here thus such an assumption
#standardSQL
WITH days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
You can test/play it with below simple/dummy data
#standardSQL
WITH `trips_table` AS (
SELECT 1 AS trip_id, DATE '2015-01-01' AS start_date, DATE '2015-12-01' AS end_date, '111' AS start_station_name UNION ALL
SELECT 2, DATE '2015-12-10', DATE '2015-12-31', '111'
),
days AS (
SELECT day
FROM UNNEST(GENERATE_DATE_ARRAY('2015-01-01', '2015-12-31')) AS day
),
stations AS (
SELECT DISTINCT start_station_name AS station
FROM `trips_table`
)
SELECT s.*
FROM (SELECT * FROM stations CROSS JOIN days) AS s
LEFT JOIN (SELECT * FROM `trips_table`,
UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day) AS a
ON s.day = a.day AND s.station = a.start_station_name
WHERE a.day IS NULL
ORDER BY station, day
the output is like below
station day
111 2015-12-02
111 2015-12-03
111 2015-12-04
111 2015-12-05
111 2015-12-06
111 2015-12-07
111 2015-12-08
111 2015-12-09
Use recursion for this purpose: try this SQL SERVER
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
Where CAST(sample.dt as date) NOT IN (
SELECT CAST(start_date as date)
FROM tablename
WHERE start_date >= '2015-01-01 00:00:00'
AND start_date < '2016-01-01 00:00:00'
)
Option(maxrecursion 0)
If you want the station data with it then you can use left join as :
WITH sample AS (
SELECT CAST('2015-01-01' AS DATETIME) AS dt
UNION ALL
SELECT DATEADD(dd, 1, dt)
FROM sample s
WHERE DATEADD(dd, 1, dt) < CAST('2016-01-01' AS DATETIME)
)
SELECT * FROM sample
left join tablename
on CAST(sample.dt as date) = CAST(tablename.start_date as date)
where sample.dt>= '2015-01-01 00:00:00' and sample.dt< '2016-01-01 00:00:00' )
Option(maxrecursion 0)
For mysql, see this fiddle. I think this would help you....
SQL Fiddle Demo

How can I group by arbitary time period with SQL

This is similar but not equal to my previous question
That was about how to summarize log-items per day.
I use this SQL.
SELECT
[DateLog] = CONVERT(DATE, LogDate),
[Sum] = COUNT(*)
FROM PerfRow
GROUP BY CONVERT(DATE, LogDate)
ORDER BY [DateLog];
Now I want to improve that to summarize over an arbitary time period.
So instead of sum per day, sum per hour or 5 minutes.
Is this possible ?
I use SQL Server 2008 R2
You can round LogDate using DATEADD and DATEPART and then group by that.
Example (groups by five second intervals):
SELECT
[DateLog] = DATEADD(ms,((DATEPART(ss, LogDate)/5)*5000)-(DATEPART(ss, LogDate)*1000)-DATEPART(ms, LogDate), LogDate),
[Sum] = COUNT(*)
FROM
(
SELECT LogDate = '2013-01-01 00:00:00' UNION ALL
SELECT LogDate = '2013-01-01 00:00:04' UNION ALL
SELECT LogDate = '2013-01-01 00:00:06' UNION ALL
SELECT LogDate = '2013-01-01 00:00:08' UNION ALL
SELECT LogDate = '2013-01-01 00:00:10'
) a
GROUP BY DATEADD(ms,((DATEPART(ss, LogDate)/5)*5000)-(DATEPART(ss, LogDate)*1000)-DATEPART(ms, LogDate), LogDate)