Fetch max value from a sort-of incomplete dataset - sql

A number of devices return a value. Only upon change, this value gets stored in a table:
Device Value Date
B 5 2017-07-01
C 2 2017-07-01
A 3 2017-07-02
C 1 2017-07-04
A 6 2017-07-04
Values may enter the table at any date (i.e. date doesn't increment continiously). Several devices may store their value on the same date.
Note that, even though there are usually only a few devices for each date in the table, all devices actually have a value at that date: it's the latest one stored until then. For example, on 2017-07-02 only device A stored a value. The values for B and C on that date are the ones stored on 2017-07-01; these are still valid on -02, they just did not change.
To retrieve the values for all devices on a given date, e.g. 2017-07-04, I'm using this:
select device, value from data inner join (select device, max(date) as date from data where date <= "2017-07-04" group by device) latestdate on data.device = latestdate.device and data.date = latestdate.date
Device Value
A 6
B 5
C 1
Question: I'd like to read the max value of all devices on all dates in a given range. The result set would be like this:
Date max(value)
2017-07-01 5
2017-07-02 5
2017-07-04 6
.. and I have no clue if that's possible using only SQL. Until now all I got was lost in an exceptional bunch of joins and groupings.
(Database is sqlite3. Generic SQL would be nice, but I'd still be happy to hear about solutions specific to other databases, especially PostgreSQL or MariaDB.)
Extra bonus: Include the missing date -03, to be exact: returning values at given dates, not necessarily the ones appearing in the table.
Date max(value)
2017-07-01 5
2017-07-02 5
2017-07-03 5
2017-07-04 6

I think the most generic way to approach this is using a separate query for each date. There are definitely simpler methods, depending on the database. But getting one that works for SQLite, MariaDB, and Postgres is not going to use any sophisticated functionality:
select '2017-07-01' as date, max(data.value)
from data inner join
(select device, max(date) as date
from data
where date <= '2017-07-01' group by device
) latestdate
on data.device = latestdate.device and data.date = latestdate.date
union all
select '2017-07-02' as date, max(data.value)
from data inner join
(select device, max(date) as date
from data
where date <= '2017-07-02' group by device
) latestdate
on data.device = latestdate.device and data.date = latestdate.date
select '2017-07-03' as date, max(data.value)
from data inner join
(select device, max(date) as date
from data
where date <= '2017-07-03' group by device
) latestdate
on data.device = latestdate.device and data.date = latestdate.date
select '2017-07-04' as date, max(data.value)
from data inner join
(select device, max(date) as date
from data
where date <= '2017-07-04' group by device
) latestdate
on data.device = latestdate.device and data.date = latestdate.date;

This should be a solution for your problem.
It should be cross-database, since OVER clause is supported by the most of the databases.
You should create a table with all the dates("ALL_DATE" in the query), otherwise every database has a specific way to do it without a table.
WITH GROUPED_BY_DATE_DEVICE AS (
SELECT DATE, DEVICE, SUM(VALUE) AS VALUE FROM DEVICE_INFO
GROUP BY DATE, DEVICE
), GROUPED_BY_DATE AS (
SELECT A.DATE, MAX(VALUE) AS VALUE
FROM ALL_DATE A
LEFT JOIN GROUPED_BY_DATE_DEVICE B
ON A.DATE = B.DATE
GROUP BY A.DATE
)
SELECT DATE, MAX(VALUE) OVER (ORDER BY DATE) AS MAX_VALUE
FROM GROUPED_BY_DATE
ORDER BY DATE;

Related

Show all results in date range replacing null records with zero

I am querying the counts of logs that appear on particular days. However on some days, no log records I'm searching for are recorded. How can I set count to 0 for these days and return a result with the full set of dates in a date range?
SELECT r.LogCreateDate, r.Docs
FROM(
SELECT SUM(TO_NUMBER(REPLACE(ld.log_detail, 'Total Documents:' , ''))) AS Docs, to_char(l.log_create_date,'YYYY-MM-DD') AS LogCreateDate
FROM iwbe_log l
LEFT JOIN iwbe_log_detail ld ON ld.log_id = l.log_id
HAVING to_char(l.log_create_date , 'YYYY-MM-DD') BETWEEN '2020-01-01' AND '2020-01-07'
GROUP BY to_char(l.log_create_date,'YYYY-MM-DD')
ORDER BY to_char(l.log_create_date,'YYYY-MM-DD') DESC
) r
ORDER BY r.logcreatedate
Current Result - Id like to include the 01, 04, 05 with 0 docs.
LOGCREATEDATE
Docs
2020-01-02
7
2020-01-03
3
2020-01-06
6
2020-01-07
1
You need a full list of dates first, then outer join the log data to that. There are several ways to generate the list of dates but now common table expressions (cte) are an ANSI standard way to do this, like so:
with cte (dt) as (
select to_date('2020-01-01','yyyy-mm-dd') as dt from dual -- start date here
union all
select dt + 1 from cte
where dt + 1 < to_date('2020-02-01','yyyy-mm-dd') -- finish (the day before) date here
)
select to_char(cte.dt,'yyyy-mm-dd') as LogCreateDate
, r.Docs
from cte
left join (
SELECT SUM(TO_NUMBER(REPLACE(ld.log_detail, 'Total Documents:' , ''))) AS Docs
, trunc(l.log_create_date) AS LogCreateDate
FROM iwbe_log l
LEFT JOIN iwbe_log_detail ld ON ld.log_id = l.log_id
HAVING trunc(l.log_create_date) BETWEEN to_date('2020-01-01','yyyy-mm-dd' AND to_date('2020-01-07','yyyy-mm-dd')
GROUP BY trunc(l.log_create_date)
) r on cte.dt = r.log_create_date
order by cte.dt
also, when dealing with dates I prefer to not convert them to strings until final output which allows you to still get proper date order and maximum query efficiency.

Count devices per day in a given date range

I have a table which has devices with 3 statuses, Pass, Fail and Warning.
Device
Status
Date
Device1
Pass
12/1/2020
Device2
Fail
12/1/2020
Device3
Warning
12/1/2020
Device1
Fail
12/2/2020
Device2
Warning
12/2/2020
Device3
Pass
12/2/2020
I want to generate a trend graph of count of devices based on the daily status. The count is on all the devices for each day. The table above will have device data repeated for multiple dates.
Example:
I want to generate a stacked bar graph, which will show count of devices which are pass, fail or warning. Need to get a query which I can use to get the response back with DateTime, count of failed devices, count of devices passed, count of devices having warning over a range of dates.
select * (select count(*) from status_table where overall_status = 'Fail' and startDate > "" and endDate < "") as failedCount,
(select count(*) from status_table where overall_status = 'Warning' and startDate > "" and endDate < "") as WarningCount,
(select count(*) from status_table where overall_status = 'Pass' startDate > "" and endDate < "") as passCount from status_table
Is there a better solution?
You can use the aggregate FILTER clause to do it in a single query.
This gets three counts (fail, pass, warn) for every selected device on every day in the selected date range. A count of NULL for days without any appearance. 0 if the device appeared, but not with this status:
SELECT date, device_name
, fail_count, warning_count, pass_count
FROM (SELECT DISTINCT device_name FROM status_table) d -- all devices ①
CROSS JOIN (
SELECT generate_series(timestamp '2020-12-01'
, timestamp '2020-12-31'
, interval '1 day')::date
) t(date) -- all dates
LEFT JOIN (
SELECT date, device_name
, count(*) FILTER (WHERE overall_status = 'Fail') AS fail_count
, count(*) FILTER (WHERE overall_status = 'Warning') AS warning_count
, count(*) FILTER (WHERE overall_status = 'Pass') AS pass_count
FROM status_table
WHERE date >= '2020-12-01' -- same date range as above
AND date <= '2020-12-31'
GROUP BY 1, 2
) s USING (date, device_name)
ORDER BY 1, 2;
Basically, you CROSS JOIN all devices to all dates (Cartesian product), the append data where data can be found with a LEFT JOIN.
① Since you don't seem to have a device table (which you probably should), generate the full list on the fly. The above query with DISTINCT is good for few rows per device. Else, there are (much) faster techniques like:
WITH RECURSIVE cte AS (
(SELECT device_name FROM status_table ORDER BY 1 LIMIT 1)
UNION ALL
SELECT (SELECT device_name FROM status_table
WHERE device_name > t.device_name ORDER BY 1 LIMIT 1)
FROM cte
WHERE device_name IS NOT NULL
)
SELECT * FROM cte
WHERE device_name IS NOT NULL;
See:
https://wiki.postgresql.org/wiki/Loose_indexscan
The subquery s aggregates only rows from the given date range. It's strictly optional. You can also left-join to the underlying table directly, and then aggregate all. But this approach is typically (much) faster.
You can convert NULL to zero or vice versa with COALESCE / NULLIF.
Related:
PostgreSQL: running count of rows for a query 'by minute'
Aggregate columns with additional (distinct) filters
For more flags, a crosstab() query might be faster. See:
PostgreSQL Crosstab Query
About generating a date range:
Generating time series between two dates in PostgreSQL
Be aware that dates are defined by your current time zone setting if you operate with timestamp with time zone. See:
Ignoring time zones altogether in Rails and PostgreSQL

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

How to select all dates in SQL query

SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
The result is the follwoing:
2016-05-05 1562
2016-05-06 3865
2016-05-09 1
...etc
The problem is that I need information for all days even if there were no id_order_item for this date.
Expected result:
Date Quantity
2016-05-05 1562
2016-05-06 3865
2016-05-07 0
2016-05-08 0
2016-05-09 1
You can't count something that is not in the database. So you need to generate the missing dates in order to be able to "count" them.
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(
(select min(created_at) from order_item),
(select max(created_at) from order_item), interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
The query gets the minimum and maximum date form the existing order items.
If you want the count for a specific date range you can remove the sub-selects:
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(date '2016-05-01', date '2016-05-31', interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
SQLFiddle: http://sqlfiddle.com/#!15/49024/5
Friend, Postgresql Count function ignores Null values. It literally does not consider null values in the column you are searching. For this reason you need to include oi.created_at in a Group By clause
PostgreSql searches row by row sequentially. Because an integral part of your query is Count, and count basically stops the query for that row, your dates with null id_order_item are being ignored. If you group by oi.created_at this column will trump the count and return 0 values for you.
SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
Group by io.created_at
From TechontheNet (my most trusted source of information):
Because you have listed one column in your SELECT statement that is not encapsulated in the count function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.
Some info on Count in PostgreSql
http://www.postgresqltutorial.com/postgresql-count-function/
http://www.techonthenet.com/postgresql/functions/count.php
Solution #1 You need Date Table where you stored all date data. Then do a left join depending on period.
Solution #2
WITH DateTable AS
(
SELECT DATEADD(dd, 1, CONVERT(DATETIME, GETDATE())) AS CreateDateTime, 1 AS Cnter
UNION ALL
SELECT DATEADD(dd, -1, CreateDateTime), DateTable.Cnter + 1
FROM DateTable
WHERE DateTable.Cnter + 1 <= 5
)
Generate Temporary table based on your input and then do a left Join.

Smoothing out a result set by date

Using SQL I need to return a smooth set of results (i.e. one per day) from a dataset that contains 0-N records per day.
The result per day should be the most recent previous value even if that is not from the same day. For example:
Starting data:
Date: Time: Value
19/3/2014 10:01 5
19/3/2014 11:08 3
19/3/2014 17:19 6
20/3/2014 09:11 4
22/3/2014 14:01 5
Required output:
Date: Value
19/3/2014 6
20/3/2014 4
21/3/2014 4
22/3/2014 5
First you need to complete the date range and fill in the missing dates (21/3/2014 in you example). This can be done by either joining a calendar table if you have one, or by using a recursive common table expression to generate the complete sequence on the fly.
When you have the complete sequence of dates finding the max value for the date, or from the latest previous non-null row becomes easy. In this query I use a correlated subquery to do it.
with cte as (
select min(date) date, max(date) max_date from your_table
union all
select dateadd(day, 1, date) date, max_date
from cte
where date < max_date
)
select
c.date,
(
select top 1 max(value) from your_table
where date <= c.date group by date order by date desc
) value
from cte c
order by c.date;
May be this works but try and let me know
select date, value from test where (time,date) in (select max(time),date from test group by date);