SQL to Excel: Sum column with relative datetime condition - sql

I'm trying to make a shift-based overview of production output by drawing on data from a sql database into an excel sheet. This data is to be displayed on a screen for production status tracking.
SQL server: Microsoft SQL Server 2012 (SP1) 11.0.3128.0 (x64)
My datatable looks like:
Product
| BatchID | EndedTime | Prodline | Status |
| xxxx | yyyy-mm-dd hh:mm:ss | x | x |
| xxxx | yyyy-mm-dd hh:mm:ss | x | x |
| xxxx | yyyy-mm-dd hh:mm:ss | x | x |
| xxxx | yyyy-mm-dd hh:mm:ss | x | x |
| xxxx | yyyy-mm-dd hh:mm:ss | x | x |
Shifts are divided into day and night, 08:00-20:00 and 20:00-08:00 respectively. I want to have an overview counting the output for the current and previous shift like the following:
|----------------------------------------------|
| | Previous shift | Current shift |
| Production |---------------------------------|
| line | OK | BAD | OK | BAD |
|------------|------|---------|------|---------|
| 1 | ## | # | ## | # |
| 2 | ## | # | ## | # |
|----------------------------------------------|
I'm fairly inexperienced when it comes to programming but have managed to make a basic query for pulling data that works:
SELECT
SUM(CASE WHEN Product.Status = OK AND Product.Prodline = 1 THEN 1 ELSE 0 END) AS Line1_OK,
SUM(CASE WHEN Product.Status = BAD AND Product.Prodline = 1 THEN 1 ELSE 0 END) AS Line1_BAD,
SUM(CASE WHEN Product.Status = OK AND Product.Prodline = 2 THEN 1 ELSE 0 END) AS Line2_OK,
SUM(CASE WHEN Product.Status = BAD AND Product.Prodline = 2 THEN 1 ELSE 0 END) AS Line2_BAD
FROM Product
WHERE Product.EndedTime IS NOT NULL
The (for me) tricky part is how to split the data for the two time intervals.
Based on the current time, I need to have one interval that is from the beginning of the current shift until NOW, and another that is the previous shift interval. I know how to do this in excel if I simply draw out a list and use COUNTIFS statements, but would like to have this in the sql query directly.
I reckon I would need to make 8 columns in my query and insert some time interval conditions in the WHEN statements. I just don't know how to implement these conditions. I have looked around threads here on the forum, but have not been able to get anything working.
For current:
if GETDATE() is earlier than 08:00:00 it should count from yesterday 20:00:00 until GETDATE().
if GETDATE() is between 08:00:00 and 19:59:59, it should count from 08:00:00 until GETDATE().
if GETDATE() is 20:00:00 or later it should count from 20:00:00 until GETDATE()
For previous:
Not sure how to make this in the most efficient way, but thinking it maybe could be the same method as for the current count with an extra 12 hours backwards included and then subtract the current count?

Related

how to join tables on cases where none of function(a) in b

Say in MonetDB (specifically, the embedded version from the "MonetDBLite" R package) I have a table "events" containing entity ID codes and event start and end dates, of the format:
| id | start_date | end_date |
| 1 | 2010-01-01 | 2010-03-30 |
| 1 | 2010-04-01 | 2010-06-30 |
| 2 | 2018-04-01 | 2018-06-30 |
| ... | ... | ... |
The table is approximately 80 million rows of events, attributable to approximately 2.5 million unique entities (ID values). The dates appear to align nicely with calendar quarters, but I haven't thoroughly checked them so assume they can be arbitrary. However, I have at least sense-checked them for end_date > start_date.
I want to produce a table "nonevent_qtrs" listing calendar quarters where an ID has no event recorded, e.g.:
| id | last_doq |
| 1 | 2010-09-30 |
| 1 | 2010-12-31 |
| ... | ... |
| 1 | 2018-06-30 |
| 2 | 2010-03-30 |
| ... | ... |
(doq = day of quarter)
If the extent of an event spans any days of the quarter (including the first and last dates), then I wish for it to count as having occurred in that quarter.
To help with this, I have produced a "calendar table"; a table of quarters "qtrs", covering the entire span of dates present in "events", and of the format:
| first_doq | last_doq |
| 2010-01-01 | 2010-03-30 |
| 2010-04-01 | 2010-06-30 |
| ... | ... |
And tried using a non-equi merge like so:
create table nonevents
as select
id,
last_doq
from
events
full outer join
qtrs
on
start_date > last_doq or
end_date < first_doq
group by
id,
last_doq
But this is a) terribly inefficient and b) certainly wrong, since most IDs are listed as being non-eventful for all quarters.
How can I produce the table "nonevent_qtrs" I described, which contains a list of quarters for which each ID had no events?
If it's relevant, the ultimate use-case is to calculate runs of non-events to look at time-till-event analysis and prediction. Feels like run length encoding will be required. If there's a more direct approach than what I've described above then I'm all ears. The only reason I'm focusing on non-event runs to begin with is to try to limit the size of the cross-product. I've also considered producing something like:
| id | last_doq | event |
| 1 | 2010-01-31 | 1 |
| ... | ... | ... |
| 1 | 2018-06-30 | 0 |
| ... | ... | ... |
But although more useful this may not be feasible due to the size of the data involved. A wide format:
| id | 2010-01-31 | ... | 2018-06-30 |
| 1 | 1 | ... | 0 |
| 2 | 0 | ... | 1 |
| ... | ... | ... | ... |
would also be handy, but since MonetDB is column-store I'm not sure whether this is more or less efficient.
Let me assume that you have a table of quarters, with the start date of a quarter and the end date. You really need this if you want the quarters that don't exist. After all, how far back in time or forward in time do you want to go?
Then, you can generate all id/quarter combinations and filter out the ones that exist:
select i.id, q.*
from (select distinct id from events) i cross join
quarters q left join
events e
on e.id = i.id and
e.start_date <= q.quarter_end and
e.end_date >= q.quarter_start
where e.id is null;

Group based on time difference between two date values

I've searched around, but haven't been able to find anyone else with this same question.
I'm working with SQL Server (2008 R2).
Let's say I have the following three rows of data coming back from my query. What I need to do is group the first two rows into one (in either SQL Server or SSRS) based on the difference in minutes between the Start Time and the End Time (the Duration). How much time elapses between one row's End Time and the next row's Start Time is of no concern; I'm only looking at Duration.
Current result set:
+---------+------------+------------+----------+
| Vehicle | Start Time | End Time | Duration |
+---------+------------+------------+----------+
| 12 | 1:56:30 AM | 2:07:47 AM | 11 |
+---------+------------+------------+----------+
| 12 | 2:07:57 AM | 6:46:08 AM | 279 |
+---------+------------+------------+----------+
| 19 | 2:55:02 PM | 3:45:59 PM | 53 |
+---------+------------+------------+----------+
Desired result set:
+---------+------------+------------+----------+
| Vehicle | Start Time | End Time | Duration |
+---------+------------+------------+----------+
| 12 | 1:56:30 AM | 6:46:08 AM | 290 |
+---------+------------+------------+----------+
| 19 | 2:55:02 PM | 3:45:59 PM | 53 |
+---------+------------+------------+----------+
I feel like it has to be a matter of grouping, but I'm not sure how to group based on whether or not the start and end times are less than 15 minutes apart.
How can this be accomplished?
Unless I misunderstood your question, try this
Select Vehicle
,StartTime = min(StartTime)
.EndTime = max(EndTime)
,Duration = sum(Duration)
From YourTable
Group By Vehicle

Select row with timestamp nearest to, but not later than, now

Using Postgres 9.4, I am trying to select a single row from from a table that contains data nearest to, but not before, the current system time. The datetime colum is a timestamp without time zone data type, and the data is in the same timezone as the server. The table structure is:
uid | datetime | date | day | time | predictionft | predictioncm | highlow
-----+---------------------+------------+-----+----------+--------------+--------------+---------
1 | 2015-12-31 03:21:00 | 2015/12/31 | Thu | 03:21 AM | 5.3 | 162 | H
2 | 2015-12-31 09:24:00 | 2015/12/31 | Thu | 09:24 AM | 2.4 | 73 | L
3 | 2015-12-31 14:33:00 | 2015/12/31 | Thu | 02:33 PM | 4.4 | 134 | H
4 | 2015-12-31 21:04:00 | 2015/12/31 | Thu | 09:04 PM | 1.1 | 34 | L
Query speed is not a worry since the table contains ~1500 rows.
For clarity, if the current server time was 2015-12-31 14:00:00, the row returned should be 3 rather than 2.
EDIT:
The solution, based on the accepted answer below, was:
select *
from myTable
where datetime =
(select min(datetime)
from myTable
where datetime > now());
EDIT 2: Clarified question.
You can also use this. This will be faster. But it wont make much difference if you have few rows.
select * from table1
where datetime >= current_timestamp
order by datetime
limit 1
SQLFiddle Demo
The general idea follows. You can adjust it for postgresql.
select fields
from yourTable
where datetimeField =
(select min(datetimeField)
from yourTable
where datetimeField > current_timestamp)
Another approach other than the answers given is to use a window function first_value
select id, first_value(dt) over (order by dt)
from test
where dt >= current_timestamp
limit 1
See it working here: http://sqlfiddle.com/#!15/0031c/12

Get latest child record without given order

Simplified, I got the following situation. I've got two tables. One migration has multiple checks through checks.migration_id. The Column checks.old describes a type of check. Now I want to get for each migration the check with the biggest time where old is true (query1) and false (query2).
There are about 30.000 migrations and each has around 1000 checks where old=true and 1000 checks where old=false. The table checks will grow quite extreme. The order of the checks is not given and could be totally mixed up.
I want to get the latest check for a maximum of 150 migrations at once.
SQL Fiddle: http://sqlfiddle.com/#!15/282ce/15
I'm using PostgreSQL 9.3 and Rails 3.2 (shouldn't matter)
Whats the most efficient way to get the latest subrecord where old = true?
Table Migrations:
| ID |
|----|
| 1 |
| 2 |
Table Checks:
| ID | MIGRATION_ID | OLD | OK | TIME |
|----|--------------|-----|----|----------------------------------|
| 1 | 1 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 2 | 1 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 3 | 2 | 1 | 1 | September, 22 2014 12:00:01+0000 |
| 4 | 2 | 0 | 1 | September, 22 2014 12:00:02+0000 |
| 5 | 1 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 6 | 1 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 7 | 2 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 8 | 2 | 0 | 1 | September, 22 2014 12:00:04+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 5 | 1 | 1 | September, 22 2014 12:00:03+0000 |
| 2 | 7 | 1 | 1 | September, 22 2014 12:00:03+0000 |
Query 1 should return the following result:
| Migration.id | Check_ID | OLD | OK | TIME |
|--------------|----------|-----|----|----------------------------------|
| 1 | 6 | 0 | 1 | September, 22 2014 12:00:04+0000 |
| 2 | 8 | 0 | 1 | September, 22 2014 12:00:04+0000 |
I tried to solve it with a max in a subquery, but then I lose the information about checks.ok and check.time.
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 't') AS latest FROM migrations eq;
SELECT eq.id, (SELECT max(checks.id) FROM checks WHERE checks.migration_id = eq.id and checks.old = 'f') AS latest FROM migrations eq;
(I know that I get max(id) instead of max(time).)
In Rails I tried to fetch for each Migration the latest Record which resulted in the 1+n Problem. I'm not able to include all Checks because there are way to much of them.
A simple solution with the Postgres specific DISTINCT ON:
Query 1 ("for each migration the check with the biggest time where old is true"):
SELECT DISTINCT ON (migration_id)
migration_id, id AS check_id, old, ok, time
FROM checks
WHERE old
ORDER BY migration_id, time DESC;
Invert the the WHERE condition for Query 2:
...
WHERE NOT old
...
Details:
Select first row in each GROUP BY group?
But if you want better read performance with big tables, use JOIN LATERAL (Postgres 9.2+, standard SQL), building on a multicolumn index like:
CREATE INDEX checks_special_idx ON checks(old, migration_id, time DESC);
Query 1:
SELECT m.id AS migration_id
, c.id AS check_id, c.old, c.ok, c.time
FROM migrations m
-- FROM (SELECT id FROM migrations LIMIT 150) m
JOIN LATERAL (
SELECT id, old, ok, time
FROM checks
WHERE migration_id = m.id
AND old
ORDER BY time DESC
LIMIT 1
) c ON TRUE;
Switch the condition on old again for query 2.
For an unspecified "maximum of 150 migrations", use the commented alternative line.
Details:
Optimize GROUP BY query to retrieve latest record per user
SQL Fiddle.
Aside: don't use "time" as identifier. It's a reserved word in standard SQL and a basic type name in Postgres.

SQL -- Derive Date Difference Column

+------+-------------------------+
| proc | endTime |
+------+-------------------------+
| A | 2010/01/01 12:10:00.000 |
| B | 2010/01/01 12:08:00.000 |
| C | 2010/01/01 12:05:00.000 |
| D | 2010/01/01 12:02:00.000 |
| ...| ... |
So basically the data I pull from the database will look something like the above, with the first column being the name of a process, and the second column the time it finished running. I want to add a THIRD column, where it displays the running time of the process.
Basically, I want the data pulled to look like this instead:
+------+-------------------------+--------------+
| proc | endTime | runningTime |
+------+-------------------------+--------------+
| A | 2010/01/01 12:10:00.000 | | (process a is not done running)
| B | 2010/01/01 12:08:00.000 | 00:03:00.000 |
| C | 2010/01/01 12:05:00.000 | 00:03:00.000 |
| D | 2010/01/01 12:02:00.000 | 00:02:00.000 | (assume 12:00 start time)
| ...| ... | ... |
And I know it would be easier it add a startTime column and from that determine runningTime, but I don't have access to change that, and regardless the old data would not have a startTime to work with anyways.
The first process's start time is arbitrary, but you see what I'm getting at. We know the run time of proc C based on when proc D ended, and the when proc C ended (subtract the first from the second).
How do I compute that third row based on the difference between "Row X Col B" and "Row X-1 Col B"?
I don't think you can add it as a "calculated column". You can calculate it in a view pretty easily like this (all code for MSSQL. Your convert function may vary):
select
e1.RowID,
e2.EndTime as StartTime,
e1.EndTime, runningtime=convert(varchar(20), e1.EndTime - e2.EndTime, 114)
from endtimetest e1
left join endtimetest e2 on e2.endtime =
(Select max(endtime)
from endtimetest
where endtime < e1.Endtime)
Or, you could calculate it in a trigger with something similar.