SQL - Delete all records as a result of subquery - sql

I'm really struggling with an implementation solution here.
SELECT
mach_id,
value1,
CASE
WHEN value1 = 0 THEN lead(created_on) OVER (ORDER BY mach_id)
END,
created_on
FROM MyTable
WHERE
field_name='someValue' and
CAST(created_on AS DATE) = CAST(GETDATE() AS DATE)
I need to get the created_on date when the value1 is 0 and then get the lead record created_on date. Then take those dates and Delete all records in another tables where the created_on is between those two dates by mach_id.
I'm really at a loss for a solution here. Any suggestions?

I finally came up with the solution, I'm posting it in hopes it'll help someone else. Thanks everyone for the comments and suggestions.
Delete fe from MySecondTable fe join
(Select * from
(
Select mach_id, station_id,
lag(value1) over (partition by mach_id order by created_on) as shiftEndValue,
CASE WHEN value1=0 THEN lag(created_on) over (partition by mach_id order by created_on)
end as shiftEndTime,
value1, created_on
FROM MyFirstTable
where field_name='cur_trgt_cnt' and CAST(created_on AS DATE) = CAST(GETDATE() AS DATE)
)a
where shiftEndTime is not null
)b
on fe.mach_id= b.mach_id and fe.station_id=b.station_id
where fe.created_on between b.shiftEndTime and b.created_on

Related

Query needed to show data in a different way

After making a query, I get this data in this format:
I need to set up a query that pairs dates related to same id and counts the difference in days.
the result should be like this:
I'm using postgresql. Could you please help me to set up the query to get the desired output ?
Thanks in advance
Hmmm . . . I'm thinking lead() and some additional filtering and arithmetic:
select id, date as date_start, next_date as date_end,
(next_date - date_start) as days
from (select t.*, lead(date) over (partition by id order by date) as next_date
from t
) t
where event = 0;
The most robust and maintainable answer is that by #GordonLinoff.
Another method to solve this can be using CTEs. Here I am assuming that event = 0 and event = 1 are paired (which is true in the example shown):
WITH t1 AS
(SELECT id, date
FROM t
WHERE event = 0 rank() OVER (
ORDER BY id, date) AS id_date_rank),
t2 AS
(SELECT id, date
FROM t
WHERE event = 1 rank() OVER (
ORDER BY id, date) AS id_date_rank),
SELECT t1.id,
t1.date AS date_start,
t2.date AS date_end,
DATE_PART('day', (t2.date - t1.date)) AS no_days
FROM t1
INNER JOIN t2 ON (t1.id_date_rank = t2.id_date_rank)
ORDER BY t1.id,
t1.date;

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?
You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
(
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
(
select Max(end_dt) as maxDate from myTable
)
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.
select
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
FOR 2008 VERSION
SELECT
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
FROM t X
LEFT JOIN (
SELECT
*
, (SELECT MAX(ID) FROM t WHERE ID < A.ID) as ID2
FROM t A) Y ON X.ID = Y.ID2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here
This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
)
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
FROM
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
where
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')
http://sqlfiddle.com/#!18/65238/1
SELECT
*
FROM
(
SELECT
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
FROM
yourTable
)
gaps
WHERE
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)
This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
VALUES
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
SELECT DATEADD(DAY, 1, end_dt)
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
FROM #
You may want to take a look at this:
http://sqlfiddle.com/#!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.

Alternative to Datediff over()?

I have data of this form:
user_id event started ended date
1 started 1 0 3/1/2018
1 ended 0 1 3/2/2018
2 started 1 0 3/5/2018
2 ended 0 1 3/22/2018
3 started 1 0 3/25/2018
There are other events and columns for 0/1 but they are irrelevant.
I am trying to get how long it takes each user to get from started to ended.
I tried datediff(day, case when started=1 then date end, case when ended=1 then date end) but since they are on different rows it doesnt work. Something along the lines of datediff over() could work, but that is obviously not a valid function.
Thanks in advance!
Assuming that you can't end before you started, you simply need MIN & MAX as Windowed Aggregates:
select user_id,
datediff(day,
min(date) over (partition by user_id),
max(date) over (partition by user_id))
from myTable
where event in ('started', 'ended')
Using this you can add any additional columns, too.
If one result row is also ok, you can do simple aggregation:
select user_id,
min(date) as started,
max(date) as ended,
datediff(day,
min(date),
max(date)) as duration
from myTable
where event in ('started', 'ended')
group by user_id
You could inner join the table on itself using the user_id column:
SELECT a.[user_id]
, a.[date] AS StartDate
, b.EndDate
, DATEDIFF(DAY, a.[date], b.EndDate) AS DateDifference
FROM dbo.TableNameHere AS a
INNER JOIN
(
SELECT [user_id]
, [date] AS EndDate
FROM dbo.TableNameHere
WHERE [ended] = 1
) AS b
ON a.[user_id] = b.[user_id]
WHERE a.[started] = 1
In my example above, you don't really need any of the columns in the first SELECT besides the DateDifference, I just had them for visibility in my testing.

Oracle - select rows with minimal value in a subset

I have a following table of dates:
dateID INT (PK),
personID INT (FK),
date DATE,
starttime VARCHAR, --Always in a format of 'HH:MM'
What I want to do is I want to pull rows (all columns, including PK) with lowest date (primary condition) and starttime (secondary condition) for every person. For example, if we have
row1(date = '2013-04-01' and starttime = '14:00')
and
row2(date = '2013-04-02' and starttime = '08:00')
row1 will be retrieved, along with all other columns.
So far I have come up with gradual filtering the table, but it`s quite a mess. Is there more efficient way of doing this?
Here is what I made so far:
SELECT
D.id
, D.personid
, D.date
, D.starttime
FROM table D
JOIN (
SELECT --Select lowest time from the subset of lowest dates
A.personid,
B.startdate,
MIN(A.starttime) AS starttime
FROM table A
JOIN (
SELECT --Select lowest date for every person to exclude them from outer table
personid
, MIN(date) AS startdate
FROM table
GROUP BY personid
) B
ON A.personid = B.peronid
AND A.date = B.startdate
GROUP BY
A.personid,
B.startdate
) C
ON C.personid = D.personid
AND C.startdate = D.date
AND C.starttime = D.starttime
It works, but I think there is a more clean/efficient way to do this. Any ideas?
EDIT: Let me expand a question - I also need to extract maximum date (only date, without time) for each person.
The result should look like this:
id
personid
max(date) for each person
min(date) for each person
min(starttime) for min(date) for each person
It is a part of a much larger query (the resulting table is joined with it), and the resulting table must be lightweight enough so that the query won`t execute for too long. With single join with this table (just using min, max for each field I wanted) the query took about 3 seconds, and I would like the resulting query not to take longer than 2-3 times that.
you should be able to do this like:
select a.dateID, a.personID, a.date, a.max_date, a.starttime
from (select t.*,
max(t.date) over (partition by t.personID) max_date,
row_number() over (partition by t.personID
order by t.date, t.starttime) rn
from table t) a
where a.rn = 1;
sample data added to fiddle: http://sqlfiddle.com/#!4/63c45/1
This is the query you can use and no need to incorporate in your query. You can also use #Dazzal's query as stand alone
SELECT ID, PERSONID, DATE, STARTTIME
(
SELECT ID, PERONID, DATE, STARTTIME, ROW_NUMBER() OVER(PARTITION BY personid ORDER BY STARTTIME, DATE) AS RN
FROM TABLE
) A
WHERE
RN = 1
select a.id,a.accomp, a.accomp_name, a.start_year,a.end_year, a.company
from (select t.*,
min(t.start_year) over (partition by t.company) min_date,
max(t.end_year) over (partition by t.company) max_date,
row_number() over (partition by t.company
order by t.end_year desc) rn
from temp_123 t) a
where a.rn = 1;