SQL query / data version check

SQL query / data version check - sql

I get stuck with some SQL query which check if dateto from first row is not > than datefrom for second row within same ID.Maybe the easiest way is to present it on example:
ID DATEFROM DATETO PK
1234 20150512 20150518 1
1234 20150514 20150520 2
1234 20150519 null 3
2313 20150512 20150518 4
44341 20150512 null 5
Now, within id 1234 flow is:
1.2015-05-12 -> 2015-05-18
2.2015-05-13 -> 2015-05-20 WRONG
3.2015-05-19 -> null
In this example end date of first row is > than start date for second row.
As output I would like to display PK id ( in this example:2).
Could anyone give me some hints how to approach this? I don't want SQL but just hints. Thanks in advance

You can use a join or correlated subquery for this. I suspect that you want a full overlap comparison:
select t.*
from t
where exists (select 1
from t t2
where t.id = t2.id and
(t2.start_date < t.start_date and
t2.end_date > t.start_date or t2.end_date is null)
);

Related

deleting specific duplicate and original entries in a table based on date

i have a table called "main" which has 4 columns, ID, name, DateID and Sign.
i want to create a query that will delete entries in this table if there is the same ID record in twice within a certain DateID.
i have my where clause that searches the previous 3 weeks
where DateID =((SELECT MAX( DateID)
WHERE DateID < ( SELECT MAX( DateID )-3))
e.g of my dataset im working with:
id
name
DateID
sign
12345
Paul
1915
Up
23658
Danny
1915
Down
37868
Jake
1916
Up
37542
Elle
1917
Up
12345
Paul
1917
Down
87456
John
1918
Up
78563
Luke
1919
Up
23658
Danny
1920
Up
in the case above, both entries for ID 12345 would need to be removed.
however the entries for ID 23658 would need to be kept as the DateID > 3
how would this be possible?

You can use window functions for this.
It's not quite clear, but it seems LAG and conditional COUNT should fit what you need.
DELETE t
FROM (
SELECT *,
CountWithinDate = COUNT(CASE WHEN t.PrevDate >= t.DateId - 3 THEN 1 END) OVER (PARTITION BY t.id)
FROM (
SELECT *,
PrevDate = LAG(t.DateID) OVER (PARTITION BY t.id ORDER BY t.DateID)
FROM YourTable t
) t
) t
WHERE CountWithinDate > 0;
db<>fiddle
Note that you do not need to re-join the table, you can delete directly from the t derived table.

Hope this works:
DELETE FROM test_tbl
WHERE id IN (
SELECT T1.id
FROM test_tbl T1
WHERE EXISTS (SELECT 1 FROM test_tbl T2 WHERE T1.id = T2.id AND ABS(T2.dateid - T1.dateid) < 3 AND T1.dateid <> T2.dateid)
)
In case you need more logic for data processing, I would suggest using Stored Procedure.

Count rows in a table based on each row value

I'm struggling with following problem. I want to create a query in spark that runs a query for every row on existing table based on current column value.
Table can be simplified like this:
job_id
start_date
end_date
1
1-1-2000
2-1-2000
2
1-1-2000
3-1-2000
3
2-1-2000
4-1-2000
4
5-1-2000
7-1-2000
I want to create query which adds another column that counts how many jobs have already been started at each rows start date.
Output for this table should look as following
job_id
start_date
end_date
jobs_active_at_start
1
1-1-2000
2-1-2000
2 (active jobs id - 1,2)
2
1-1-2000
3-1-2000
2 (active jobs id - 1,2)
3
2-1-2000
4-1-2000
3 (active jobs id - 1,2,3)
4
5-1-2000
7-1-2000
1 (only job 4 is active)
I've tried to do subquery
%sql
SELECT
t1.id,
(SELECT COUNT(*) FROM table t2 WHERE t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date)
FROM table t1
But databricks returned an error
AnalysisException: Correlated column is not allowed in predicate
I guess this method doesn't have best efficiency either.
What is best approach to tackle such problem?

You can just join the table to itself on the dates.
select
t1.job_id,
t1.start_date,
t1.end_date,
count (t2.job_id)
from
Table1 t1
inner join Table1 t2
on t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date
group by
t1.job_id,
t1.start_date,
t1.end_date;

Adding in missing dates from results in SQL

I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?

You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?

You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.

SQL query where clause for reporting

Let's say I have this table with the following data:
Service_ID Cust_ID Service_Date Next_Service_Date
-----------------------------------------------------
1 15 2016-01-1 2016-01-31
2 21 2016-01-1 2016-01-31
3 15 2016-01-31 2016-03-1
I need a condition to check if Next_Service_Date is found in Service_Date for each customer not the whole table.
For example customer with id = 15 and Service_ID = 3, you can see Service_Date was made on 2016-01-31
Same as the Next_Service_Date with Service_ID = 1
So output of the query should be
Service_ID Cust_ID Service_Date Next_Service_Date
------------------------------------------------------
2 21 2016-01-1 2016-01-31
I hope I made everything clear.
Note why I want to show record # 2 because that customer has no records in Service_Date that matches the date in Next_Service_Date

If I understand correctly, you want not exists:
select bt.*
from belowtable bt
where not exists (select 1
from belowtable bt2
where bt2.cust_id = bt.cust_id and
(bt2.next_service_date = bt.service_date or
bt2.service_date = bt.next_serice_date
)
);

Edited -- I think I understand your question now -- you want to find all records that should have a next service date scheduled but do not? If so, a combination semi-join and anti-join would do it.
If I'm off the mark, please let me know where I misunderstood.
select
t1.*
from
MyTable t1
where exists (
select null
from MyTable t2
where
t1.Next_Service_Date = t2.Service_Date
)
and not exists (
select null
from MyTable t2
where
t1.Cust_Id = t2.Cust_Id and
t1.Next_Service_Date = t2.Service_Date
)

Complex SQL Query (at least for me)

I'm trying to develop a sql query that will return a list of serial numbers. The table is set up that whenever a serial number reaches a step, the date and time are entered. When it completes the step, another date and time are entered. I want to develop a query that will give me the list of serial numbers that have entered the step, but not exitted the step. They may enter more than once, so I'm only looking for serial numbers that don't have exits after and enter.
Ex.(for easy of use, call the table "Table1")
1. Serial | Step | Date
2. 1 | enter | 10/1
3. 1 | exit | 10/2
4. 1 | enter | 10/4
5. 2 | enter | 10/4
6. 3 | enter | 10/5
7. 3 | exit | 10/6
For the above table, serial numbers 1 and 2 should be retrieved, but 3 should not.
Can this be done in a signle query with sub queries?

select * from Table1
group by Step
having count(*) % 2 = 1
this is when there cannot be two 'enter' but each enter is followed by an 'exit' (as in the example provided)

Personally I think this is something best done through a change in the way the data is stored. The current method cannot be efficient or effective. Yes you can mess around and find a way to get the data out. However, what happens when you have multiple entered steps with no exit for the same serialNO? Yeah it shouldn't happen but sooner or later it will unless you have code written to prevent it (code which coupld get complicated to write). It would be cleaner to have a table that stores both the enter and exit in the same record. Then it become trivial to query (and much faster) in order to find those entered but not exited.

This will give you all 'enter' records that don't have an ending 'exit'. If you only want a list of serial numbers you should then also group by serial number and select only that column.
SELECT t1.*
FROM Table1 t1
LEFT JOIN Table1 t2 ON t2.Serial=t1.Serial
AND t2.Step='Exit' AND t2.[Date] >= t1.[Date]
WHERE t1.Step='Enter' AND t2.Serial IS NULL

I tested this in MySQL.
SELECT Serial,
COUNT(NULLIF(Step,'enter')) AS exits,
COUNT(NULLIF(Step,'exit')) AS enters
FROM Table1
WHERE Step IN ('enter','exit')
GROUP BY Serial
HAVING enters <> exits
I wasn't sure what the importance of Date was here, but the above could easily be modified to incorporate intraday or across-days requirements.

SELECT DISTINCT Serial
FROM Table t
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'exit') <
(SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'enter')

SELECT * FROM Table1 T1
WHERE NOT EXISTS (
SELECT * FROM Table1 T2
WHERE T2.Serial = T1.Serial
AND T2.Step = 'exit'
AND T2.Date > T1.Date
)

If you're sure that you've got matching enter and exit values for the the ones you don't want, you could look for all the serial values where the count of "enter" is not equal to the count of "exit".

If you're using MS SQL 2005 or 2008, you could use a CTE to get the results you're looking for...
WITH ExitCTE
AS
(SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit')
SELECT A.*
FROM #Table1 A LEFT JOIN ExitCTE B ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL
If you're not using those, i'd try for a subquery instead...
SELECT A.*
FROM #Table1 A LEFT JOIN (SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit') B
ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL

In Oracle:
SELECT *
FROM (
SELECT serial,
CASE
WHEN so < 0 THEN "Stack overflow"
WHEN depth > 0 THEN "In"
ELSE "Out"
END AS stack
FROM (
SELECT serial, MIN(SUM(DECODE(step, "enter", 1, "exit", -1) OVER (PARTITION BY serial ORDER BY date)) AS so, SUM(DECODE(step, "enter", 1, "exit", -1)) AS depth
FROM Table 1
GROUP BY serial
)
)
WHERE stack = "Out"
This will select what you want AND filter out exits that happened without enters

Several people have suggested rearranging your data, but I don't see any examples, so I'll take a crack at it. This is a partially-denormalized variant of the same table you've described. It should work well with a limited number of "steps" (this example only takes into account "enter" and "exit", but it could be easily expanded), but its greatest weakness is that adding additional steps after populating the table (say, enter/process/exit) is expensive — you have to ALTER TABLE to do so.
serial enter_date exit_date
------ ---------- ---------
1 10/1 10/2
1 10/4 NULL
2 10/4 NULL
3 10/5 10/6
Your query then becomes quite simple:
SELECT serial,enter_date FROM table1 WHERE exit_date IS NULL;
serial enter_date
------ ----------
1 10/4
2 10/4

Here's a simple query that should work with your scenario
SELECT Serial FROM Table1 t1
WHERE Step='enter'
AND (SELECT Max(Date) FROM Table1 t2 WHERE t2.Serial = t1.Serial) = t1.Date
I've tested this one and this will give you the rows with Serial numbers of 1 & 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL query / data version check - sql

You can use a join or correlated subquery for this. I suspect that you want a full overlap comparison: select t.* from t where exists (select 1 from t t2 where t.id = t2.id and (t2.start_date < t.start_date and t2.end_date > t.start_date or t2.end_date is null) );

Related

deleting specific duplicate and original entries in a table based on date

Count rows in a table based on each row value

Adding in missing dates from results in SQL

SQL query where clause for reporting

Complex SQL Query (at least for me)

Categories

Resources