Complex SQL Query (at least for me) - sql

I'm trying to develop a sql query that will return a list of serial numbers. The table is set up that whenever a serial number reaches a step, the date and time are entered. When it completes the step, another date and time are entered. I want to develop a query that will give me the list of serial numbers that have entered the step, but not exitted the step. They may enter more than once, so I'm only looking for serial numbers that don't have exits after and enter.
Ex.(for easy of use, call the table "Table1")
1. Serial | Step | Date
2. 1 | enter | 10/1
3. 1 | exit | 10/2
4. 1 | enter | 10/4
5. 2 | enter | 10/4
6. 3 | enter | 10/5
7. 3 | exit | 10/6
For the above table, serial numbers 1 and 2 should be retrieved, but 3 should not.
Can this be done in a signle query with sub queries?

select * from Table1
group by Step
having count(*) % 2 = 1
this is when there cannot be two 'enter' but each enter is followed by an 'exit' (as in the example provided)

Personally I think this is something best done through a change in the way the data is stored. The current method cannot be efficient or effective. Yes you can mess around and find a way to get the data out. However, what happens when you have multiple entered steps with no exit for the same serialNO? Yeah it shouldn't happen but sooner or later it will unless you have code written to prevent it (code which coupld get complicated to write). It would be cleaner to have a table that stores both the enter and exit in the same record. Then it become trivial to query (and much faster) in order to find those entered but not exited.

This will give you all 'enter' records that don't have an ending 'exit'. If you only want a list of serial numbers you should then also group by serial number and select only that column.
SELECT t1.*
FROM Table1 t1
LEFT JOIN Table1 t2 ON t2.Serial=t1.Serial
AND t2.Step='Exit' AND t2.[Date] >= t1.[Date]
WHERE t1.Step='Enter' AND t2.Serial IS NULL

I tested this in MySQL.
SELECT Serial,
COUNT(NULLIF(Step,'enter')) AS exits,
COUNT(NULLIF(Step,'exit')) AS enters
FROM Table1
WHERE Step IN ('enter','exit')
GROUP BY Serial
HAVING enters <> exits
I wasn't sure what the importance of Date was here, but the above could easily be modified to incorporate intraday or across-days requirements.

SELECT DISTINCT Serial
FROM Table t
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'exit') <
(SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'enter')

SELECT * FROM Table1 T1
WHERE NOT EXISTS (
SELECT * FROM Table1 T2
WHERE T2.Serial = T1.Serial
AND T2.Step = 'exit'
AND T2.Date > T1.Date
)

If you're sure that you've got matching enter and exit values for the the ones you don't want, you could look for all the serial values where the count of "enter" is not equal to the count of "exit".

If you're using MS SQL 2005 or 2008, you could use a CTE to get the results you're looking for...
WITH ExitCTE
AS
(SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit')
SELECT A.*
FROM #Table1 A LEFT JOIN ExitCTE B ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL
If you're not using those, i'd try for a subquery instead...
SELECT A.*
FROM #Table1 A LEFT JOIN (SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit') B
ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL

In Oracle:
SELECT *
FROM (
SELECT serial,
CASE
WHEN so < 0 THEN "Stack overflow"
WHEN depth > 0 THEN "In"
ELSE "Out"
END AS stack
FROM (
SELECT serial, MIN(SUM(DECODE(step, "enter", 1, "exit", -1) OVER (PARTITION BY serial ORDER BY date)) AS so, SUM(DECODE(step, "enter", 1, "exit", -1)) AS depth
FROM Table 1
GROUP BY serial
)
)
WHERE stack = "Out"
This will select what you want AND filter out exits that happened without enters

Several people have suggested rearranging your data, but I don't see any examples, so I'll take a crack at it. This is a partially-denormalized variant of the same table you've described. It should work well with a limited number of "steps" (this example only takes into account "enter" and "exit", but it could be easily expanded), but its greatest weakness is that adding additional steps after populating the table (say, enter/process/exit) is expensive — you have to ALTER TABLE to do so.
serial enter_date exit_date
------ ---------- ---------
1 10/1 10/2
1 10/4 NULL
2 10/4 NULL
3 10/5 10/6
Your query then becomes quite simple:
SELECT serial,enter_date FROM table1 WHERE exit_date IS NULL;
serial enter_date
------ ----------
1 10/4
2 10/4

Here's a simple query that should work with your scenario
SELECT Serial FROM Table1 t1
WHERE Step='enter'
AND (SELECT Max(Date) FROM Table1 t2 WHERE t2.Serial = t1.Serial) = t1.Date
I've tested this one and this will give you the rows with Serial numbers of 1 & 2

Related

BQ/SQL join two tables in a way that one column fills up with all distinct values from the other table while remaining columns get a null

Hello everyone this is my first question here. I have been browsing thru the questions but couldnt quite find the answer to my problem:
I have a couple of tables which I need to join. The key I join with is non unique(in this case its a date). This is working fine but now I also need to group the results based on another column without getting cross-join like results (meaning each value of this column should only appear once but depending on the table used the column can have different values in each table)
Here is an example of what I have and what I would like to get:
Table1
Date/Key
Group Column
Example Value1
01-01-2022
a
1
01-01-2022
d
2
01-01-2022
e
3
01-01-2022
f
4
Table 2
Date/Key
Group Column
Example Value 2
01-01-2022
a
1
01-01-2022
b
2
01-01-2022
c
3
01-01-2022
d
4
Wanted Result :
Table Result
Date/Key
Group Column
Example Value1
Example Value2
01-01-2022
a
1
1
01-01-2022
b
NULL
2
01-01-2022
c
NULL
3
01-01-2022
d
2
4
01-01-2022
e
3
NULL
01-01-2022
f
4
NULL
I have tryed a couple of approaches but I always get results with values in group column appear multiple times. I am under the impression that full joining and then grouping over the group column shoul work but apparently I am missing something. I also figured I could bruteforce the result by left joining everything with setting the on to table1.date = table2.date AND table1.Groupcolumn = table2.Groupcolumn ect.. and then doing UNIONs of all permutations (so each table was on "the left" once) but this is not only tedious but bigquery doesnt like it since it contains too many sub queries.
I feel kinda bad that my first question is something that I should actually know but I hope someone can help me out!
I do not need a full code solution just a hint to the correct approach would suffice (also incase I missed it: if this was already answered I also appreciate just a link to it!)
Edit:
So one solution I came up with, which appears to work, was to select the group column of each table and union them as a with() and then join this "list" onto the first table like
list as(Select t1.GroupColumn FROM Table_1 t1 WHERE CONDITION1
UNION DISTINCT Select t1.GroupColumn FROM Table_1 t1 WHERE CONDITION2 ... ect)
result as (
SELECT l.GoupColumn, t1.Example_Value1, t2.Example_Value2
FROM Table_1 t1
LEFT JOIN( SELECT * FROM list) s
ON S.GroupColumn = t1.GroupColumn
LEFT JOIN Table_2 t2
on S.GroupColumn = t2.GroupColumn
and t1.key = t2.key
...
)
SELECT * FROM result
I think what you are looking for is a FULL OUTER JOIN and then you can coalesce the date and group columns. It doesn't exactly look like you need to group anything based on the example data you posted:
SELECT
coalesce(table1.date_key, table2.date_key) AS date_key,
coalesce(table1.group_column, table2.group_column) AS group_column,
table1.example_value_1,
table2.example_value_2
FROM
table1
FULL OUTER JOIN
table2
USING
(date_key,
group_column)
ORDER BY
date_key,
group_column;
Consider below simple approach
select * from (
select *, 'example_value1' type from table1 union all
select *, 'example_value2' type from table2
)
pivot (
any_value(example_value1)
for type in ('example_value1', 'example_value2')
)
if applied to sample data in your question - output is

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Optimize SQL query: how to check if an Id is assigned to more than other Id (it should not)

I have a simple table like
client_id , company_id
100 1
101 1
102 1
200 2
200 2
201 2
For each client_id I should have just one company_id. I need to make a query to check this.
What I was doing is:
SELECT
client_id,
count(DISTINCT company_id) as count
FROM table GROUP BY
client_id
HAVING count > 1;
If it's not empty, it should trigger an alert.
However, I was wondering if I could optimize it a little, because I don't really need to get every row, I just need to know if this query results in AT LEAST one row.
Is it possibe?
EXISTS might be faster:
select distinct client_id
from t
where exists (select 1
from t t2
where t2.client_id = t.client_id and t2.company_id <> t.company_id
);
I would expect a performance improvement under two conditions:
There is an index on (client_id, company_id).
Most clients have only one company.

How to query records that don't follow expected pattern with SQL

I have a table that may look like this:
ID | ACTION
1 | 'start'
2 | 'stop'
3 | 'start'
4 | 'stop'
5 | 'start'
...| ...
my question is, how can I detect with an SQL query if the start/stop pattern breaks, such as two stops without a start in between and vice versa?
I want a query to show all records that break the pattern. You could say that when there are two start/stop actions, you cannot be sure which of the two records are to blame, so I would like to have both records added to the query.
I know how to do this with VBA, but I do not see a way to use VBA with a query.
Thank you.
You want any row where the next row is the same type or the previous row is the same type.
Assuming the ids have no gaps, you can get the information you need using joins:
select t.*
from (t as t left join
t as tnext
on tnext.id = t.id + 1
) left join
t as tprev
on tprev.id = t.id - 1
where t.action in (tprev.action, tnext.action);
If you have no gaps in your ids:
select * from tablename as t
where
action in (select action from tablename where id in (t.id - 1, t.id + 1))

SQL - for each entry in a table - check for associated row

I have a log table which logs a start row, and a finish row for a particular event.
Each event should have a start row, and if everything goes ok it should have an end row.
But if something goes wrong then the end row may not be created.
I want to SELECT everything in the table that has a start row but not an associated end row.
For example, consider the table like this:
id event_id event_status
1 123 1
2 123 2
3 234 1
4 234 2
5 456 1
6 678 1
7 678 2
Notice that the id column 5 has a start row but no end row. Start is an event_status of 1, end is an event_status of 2.
How can i pull back all the event_ids which have a start row but not an end row>?
This is for mssql.
You could use a not exists subquery to demand that no other row exists that ends the event:
select *
from YourTable t1
where status = 1
and not exists
(
select *
from YourTable t2
where t2.event_id = t1.event_id
and t2.status = 2
)
You can try with left self join as below:
select y1.event_id from #yourevents y1 left join #yourevents y2
on y1.event_id = y2.event_id
and y1.event_status = 1
and y2.event_status = 2
where y2.event_id is null
and y1.event_status = 1
In this particular case you could use one of 3 solutions:
Solution 1. The classic
Check if there is no end status
SELECT *
FROM myTable t1
WHERE NOT EXISTS (
SELECT *
FROM myTable t2
WHERE t1.event_id = t2.event_id AND t2.status=2
)
Solution 2. Make it pretty. Don't do subqueries with so many parentheses
The same check, but in a more concise and pretty manner
SELECT t1.*
FROM myTable t1
LEFT JOIN myTable t2 ON t1.event_id = t2.event_id AND t2.status=2
-- Doesn't exist
WHERE t2.event_id IS NULL
Solution 3. Look for the last status for each event
More flexibility in case the status logic becomes more complicated
WITH last_status AS (
SELECT
id,
event_id,
status,
-- The ROWS BETWEEN ..yadda yadda ... FOLLOWING might be unnecessary. Try, check.
last_value(status) OVER (PARTITION BY event_id ORDER BY status ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_status
FROM myTable
)
SELECT
id,
event_id,
status
FROM last_events
WHERE last_status<>2
There are more, with min/max queries and others. Pick what best suits your need for cleanliness, readability and versatility.