Compare table to itself based on datetime and boolean fields - sql

I have a table that I am trying to find items that are scheduled for the same start and end time and there are two Boolean fields that indicate a schedule collision. Here is kind of what the table looks like without having excess stuff in there:
id | RecordNo | Starttime | Endtime | Description | Bool1 | Bool2
Now, these records have different RecordNo but if two records have the same Description,Starttime and Endtime and one record has Bool1 as FALSE and the other record has Bool2 as TRUE or vice versa, that would be a schedule collision.
Can someone help me with this query?

For exact same starttime and endtime collissions
with records as(
select starttime, endtime from table group by starttime, endtime where count(starttime)>1
)
select recordno from table t
inner join records r on t.starttime=r.starttime and t.endtime=r.endtime
but I think you may want overlapping collisions too
select t1.recordno
from table t1
inner join table t2
on (t1.starttime between t2.starttime and t2.endtime)
or
(t1.endtime between t2.starttime and t2.endtime)
This is a little dangerous though because it will join every record in the table to every record in the table. If you have 10 rows in the table it will create a 100 row set before narrowing it to the results. For 100 rows it will create 10000 rows before narrowing to your results.
rows ^ 2
Based on your last comment maybe you would want to do the second approach based on description and different Booleans and exact times in which case get back the duplicate transactions
select t1.recordno
from table t1
inner join table t2
on t1.starttime=t2.starttime
and t1.endtime=t2.endtime
and t1.description=t2.description
and t1.Bool1 != t2.Bool1
and t1.Bool2 != t2.Bool2

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

SQL Server : getting sum of values in "calendar" table without joining

Is it possible to get a the sum of value from the calendar_table to the main_table without joining like below?
select
date, sum(value)
from
main_table
inner join
calendar_table on start_date <= date and end_date >= date
group by
date
I am trying to avoid a join like this because main_table is a very large table with rows that have very large start and end dates, and it is absolutely killing my performance. And I've already indexed both tables.
Sample desired results:
+-----------+-------+
| date | total |
+-----------+-------+
| 7-24-2010 | 11 |
+-----------+-------+
Sample tables
calendar_table:
+-----------+-------+
| date | value |
+-----------+-------+
| 7-24-2010 | 5 |
| 7-25-2010 | 6 |
| ... | ... |
| 7-23-2020 | 2 |
| 7-24-2020 | 10 |
+-----------+-------+
main_table:
+------------+-----------+
| start_date | end_date |
+------------+-----------+
| 7-24-2010 | 7-25-2010 |
| 8-1-2011 | 8-5-2011 |
+------------+-----------+
You want the sum in the calendar table. So, I would recommend an "incremental" approach. This starts by unpivoting the data and putting the value as an increment and decrement in the results:
select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
union all
select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date;
The final step is to aggregate and do a cumulative sum:
select date, sum(inc) as value_on_date,
sum(sum(inc)) over (order by date) as net_value
from ((select c.date, c.value as inc
from main_table m join
calendar_table t
on m.start_date = c.date
) union all
(select dateadd(day, 1, c.date), - c.value as inc
from main_table m join
calendar_table t
on m.end_date = c.date
)
) c
group by date
order by date;
This is processing two rows of data for each row in the master table. Assuming that your time spans are longer than two days typically for each master row, the resulting data processed should be much smaller. And smaller data implies a faster query.
Here's a cross-apply example to possibly work from.
select main_table.date
, CalendarTable.ValueSum
from main_table
CROSS APPLY(
SELECT SUM(value) as ValueSum
FROM calendar_table
WHERE start_date <= main_table.date and main_table.end_date >= date
) as CalendarTable
group by date
You could try something like this ... but be aware, it is still technically 'joined' to the main table. If you look at an execution plan, you will see that there is a join operation of some kind going on.
select
date,
(select sum(value) from calendar_table t where m.start_date <= t.date and m.end_date >= t.date)
from
main_table m
The thing about that query is that the 'main_table' is not grouped as part of the results. You could possibly do that outside the select, but I don't know what you are trying to achieve. If you are grouping just to get the SUM, then perhaps maintaining the 'main_table' in the group is superflous.
As already mentioned, you must perform a join of some sort in order to get data from more than one table in a query.
You did not provide details if the indexes which are important for performance. I suggest the following indexes to optimize query performance.
For calendar_table, make sure you have a unique clustered index (or primary key) on date. Alternatively, a unique nonclustered index on date with the value column included.
A composite index on the main_table start_date and end_date columns may also be beneficial.
Even with optimal indexes, the query will still take some time against a 500M row table (e.g. a couple of minutes) with no additional filter criteria. If you need results in milliseconds, create an indexed view to materialize the join and aggregation results. Be aware the indexed view will add overhead for inserts/deletes on both tables as well as for updates to the value column in order to keep the index consistent with the underlying data.
Below is an indexed view DDL example.
CREATE VIEW dbo.vw_example
WITH SCHEMABINDING
AS
SELECT
date, sum(value) AS value, COUNT_BIG(*) AS countbig
from
dbo.main_table
inner join
dbo.calendar_table on start_date <= date and end_date >= date
group by
date;
GO
CREATE UNIQUE CLUSTERED INDEX cdx ON dbo.vw_example(date);
GO
Depending on your SQL Server edition, the optimizer may be able to use the indexed view automatically so your original query can use the view index without changes. Otherwise, query the view directly and specify a NOEXPAND hint:
SELECT date, value AS total
FROM dbo.vw_example WITH (NOEXPAND);
EDIT:
With the query improvement #GordonLinoff suggested, a non-clustered index on the main_table end_date column will help optimize that query.

Simple WHERE clause but keep extracted rows and fill them will null values

I have a table which basically looks like this one:
Date | Criteria
12-04-2016 123
12-05-2016 1234
...
Now I want to select those rows with values in the column 'Criteria' within a given range but I want to keep the extracted rows. The extracted rows should get the value 'null' for the column 'Criteria'. So for example, if I want to select the row with 'Criteria = 123' my result should look like this:
Date | Criteria
12-04-2016 123
12-05-2016 null
Currently I am using this query to get the result:
SELECT b.date, a.criteria
FROM (SELECT id, date, criteria FROM ABC WHERE criteria > 100 and criteria < 200) a
FULL OUTER JOIN ABC b ON a.id = b.id ORDER BY a.criteria
Someone told me that full outer joins perform very badly. Plus my table has like 400000 records and the query is used pretty often. So anyone has an idea to speed up my query? Btw I am using the Oracle11g database.
Do you just want a case expression?
SELECT date,
(case when criteria > 100 and criteria < 200 then criteria end) as criteria
FROM ABC;

Troubleshooting SQL Query

I have a Patient activity table that records every activity of the patient right from the time the patient got admitted to the hospital till the patient got discharged. Here is the table command
Create table activity
( activityid int PRIMARY KEY NOT NULL,
calendarid int
admissionID int,
activitydescription varchar(100),
admitTime datetime,
dischargetime datetime,
foreign key (admissionID) references admission(admissionID)
)
The data looks like this:
activityID calendarid admissionID activitydescription admitTime dischargeTime
1 100 10 Patient Admitted 1/1/2013 10:15 -1
2 100 10 Activity 1 -1 -1
3 100 10 Activity 2 -1 -1
4 100 10 Patient Discharged -1 1/4/2013 13:15
For every calendarID defined, the set of admissionid repeats. For a given calendarid, the admissionsid(s) are unique. For my analysis, I want to write a query to display admissionid, calendarid, admitTime and dischargetime.
select admissionId, calendarid, admitTime=
(select distinct admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select distinct dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100
When I individually assign numbers, it works, otherwise it comes up with this message:
Subquery returned more than 1 value.
What am I doing wrong?
DISTINCT does not return 1 row, it returns all distinct rows given the columns you provided in the select clause. That's why you're getting more than one value back from the subquery.
What are you looking for out of the sub-query? If you use TOP 1 instead of DISTINCT, that should work, but it might not be what you're looking for.
Your error message tells a lot. Obviously, one (or both) of your projection subqueries in the (the SELECT DISTINCT queries) return more than one value. Thus, the columns admitTime, resp. dischargeTime cannot be compared to the result.
One possibility would be to limit your subqueries to 1 row. However, this error might also indicate a structural problem in your DB design.
Try:
select top 1 admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
or
select admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
limit 1
This should get you what you want, with a bit less of a performance hit than subqueries:
select a1.admissionId
,a1.calendarid
,a2.admitTime
,a3.dischargeTime
from activity a1
left join activity a2
on a1.calendarid = a2.calendarid
and a2.admitTime <> -1
left join activity a3
on a1.calendarid = a3.calendarid
and a3.dischargeTime <> -1
where a1.calendarid=100
Try this !
select admissionId, calendarid, admitTime=
(select top(1) admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select top(1) dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100

mysql query trying to search by alias involving CASES and aggregate functions

I have two tables left joined. The query is grouped by the left table's ID column. The right table has a date column called close_date. The problem is, if there are any right table records that have not been closed (thus having a close_date of 0000-00-00), then I do not want any of the left table records to be shown, and if there are NO right table records with a close_date of 0000-00-00, I would like only the right table record with the MAX close date to be returned.
So for simplicity sake, let's say the tables look like this:
Table1
id
1
2
Table2
table1_id | close_date
1 | 0000-00-00
1 | 2010-01-01
2 | 2010-01-01
2 | 2010-01-02
I would like the query to only return this:
Table1.id | Table2.close_date
2 | 2010-01-02
I tried to come up with an answer using aliased CASES and aggregate functions, but I could not search by the result, and I was attempting not to make a 3 mile long query to solve the problem. I looked through a few of the related posts on here, but none seem to meet the criteria of this particular case.
Use:
SELECT t1.id,
MAX(t2.close_date)
FROM TABLE1 t1
JOIN TABLE2 t2 ON t2.table1_id = t1.id
WHERE NOT EXISTS(SELECT NULL
FROM TABLE2 t
WHERE t.table1_id = t1.id
AND t.closed_date = '0000-00-00')
The '0000-00-00' should be implicitly converted by MySQL to a DATETIME. If not, cast the value to DATETIME.
Try:
select table1id,close_date form table2
where close_date= (select max(close_date) from table2) or close_date='0000-00-00'