Troubleshooting SQL Query - sql

I have a Patient activity table that records every activity of the patient right from the time the patient got admitted to the hospital till the patient got discharged. Here is the table command
Create table activity
( activityid int PRIMARY KEY NOT NULL,
calendarid int
admissionID int,
activitydescription varchar(100),
admitTime datetime,
dischargetime datetime,
foreign key (admissionID) references admission(admissionID)
)
The data looks like this:
activityID calendarid admissionID activitydescription admitTime dischargeTime
1 100 10 Patient Admitted 1/1/2013 10:15 -1
2 100 10 Activity 1 -1 -1
3 100 10 Activity 2 -1 -1
4 100 10 Patient Discharged -1 1/4/2013 13:15
For every calendarID defined, the set of admissionid repeats. For a given calendarid, the admissionsid(s) are unique. For my analysis, I want to write a query to display admissionid, calendarid, admitTime and dischargetime.
select admissionId, calendarid, admitTime=
(select distinct admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select distinct dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100
When I individually assign numbers, it works, otherwise it comes up with this message:
Subquery returned more than 1 value.
What am I doing wrong?

DISTINCT does not return 1 row, it returns all distinct rows given the columns you provided in the select clause. That's why you're getting more than one value back from the subquery.
What are you looking for out of the sub-query? If you use TOP 1 instead of DISTINCT, that should work, but it might not be what you're looking for.

Your error message tells a lot. Obviously, one (or both) of your projection subqueries in the (the SELECT DISTINCT queries) return more than one value. Thus, the columns admitTime, resp. dischargeTime cannot be compared to the result.
One possibility would be to limit your subqueries to 1 row. However, this error might also indicate a structural problem in your DB design.
Try:
select top 1 admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
or
select admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid
limit 1

This should get you what you want, with a bit less of a performance hit than subqueries:
select a1.admissionId
,a1.calendarid
,a2.admitTime
,a3.dischargeTime
from activity a1
left join activity a2
on a1.calendarid = a2.calendarid
and a2.admitTime <> -1
left join activity a3
on a1.calendarid = a3.calendarid
and a3.dischargeTime <> -1
where a1.calendarid=100

Try this !
select admissionId, calendarid, admitTime=
(select top(1) admitTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid),
dischargeTime=
(select top(1) dischargeTime
from activity a1
where a1.admisionID=a.admissionID and a1.calendarID=a.calendarid)
from activity a
where calendarid=100

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Eliminate NULL records in distinct select statement

In SQL SERVER 2008
Relation : Employee
empid clock-in clock-out date Cmpid
1 10 11 17-06-2015 001
1 11 12 17-06-2015 NULL
1 12 1 NULL 001
2 10 11 NULL 002
2 11 12 NULL 002
I need to populate table temp :
insert into temp
select distinct empid,date from employee
This gives all
3 records since they are distinct but what
I need is
empid date CMPID
1 17-06-2015 001
2 NULL 002
Depending on the size and scope of your table, it might just be more prudent to add
WHERE columnName is not null AND columnName2 is not null to the end of your query.
Null is different from other date value. If you wont exclude null record you have to add a and condition like table.filed is not null.
It sounds like what you want is a result table containing a row or tuple (relational databases don't have records) for every employee with a date column showing the date on which the worked or null if they didn't work. Right?
Something like this should do you:
select e.employee_id
from ( select distinct
empid
from employee
) master
left join employee detail on detail.empid = master.empid
and detail.date is not null
The master virtual table gives you the set of destinct employees; the detail gives you employees with non-null dates on which they worked. The left join gives you everything from master with any matches from detail blended in.
Rows in master with no matching rows in details, are returned once with the contributing columns from detail set to null. Rows in master with matching rows in detailare repeated once for each such match, with the detail columns reflecting the matching row's values.
This will give you the lowest date or null for each empid
SELECT empid,
MIN(date) date,
MIN(cmpid) cmpid
FROM employee
GROUP BY empid
try this
select distinct empid,date from employee where date is not null

Trouble performing Postgres group by non-ID column to get ID containing max value

I'm attempting to perform a GROUP BY on a join table table. The join table essentially looks like:
CREATE TABLE user_foos (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL,
foo_id INT NOT NULL,
effective_at DATETIME NOT NULL
);
ALTER TABLE user_foos
ADD CONSTRAINT user_foos_uniqueness
UNIQUE (user_id, foo_id, effective_at);
I'd like to query this table to find all records where the effective_at is the max value for any pair of user_id, foo_id given. I've tried the following:
SELECT "user_foos"."id",
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
Unfortunately, this results in the error:
column "user_foos.id" must appear in the GROUP BY clause or be used in an aggregate function
I understand that the problem relates to "id" not being used in an aggregate function and that the DB doesn't know what to do if it finds multiple records with differing ID's, but I know this could never happen due to my trinary primary key across those columns (user_id, foo_id, and effective_at).
To work around this, I also tried a number of other variants such as using the first_value window function on the id:
SELECT first_value("user_foos"."id"),
"user_foos"."user_id",
"user_foos"."foo_id",
max("user_foos"."effective_at")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id";
and:
SELECT first_value("user_foos"."id")
FROM "user_foos"
GROUP BY "user_foos"."user_id", "user_foos"."foo_id"
HAVING "user_foos"."effective_at" = max("user_foos"."effective_at")
Unfortunately, these both result in a different error:
window function call requires an OVER clause
Ideally, my goal is to fetch ALL matching id's so that I can use it in a subquery to fetch the legitimate full row data from this table for matching records. Can anyone provide insight on how I can get this working?
Postgres has a very nice feature called distinct on, which can be used in this case:
SELECT DISTINCT ON (uf."user_id", uf."foo_id") uf.*
FROM "user_foos" uf
ORDER BY uf."user_id", uf."foo_id", uf."effective_at" DESC;
It returns the first row in a group, based on the values in parentheses. The order by clause needs to include these values as well as a third column for determining which is the first row in the group.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1
If you don't want to use a sub query based on a composite of all three keys then you need to create a "dense rank" window function field that orders subsets of id, user_id and foo_id by effective date with the rank order field. Then subquery that and take the records where rank_order=1. Since the rank ordering was by effective date you are getting all fields of the record with the highest effective date for each foo and user.
DATSET
1 1 1 01/01/2001
2 1 1 01/01/2002
3 1 1 01/01/2003
4 1 2 01/01/2001
5 2 1 01/01/2001
DATSET WITH RANK ORDER PARTITIONED BY FOO_ID, USER_ID ORDERED BY DATE DESC
1 3 1 1 01/01/2001
2 2 1 1 01/01/2002
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001
SELECT * FROM QUERY ABOVE WHERE RANK_ORDER=1
3 1 1 1 01/01/2003
4 1 1 2 01/01/2001
5 1 2 1 01/01/2001

Select rows in one table, adding column where MAX(Date) of rows in other, related table

I have a table containing a set of tasks to perform:
Task
ID Name
1 Washing Up
2 Hoovering
3 Dusting
The user can add one or more Notes to a Note table. Each note is associated with a task:
Note
ID ID_Task Completed(%) Date
11 1 25 05/07/2013 14:00
12 1 50 05/07/2013 14:30
13 1 75 05/07/2013 15:00
14 3 20 05/07/2013 16:00
15 3 60 05/07/2013 17:30
I want a query that will select the Task ID, Name and it's % complete, which should be zero if there aren't any notes for it. The query should return:
ID Name Completed (%)
1 Washing Up 75
2 Hoovering 0
3 Dusting 60
I've really been struggling with the query for this, which I've read is a "greatest n per group" type problem, of which there are many examples on SO, none of which I can apply to my case (or at least fully understand). My intuition was to start by finding the MAX(Date) for each task in the note table:
SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task
Annoyingly, I can't just add "Complete %" to the above query unless it's contained in a GROUP clause. Argh! I'm not sure how to jump through this hoop in order to somehow get the task table rows with the column appended to it. Here is my pathetic attempt, which fails as it only returns tasks with notes and then duplicates task records at that (one for each note, so it's a complete fail).
SELECT Task.ID,
Task.Name,
Note.Complete
FROM
Task
JOIN
(SELECT ID_Task,
MAX(Date) AS Date
FROM
Note
GROUP BY
ID_Task) AS InnerNote
ON
Task.ID = InnerNote.ID_Task
JOIN
Note
ON
Task.ID = Note.ID_Task
Can anyone help me please?
If we assume that tasks only become more complete, you can do this with a left outer join and aggregation:
select t.ID, t.Name, coalesce(max(n.complete), 0)
from tasks t left outer join
notes n
on t.id = n.id_task
group by t.id, t.name
If tasks can become "less complete" then you want the one with the last date. For this, you can use row_number():
select t.ID, t.Name, coalesce(n.complete, 0)
from tasks t left outer join
(select n.*, row_number() over (partition by id_task order by date desc) as seqnum
from notes n
) n
on t.id = n.id_task and n.seqnum = 1;
In this case, you don't need a group by, because the seqnum = 1 performs the same role.
How about this just get the max of completed and group by taskid
SELECT t.ID_Task as ID,n.`name`,MAX(t.completed) AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
OR
SELECT t.ID_Task as ID,n.`name`,
(CASE when MAX(t.completed) IS NULL THEN '0' ELSE MAX(t.completed))AS completed
FROM `task` t RIGHT JOIN `note` n on ( t.ID_Task=n.ID )
GROUP BY t. ID_Task
select a.ID,
a.Name,
isnull((select completed
from Note
where ID_Task = b.ID_Task
and Date = b.date),0)
from Task a
LEFT OUTER JOIN (select ID_Task,
max(date) date
from Note
group by ID_Task) b
ON a.ID = b.ID_Task;
See DEMO here

Complex SQL Query (at least for me)

I'm trying to develop a sql query that will return a list of serial numbers. The table is set up that whenever a serial number reaches a step, the date and time are entered. When it completes the step, another date and time are entered. I want to develop a query that will give me the list of serial numbers that have entered the step, but not exitted the step. They may enter more than once, so I'm only looking for serial numbers that don't have exits after and enter.
Ex.(for easy of use, call the table "Table1")
1. Serial | Step | Date
2. 1 | enter | 10/1
3. 1 | exit | 10/2
4. 1 | enter | 10/4
5. 2 | enter | 10/4
6. 3 | enter | 10/5
7. 3 | exit | 10/6
For the above table, serial numbers 1 and 2 should be retrieved, but 3 should not.
Can this be done in a signle query with sub queries?
select * from Table1
group by Step
having count(*) % 2 = 1
this is when there cannot be two 'enter' but each enter is followed by an 'exit' (as in the example provided)
Personally I think this is something best done through a change in the way the data is stored. The current method cannot be efficient or effective. Yes you can mess around and find a way to get the data out. However, what happens when you have multiple entered steps with no exit for the same serialNO? Yeah it shouldn't happen but sooner or later it will unless you have code written to prevent it (code which coupld get complicated to write). It would be cleaner to have a table that stores both the enter and exit in the same record. Then it become trivial to query (and much faster) in order to find those entered but not exited.
This will give you all 'enter' records that don't have an ending 'exit'. If you only want a list of serial numbers you should then also group by serial number and select only that column.
SELECT t1.*
FROM Table1 t1
LEFT JOIN Table1 t2 ON t2.Serial=t1.Serial
AND t2.Step='Exit' AND t2.[Date] >= t1.[Date]
WHERE t1.Step='Enter' AND t2.Serial IS NULL
I tested this in MySQL.
SELECT Serial,
COUNT(NULLIF(Step,'enter')) AS exits,
COUNT(NULLIF(Step,'exit')) AS enters
FROM Table1
WHERE Step IN ('enter','exit')
GROUP BY Serial
HAVING enters <> exits
I wasn't sure what the importance of Date was here, but the above could easily be modified to incorporate intraday or across-days requirements.
SELECT DISTINCT Serial
FROM Table t
WHERE (SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'exit') <
(SELECT COUNT(*) FROM Table t2 WHERE t.Serial = t2.Serial AND Step = 'enter')
SELECT * FROM Table1 T1
WHERE NOT EXISTS (
SELECT * FROM Table1 T2
WHERE T2.Serial = T1.Serial
AND T2.Step = 'exit'
AND T2.Date > T1.Date
)
If you're sure that you've got matching enter and exit values for the the ones you don't want, you could look for all the serial values where the count of "enter" is not equal to the count of "exit".
If you're using MS SQL 2005 or 2008, you could use a CTE to get the results you're looking for...
WITH ExitCTE
AS
(SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit')
SELECT A.*
FROM #Table1 A LEFT JOIN ExitCTE B ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL
If you're not using those, i'd try for a subquery instead...
SELECT A.*
FROM #Table1 A LEFT JOIN (SELECT Serial, StepDate
FROM #Table1
WHERE Step = 'exit') B
ON A.Serial = B.Serial AND B.StepDate > A.StepDate
WHERE A.Step = 'enter'
AND B.Serial IS NULL
In Oracle:
SELECT *
FROM (
SELECT serial,
CASE
WHEN so < 0 THEN "Stack overflow"
WHEN depth > 0 THEN "In"
ELSE "Out"
END AS stack
FROM (
SELECT serial, MIN(SUM(DECODE(step, "enter", 1, "exit", -1) OVER (PARTITION BY serial ORDER BY date)) AS so, SUM(DECODE(step, "enter", 1, "exit", -1)) AS depth
FROM Table 1
GROUP BY serial
)
)
WHERE stack = "Out"
This will select what you want AND filter out exits that happened without enters
Several people have suggested rearranging your data, but I don't see any examples, so I'll take a crack at it. This is a partially-denormalized variant of the same table you've described. It should work well with a limited number of "steps" (this example only takes into account "enter" and "exit", but it could be easily expanded), but its greatest weakness is that adding additional steps after populating the table (say, enter/process/exit) is expensive — you have to ALTER TABLE to do so.
serial enter_date exit_date
------ ---------- ---------
1 10/1 10/2
1 10/4 NULL
2 10/4 NULL
3 10/5 10/6
Your query then becomes quite simple:
SELECT serial,enter_date FROM table1 WHERE exit_date IS NULL;
serial enter_date
------ ----------
1 10/4
2 10/4
Here's a simple query that should work with your scenario
SELECT Serial FROM Table1 t1
WHERE Step='enter'
AND (SELECT Max(Date) FROM Table1 t2 WHERE t2.Serial = t1.Serial) = t1.Date
I've tested this one and this will give you the rows with Serial numbers of 1 & 2