Optimizing a troublesome query - sql

I'm generating a PDF, via php from 2 mysql tables, that contains a table. On larger tables the script is eating up a lot of memory and is starting to become a problem.
My first table contains "inspections." There are many rows per day. This has a many to one relationship with the user table.
Table "inspections"
id
area
inpsection_date
inpsection_agent_1
inpsection_agent_2
inpsection_agent_3
id (int)
area (varchar) - is one of 8 "areas" ie: Concrete, Soils, Earthwork
inspection_date (int) - unix timestamp
inspection_agent_1 (int) - a user id
inspection_agent_2 (int) - a user id
inspection_agent_3 (int) - a user id
Second table is the user's info. All I need is to join the name to the "inspection_agents_x"
id
name
The final table, that is going to be in the PDF, needs to organize the data by:
by day
by user, find every "area" that the user "inspected" on that day
Concrete
Soils
Earthwork
1/18/2011
Jon Doe
X
Jane Doe
X
X
And so on for each day. Right now I'm just doing a simple join on the names and then organizing everything on the code end. I know I'm leaving a lot on the table as far as the queries go, I just can't think of way to do it.
Thanks for any and all help.

Select U.name
, user_inspections.inspection_date
, Min( Case When user_inspections.area = 'Concrete' Then 'X' End ) As Concrete
, Min( Case When user_inspections.area = 'Soils' Then 'X' End ) As Soils
, Min( Case When user_inspections.area = 'Earthwork' Then 'X' End ) As Earthwork
From users As U
Join (
Select area, inspection_date, inspection_agent1 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent2 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent3 As user_id
From inspections
) As user_inspections
On user_inspections.user_id = U.id
Group By U.name, user_inspections.inspection_date
This is effectively a static crosstab. It means that you will need to know all areas that should be outputted in the query at design time.
One of the reasons this query is problematic is that your schema is not normalized. Your inspection table should look like:
Create Table inspections
(
id int...
, area varchar...
, inspection_date date ...
, inspection_agent int References Users ( Id )
)
That would avoid the inner Union All query to get the output you want.

I would go like this:
select i.*, u1.name, u2.name, u3.name from inspections i left join users u1 on (i.inspection_agent_id1 = u1.id) left join users u2 on (i.inspection_agent_id2 = u2.id) left join users u3 on (i.inspection_agent_id3 = u3.id) order by i.inspection_date asc;
Then select distinct areas names and remember them or fetch them from area table if you have any:
select distinct area from inspections;
Then it's just foreach:
$day = "";
foreach($inspection in $inspections)
{
if($day == "" || $inspection["inspection_date"] != $day)
{
//start new row with date here
}
//start standard row with user name
}
It isn't clear if you have to display all users each time ( even if some of them do not do inspection that thay), if you have to you should fetch users once and loop over $users and search for user in $inspection row.

Related

SQL COUNT By month

I have the following table that registers the days students went to school. I need to count the days they are PRESENT, but also need to count the total of school days for each month. (When ASISTENCIA is either 0 or 1)
This what I have so far, but it doesn't count the total.
BEGIN
SELECT
u.user_id user_id,
u.user_first_name as names,
u.user_last_name_01 as lastname1,
u.user_last_name_02 as lastname2,
MONTH(a.FECHA_ASISTENCIA) month,
COUNT(*) as absent_days,
p.PHONE as phone,
p.CITY as city,
#EDUCATION_LEVEL_ID
FROM
users u
inner join asistencia a ON u.user_id = a.USER_ID
inner join profile p ON u.rut_SF = p.RUT_SF
WHERE
a.ASISTENCIA = 0 -- NOT PRESENT
AND a.EDUCATION_LEVEL_ID = #EDUCATION_LEVEL_ID
AND YEAR(a.FECHA_ASISTENCIA) = #EDUCATION_LEVEL_YEAR
GROUP BY
u.user_id,
u.user_first_name,
u.user_last_name_01,
u.user_last_name_02,
MONTH(a.FECHA_ASISTENCIA),
p.TELEFONO,
p.CIUDAD_DOM
ORDER BY mes
END
ATTENDANCE
USER_ID
DATE
ATTENDANCE
EDUCATION_LEVEL_ID
123
2021-04-13
0
1
123
2021-04-14
1
1
DESIRED OUTPUT
names
lastname1
lastname2
month
absent_days
total_class_days
city
JOHN
SMITH
SMITH
3
10
24
CITY
JOHN
SMITH
SMITH
4
8
24
CITY
Without examples of what these tables look like, it is hard to give you a solid answer. However, it appears as though your biggest challenge is that ABSTENTIA does not exist.
This is a common problem for analysis - you need to create rows that do not exist (when the user was absent).
The general approach is to:
create a list of unique users
create a list of unique dates you care about
Cross Join (cartesian join) these to create every possible combination of user and date
Outer Join #3 to #4 so you can populate a PRESENT flag, and now you can see both who were PRESENT and which were not.
Filter out rows which don't apply (for instance if a user joined on 3/4/2021 then ignore the blank rows before this date)
You can accomplish this with some SQL that looks like this:
WITH GLOBAL_SPINE AS (
SELECT
ROW_NUMBER() OVER (ORDER BY NULL) as INTERVAL_ID,
DATEADD('DAY', (INTERVAL_ID - 1), '2021-01-01'::timestamp_ntz) as SPINE_START,
DATEADD('DAY', INTERVAL_ID, '2022-01-01'::timestamp_ntz) as SPINE_END
FROM TABLE (GENERATOR(ROWCOUNT => 365))
),
GROUPS AS (
SELECT
USERID,
MIN(DESIRED_INTERVAL) AS LOCAL_START,
MAX(DESIRED_INTERVAL) AS LOCAL_END
FROM RASGO.PUBLIC.RASGO_SDK__OP4__AGGREGATE_TRANSFORM__8AB1FEDF90
GROUP BY
USERID
),
GROUP_SPINE AS (
SELECT
USERID,
SPINE_START AS GROUP_START,
SPINE_END AS GROUP_END
FROM GROUPS G
CROSS JOIN LATERAL (
SELECT
SPINE_START, SPINE_END
FROM GLOBAL_SPINE S
WHERE S.SPINE_START BETWEEN G.LOCAL_START AND G.LOCAL_END
)
)
SELECT
G.USERID AS GROUP_BY_USERID,
GROUP_START,
GROUP_END,
T.*
FROM GROUP_SPINE G
LEFT JOIN {{ your_table }} T
ON DESIRED_INTERVAL >= G.GROUP_START
AND DESIRED_INTERVAL < G.GROUP_END
AND G.USERID = T.USERID;
The above script works on Snowflake, but the syntax might be slightly different depending on your RDBMS. There are also some other tweaks you can make regarding when you insert the blank rows, I chose 'local' which means that we begin inserting rows for each user on their very first day. You could change this to global if you wanted to populate data from every single day between 1/1/2021 and 1/1/2022.
I used Rasgo to generate this SQL

Select last appointment for a given specialist and a referral leading to it

For the sake of simplicity, let’s assume that the table in question is called app and it has only three fields:
Person_id | employee_id | appointment_time
----------+-------------+-----------------
int | int | date
The table holds details of all medical appointments, past and future, for all clients (person_id) and specialists (employee_id).
What I am trying to figure out is how to create a list of appointments for a given specialist (let's say with an id of 235) and their corresponding "referals" (if any) - the previous appointment for a given person_id with an earlier date and serviced by another specialist (id <> 235).
SELECT
qLast.person_id,
qLast.employee_id,
qLast.LastDate,
qPrevious.employee_id,
qPrevious.PreviousDate
FROM
(
SELECT
app.person_id,
app.employee_id,
Max(app.appointment_time) AS LastDate
FROM
app
GROUP BY
app.person_id,
app.employee_id
HAVING
app.person_id <> 0
AND app.employee_id = 235
) qLast
LEFT JOIN (
SELECT
qSub.person_id,
app.employee_id,
qSub.MaxOfappointment_time AS PreviousDate
FROM
(
SELECT
app.person_id,
Max(app.appointment_time) AS MaxOfappointment_time
FROM
app
GROUP BY
app.person_id,
app.employee_id
HAVING
app.person_id <> 0
AND app.employee_id <> 235
) qSub
INNER JOIN app ON (
qSub.MaxOfappointment_time = app.appointment_time
)
AND (qSub.person_id = app.person_id)
) qPrevious ON qLast.person_id = qPrevious.person_id;
My mangled attempt almost works but sadly falls on its confused face when there is an appointment for a specialist with id<>235 with a later date than the last appointment for id=235. For now I run another query on the results of this one to filter out the unwanted records but it a rather ugly kludge. I'm sure there is a better and more elegant way of solving it. Help please!
I think you basically want lag(), but that is not available in SQL Server 2008 (time to upgrade to supported software!).
You can use apply instead:
select a.*, a2.*
from app a cross apply
(select top (1) a2.*
from app a2
where a2.person_id = a.person_id and
a2.employee_id <> a.employee_id and
a2.appointment_time < a.appointment_time
order by a2.appointment_time desc
) a2

High performance TSQL to retrieve data

I have two tables with below structure
Person(ID, Name, ...)
Action(ID, FirstPersonId, SecondPersonId, Date)
I wanna retrieve this data for each person:
Number of action that a person be on second person from last action
that be on first person
Current query
Select Result.Id ,
(Select Count(*)
From Action
Where SecondPersonId = Result.Id
AND Date > Result.LastAction)
From
(Select ID ,
(
Select Top 1 Date
From Action
Where Action.FirstPersonId = Person.Id
) as LastAction
From Person ) As Result
this query has bad performance and i need very better one.
with lastActionPerson as -- last action for every first person
(select FirstPersonId , max([Date]) as LastActionDate
from Action
)
select a.SecondPersonId ,count(*)
from lastActionPerson lap
join Action a
on a.SecondPersonId = lap.FirstPersonId -- be on second person
and a.[Date] > lap.lastActionDate
-- you could continue to right join person table to show the person without actions
group by a.SecondPersonId

Find incorrect records by Id

I am trying to find records where the personID is associated to the incorrect SoundFile(String). I am trying to search for incorrect records among all personID's, not just one specific one. Here are my example tables:
TASKS-
PersonID SoundFile(String)
123 D10285.18001231234.mp3
123 D10236.18001231234.mp3
123 D10237.18001231234.mp3
123 D10212.18001231234.mp3
123 D12415.18001231234.mp3
**126 D19542.18001231234.mp3
126 D10235.18001234567.mp3
126 D19955.18001234567.mp3
RECORDINGS-
PhoneNumber(Distinct Records)
18001231234
18001234567
So in this example, I am trying to find all records like the one that I indented. The majority of the soundfiles like '%18001231234%' are associated to PersonID 123, but this one record is PersonID 126. I need to find all records where for all distinct numbers from the Recordings table, the PersonID(s) is not the majority.
Let me know if you need more information!
Thanks in advance!!
; WITH distinctRecordings AS (
SELECT DISTINCT PhoneNumber
FROM Recordings
),
PersonCounts as (
SELECT t.PersonID, dr.PhoneNumber, COUNT(*) AS num
FROM
Tasks t
JOIN distinctRecordings dr
ON t.SoundFile LIKE '%' + dr.PhoneNumber + '%'
GROUP BY t.PersonID, dr.PhoneNumber
)
SELECT t.PersonID, t.SoundFile
FROM PersonCounts pc1
JOIN PersonCounts pc2
ON pc2.PhoneNumber = pc1.PhoneNumber
AND pc2.PersonID <> pc1.PersonID
AND pc2.Num < pc1.Num
JOIN Tasks t
ON t.PersonID = pc2.PersonID
AND t.SoundFile LIKE '%' + pc2.PhoneNumber + '%'
SQL Fiddle Here
To summarize what this does... the first CTE, distinctRecordings, is just a distinct list of the Phone Numbers in Recordings.
Next, PersonCounts is a count of phone numbers associated with the records in Tasks for each PersonID.
This is then joined to itself to find any duplicates, and selects whichever duplicate has the smaller count... this is then joined back to Tasks to get the offending soundFile for that person / phone number.
(If your schema had some minor improvements made to it, this query would have been much simpler...)
here you go, receiving all pairs (PersonID, PhoneNumber) where the person has less entries with the given phone number than the person with the maximum entries. note that the query doesn't cater for multiple persons on par within a group.
select agg.pid
, agg.PhoneNumber
from (
select MAX(c) KEEP ( DENSE_RANK FIRST ORDER BY c DESC ) OVER ( PARTITION BY rt.PhoneNumber ) cmax
, rt.PhoneNumber
, rt.PersonID pid
, rt.c
from (
select r.PhoneNumber
, t.PersonID
, count(*) c
from recordings r
inner join tasks t on ( r.PhoneNumber = regexp_replace(t.SoundFile, '^[^.]+\.([^.]+)\.[^.]+$', '\1' ) )
group by r.PhoneNumber
, t.PersonID
) rt
) agg
where agg.c < agg.cmax
;
caveat: the solution is in oracle syntax though the operations should be in the current sql standard (possibly apart from regexp_replace, which might not matter too much since your sound file data seems to follow a fixed-position structure ).

Query for exact match of users in a conversation in SQL Server

I have a conversation table, and a user conversation table.
CONVERSATION
Id, Subject, Type
USERCONVERSATION
Id, UserId, ConversationId
I need to do a SQL Query based on a list of UserIds. So, if I have three UserIds for the same ConversationId, I need to perform a query where if I provide the same three userIds, it will return the ConversationId where they match exactly.
Assuming the same user can't be in a UserConversation twice:
SELECT ConversationID
FROM UserConversation
GROUP BY ConversationID
HAVING
Count(UserID) = 3 -- this isn't necessary but might improve performance
AND Sum(CASE WHEN UserID IN (1, 2, 3) THEN 1 ELSE 0 END) = 3
This also works:
SELECT ConversationID
FROM
UserConversation UC
LEFT JOIN (
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
) U (UserID) ON UC.UserID = U.UserID
GROUP BY ConversationID
HAVING
Count(U.UserID) = 3
AND Count(UC.UserID) = 3
If you find that performance is poor with either of these queries then a two-step method could help: First find all conversations containing at least the desired parties, then from that set exclude those that contain any others. Indexes of course will make a big difference.
Getting rid of the ID column from UserConversation will improve performance by getting more rows per page, thus more data per read (about 50% more!). If your Id column is not only the PK but also the clustered index, then immediately go change the clustered index to ConversationId, UserId (or vice versa, depending on the most common usage)!
If you need help with performance post a comment and I'll try to help you.
P.S. Here's another wild idea but it may not perform as well (though things can surprise you sometimes):
SELECT
Coalesce(C.ConversationID, UC.ConversationID) ConversationID
-- Or could be Min(C.ConversationID)
FROM
Conversation C
CROSS JOIN (
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
) U (UserID)
FULL JOIN UserConversation UC
ON C.ConversationID = UC.ConversationID
AND U.UserID = UC.UserID
GROUP BY Coalesce(C.ConversationID, UC.ConversationID)
HAVING Count(*) = Count(U.UserID)
My solution was wrong, unfortunately...
I strongly suggest to use one of Erik's solutions...
Regards