Getting a subquery to run N times - sql

I'm trying to write a query that scans a table with multiple status entries for each date, for each test, for each area in a system. The goal being to get the newest status for each date for each test in ONE given area. This would allow me to get a broad overview of a system to determine where the majority of tests are failing
Below is the basic table structure, but I've created this SQLFiddle for ease of use.
CREATE TABLE area (
area_id integer NOT NULL,
area_name character varying(100)
);
CREATE TABLE test (
test_id integer NOT NULL,
test_name character varying(100) NOT NULL,
area_id integer NOT NULL,
test_isvisible boolean DEFAULT true
);
CREATE TABLE status (
status_date bigint NOT NULL,
test_id integer NOT NULL,
process_state_id integer NOT NULL,
process_step_id integer NOT NULL,
status_iteration integer DEFAULT 1 NOT NULL,
status_time bigint NOT NULL
);
CREATE TABLE process_state (
process_state_id integer NOT NULL,
process_state_name character varying(100)
);
CREATE TABLE process_step (
process_step_id integer NOT NULL,
process_step_name character varying(100)
);
The query I currently have gets the furthest point of test processing for one single test for every date that is available. I would like to figure out a way to get that same type of information but instead pass the id of a given area, so that I can get that same data for each test in that area.
I.E. in the SQLFiddle, where I have information from dates July 2 - 10 for test1, I would also like the query to return the same set of information for test2, thus returning 18 rows instead of 9.
The main problem I'm having is that when I try to just join the area table and get all of the tests that way, I end up getting only 9 days of data like I did with one test, but just a mix-and-match of data from different tests.
Let me know if you need any more information, and I will post back here if I manage to figure it out before someone here.
EDIT
As was pointed out in the comments, this trial data does not have keys (primary or foreign) simply because it saved time and wasn't necessary for the problem at hand. It is important to note though, that these keys are 100% necessary in real world application, as the larger the dataset becomes, the more unruly and time consuming it would be to run queries against your tables.
Lesson: Don't do drugs, do keys.

After a couple more hours, I found a different way to think about it, and finally got the data I was looking for.
I realized that my main problem with my previous attempts had been the use of GROUP BY, since I would have to group every selected column if I grouped any of them. So I first wrote a query that just got me the test_id/test_name along with each date that there was data for, since I knew I could group all of these no problem:
SELECT t.test_name AS test_name,
to_char( to_timestamp(s.status_date)::TIMESTAMP, 'MM/DD/YYYY' ) AS event_date,
s.status_date
FROM status s
INNER JOIN test t ON t.test_id = s.test_id
INNER JOIN area a ON a.area_id = t.area_id
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE a.area_id = 12
GROUP BY t.test_id, s.status_date, t.test_name;
This didn't give me any information about where that test made it through (completed, failed, running). So then I wrote a separate query that simply got the test status when it was given a test_id and a status_date:
SELECT
CASE WHEN state.process_state_name = 'FAILURE' OR state.process_state_name = 'WAITING' OR state.process_state_name = 'VOLUME' THEN state.process_state_name
WHEN step.process_step_name = 'COMPLETE' AND (state.process_state_name = 'SUCCESS' OR state.process_state_name = 'APPROVED') THEN 'Complete'
ELSE 'Running'
END AS process_state
FROM status s
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE s.test_id = 290
AND s.status_date = 1404273600
AND s.status_iteration = (SELECT MAX(s.status_iteration)
FROM status s
WHERE s.test_id = 290
AND s.status_date = 1404273600)
ORDER BY s.status_time DESC, s.process_step_id DESC, s.process_step_id DESC
LIMIT 1;
So this query worked for one single test and date, which I recognized would work perfectly for a subquery in my original query, since it would bypass the GROUP BY logic. So with that in mind, I proceeded to merge the two queries to get this one final query:
SELECT t.test_name AS test_name,
to_char( to_timestamp(status.status_date)::TIMESTAMP, 'MM/DD/YYYY' ) AS event_date,
(
SELECT
CASE WHEN state.process_state_name = 'FAILURE' OR state.process_state_name = 'WAITING' OR state.process_state_name = 'VOLUME' THEN state.process_state_name
WHEN step.process_step_name = 'COMPLETE' AND (state.process_state_name = 'SUCCESS' OR state.process_state_name = 'APPROVED') THEN 'Complete'
ELSE 'Running'
END AS process_state
FROM status s
INNER JOIN process_step step ON s.process_step_id = step.process_step_id
INNER JOIN process_state state ON s.process_state_id = state.process_state_id
WHERE s.test_id = t.test_id
AND s.status_date = status.status_date
AND s.status_iteration = (SELECT MAX(s.status_iteration)
FROM status s
WHERE s.test_id = t.test_id
AND s.status_date = status.status_date)
ORDER BY s.status_time DESC, s.process_step_id DESC, s.process_step_id DESC
LIMIT 1
) AS process_status
FROM status status
INNER JOIN test t ON t.test_id = status.test_id
INNER JOIN area a ON a.area_id = t.area_id
WHERE a.area_id = 12
GROUP BY t.test_id, status.status_date, t.test_name
ORDER BY 1, 2;
And all of this can be seen in action in my revised SQLFiddle.
Let me know if you have questions about what I did, hopefully this helps future developers.

Related

How to check if time period is fully covered by smaller periods in SQL?

Having tables like the following:
create table originalPeriods (
[Id] INT PRIMARY KEY,
[Start] DATETIME NOT NULL,
[End] DATETIME NOT NULL,
[Flag1] INT NOT NULL,
[Flag2] INT NOT NULL,
CONSTRAINT UC_UniueFlags UNIQUE (Flag1,Flag2)
)
go
create table convertedPeriods(
[Id] INT PRIMARY KEY,
[Start] DATETIME NOT NULL,
[End] DATETIME NOT NULL,
[Flag1] INT NOT NULL,
[Flag2] INT NOT NULL
)
go
I want to check whether every period from the first table is represented by a set of periods from the second table with matching Flags.
I want converted periods (from the second table) to fill the whole original period (from the first period) with no empty spaces, no overlapping and no extensions! Converted periods should fit the original period exactly.
The perfect outcome would be a list of original periods Id with the flag of whether it is well covered by converted periods.
Try this, let me know if it works:
select
op.Id
,[Flag: Converted Period matches Original Period] = case when cp.Id is not null then 'Found' else 'Not Found' end
from originalPeriods as op
left join convertedPeriods as cp on cp.[Start] = op.[Start] and cp.[End] = op.[End]
I guess you are lookin for something like this
with RECURSIVE periodsNet as (
select o.id, o.dtstart, o.dtend,
c.dtend as netPoint,
exists(select * from convertedPeriods c2
where (c2.dtstart > c.dtstart and c2.dtstart < c.dtend)
or (c2.dtend > c.dtstart and c2.dtend < c.dtend)
) as hasOverlap
from originalPeriods o
inner join convertedPeriods c
on o.dtstart = c.dtstart
union all
select o.id, o.dtstart as ostart, o.dtend as oend,
c.dtend as netPoint,
exists(select * from convertedPeriods c2
where (c2.dtstart > c.dtstart and c2.dtstart < c.dtend)
or (c2.dtend > c.dtstart and c2.dtend < c.dtend)
) as hasOverlap
from periodsNet o
inner join convertedPeriods c
on o.netPoint = c.dtstart
),
periodsFilled as (
select id, dtstart, dtend,
case when dtend = max(netPoint) then true else false end filled
from periodsNet
group by id, dtstart, dtend
)
select *,
exists(select * from periodsNet n where n.id = p.id and n.hasOverlap) as hasOverlap
from periodsFilled p
see fiddle> https://www.db-fiddle.com/f/jpezmztvj7uFg1PvixaBsh/0
Thank you for your answers but I'm afraid their effectiveness was not sufficient.
For anyone haveing a similar problem in the future - I ended applying two checks:
First would check if for each original period there is:
period starting at the same time
period ending at the same time
each converted (apart from the ending one as in 2.) period has a following period.
SELECT
op.Id
,(SELECT COUNT() FROM convertedPeriods ps WHERE op.Start=ps.Start AND op.Flag1=cp.Flag1 AND op.Flag2=cp.Flag2)
,(SELECT COUNT() FROM convertedPeriods ps WHERE op.End=ps.End AND op.Flag1=cp.Flag1 AND op.Flag2=cp.Flag2)
,(SELECT COUNT(cp.Id) FROM convertedPeriods cp WHERE NOT EXISTS(SELECT 1 FROM convertedPeriods cp2 WHERE cp2.Start=cp.End) AND cp.End <> op.End)
FROM
originalPeriods op
While there can be false positives with this method, there are not false negatives - meaning every correct period representations must pass this test.
Second check was simply to generate random number of timestamps and compare whether they are covered in originals the same as in converted.
Those methods have proven themself to be succesful check for period's coverage comparement against huge amount of data.
The best solution to check if periods is coverd is this one from ask Tom:
from [enter link description here][1]
select * from (
select nmi,
max(invoice_end_date) over(
partition by nmi order by invoice_start_date
) + 1 start_gap,
lead(invoice_start_date) over(
partition by nmi order by invoice_start_date
) - 1 end_gap
from icr_tmp
)
where start_gap <= end_gap;
it works as a charm
[1]: https://asktom.oracle.com/pls/apex/asktom.search?tag=sql-to-find-gaps-in-date-ranges

Firebird - Calculate time difference between two rows

Overview: I have tables SHIFT_LOG, SHIFT_LOG_DET & SHIFT_LOG_ENTRY having Parent-Child-GrandChild relationships (one-to-many). So,
LOG table contains shift details.
LOG_DET contains operators in a particular shift &
LOG_ENTRY table logs different entry types and timestamp for a user in a shift like (ADDED, STARTED, ON-BREAK, JOINED, ENDED).
Problem: For a given shift I can get all operators, and their entries using below query. What I can't do is to find the duration an operator spent on a particular entry type. i.e difference between two rows ENTRY_TIME.
SELECT
ent.ID as ENT_ID,
det.ID as DET_ID,
usr.CODE as USR_ID,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE,
IIF(ent.ENTRY_TYPE = 0 , 'ADDED',
IIF(ent.ENTRY_TYPE = 1 , 'STARTED',
IIF(ent.ENTRY_TYPE = 2 , 'ON-BREAK',
IIF(ent.ENTRY_TYPE = 3 , 'JOINED',
IIF(ent.ENTRY_TYPE = 4 , 'ENDED', 'UNKNOWN ENTRY'))))) as ENTRY_TYPE_VALUE,
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_DET det on det.ID = ent.SHIFT_LOG_DET_ID
LEFT JOIN SHIFT_LOG log on log.ID = det.SHIFT_LOG_ID
LEFT JOIN USERS usr on usr.USERID = det.OPERATOR_ID
WHERE log.ID = 1
GROUP BY
usr.CODE,
ent.SHIFT_LOG_DET_ID,
det.ID,
ent.ID,
ENTRY_TYPE_VALUE,
ent.ENTRY_TIME,
ent.ENTRY_TYPE
Result Set:
So Inteval is the time spent in secs on a perticular ENTRY_TYPE. i.e
ROW(1).Interval = ( Row(2).EntryTime - Row(1).EntryTime )
Entry type ENDED has no interval as there is no other entry for the user after the shift has ended.
Firebird version is 2.5.3
Here is a different, "pro-active" approach. Whether it can fit your workflow decide for yourself. It is based upon adding special extra column just to link adjacent rows together.
Since LOG_ENTRY is a log of events, and events from same source, and events rather long (15 seconds is a lot for computer), I would assume that
Data is only added to the table, it is very rarely or never is edited or deleted
Data is added in ordered manner, that is when any event is being inserted - it is the LAST event in the batch (in your case batch seems to mean: for the given operator and the given shift).
If those assumptions hold, I'd add one more (indexed!) column to the table: batch_internal_id. It will start as zero on your selected row #1, will be 1 on the next row, will be 2 on the row #3 and so forth. It will be reset back to zero when the batch changes (on row #8 in your screenshot).
After that the calculation of time elapsed would be a simple continuous self-join, which should usually be faster, than having many sub-selects, one per row.
Something like that:
SELECT
ent.ID as ENT_ID,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE,
DECODE(ent.ENTRY_TYPE, 0 , 'ADDED', 1 , 'STARTED', 2 , 'ON-BREAK',
3 , 'JOINED', 4 , 'ENDED', 'UNKNOWN ENTRY')
as ENTRY_TYPE_VALUE, -- better make it an extra table to join!
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME,
ent_next.ENTRY_TIME - ent.ENTRY_TIME as time_elapsed
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_ENTRY ent_next ON
(ent.SHIFT_LOG_DET_ID = ent_next.SHIFT_LOG_DET_ID) and
(ent.batch_internal_id + 1 = ent_next.batch_internal_id)
ORDER BY ent.SHIFT_LOG_DET_ID, ent.batch_internal_id
The trick then would be to ensure correct filling of batch_internal_id within every batch and at the same time isolated from other batches.
Here is where the assumptions above become important.
You can easily auto-fill the new internal (batch-relative) ID field from a SQL trigger, providing that you made the warranty, that the event being inserted is always last in the batch.
Something like this:
CREATE TRIGGER SHIFT_LOG_DET_LINK_EVENTS
BEFORE UPDATE OR INSERT
ON SHIFT_LOG_DET
AS
BEGIN
NEW.batch_internal_id = 0;
SELECT FIRST(1) -- we only need one last row per same batch
prev.batch_internal_id + 1 -- next value
FROM SHIFT_LOG_DET prev
WHERE prev.SHIFT_LOG_DET_ID = NEW.SHIFT_LOG_DET_ID -- batch definition
ORDER BY prev.ENTRY_TIME DESCENDING
INTO NEW.batch_internal_id;
END
Such a trigger would initialize the relative ID with zero when new batch is started and with incremented last ID if there already were other rows for the batch.
It however is critically dependent upon always be called in-order when all the same batch's previous rows were already inserted and none of next rows was inserted yet.
One can also write the command a bit more laconic but maybe harder to read.
.......
AS
BEGIN
NEW.batch_internal_id =
COALESCE( (
SELECT FIRST(1) -- we only need one last row per same batch
prev.batch_internal_id + 1 -- next value
FROM SHIFT_LOG_DET prev
WHERE prev.SHIFT_LOG_DET_ID = NEW.SHIFT_LOG_DET_ID -- batch definition
ORDER BY prev.ENTRY_TIME DESCENDING
) , 0);
END
You will need to select the next date from the relevant entries. You can do this using something like:
select
SHIFT_LOG_DET_ID,
ENTRY_TIME,
datediff(minute from ENTRY_TIME to NEXT_ENTRY_TIME) as DURATION
from (
select
a.SHIFT_LOG_DET_ID,
a.ENTRY_TIME,
(select min(ENTRY_TIME)
from SHIFT_LOG_ENTRY
where SHIFT_LOG_DET_ID = a.SHIFT_LOG_DET_ID
and ENTRY_TIME > a.ENTRY_TIME) as NEXT_ENTRY_TIME
from SHIFT_LOG_ENTRY a
) b
See also this fiddle.
In Firebird 3, you can use the window function LEAD to achieve this:
select
SHIFT_LOG_DET_ID,
ENTRY_TIME,
datediff(minute from ENTRY_TIME
to lead(ENTRY_TIME) over (partition by SHIFT_LOG_DET_ID order by ENTRY_TIME)) as DURATION
from SHIFT_LOG_ENTRY
Full solution
This solution was contributed by AlphaTry
select
ENT_ID,
DET_ID,
USR_CODE,
SHIFT_LOG_DET_ID,
ENTRY_TYPE,
ENTRY_TYPE_VALUE,
ENTRY_TIME,
datediff(second from ENTRY_TIME to NEXT_ENTRY_TIME) as DURATION
from (
SELECT
ent.ID as ENT_ID,
det.ID as DET_ID,
usr.CODE as USR_CODE,
ent.SHIFT_LOG_DET_ID,
ent.ENTRY_TYPE as ENTRY_TYPE,
case (ent.ENTRY_TYPE)
when '0' then 'ADDED'
when '1' then 'STARTED'
when '2' then 'ON-BREAK'
when '3' then 'JOINED'
when '4' then 'ENDED'
else 'UNKNOWN ENTRY'
end as ENTRY_TYPE_VALUE,
ent.ENTRY_TIME+cast('31.12.1899' as timestamp) as ENTRY_TIME,
(
select min(ENTRY_TIME)
from SHIFT_LOG_ENTRY
where SHIFT_LOG_DET_ID = ent.SHIFT_LOG_DET_ID
and ENTRY_TIME > ent.ENTRY_TIME
)+cast('31.12.1899' as timestamp) as NEXT_ENTRY_TIME
FROM SHIFT_LOG_ENTRY ent
LEFT JOIN SHIFT_LOG_DET det on det.ID = ent.SHIFT_LOG_DET_ID
LEFT JOIN SHIFT_LOG log on log.ID = det.SHIFT_LOG_ID
LEFT JOIN USERS usr on usr.USERID = det.OPERATOR_ID
WHERE log.ID = 1
GROUP BY
usr.CODE,
ent.SHIFT_LOG_DET_ID,
det.ID,
ent.ID,
ENTRY_TYPE_VALUE,
ent.ENTRY_TIME,
ent.ENTRY_TYPE
) b
Result

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

I have a table of orders which has multiple rows of orders missing a Type and I'm struggling to get the queries right. I'm pretty new to SQL so please bear with me.
I've illustrated an example in the picture below. I need help creating the query that will take the table to the right and UPDATE it to look like the right table.
The orders are sorted by group. Each group should have one instance of type OK (IF A NULL OR OK ALREADY EXISTS), and no instances of NULL. I would like to achieve this by updating one of the groups' orders with type NULL to have type OK and delete the rest of the respective group's NULL rows.
I've managed to get the rows that I want to keep by
Create a temporary table where I insert the orders and replace NULL types with EMPTY
From the temporary table, get the existing OK orders for groups that already have one OK order, else an EMPTY order that should be changed to OK.
I've done this with the following:
SELECT * FROM Orders
SELECT *
INTO #modified
FROM
(SELECT
Id, IdGroup,
CASE WHEN Type IS NULL
THEN 'EMPTY'
ELSE Type
END Type
FROM
Orders) AS XXX
SELECT MIN(x.Id) Id, x.IdGroup, x.Type
FROM #modified x
JOIN
(SELECT
IdGroup, MIN (Type) AS min_Type
FROM #modified a
WHERE Type = 'OK' OR Type = 'EMPTY'
GROUP BY IdGroup) y ON y.IdGroup = x.IdGroup AND y.min_Type = x.Type
GROUP BY x.IdGroup, x.Type
DROP TABLE #modified
The rest of the EMPTY orders should after this step be deleted, but I don't know how to proceed from here. Maybe this is a poor approach from the beginning and maybe it could be done even easier?
Well done for writing a question that shows some effort and clearly explains what you're after. That's a rare thing unfortunately!
This is how I would do it:
First backup the table (I like to put them into a different schema to keep things neat)
CREATE SCHEMA bak;
SELECT * INTO bak.Orders FROM dbo.Orders;
Now you can do a trial run on the bak table if you like.
Anyway...
Set all the NULL types to OK
UPDATE Orders SET Type = 'OK' WHERE Type IS NULL;
Now repeatedly delete redundant records. Find records with more than one OK and delete them:
DELETE Orders WHERE ID In
(
SELECT MIN(Id) Id
FROM Orders
WHERE Type = 'OK';
GROUP BY idGroup
HAVING COUNT(*) > 1
);
You'll need to run that one a few times until it affects zero records
Assuming there are no multiple OKs and each group has at least one Ok or NULL value, you can do:
select t.id, t.idGroup, t.Type
from lefttable t
where t.Type is not null and t.Type <> 'OK'
union all
select t.id, t.idGroup, 'OK'
from (select t.*, row_number() over (partition by idGroup order by coalesce(t.Type, 'ZZZ')) as seqnum
from lefttable t
where t.Type is null or t.Type = 'OK'
) t
where seqnum = 1;
Actually, this will work even if you do have multiple OKs, but it will keep only of of the rows.
The first subquery selects all rows that are not OK or NULL. The second chooses exactly one of those group and assign the type as OK.
If you want to keep any OK ones in preference to a NULL, this will work. It creates a temp table with everything we need to work on (OK and NULL), and numbers them starting from one with each group, ordered so you list OK records before null ones. Then it makes sure all the first records are OK, and deletes all the rest
Create table #work (Id int, RowNo int)
--Get a list of all the rows we need to work on, and number them for each group
--(order by type desc puts OK before nulls)
Insert into #work (Id, RowNo)
Select Id, ROW_NUMBER() over (partition by IdGroup order by type desc) as RowNo
From Orders O
where (type is null OR type = 'OK');
-- Make sure the one we keep is OK, not null
Update O set type = 'OK'
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo = 1 and O.type IS NULL;
--Delete the remaining ones (any rowno > 1)
Delete O
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo > 1;
drop table #work;
Can't you just delete the rows where Type equals null?
DELETE FROM Orders WHERE Type IS NULL

Join on id or null and get first result

I have created the query below:
select * from store str
left join(
select * from schedule sdl
where day = 3
order by
case when sdl.store_id is null then (
case when sdl.strong is true then 0 else 2 end
) else 1 end, sdl.schedule_id desc
) ovr on (ovr.store_id = str.store_id OR ovr.store_id IS NULL)
Sample data:
STORE
[store_id] [title]
20010 Shoes-Shop
20330 Candy-Shop
[SCHEDULE]
[schedule_id] [store_id] [day] [strong] [some_other_data]
1 20330 3 f 10% Discount
2 NULL 3 t 0% Discount
What I want to get from the LEFT JOIN is either data for NULL store_id (global schedule entry - affects all store entries) OR the actual data for the given store_id.
Joining the query like this, returns results with the correct order, but for both NULL and store_id matches. It makes sense using the OR statement on join clause.
Expected results:
[store_id] [title] [some_other_data]
20010 Shoes-Shop 0% Discount
20330 Candy-Shop 0% Discount
Current Results:
[store_id] [title] [some_other_data]
20010 Shoes-Shop 0% Discount
20330 Candy-Shop 0% Discount
20330 Candy-Shop 10% Discount
If there is a more elegant approach on the subject I would be glad to follow it.
DISTINCT ON should work just fine, as soon as you get ORDER BY right. Basically, matches with strong = TRUE in schedule have priority, then matches with store_id IS NOT NULL:
SELECT DISTINCT ON (st.store_id)
st.store_id, st.title, sl.some_other_data
FROM store st
LEFT JOIN schedule sl ON sl.day = 3
AND (sl.store_id = st.store_id OR sl.store_id IS NULL)
ORDER BY NOT strong, store_id IS NULL;
This works because:
Sorting null values after all others, except special
Basics for DISTINCT ON:
Select first row in each GROUP BY group?
Alternative with a LATERAL join (Postgres 9.3+):
SELECT *
FROM store st
LEFT JOIN LATERAL (
SELECT some_other_data
FROM schedule
WHERE day = 3
AND (store_id = st.store_id OR store_id IS NULL)
ORDER BY NOT strong
, store_id IS NULL
LIMIT 1
) sl ON true;
About LATERAL joins:
What is the difference between LATERAL and a subquery in PostgreSQL?
I think the easiest way to do what you want is to use distinct on. The question is then how you order it:
select distinct on (str.store_id) *
from store str left join
schedule sdl
on (sdl.store_id = str.store_id or sdl.store_id is null) and dl.day = 3
order by str.store_id,
(case when sdl.store_id is null then 2 else 1 end)
This will return the store record if available, otherwise the schedule record that has a value of NULL. Note: your query has this notion of strength, but the question doesn't explain how to use it. This can be readily modified to include multiple levels of priorities.

Optimizing a troublesome query

I'm generating a PDF, via php from 2 mysql tables, that contains a table. On larger tables the script is eating up a lot of memory and is starting to become a problem.
My first table contains "inspections." There are many rows per day. This has a many to one relationship with the user table.
Table "inspections"
id
area
inpsection_date
inpsection_agent_1
inpsection_agent_2
inpsection_agent_3
id (int)
area (varchar) - is one of 8 "areas" ie: Concrete, Soils, Earthwork
inspection_date (int) - unix timestamp
inspection_agent_1 (int) - a user id
inspection_agent_2 (int) - a user id
inspection_agent_3 (int) - a user id
Second table is the user's info. All I need is to join the name to the "inspection_agents_x"
id
name
The final table, that is going to be in the PDF, needs to organize the data by:
by day
by user, find every "area" that the user "inspected" on that day
Concrete
Soils
Earthwork
1/18/2011
Jon Doe
X
Jane Doe
X
X
And so on for each day. Right now I'm just doing a simple join on the names and then organizing everything on the code end. I know I'm leaving a lot on the table as far as the queries go, I just can't think of way to do it.
Thanks for any and all help.
Select U.name
, user_inspections.inspection_date
, Min( Case When user_inspections.area = 'Concrete' Then 'X' End ) As Concrete
, Min( Case When user_inspections.area = 'Soils' Then 'X' End ) As Soils
, Min( Case When user_inspections.area = 'Earthwork' Then 'X' End ) As Earthwork
From users As U
Join (
Select area, inspection_date, inspection_agent1 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent2 As user_id
From inspections
Union All
Select area, inspection_date, inspection_agent3 As user_id
From inspections
) As user_inspections
On user_inspections.user_id = U.id
Group By U.name, user_inspections.inspection_date
This is effectively a static crosstab. It means that you will need to know all areas that should be outputted in the query at design time.
One of the reasons this query is problematic is that your schema is not normalized. Your inspection table should look like:
Create Table inspections
(
id int...
, area varchar...
, inspection_date date ...
, inspection_agent int References Users ( Id )
)
That would avoid the inner Union All query to get the output you want.
I would go like this:
select i.*, u1.name, u2.name, u3.name from inspections i left join users u1 on (i.inspection_agent_id1 = u1.id) left join users u2 on (i.inspection_agent_id2 = u2.id) left join users u3 on (i.inspection_agent_id3 = u3.id) order by i.inspection_date asc;
Then select distinct areas names and remember them or fetch them from area table if you have any:
select distinct area from inspections;
Then it's just foreach:
$day = "";
foreach($inspection in $inspections)
{
if($day == "" || $inspection["inspection_date"] != $day)
{
//start new row with date here
}
//start standard row with user name
}
It isn't clear if you have to display all users each time ( even if some of them do not do inspection that thay), if you have to you should fetch users once and loop over $users and search for user in $inspection row.