This question already has an answer here:
How to select records with duplicate just one field and all other field value?
(1 answer)
Closed 5 years ago.
I am not sure how it can be done. If have any solution please help me....
i have 2 table course and schedule now i want to show all data from course and if schedule table have any data then show it..
i have table name = schedule
--------------------------------------------------------
| course_id | room | day | time |
--------------------------------------------
| 2 | 401 |saturday| 9:00 - 13:00 |
--------------------------------------------
| 3 | 401 | sunday | 9:00 - 13:00 |
--------------------------------------------
| 2 | 402 | monday | 9:00 - 13:00 |
--------------------------------------------
| 3 | 403 | tuesday| 14:00 - 17:00|
--------------------------------------------
| 4 | 401 | tuesday| 9:00 - 13:00 |
--------------------------------------------
| 2 | 402 |wednesday|14:00 - 17:00|
--------------------------------------------
and another table name = course
----------------------------------
| id | course_code | course_name |
----------------------------------
| 2 | cse1 | cse |
---------------------------------
| 3 | eee | eee |
---------------------------------
| 4 | ct1 | ct |
---------------------------------
| 5 | ct2 | ct |
----------------------------------
| 6 | cse2 | ct |
----------------------------------
Now how to get output like this..
----------------------------------
| course code | course name | info |
|----------------------------------
|cse1|cse|401,satueday-9:00-13:00; 402,monday-9:00-13:00; 402,wednesday-14:00-17:00|
|------------------------
|cse2|cse|not assigned|
|----|---|-----------------
|eee |eee|401,sunday-9:00-13:00;403,tuesday-14:00-17:00 |
|----|---|-----------------
|ct1 |ct |401,tuesday-9:00-13:00|
|----|---|----------------
|ct2 |ct | not assigned|
|----|---|----------------
First you need to join the two tables together, then aggregate your "schedules" to a single string value grouped on the courses.
Disclaimer : code is not tested as I don't have a MySQL db anywhere near!
I am used to Oracle so I did not know which function worked in MySQL to aggregate strings so after a quick search, I found https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat-ws
SELECT c.course_code,
c.course_name,
CONCAT_WS(',',CONCAT(s.room,',',s.day,'-',s.time)) as info
FROM COURSE c
LEFT JOIN SCHEDULE s ON c.id = s.course_id
GROUP BY c.course_code,c.course_name
It should yield a result close to what you are expecting, except for the "not assigned" which will do something I can't predict depending on how WS_CONCAT handles nulls.
Explanation:
CONCAT_WS is an aggregator function, much like MAX or SUM. Only, it takes a separator as firts argument, and the columns you want to aggregate after.
If you want to display extra columns (such as course code and name) when querying aggregates, you need to specify them in your GROUP BY clause.
Also, try changing LEFT JOIN for INNER JOIN and notice the difference : courses without schedules won't be returned.
The group_concat function is your friend here (combined with coalesce).
select c.course_code,
c.course_name,
coalesce(group_concat(s.room, ', ', s.day, '-', s.time separator ';'), 'not assigned') info
from course c
left outer join schedule s on c.id = s.course_id
group by c.course_code;
One way you can do this is with the XML method:
SELECT room + ' ' + day + ' ' + time + ', ' AS 'data()'
FROM schedule
FOR XML PATH('')
Related
I want to join tables in MS Access in such a way that it fetches only the latest record from one of the tables. I've looked at the other solutions available on the site, but discovered that they only work for other versions of SQL. Here is a simplified version of my data:
PatientInfo Table:
+-----+------+
| ID | Name |
+-----+------+
| 1 | John |
| 2 | Tom |
| 3 | Anna |
+-----+------+
Appointments Table
+----+-----------+
| ID | Date |
+----+-----------+
| 1 | 5/5/2001 |
| 1 | 10/5/2012 |
| 1 | 4/20/2018 |
| 2 | 4/5/1999 |
| 2 | 8/8/2010 |
| 2 | 4/9/1982 |
| 3 | 7/3/1997 |
| 3 | 6/4/2015 |
| 3 | 3/4/2017 |
+----+-----------+
And here is a simplified version of the results that I need after the join:
+----+------+------------+
| ID | Name | Date |
+----+------+------------+
| 1 | John | 4/20/2018 |
| 2 | Tom | 8/8/2010 |
| 3 | Anna | 3/4/2017 |
+----+------+------------+
Thanks in advance for reading and for your help.
You can use aggregation and JOIN:
select pi.id, pi.name, max(a.date)
from appointments as a inner join
patientinfo as pi
on a.id = pi.id
group by pi.id, pi.name;
something like this:
select P.ID, P.name, max(A.Date) as Dt
from PatientInfo P inner join Appointments A
on P.ID=A.ID
group by P.ID, P.name
Both Bing and Gordon's answers work if your summary table only needs one field (the Max(Date)) but gets more tricky if you also want to report other fields from the joined table, since you would need to include them either as an aggregated field or group by them as well.
Eg if you want your summary to also include the assessment they were given at their last appointment, GROUP BY is not the way to go.
A more versatile structure may be something like
SELECT Patient.ID, Patient.Name, Appointment.Date, Appointment.Assessment
FROM Patient INNER JOIN Appointment ON Patient.ID=Appointment.ID
WHERE Appointment.Date = (SELECT Max(Appointment.Date) FROM Appointment WHERE Appointment.ID = Patient.ID)
;
As an aside, you may want to think whether you should use a field named 'ID' to refer to the ID of another table (in this case, the Apppintment.ID field refers to the Patient.ID). You may make your db more readable if you leave the 'ID' field as an identifier specific to that table and refer to that field in other tables as OtherTableID or similar, ie PatientID in this case. Or go all the way and include the name of the actual table in its own ID field.
Edited after comment:
Not quite sure why it would crash. I just ran an equivalent query on 2 tables I have which are about 10,000 records each and it was pretty instanteneous. Are your ID fields (i) unique numbers and (ii) indexed?
Another structure which should do the same thing (adapted for your field names and assuming that there is an ID field in Appointments which is unique) would be something like:
SELECT PatientInfo.UID, PatientInfo.Name, Appointments.StartDateTime, Appointments.Assessment
FROM PatientInfo INNER JOIN Appointments ON PatientInfo_UID = Appointments.PatientFID
WHERE Appointments.ID = (SELECT TOP 1 ID FROM Appointments WHERE Appointments.PatientFID = PatientInfo_UID ORDER BY StartDateTime DESC)
;
But that is starting to look a bit contrived. On my data they both produce the same result (as they should!) and are both almost instantaneous.
Always difficult to troubleshoot Access when it crashes - I guess you see no error codes or similar? Is this against a native .accdb database or another server?
I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.
I have a situation which is a little hard to describe. I'll try to explain with an example and the result which I want.
I have three tables like so
Employee
| id | Name |
|----+-------|
| 1 | Alice |
| 2 | Bob |
| 3 | Jane |
| 4 | Jack |
Task
| id | employee_id | description |
|----+-------------+---------------------|
| 1 | 1 | Fix bug |
| 2 | 1 | Implement feature |
| 3 | 1 | Deploy master |
| 4 | 2 | Integrate feature |
| 5 | 2 | Fix cosmetic issues |
Status
| id | task_id | time | details | Terminal |
|----+---------+-------+-----------+----------|
| 1 | 1 | 12:00 | Assigned | false |
| 2 | 1 | 12:30 | Started | false |
| 3 | 1 | 13:00 | Completed | true |
| 4 | 2 | 12:10 | Assigned | false |
| 5 | 2 | 14:00 | Started | false |
| 6 | 3 | 12:15 | Assigned | false |
| 7 | 4 | 12:20 | Assigned | false |
| 8 | 5 | 12:25 | Assigned | false |
| 9 | 4 | 12:30 | Started | false |
(I have also put these into a sqlfiddle page at http://sqlfiddle.com/#!9/728c85/1)
The basic idea is that I have some employees and tasks. The tasks can be assigned to employees and as they work on them, they keep adding "status" rows.
Each status entry has a field "terminal" which can either be true or false.
If the last status entry for a task has terminal set to true, then that task is over and there's nothing more to be done on it.
If all tasks assigned to an employee are over, then the employee is considered free.
I need to get a list of free employees. This would basically mean, given an employee, a list of all his or her tasks with statuses. So, something like this for Alice
| Task | Completed |
|-------------------+-----------|
| Fix bug | true |
| Implement feature | false |
| Deploy master | false |
From which I know that she's not free right now since there are 'false' entries in completed.
How would I do this? If my tables are not constructed properly for this kind of query, I'd very much like some advice on that too.
(I titled the question like this since I want to order the statuses of each task per user and them limit them to the last row).
Update
It was suggested to me that the status field should really go inside the tasks table and the Status table should simple be a log table.
I would go for the idea to have the status in the tasks table. (Please see my comment on your request on this.) However, here are two queries to select free employees:
If tasks cannot be re-opened, it is simple: Get all incompleted tasks by checking whether a record with terminal = true exists. Free employees are all that have no incomplete task.
select *
from employee
where id not in
(
select employee_id
from task
where id not in (select task_id from status where terminal = true)
);
If tasks can be re-opened, however, then you do the same but must find the last status. This can be done with Postgre's DISTINCT ON for instance.
select *
from employee
where id not in
(
select employee_id
from task
where not
(
select distinct on (task_id) terminal
from status
where task_id = task.id
order by task_id, id desc
)
);
(I am using the ID to find the last entry per task, as the time without a date seems inappropriate. You could only use the time column instead, if a task will always run within one day only.)
SQL fiddles:
http://sqlfiddle.com/#!15/f0ea8/2
http://sqlfiddle.com/#!15/f0ea8/1
You have to group all the statuses togheter and you can then use MAX() to find if one of them is true, like this:
SELECT t.description, MAX(s.terminal)
FROM Employee e
INNER JOIN task t ON t.employee_id = e.id
INNER JOIN status s ON s.task_id = t.id
GROUP BY t.id;
When you want this just for one user add something like this WHERE e.id = 1.
Hope this helps
select T.employee_id, T.description, S.Terminal
from Employee E
INNER JOIN Task T ON E.id=T.employee_id
INNER JOIN (Select task_id, max(id) as status_id FROM Status GROUP BY task_id) as ST on T.id=ST.task_id
INNER JOIN Status S on S.id=ST.status_id
I hope this will help you...??
select E.Name,T.id as[Task Id],T.description,S.Terminal from Employee E
inner join Task T on E.id=T.employee_id
inner join Status S on S.task_id=T.id
where e.id not in (select employee_id from Task where id in (select task_id from Status where Terminal='true' and details='Completed') )
I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!
Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA
Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.
I have a table test column with int arrays and values like {1000,4000,6000} or {1000} or {1000,4000} called ekw.
These values match to a description string in another table
tab: test
id | name | ekw
-----------------
1 | One | {1000}
2 | Two | {1000,4000}
3 | Three | {1000,4000,6000}
tab: ekwdesc
id | value | desc
-----------------
1 | 1000 | Max
2 | 2000 | Tim
3 | 3000 | Rita
5 | 4000 | Sven
6 | 5000 | Tom
7 | 6000 | Bob
is it possible to select these columns and print the strings?
something like:
select name, ekw from test, ekwdesc
I would like to see this result:
id | name | ekwdesc
-----------------
1 | One | Max
2 | Two | Max, Sven
3 | Three | Max, Sven, Bob
I tried with IN and ANY but couldn't get it to work.
You had the right idea to use the any operator for the join. Once the join is complete, all that's left is to use string_agg to transform the result to the format you want:
SELECT name, STRING_AGG(description, ', ')
FROM test
JOIN ekwdesc ON ekwdesc.value = ANY(test.ekw)
GROUP BY name
See the attached SQLFiddle for an executable example.