Duplicate rows when joining three tables - sql

I'm using SQL Server 2014, and I've got a problem with a query. I've got three tables. A Report consists of ten each of ClothingObservation and HygieneObservation. The way I do this is by referencing the ReportId of Report in ten rows each of the two types of observations, for 20 observations per report in total. I want to select all the rows of one report. When I try to do this, I get 100 rows. My goal is to get 10 rows, or 20 rows with NULL values. This is for testing purposes at the moment, so Report contains just 1 row, and ClothingObservation and HygieneObservation contains 10 rows each, all referencing the ReportId of the one existing report.
My tables, details omitted for clarity:
CREATE TABLE HygieneObservation
(
HygieneObservationId int PRIMARY KEY IDENTITY NOT NULL,
...
ReportId int NOT NULL
)
CREATE TABLE ClothingObservation
(
ClothingObservationId int PRIMARY KEY IDENTITY NOT NULL,
...
ReportId int NOT NULL
)
CREATE TABLE Report
(
ReportId int PRIMARY KEY IDENTITY NOT NULL,
Period Date NOT NULL,
Reporter nvarchar(8) NOT NULL,
DepartmentId int NOT NULL
)
My query:
SELECT
Report.ReportId,
Report.Period,
Report.Reporter,
Report.DepartmentId,
ClothingObservation.ClothingObservationId,
HygieneObservation.HygieneObservationId
FROM Report
LEFT JOIN ClothingObservation ON
(ClothingObservation.ReportId = Report.ReportId)
LEFT JOIN HygieneObservation ON
(HygieneObservation.ReportId = Report.ReportId)
GROUP BY
Report.ReportId,
Period,
Reporter,
DepartmentId,
ClothingObservation.ClothingObservationId,
HygieneObservation.HygieneObservationId
This gives me 100 rows, which I understand is because each row in ClothingObservation is matched to each row in HygieneObservation. I thought that using GROUP BY would cause duplicates to be removed, but I'm obviously doing something wrong. Any hints?
Edit: Here's my data right now (details omitted).
Report:
ReportId Period Reporter DepartmentId
----------- ---------- -------- ------------
1 2016-05-01 username 1
ClothingObservation:
ClothingObservationId ... ReportId
--------------------- ... -----------
1 ... 1
2 ... 1
3 ... 1
4 ... 1
5 ... 1
6 ... 1
7 ... 1
8 ... 1
9 ... 1
10 ... 1
HygieneObservation:
HygieneObservationId ... ReportId
-------------------- ... -----------
3 ... 1
4 ... 1
5 ... 1
6 ... 1
7 ... 1
8 ... 1
9 ... 1
10 ... 1
12 ... 1
13 ... 1
Edit 2: If I run these two queries, I get my desired output (again, irrelevant details omitted from result):
SELECT * FROM Report
LEFT JOIN ClothingObservation ON
(ClothingObservation.ReportId = Report.ReportId)
SELECT * FROM Report
LEFT JOIN HygieneObservation ON
(HygieneObservation.ReportId = Report.ReportId)
ReportId Period Reporter DepartmentId ClothingObservationId ... ReportId
----------- ---------- -------- ------------ --------------------- ...- -----------
1 2016-05-01 username 1 1 ... 1
1 2016-05-01 username 1 2 ... 1
1 2016-05-01 username 1 3 ... 1
1 2016-05-01 username 1 4 ... 1
1 2016-05-01 username 1 5 ... 1
1 2016-05-01 username 1 6 ... 1
1 2016-05-01 username 1 7 ... 1
1 2016-05-01 username 1 8 ... 1
1 2016-05-01 username 1 9 ... 1
1 2016-05-01 username 1 10 ... 1
ReportId Period Reporter DepartmentId HygieneObservationId ... ReportId
----------- ---------- -------- ------------ -------------------- ... -----------
1 2016-05-01 username 1 3 ... 1
1 2016-05-01 username 1 4 ... 1
1 2016-05-01 username 1 5 ... 1
1 2016-05-01 username 1 6 ... 1
1 2016-05-01 username 1 7 ... 1
1 2016-05-01 username 1 8 ... 1
1 2016-05-01 username 1 9 ... 1
1 2016-05-01 username 1 10 ... 1
1 2016-05-01 username 1 12 ... 1
1 2016-05-01 username 1 13 ... 1
My goal is to get this output (or something like it) with one query.

What is happening is that joining Report (1 row) to ClothingObservation (10 rows) produces 10 row (1 x 10), you then join to HygieneObservation (10 rows) which gives you 100. The reason this is happening is because after the initial join you have 10 rows with the same ReportID so the next join takes each of these 10 rows and joins to the 10 rows in HygieneObservation.
The solution for "20 rows with NULL values":
SELECT
Report.ReportId,
Report.Period,
Report.Reporter,
Report.DepartmentId,
ClothingObservation.ClothingObservationId,
NULL AS HygieneObservationId
FROM Report
LEFT JOIN ClothingObservation ON
(ClothingObservation.ReportId = Report.ReportId)
UNION ALL
SELECT
Report.ReportId,
Report.Period,
Report.Reporter,
Report.DepartmentId,
NULL AS ClothingObservationId,
HygieneObservation.HygieneObservationId
FROM Report
LEFT JOIN HygieneObservation ON
(HygieneObservation.ReportId = Report.ReportId)
How it works:
You essentially write two separate queries: one that join Report and ClothingObservation and another that joins Report to HygieneObservation. You then combine the two queries with UNION ALL.
The solution for "get 10 rows"
This is complex as it involves what I call "vertical merging" or "Merge Join". Below is the query (Update: I have tested it).
SELECT
Report.ReportId,
Report.Period,
Report.Reporter,
Report.DepartmentId,
MergedObservations.ClothingObservationId,
MergedObservations.HygieneObservationId
FROM Report
LEFT JOIN
( SELECT COALESCE( ClothingObservation.ReportID, HygieneObservation.ReportID ) AS ReportID,
HygieneObservationID, ClothingObservationID -- Add appropriate columns
FROM
( SELECT ROW_NUMBER() OVER( PARTITION BY ReportID ORDER BY ClothingObservationID ) AS ResultID, ReportID, ClothingObservationID
FROM ClothingObservation ) AS ClothingObservation
FULL OUTER JOIN
( SELECT ROW_NUMBER() OVER( PARTITION BY ReportID ORDER BY HygieneObservationID ) AS ResultID, ReportID, HygieneObservationID
FROM HygieneObservation ) AS HygieneObservation
ON ClothingObservation.ReportID = HygieneObservation.ReportID
AND ClothingObservation.ResultID = HygieneObservation.ResultID
) AS MergedObservations
ON Report.ReportID = MergedObservations.ReportID
How it works:
Because ClothingObservation and HygieneObservationId are not directly related to each other and have differing number of rows per ReportID, I use a ROW_NUMBER() function to generate a join key. I then do a "Merge Join" using ReportID and the output of the ROW_NUMBER() function.
Sample Data
I have converted your sample data into a usable table data to test above queries.
CREATE TABLE Report( ReportId INT, Period DATETIME, Reporter VARCHAR( 20 ), DepartmentId INT )
CREATE TABLE ClothingObservation( ClothingObservationID INT, ReportId INT )
CREATE TABLE HygieneObservation( HygieneObservationID INT, ReportId INT )
INSERT INTO Report
VALUES( 1, '2016-05-01', 'username', 1 )
INSERT INTO ClothingObservation
VALUES
( 1, 1 ), ( 2, 1 ), ( 3, 1 ), ( 4, 1 ), ( 5, 1 ), ( 6, 1 ), ( 7, 1 ), ( 8, 1 ), ( 9, 1 ), ( 10, 1 )
INSERT INTO HygieneObservation
VALUES
( 3, 1 ), ( 4, 1 ), ( 5, 1 ), ( 6, 1 ), ( 7, 1 ), ( 8, 1 ), ( 9, 1 ), ( 10, 1 ), ( 11, 1 ), ( 12, 1 ), ( 13, 1 )

You can also try to use the query below:
SELECT
ReportId = ISNULL(v1.ReportId, v2.ReportId),
Period = ISNULL(v1.Period, v2.Period),
Reporter = ISNULL(v1.Reporter, v2.Reporter),
DepartmentId = ISNULL(v1.DepartmentId, v2.DepartmentId),
v1.ClothingObservationId,
v2.HygieneObservationId
FROM
(
SELECT
RowNumber = ROW_NUMBER() OVER(Partition BY r.ReportId ORDER BY c.ClothingObservationId),
r.ReportId,
r.Period,
r.Reporter,
r.DepartmentId,
c.ClothingObservationId
FROM
Report r
LEFT JOIN ClothingObservation c ON c.ReportId = r.ReportId) v1
FULL JOIN
(
SELECT
RowNumber = ROW_NUMBER() OVER(Partition BY r.ReportId ORDER BY h.HygieneObservationId),
r.ReportId,
r.Period,
r.Reporter,
r.DepartmentId,
h.HygieneObservationId
FROM Report r
LEFT JOIN HygieneObservation h ON h.ReportId = r.ReportId) v2 ON v1.RowNumber = v2.RowNumber AND v1.ReportId = v2.ReportId
ORDER BY ReportId

Related

SQL check if group containes certain values of given column (ORACLE)

I have table audit_log with these records:
log_id | request_id | status_id
1 | 2 | 5
2 | 2 | 10
3 | 2 | 20
4 | 3 | 10
5 | 3 | 20
I would like to know if there exists request_ids having status_id 5 and 10 at the same time. So this query should return request_id = 2 as its column status_id has values 5 and 10 (request_id 3 is omitted because status_id column has only value of 10 without 5).
How could I do this with SQL?
I think I should use group by request_id, but I don't know how to check if group has status_id with values 5 and 10?
Thanks,
mismas
This could be a way:
/* input data */
with yourTable(log_id , request_id , status_id) as (
select 1 , 2 , 5 from dual union all
select 2 , 2 , 10 from dual union all
select 3 , 2 , 20 from dual union all
select 4 , 3 , 10 from dual union all
select 5 , 3 , 20 from dual
)
/* query */
select request_id
from yourTable
group by request_id
having count( distinct case when status_id in (5,10) then status_id end) = 2
How it works:
select request_id,
case when status_id in (5,10) then status_id end as checkColumn
from yourTable
gives
REQUEST_ID CHECKCOLUMN
---------- -----------
2 5
2 10
2
3 10
3
So the condition count (distinct ...) = 2 does the work
SELECT request_id
FROM table_name
GROUP BY request_id
HAVING COUNT( CASE status_id WHEN 5 THEN 1 END ) > 0
AND COUNT( CASE status_id WHEN 10 THEN 1 END ) > 0
To check if both values exists (without regard to additional values) you can filter before aggregation:
select request_id
from yourTable
where status_id in (5,10)
group by request_id
having count(*) = 2 -- status_id is unique
-- or
having count(distinct status_id) = 2 -- status_id exists multiple times
This should do it:
select
log5.*, log10.status_id
from
audit_log log5
join audit_log log10 on log10.request_id = log5.request_id
where
log5.status_id = 5
and log10.status_id = 10
order by
log5.request_id
;
Here's the output:
+ ----------- + --------------- + -------------- + -------------- +
| log_id | request_id | status_id | status_id |
+ ----------- + --------------- + -------------- + -------------- +
| 1 | 2 | 5 | 10 |
+ ----------- + --------------- + -------------- + -------------- +
1 rows
And here's the sql to set up the example:
create table audit_log (
log_id int,
request_id int,
status_id int
);
insert into audit_log values (1,2,5);
insert into audit_log values (2,2,10);
insert into audit_log values (3,2,20);
insert into audit_log values (4,3,10);
insert into audit_log values (5,3,20);

sql select parent child recursive in one field

I do not know how to select query recursive..
id idparent jobNO
--------------------------------
1 0 1
2 1 2
3 1 3
4 0 4
5 4 5
6 4 6
how do the results like this With SqlServer
id idparent jobNO ListJob
----------------------------------------
1 0 1 1
2 1 2 1/2
3 1 3 1/3
4 0 4 4
5 4 5 4/5
6 5 6 4/5/6
You need to use a Recursive Common Table Expression.
There are many useful articles online.
Useful Links
Simple Talk: SQL Server CTE Basics
blog.sqlauthority: Recursive CTE
Here is a solution to your question:
CREATE TABLE #TEST
(
id int not null,
idparent int not null,
jobno int not null
);
INSERT INTO #Test VALUES
(1,0,1),
(2,1,2),
(3,1,3),
(4,0,4),
(5,4,5),
(6,5,6);
WITH CTE AS (
-- This is end of the recursion: Select items with no parent
SELECT id, idparent, jobno, CONVERT(VARCHAR(MAX),jobno) AS ListJob
FROM #Test
WHERE idParent = 0
UNION ALL
-- This is the recursive part: It joins to CTE
SELECT t.id, t.idparent, t.jobno, c.ListJob + '/' + CONVERT(VARCHAR(MAX),t.jobno) AS ListJob
FROM #Test t
INNER JOIN CTE c ON t.idParent = c.id
)
SELECT * FROM CTE
ORDER BY id;

In Oracle SQL, how do I UPDATE columns specified by a priority list?

So, I've already had this question answered already, but now I need the Oracle SQL solution. (See: An update with multiple conditions. SQL 2008)
But to run through it again..
Below is the current table "table1".
ProjectID UserID RoleID
101 1 10
101 2 10
102 2 10
102 3 10
103 1 10
Currently there is only one type of Role, role '10', but I'm wanting to add a new role, role '11', which will act as a lead. So any project that has a user with the role of '10', should have a lead. The user chosen to be lead will be based on a priorty list, in this example we'll say the order is 1, 2, 3.
Expected result...
ProjectID UserID RoleID
101 1 11
101 2 10
102 2 11
102 3 10
103 1 11
I was unable to get the WITH clause, from the previous solution, to work properly, as from what I have learned, Oracle does not take a FROM in a WITH clause.
Here is the working query that I essentially need to use in an UPDATE, and update roleid to 11 where PriorityForLead is = 1.
select t.*, row_number() over (partition by projectid
order by (case when userid = 1 then 1
when userid = 2 then 2
when userid = 3 then 3
else 4
end )
) as PriorityForLead
from table1 t
update table1 t1
set roleid = 11
where roleid = 10 and
(case when userid = 1 then 1 when userid = 2 then 2 when userid = 3 then 3 else 4 end) =
(select min(case when userid = 1 then 1 when userid = 2 then 2 when userid = 3 then 3 else 4 end)
from table1
where projectid = t1.projectid);
EDIT:
SQL> create table table1 (projectid number, userid number, roleid number);
Table created.
SQL> insert into table1 values (101, 1, 10);
1 row created.
SQL> insert into table1 values (101, 2, 10);
1 row created.
SQL> insert into table1 values (102, 2, 10);
1 row created.
SQL> insert into table1 values (102, 3, 10);
1 row created.
SQL> insert into table1 values (103, 1, 10);
1 row created.
SQL> select * from table1;
PROJECTID USERID ROLEID
---------- ---------- ----------
101 1 10
101 2 10
102 2 10
102 3 10
103 1 10
SQL> update table1 t1
2 set roleid = 11
3 where roleid = 10 and
4 (case when userid = 1 then 1 when userid = 2 then 2 when userid = 3 then 3 else 4 end) =
5 (select min(case when userid = 1 then 1 when userid = 2 then 2 when userid = 3
then 3 else 4 end)
5 from table1
6 where projectid = t1.projectid);
3 rows updated.
SQL> select * from table1;
PROJECTID USERID ROLEID
---------- ---------- ----------
101 1 11
101 2 10
102 2 11
102 3 10
103 1 11
Assuming that the 3 columns provided are the compound primary key for this table, the query below should be a correct conversion to Oracle.
update t
set RoleId = 11
WHERE EXISTS (SELECT 1 FROM
(select t.*,
row_number() over (partition by projectid
order by (case when userid = 1 then 1
when userid = 2 then 2
when userid = 3 then 3
else 4
end
)
) as PriorityForLead
from table t) toupdate
WHERE toupdate.PriorityForLead = 1
AND t.ProjectID = toupdate.ProjectID
AND t.UserID = toupdate.UserID
AND t.RoleID = toupdate.RoleId);

sql query to return number of rows dynamically

I have a table of form
CREATE TABLE [dbo].[table1](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[clientid] [int] NULL,
[startdate] [int] NULL,
[copyid] [int] NULL
)
data in the table is of form:
id clientid startdate copyid
1 4 11 1
2 4 12 1
3 4 44 2
3 5 123 1
4 5 15 1
5 5 12 2
6 5 12 2
7 5 12 2
the copyid is subset of clientid
My question is that can i form a select query which returns a table with N number of rows
and is a copy of clientid and copyid combination with copyid incremented.
For e.g. it should if clientid is taken as 4 and copyid as 1 and N as 6 it should return 6 rows like
clientid startdate copyid
4 11 3
4 12 3
4 11 4
4 12 4
4 11 5
4 12 5
N will always be a multiple of client and copy combination
I know how to do this using loops. But is it possible using a single select query?
This can be accomplished using a simple cursor.
Using the sample data you gave in the question I created the following solution:
DECLARE #ClientID INT = 4
DECLARE #CopyID INT = 1
DECLARE #N INT = 6
;WITH DATA
AS (SELECT *,
Row_number ()
OVER (
ORDER BY ID) RN,
Count(*)
OVER (
PARTITION BY CLIENTID) CID
FROM (SELECT *,
Max(COPYID)
OVER (
PARTITION BY CLIENTID) MaxID,
0 AS root
FROM TABLE1)T
WHERE CLIENTID = #clientid
AND COPYID = #Copyid),
CTE
AS (SELECT *
FROM DATA
UNION ALL
SELECT t2.[ID],
t2.[CLIENTID],
t2.[STARTDATE],
t2.[COPYID],
t2.MAXID,
t2.ROOT + 1,
t2.RN + T2.CID RN,
T2.CID
FROM DATA t1
INNER JOIN CTE t2
ON t1.ID = t2.ID
WHERE t2.RN < #N - 1)
SELECT CLIENTID,
STARTDATE,
MAXID + ROOT + 1 COPYID
FROM CTE
WHERE RN <= #N
ORDER BY COPYID
A working example can be found on SQL Fiddle.

What is the SQL for 'next' and 'previous' in a table?

I have a table of items, each of which has a date associated with it. If I have the date associated with one item, how do I query the database with SQL to get the 'previous' and 'subsequent' items in the table?
It is not possible to simply add (or subtract) a value, as the dates do not have a regular gap between them.
One possible application would be 'previous/next' links in a photo album or blog web application, where the underlying data is in a SQL table.
I think there are two possible cases:
Firstly where each date is unique:
Sample data:
1,3,8,19,67,45
What query (or queries) would give 3 and 19 when supplied 8 as the parameter? (or the rows 3,8,19). Note that there are not always three rows to be returned - at the ends of the sequence one would be missing.
Secondly, if there is a separate unique key to order the elements by, what is the query to return the set 'surrounding' a date? The order expected is by date then key.
Sample data:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8
What query for '8' returns the set:
2:3,3:8,4:8,16:8,5:19
or what query generates the table:
key date prev-key next-key
1 1 null 2
2 3 1 3
3 8 2 4
4 8 3 16
5 19 16 10
10 19 5 11
11 67 10 15
15 45 11 null
16 8 4 5
The table order is not important - just the next-key and prev-key fields.
Both TheSoftwareJedi and Cade Roux have solutions that work for the data sets I posted last night. For the second question, both seem to fail for this dataset:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8
The order expected is by date then key, so one expected result might be:
2:3,3:8,4:8,16:8,5:19
and another:
key date prev-key next-key
1 1 null 2
2 3 1 3
3 8 2 4
4 8 3 16
5 19 16 10
10 19 5 11
11 67 10 15
15 45 11 null
16 8 4 5
The table order is not important - just the next-key and prev-key fields.
Select max(element) From Data Where Element < 8
Union
Select min(element) From Data Where Element > 8
But generally it is more usefull to think of sql for set oriented operations rather than iterative operation.
Self-joins.
For the table:
/*
CREATE TABLE [dbo].[stackoverflow_203302](
[val] [int] NOT NULL
) ON [PRIMARY]
*/
With parameter #val
SELECT cur.val, MAX(prv.val) AS prv_val, MIN(nxt.val) AS nxt_val
FROM stackoverflow_203302 AS cur
LEFT JOIN stackoverflow_203302 AS prv
ON cur.val > prv.val
LEFT JOIN stackoverflow_203302 AS nxt
ON cur.val < nxt.val
WHERE cur.val = #val
GROUP BY cur.val
You could make this a stored procedure with output parameters or just join this as a correlated subquery to the data you are pulling.
Without the parameter, for your data the result would be:
val prv_val nxt_val
----------- ----------- -----------
1 NULL 3
3 1 8
8 3 19
19 8 45
45 19 67
67 45 NULL
For the modified example, you use this as a correlated subquery:
/*
CREATE TABLE [dbo].[stackoverflow_203302](
[ky] [int] NOT NULL,
[val] [int] NOT NULL,
CONSTRAINT [PK_stackoverflow_203302] PRIMARY KEY CLUSTERED (
[ky] ASC
)
)
*/
SELECT cur.ky AS cur_ky
,cur.val AS cur_val
,prv.ky AS prv_ky
,prv.val AS prv_val
,nxt.ky AS nxt_ky
,nxt.val as nxt_val
FROM (
SELECT cur.ky, MAX(prv.ky) AS prv_ky, MIN(nxt.ky) AS nxt_ky
FROM stackoverflow_203302 AS cur
LEFT JOIN stackoverflow_203302 AS prv
ON cur.ky > prv.ky
LEFT JOIN stackoverflow_203302 AS nxt
ON cur.ky < nxt.ky
GROUP BY cur.ky
) AS ordering
INNER JOIN stackoverflow_203302 as cur
ON cur.ky = ordering.ky
LEFT JOIN stackoverflow_203302 as prv
ON prv.ky = ordering.prv_ky
LEFT JOIN stackoverflow_203302 as nxt
ON nxt.ky = ordering.nxt_ky
With the output as expected:
cur_ky cur_val prv_ky prv_val nxt_ky nxt_val
----------- ----------- ----------- ----------- ----------- -----------
1 1 NULL NULL 2 3
2 3 1 1 3 8
3 8 2 3 4 19
4 19 3 8 5 67
5 67 4 19 6 45
6 45 5 67 NULL NULL
In SQL Server, I prefer to make the subquery a Common table Expression. This makes the code seem more linear, less nested and easier to follow if there are a lot of nestings (also, less repetition is required on some re-joins).
Firstly, this should work (the ORDER BY is important):
select min(a)
from theTable
where a > 8
select max(a)
from theTable
where a < 8
For the second question that I begged you to ask...:
select *
from theTable
where date = 8
union all
select *
from theTable
where key = (select min(key)
from theTable
where key > (select max(key)
from theTable
where date = 8)
)
union all
select *
from theTable
where key = (select max(key)
from theTable
where key < (select min(key)
from theTable
where date = 8)
)
order by key
SELECT 'next' AS direction, MIN(date_field) AS date_key
FROM table_name
WHERE date_field > current_date
GROUP BY 1 -- necessity for group by varies from DBMS to DBMS in this context
UNION
SELECT 'prev' AS direction, MAX(date_field) AS date_key
FROM table_name
WHERE date_field < current_date
GROUP BY 1
ORDER BY 1 DESC;
Produces:
direction date_key
--------- --------
prev 3
next 19
My own attempt at the set solution, based on TheSoftwareJedi.
First question:
select date from test where date = 8
union all
select max(date) from test where date < 8
union all
select min(date) from test where date > 8
order by date;
Second question:
While debugging this, I used the data set:
(key:date) 1:1,2:3,3:8,4:8,5:19,10:19,11:67,15:45,16:8,17:3,18:1
to give this result:
select * from test2 where date = 8
union all
select * from (select * from test2
where date = (select max(date) from test2
where date < 8))
where key = (select max(key) from test2
where date = (select max(date) from test2
where date < 8))
union all
select * from (select * from test2
where date = (select min(date) from test2
where date > 8))
where key = (select min(key) from test2
where date = (select min(date) from test2
where date > 8))
order by date,key;
In both cases the final order by clause is strictly speaking optional.
If your RDBMS supports LAG and LEAD, this is straightforward (Oracle, PostgreSQL, SQL Server 2012)
These allow to choose the row either side of any given row in a single query
Try this...
SELECT TOP 3 * FROM YourTable
WHERE Col >= (SELECT MAX(Col) FROM YourTable b WHERE Col < #Parameter)
ORDER BY Col