join equal or max - sql

I want to join two tables by 'EmployeeId' and 'CalendarMonthId', but problem is that I want to connect record with the bigest CalednarMonthId if not exist equal. I use Azure Synapse SQL pool.
Example data and code
create table emp_contr (
EmployeeId int,
CalendarMonthId int,
IsDeleted int
--login varchar(20)
)
create table contr (
ContractId int,
EmployeeId int,
CalendarMonthId int,
value int
)
insert into emp_contr values (1, 202201, 0)
insert into emp_contr values (1, 202202, 0)
insert into emp_contr values (1, 202205, 0)
insert into emp_contr values (1, 202206, 0)
insert into emp_contr values (2, 202202, 0)
insert into emp_contr values (2, 202203, 0)
insert into contr values (1, 1, 202201, 5)
insert into contr values (2, 1, 202202, 2)
insert into contr values (40, 2, 202202, 2)
insert into contr values (50, 2, 202203, 0)
Base on this data I have problem with connect table: emp_contr, row: EmployeeId:1, CalendarMonthId:202205 with row from table contr from the same user and maximum CalendarMonthId:202202. I have query like this, but it doesn't work
select *
from emp_contr ec
join contr c
on ec.EmployeeId = c.EmployeeId
and (ec.CalendarMonthId = c.CalendarMonthId or ec.CalendarMonthId > max(c.CalendarMonthId))
order by ec.EmployeeId
I expect result like this:
EmployeeId
Emp_Contr_CalendarMonthId
Contr_CalendarMonthId
Value
ContractId
1
202206
202202
2
2
1
202205
202202
2
2
1
202202
202202
2
2
1
202201
202201
5
1
2
202203
202203
0
50
2
202202
202202
2
40
I don't know what I should do. Maybe should I add row in temporary table based on contr table with the same values like bigest one but different CalendarMonthId

TL;DR:
select ec.EmployeeId, ec.CalendarMonthId Emp_Contr_CalendarMonthId,
coalesce(c.CalendarMonthId,c2.CalendarMonthId) Contr_CalendarMonthId,
coalesce(c.value,c2.value) value,
coalesce(c.ContractId,c2.ContractId) ContractId,
-- additional columns for debugging
c.CalendarMonthId, c.value, c.ContractId,
c2.CalendarMonthId, c2.value, c2.ContractId
from emp_contr ec
left join contr c
on ec.EmployeeId = c.EmployeeId
and ec.CalendarMonthId = c.CalendarMonthId
join (
select *
from (
select EmployeeId,
CalendarMonthId,
value,
ContractId,
row_number() over (partition by EmployeeId order by CalendarMonthId desc) rn
from contr
) x
where rn = 1
) c2 on ec.EmployeeId = c2.EmployeeId
order by ec.EmployeeId, ec.CalendarMonthId desc
SQL Fiddle
The coalesce assumes all relevant columns are NOT NULL. If this is not the case you'll have to replace them with CASE-expressions in which you check if c.CalendarMonthId is null.
Explanation
The statement joins contr twice.
First using an outer join to obtain the matching results when present, without loosing rows without a match.
The second join is an inner join.
This assumes there is at least one contr row per emp_contr
The join does join an inline view which
uses the window function row_number() to filter only the last row per employee when ordered by CalendarMonthId
The last step is to combine the results, and pick the one from the first join if present and the one from the second join if not.

Related

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures

Check duplicates in sql table and replace the duplicates ID in another table

I have a table with duplicate entries (I forgot to make NAME column unique)
So I now have this Duplicate entry table called 'table 1'
ID NAME
1 John F Smith
2 Sam G Davies
3 Tom W Mack
4 Bob W E Jone
5 Tom W Mack
IE ID 3 and 5 are duplicates
Table 2
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 5 item34
NAMEID are ID from table 1. Table 2 ID 4 and 5 I want to have NAMEID of 3 (Tom W Mack's Orders) like so
Table 2 (correct version)
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 3 item34
Is there an easy way to find and update the duplicates NAMEID in table 2 then remove the duplicates from table 1
In this case what you can do is.
You can find how many duplicate records you have.
In Order to find duplicate records you can use.
SELECT ID, NAME,COUNT(1) as CNT FROM TABLE1 GROUP BY ID, NAME
This is will give you the count and you find all the duplicate records
and delete them manually.
Don't forget to alter your table after removing all the duplicate records.
Here's how you can do it:
-- set up the environment
create table #t (ID int, NAME varchar(50))
insert #t values
(1, 'John F Smith'),
(2, 'Sam G Davies'),
(3, 'Tom W Mack'),
(4, 'Bob W E Jone'),
(5, 'Tom W Mack')
create table #t2 (ID int, NAMEID int, ORDERS varchar(10))
insert #t2 values
(1, 2, 'item4'),
(2, 1, 'item5'),
(3, 4, 'item6'),
(4, 3, 'item23'),
(5, 5, 'item34')
go
-- update the referencing table first
;with x as (
select id,
first_value(id) over(partition by name order by id) replace_with
from #t
),
y as (
select #t2.nameid, x.replace_with
FROM #t2
join x on #t2.nameid = x.id
where #t2.nameid <> x.replace_with
)
update y set nameid = replace_with
-- delete duplicates from referenced table
;with x as (
select *, row_number() over(partition by name order by id) rn
from #t
)
delete x where rn > 1
select * from #t
select * from #t2
Pls, test first for performance and validity.
Let's use the example data
INSERT INTO TableA
(`ID`, `NAME`)
VALUES
(1, 'NameA'),
(2, 'NameB'),
(3, 'NameA'),
(4, 'NameC'),
(5, 'NameB'),
(6, 'NameD')
and
INSERT INTO TableB
(`ID`, `NAMEID`, `ORDERS`)
VALUES
(1, 2, 'itemB1'),
(2, 1, 'itemA1'),
(3, 4, 'itemC1'),
(4, 3, 'itemA2'),
(5, 5, 'itemB2'),
(5, 6, 'itemD1')
(makes it a bit easier to spot the duplicates and check the result)
Let's start with a simple query to get the smallest ID for a given NAME
SELECT
NAME, min(ID)
FROM
tableA
GROUP BY
NAME
And the result is [NameA,1], [NameB,2], [NameC,4], [NameD,6]
Now if you use that as an uncorrelated subquery for a JOIN with the base table like
SELECT
keep.kid, dup.id
FROM
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
It finds all duplicates that have the same name as in the result of the subquery but a different id + it also gives you the id of the "original", i.e. the smallest id for that name.
For the example it's [1,3], [2,5]
Now you can use that in an UPDATE query like
UPDATE
TableB as b
JOIN
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
SET
b.NAMEID=keep.kid
WHERE
b.NAMEID=dup.id
And the result is
ID,NAMEID,ORDERS
1, 2, itemB1
2, 1, itemA1
3, 4, itemC1
4, 1, itemA2 <- now has NAMEID=1
5, 2, itemB2 <- now has NAMEID=2
5, 6, itemD1
To eleminate the duplicates from tableA you can use the first query again.

Access 97 Outer join issue

I have two tables I want to join.
Table A has one column, named "Week", and contains 52 rows: 1,2,3,4,5,6 etc.
Table 2 has three columns, named "Name", "Week", and "Total", and contains 10 rows:
'Bob', 1, 1
'Bob', 3, 1
'Joe', 4, 1
'Bob', 6, 1
I want to join these together so that my data looks like:
NAME|WEEK|TOTAL
'Bob', 1, 1
'Bob', 2, 0
'Bob', 3, 1
'Bob', 4, 0
'Bob', 5, 0
'Bob', 6, 1
As you can see, a simple outer join. However, when I try to do this, I'm not getting the expected result, no matter what kind of join I use.
My query below:
SELECT a.WEEK, b.Total
FROM Weeks a LEFT JOIN Totals b ON (a.Week = b.Week and b.Name ='Bob')
The result of this query is
NAME|WEEK|TOTAL
'Bob', 1, 1
'Bob', 3, 1
'Bob', 6, 1
Thanks in advance for the help!
I know its access but your join is incorrect. Here we go in sql server..same concept just look at the join condition:
--dont worry about this code im just creating some temp tables
--table to store one column (mainly week number 1,2..52)
CREATE TABLE #Weeks
(
weeknumber int
)
--insert some test data
--week numbers...I'll insert some for you
INSERT INTO #Weeks(weeknumber) VALUES(1)
INSERT INTO #Weeks(weeknumber) VALUES(2)
INSERT INTO #Weeks(weeknumber) VALUES(3)
INSERT INTO #Weeks(weeknumber) VALUES(4)
INSERT INTO #Weeks(weeknumber) VALUES(5)
INSERT INTO #Weeks(weeknumber) VALUES(6)
--create another table with two columns storing the week # and a total for that week
CREATE TABLE #Table2
(
weeknumber int,
total int
)
--insert some data
INSERT INTO #Table2(weeknumber, total) VALUES(1, 100)
--notice i skipped week 2 on purpose to show you the results
INSERT INTO #Table2(weeknumber, total) VALUES(3, 100)
--here's the magic
SELECT t1.weeknumber as weeknumber, ISNULL(t2.total,0) as total FROM
#Weeks t1 LEFT JOIN #Table2 t2 ON t1.weeknumber=t2.weeknumber
--get rid of the temp tables
DROP TABLE #table2
DROP TABLE #Weeks
Results:
1 100
2 0
3 100
4 0
5 0
6 0
Take your week number table (the table that has one column:
SELECT t1.weeknumber as weeknumber
Add to it a null check to replace the null value with a 0. I think there is something in access like ISNULL:
ISNULL(t2.total, 0) as total
And start your join from your first table and left join to your second table on the weeknumber field. The result is simple:
SELECT t1.weeknumber as weeknumber, ISNULL(t2.total,0) as total FROM
#Weeks t1 LEFT JOIN #Table2 t2 ON t1.weeknumber=t2.weeknumber
Do not pay attention to all the other code I have posted, that is only there to create temp tables and insert values into the tables.
SELECT b.Name, b.Week, b.Total
FROM Totals AS b
WHERE b.Name ='Bob'
UNION
SELECT 'Bob' AS Name, a.Week, 0 AS Total
FROM Weeks AS a
WHERE NOT EXISTS ( SELECT *
FROM Totals AS b
WHERE a.Week = b.Week
AND b.Name ='Bob' );
You were on the right track, but just needed to use a left join. Also the NZ function will put a 0 if total is null.
SELECT Totals.Person, Weeks.WeekNo, Nz(Totals.total, 0) as TotalAmount
FROM Weeks LEFT JOIN Totals
ON (Weeks.WeekNo = Totals.weekno and Totals.Person = 'Bob');
EDIT: The query you now have won't even give the results you've shown because you left out the Name field (Which is a bad name for a field because it is a reserved word.). You're still not providing all the information. This query works.
*Another Approach: * Create a separate query on the Totals table having a where clause: Name = 'Bob'
Select Name, WeekNo, Total From Totals Where Name = 'Bob';
and substitute that query for the Totals table in this query.
Select b.Name, w.WeekNo, b.total
from Weeks as w
LEFT JOIN qryJustBob as b
on .WeekNo = b.WeekNo;

Find rows with same ID and have a particular set of names

EDIT:
I have a table with 3 rows like so.
ID NAME REV
1 A 0
1 B 0
1 C 0
2 A 1
2 B 0
2 C 0
3 A 1
3 B 1
I want to find the ID wich has a particular set of Names and the REV is same
example:
Edit2: GBN's solution would have worked perfectly, but since i do not have the access to create new tables. The added constraint is that no new tables can be created.
if input = A,B then output is 3
if input = A ,B,C then output is 1 and not 1,2 since the rev level differs in 2.
The simplest way is to compare a COUNT per ID with the number of elements in your list:
SELECT
ID
FROM
MyTable
WHERE
NAME IN ('A', 'B', 'C')
GROUP BY
ID
HAVING
COUNT(*) = 3;
Note: ORDER BY isn't needed and goes after the HAVING if needed
Edit, with question update. In MySQL, it's easier to use a separate table for search terms
DROP TABLE IF EXISTS gbn;
CREATE TABLE gbn (ID INT, `name` VARCHAR(100), REV INT);
INSERT gbn VALUES (1, 'A', 0);
INSERT gbn VALUES (1, 'B', 0);
INSERT gbn VALUES (1, 'C', 0);
INSERT gbn VALUES (2, 'A', 1);
INSERT gbn VALUES (2, 'B', 0);
INSERT gbn VALUES (2, 'C', 0);
INSERT gbn VALUES (3, 'A', 0);
INSERT gbn VALUES (3, 'B', 0);
DROP TABLE IF EXISTS gbn1;
CREATE TABLE gbn1 ( `name` VARCHAR(100));
INSERT gbn1 VALUES ('A');
INSERT gbn1 VALUES ('B');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
INSERT gbn1 VALUES ('C');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
Edit 2, without extra table, use a derived (inline) table:
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
(SELECT 'A' AS `name`
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
) gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = 3 -- matches number of elements in gbn1 derived table
AND MIN(gbn.REV) = MAX(gbn.REV);
Similar to gbn, but allowing for the possibility of duplicate ID/Name combinations:
SELECT ID
FROM MyTable
WHERE NAME IN ('A', 'B', 'C')
GROUP BY ID
HAVING COUNT(DISTINCT NAME) = 3;
OKAY!... I solved my problem ! I modified GBN's logic to do it without a search table using the IN clause
1 flaw with doing MAX(rev) = MIN(REV) is: if i have a data like so .
ID NAME REV
1 A 0
1 B 1
1 A 1
then when I use a query like
Select ID from TABLE
where NAME in {A,B}
groupby ID
having count(*) = 2
and MIN(REV) = MAX(REV)
it will not show me the ID 1 as the min and max are different and the count is 3.
So i simply add another column to the groupby
so the final query is
Select ID from TABLE
where NAME in {A,B}
groupby ID,REV
having count(*) = 2
and MIN(REV) = MAX(REV)
Thanks,to all that helped. !

SQL Group By Problem

I have a table that has 3 cols namely points, project_id and creation_date. every time points are assigned a new record has been made, for example.
points = 20 project_id = 441 creation_date = 04/02/2011 -> Is one record
points = 10 project_id = 600 creation_date = 04/02/2011 -> Is another record
points = 5 project_id = 441 creation_dae = 06/02/2011 -> Is final record
(creation_date is the date on which record is entered and it is achieved by setting the default value to GETDATE())
now the problem is I want to get MAX points grouped by project_id but I also want creation_date to appear with it so I can use it for another purpose, if creation date is repeating its ok and I cannot group by creation_date because if I do so it will skip the points of project with id 600 and its wrong because id 600 is a different project and its only max points are 10 so it should be listed and its only possible if I do the grouping using project_id but then how should I also list creation_date
So far I am using this query to get MAX points of each project
SELECT MAX(points) AS points, project_id
FROM LogiCpsLogs AS LCL
WHERE (writer_id = #writer_id) AND (DATENAME(mm, GETDATE()) = DATENAME(mm, creation_date)) AND (points <> 0)
GROUP BY project_id
writer_id is the ID of writer whose points I want to see, like writer_id = 1, 2 or 3.
This query brings the result of current month only but I would like to list creation_date as well. Please help.
The subquery way
SELECT P.Project_ID, P.Creation_Date, T.Max_Points
FROM Projects P INNER JOIN
(
SELECT Project_ID, MAX(Points) AS Max_Points
FROM Projects
GROUP BY Project_ID
) T
ON P.Project_ID = T.Project_ID
AND P.Points = T.Max_Points
Please see comment: this will give you ALL days where max-points was achieved. If you only just want one, the query will be more complex.
Edits:
Misread requirements. Added additional constraint.
I'll give you sample..
SELECT MAX(POINTS),
PROJECT_ID,
CREATION_DATE
FROM yourtable
GROUP by CREATION_DATE,PROJECT_ID;
This should be what you want, you don't even need a group by or aggravate functions:
SELECT points, project_id, created_date
FROM #T AS LCL
WHERE writer_id = #writer_id AND points <> 0
AND NOT EXISTS (
SELECT TOP 1 1
FROM #T AS T2
WHERE T2.writer_id = #writer_id
AND T2.project_id = LCL.project_id
AND T2.points > LCL.points)
Where #T is your table, also if you want to only show the records where they were the total in general and not the total for just this given #writer_id then remove the restriction T2.writer_id = #writer_id from the inner query
And my code that I used to test:
DECLARE #T TABLE
(
writer_id int,
points int,
project_id int,
created_date datetime
)
INSERT INTO #T VALUES(1, 20, 441, CAST('20110204' AS DATETIME))
INSERT INTO #T VALUES(1, 10, 600, CAST('20110204' AS DATETIME))
INSERT INTO #T VALUES(1, 5, 441, CAST('20110202' AS DATETIME))
INSERT INTO #T VALUES(1, 15, 241, GETDATE())
INSERT INTO #T VALUES(1, 12, 241, GETDATE())
INSERT INTO #T VALUES(2, 12, 241, GETDATE())
SELECT * FROM #T
DECLARE #writer_id int = 1
My results:
Result Set (3 items)
points | project_id | created_date
20 | 441 | 04/02/2011 00:00:00
10 | 600 | 04/02/2011 00:00:00
15 | 241 | 21/09/2011 18:59:31
My solution use CROSS APPLY sub-queries.
For optimal performance I have created an index on project_id (ASC) & points (DESC sorting order) fields.
If you want to see all creation_date values that have maximum points then you can use WITH TIES:
CREATE TABLE dbo.Project
(
project_id INT PRIMARY KEY
,name NVARCHAR(100) NOT NULL
);
CREATE TABLE dbo.ProjectActivity
(
project_activity INT IDENTITY(1,1) PRIMARY KEY
,project_id INT NOT NULL REFERENCES dbo.Project(project_id)
,points INT NOT NULL
,creation_date DATE NOT NULL
);
CREATE INDEX IX_ProjectActivity_project_id_points_creation_date
ON dbo.ProjectActivity(project_id ASC, points DESC)
INCLUDE (creation_date);
GO
INSERT dbo.Project
VALUES (1, 'A'), (2, 'BB'), (3, 'CCC');
INSERT dbo.ProjectActivity (project_id, points, creation_date)
VALUES (1,100,'2011-01-01'), (1,110,'2011-02-02'), (1, 111, '2011-03-03'), (1, 111, '2011-04-04')
,(2, 20, '2011-02-02'), (2, 22, '2011-03-03')
,(3, 2, '2011-03-03');
SELECT p.*, ca.*
FROM dbo.Project p
CROSS APPLY
(
SELECT TOP(1) WITH TIES
pa.points, pa.creation_date
FROM dbo.ProjectActivity pa
WHERE pa.project_id = p.project_id
ORDER BY pa.points DESC
) ca;
DROP TABLE dbo.ProjectActivity;
DROP TABLE dbo.Project;