I have two SQL Server tables:
TableA TableB
+------+--------+ +-----+------------+
| aid | Name | | aid | Activity |
+------+--------+ +-----+------------+
| 1 | Jim | | 1 | Skiing |
| 2 | Jon | | 1 | Surfing |
| 3 | Stu | | 1 | Riding |
| 4 | Sam | | 3 | Biking |
| 5 | Kat | | 3 | Flying |
+------+--------+ +-----+------------+
I'm trying to the following result where the related activities are in a comma-separated list:
+------+--------+------------------------------+
| aid | Name | Activity |
+------+--------+------------------------------+
| 1 | Jim | Skiing, Surfing, Riding |
| 2 | Jon | NULL |
| 3 | Stu | Biking, Flying |
| 4 | Sam | NULL |
| 5 | Kat | NULL |
+------+--------+------------------------------+
I tried:
SELECT aid, Name, STRING_AGG([Activity], ',') AS Activity
FROM TableA
INNER JOIN TableB
ON TableA.aid = TableB.aid
GROUP BY aid, Name
Can someone help me with this SQL query? Thank you.
You could use OUTER APPLY to aggregate the string if you're using SQL Server 2017 or higher.
drop table if exists #TableA;
go
create table #TableA (
aid int not null,
[Name] varchar(10) not null);
insert #TableA(aid, [Name]) values
(1, 'Jim'),
(2, 'Jon'),
(3, 'Stu'),
(4, 'Sam'),
(5, 'Kat');
drop table if exists #TableB;
go
create table #TableB (
aid int not null,
[Activity] varchar(10) not null);
insert #TableB(aid, [Activity]) values
(1, 'Skiing'),
(1, 'Surfing'),
(1, 'Riding'),
(3, 'Biking'),
(3, 'Flying');
select a.aid, a.[Name], oa.sa
from #TableA a
outer apply (select string_agg(b.Activity, ', ') sa
from #TableB b
where a.aid=b.aid) oa;
Name sa
Jim Skiing, Surfing, Riding
Jon NULL
Stu Biking, Flying
Sam NULL
Kat NULL
Related
I have a logs table with the following definition:
Column | Type | Collation | Nullable | Default
------------------+-----------------------+-----------+----------+---------
id | integer | | not null |
work_location_id | uuid | | not null |
hard_disk_id | integer | | not null |
and a works table with the following definition:
Column | Type | Collation | Nullable | Default
-------------+-----------------------+-----------+----------+---------
id | integer | | not null |
location_id | uuid | | not null |
f_index | integer | | not null |
f_name | character varying(40) | | not null |
f_value | character varying(40) | | not null |
The logs table has data such as:
id | work_location_id | hard_disk_id
----+--------------------------------------+--------------
1 | 40e6215d-b5c6-4896-987c-f30f3678f608 | 1
2 | 3f333df6-90a4-4fda-8dd3-9485d27cee36 | 2
3 | c17bed94-3a9c-4c21-be49-dc77f96d49dc | 3
4 | 6ecd8c99-4036-403d-bf84-cf8400f67836 | 4
5 | 6ecd8c99-4036-403d-bf84-cf8400f67836 | 5
And the works table has data such as:
id | location_id | f_index | f_name | f_value
----+--------------------------------------+---------+-------------+------------
1 | 40e6215d-b5c6-4896-987c-f30f3678f608 | 1 | plot_crop | pears
2 | 3f333df6-90a4-4fda-8dd3-9485d27cee36 | 1 | plot_crop | pears
3 | c17bed94-3a9c-4c21-be49-dc77f96d49dc | 1 | plot_crop | pears
4 | 1cdc7c05-0acd-46cb-b48a-4d3e240a4548 | 1 | plot_crop | pears
5 | dae1eee7-508f-4a76-8906-8ff7b8bfab26 | 1 | plot_crop | pears
6 | 6ecd8c99-4036-403d-bf84-cf8400f67836 | 1 | plot_id | 137
7 | 6ecd8c99-4036-403d-bf84-cf8400f67836 | 2 | farmer_name | John Smith
Desired Output
I want to be able to query the two tables and get the following output
location_id | plot_id | farmer_name
---------------------------------------+---------+-------------
40e6215d-b5c6-4896-987c-f30f3678f608 | None | None
3f333df6-90a4-4fda-8dd3-9485d27cee36 | None | None
c17bed94-3a9c-4c21-be49-dc77f96d49dc | None | None
6ecd8c99-4036-403d-bf84-cf8400f67836 | 137 | John Smith
Notice how for location_id = 6ecd8c99-4036-403d-bf84-cf8400f67836, both values are now showing in one row. I tried to use group by location_id but that didn't work, I was still getting duplicates.
I have also created a db-fiddle.
This looks like conditional aggregation:
select location_id,
max(f_value) filter (where f_name = 'plot_id') as plot_id,
max(f_value) filter (where f_name = 'farmer_name') as farmer_name
from t
group by location_id;
In other databases, you would just use:
max(case when f_name = 'plot_id' then f_value end) as plot_id
As you want to have None as text
Schema (PostgreSQL v13)
-- create table
create table logs (
id integer not null,
work_location_id uuid not null,
hard_disk_id integer not null
);
create table works (
id integer not null,
location_id uuid not null,
f_index integer not null,
f_name varchar(40) not null,
f_value varchar(40) not null
);
-- insert data into table
insert into logs (id, work_location_id, hard_disk_id) values
(1, '40e6215d-b5c6-4896-987c-f30f3678f608', 1),
(2, '3f333df6-90a4-4fda-8dd3-9485d27cee36', 2),
(3, 'c17bed94-3a9c-4c21-be49-dc77f96d49dc', 3),
(4, '6ecd8c99-4036-403d-bf84-cf8400f67836', 4),
(5, '6ecd8c99-4036-403d-bf84-cf8400f67836', 5);
insert into works (id, location_id, f_index, f_name, f_value) values
(1, '40e6215d-b5c6-4896-987c-f30f3678f608', 1, 'plot_crop', 'pears'),
(2, '3f333df6-90a4-4fda-8dd3-9485d27cee36', 1, 'plot_crop', 'pears'),
(3, 'c17bed94-3a9c-4c21-be49-dc77f96d49dc', 1, 'plot_crop', 'pears'),
(4, '1cdc7c05-0acd-46cb-b48a-4d3e240a4548', 1, 'plot_crop', 'pears'),
(5, 'dae1eee7-508f-4a76-8906-8ff7b8bfab26', 1, 'plot_crop', 'pears'),
(6, '6ecd8c99-4036-403d-bf84-cf8400f67836', 1, 'plot_id', '137'),
(7, '6ecd8c99-4036-403d-bf84-cf8400f67836', 2, 'farmer_name', 'John Smith');
Query #1
select w.location_id,
COALESCE(MAX(case
when w.f_name = 'plot_id' then w.f_value
else NULL
end),'None') as "plot_id",
COALESCE(MAX(case
when w.f_name = 'farmer_name' then w.f_value
else NULL
end),'None') as "farmer_name"
from logs l
inner join works w on w.location_id = l.work_location_id
GROUP BY location_id;
location_id
plot_id
farmer_name
3f333df6-90a4-4fda-8dd3-9485d27cee36
None
None
40e6215d-b5c6-4896-987c-f30f3678f608
None
None
6ecd8c99-4036-403d-bf84-cf8400f67836
137
John Smith
c17bed94-3a9c-4c21-be49-dc77f96d49dc
None
None
View on DB Fiddle
I have a schema as below
Test
--------------------
| Id | Name |
--------------------
| 1 | A001 |
| 2 | B001 |
| 3 | C001 |
--------------------
RelatedTest
---------------------------------
| Id | Name | TestId |
---------------------------------
| 1 | Jack | NULL |
| 2 | Joe | 2 |
| 3 | Jane | 3 |
| 4 | Julia | 3 |
---------------------------------
To briefly explain this schema RelatedTest has a nullable FK to Test and the FKId can appear either 0 or 1 or 2 times but never more than 2 times.
I am after a t-SQL query that reports the data in Test in the following format
TestReport
---------------------------------------------------------------------------
| TestId | TestName | RelatedTestName1 | RelatedTestName2 |
---------------------------------------------------------------------------
| 1 | A001 | NULL | NULL |
| 2 | B001 | Joe | NULL |
| 3 | C001 | Jane | Julia |
I can safely assume that TestReport will not need any more than two columns for RelatedTestName.
The schema is beyond my control and I am just looking to query it for some reporting.
I've been trying to utilise the Pivot function but I'm not entirely sure how I can use it so that RelatedTestName1 and RelatedTestName1 can be NULL in the case where there is no RelatedTest records. And also since RelatedTestName is a varchar I'm not sure how to apply an appropriate aggregate if that's what is needed.
Preparing Data:
DROP TABLE IF EXISTS Test
GO
CREATE TABLE Test (Id INT PRIMARY KEY, Name VARCHAR(10)) ON [PRIMARY]
GO
INSERT INTO Test Values
(1, 'A001')
,(2, 'B001')
,(3, 'C001')
GO
DROP TABLE IF EXISTS RelatedTest
GO
CREATE TABLE RelatedTest (
Id INT,
Name VARCHAR(10),
TestId INT FOREIGN KEY REFERENCES Test (Id)
) ON [PRIMARY]
GO
INSERT INTO RelatedTest Values
(1, 'Jack', NULL)
,(2, 'Joe', 2)
,(3, 'Jane', 3)
,(3, 'Julia', 3)
GO
Query:
;WITH CTE AS
(
SELECT TestId = T.Id
,TestName = T.Name
,RelatedTestName = RT.Name
,RN = ROW_NUMBER() OVER(PARTITION BY T.Id ORDER BY RT.Id ASC)
FROM Test T
LEFT JOIN RelatedTest RT
ON T.Id = RT.TestId
)
SELECT DISTINCT
C.TestId
,C.TestName
,RelatedTestName1 = (SELECT RelatedTestName FROM CTE A WHERE A.TestId = C.TestId AND A.RN = 1)
,RelatedTestName2 = (SELECT RelatedTestName FROM CTE A WHERE A.TestId = C.TestId AND A.RN = 2)
FROM CTE C;
I'm trying to find examples of sqlcmd script files that will run a select statement, and return those values internal to the script and place them in a variable. I then want to iterate over those returned values, run some if statements on those returned values, and then run some sql insert statements. I'm using Sql Server Managment Studio, so I thought I could run some scripts in the sqlcmd mode of the Query Editor. Maybe there's a better way to do it, but that seemed like a good solution.
I've looked on the Microsoft website for sqlcmd and T-SQL examples that might help. I've also done general searches of the web, but all the examples that come up are too simplistic, and weren't helpful. Any help would be appreciated.
Here is how I understand your starting position:
create table #data
(
id int,
column1 varchar(100),
column2 varchar(100),
newcolumn int
)
create table #lookup
(
id int,
column1 varchar(100),
column2 varchar(100)
)
insert into #data
values
(1, 'black', 'duck', NULL),
(2, 'white', 'panda', NULL),
(3, 'yellow', 'dog', NULL),
(4, 'orange', 'cat', NULL),
(5, 'blue', 'lemur', NULL)
insert into #lookup
values
(1, 'white', 'panda'),
(2, 'orange', 'cat'),
(3, 'black', 'duck'),
(4, 'blue', 'lemur'),
(5, 'yellow', 'dog')
select * from #data
select * from #lookup
Output:
select * from #data
/------------------------------------\
| id | column1 | column2 | newcolumn |
|----|---------|---------|-----------|
| 1 | black | duck | NULL |
| 2 | white | panda | NULL |
| 3 | yellow | dog | NULL |
| 4 | orange | cat | NULL |
| 5 | blue | lemur | NULL |
\------------------------------------/
select * from #lookup
/------------------------\
| id | column1 | column2 |
|----|---------|---------|
| 1 | white | panda |
| 2 | orange | cat |
| 3 | black | duck |
| 4 | blue | lemur |
| 5 | yellow | dog |
\------------------------/
From this starting point, you can achieve what you are asking for as follows:
update d set d.newcolumn = l.id
from #data d
left join #lookup l on d.column1 = l.column1 and d.column2 = l.column2
alter table #data
drop column column1, column2
This will leave the tables in the desired state, with the varchar values moved out into the lookup table:
select * from #data
/----------------\
| id | newcolumn |
|----|-----------|
| 1 | 3 |
| 2 | 1 |
| 3 | 5 |
| 4 | 2 |
| 5 | 4 |
\----------------/
select * from #lookup
/------------------------\
| id | column1 | column2 |
|----|---------|---------|
| 1 | white | panda |
| 2 | orange | cat |
| 3 | black | duck |
| 4 | blue | lemur |
| 5 | yellow | dog |
\------------------------/
i have two tables, ADDRESSES and an additional table CONTACTS. CONTACTS have a SUPERID which is the ID of the ADDRESS they belong to.
I want to identify duplicates (same Name, Firstname and Birthday) in the ADDRESSES Table and merge the contacts of these duplicates onto the latest Adress (latest DATECREATE or highest ID of the Adress).
Afterwards the other duplicates shall be deleted.
My approach for merging the contacts does not work though. Deleting duplicates works.
This is my approach. Would be grateful for support what is wrong here.
Thank you!
UPDATE dbo.CONTACTS
SET SUPERID = ADDRESSES.ID FROM dbo.ADDRESSES
inner join CONTACTS on ADDRESSES.ID = CONTACTS.SUPERID
WHERE ADDRESSES.id in (
SELECT id FROM dbo.ADDRESSES
WHERE EXISTS(
SELECT NULL FROM ADDRESSES AS tmpcomment
WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
))
DELETE FROM ADDRESSES
WHERE id in (
SELECT id FROM dbo.ADDRESSES
WHERE EXISTS(
SELECT NULL FROM ADDRESSES AS tmpcomment
WHERE dbo.ADDRESSES.FIRSTNAME0 = tmpcomment.FIRSTNAME0
AND dbo.ADDRESSES.LASTNAME0 = tmpcomment.LASTNAME0
and dbo.ADDRESSES.BIRTHDAY1 = tmpcomment.BIRTHDAY1
HAVING dbo.ADDRESSES.id > MIN(tmpcomment.id)
)
)
Here is a sample for understanding the issue.
ADDRESSES
| ID | DATECREATE | LASTNAME0 | FIRSTNAME0 | BIRTHDAY1 |
|:-----------|------------:|:------------:|------------:|:------------:|
| 1 | 19.07.2011 | Arthur | James | 05.05.1980 |
| 2 | 23.08.2012 | Arthur | James | 05.05.1980 |
| 3 | 11.12.2015 | Arthur | James | 05.05.1980 |
| 4 | 22.10.2016 | Arthur | James | 05.05.1980 |
| 6 | 20.12.2014 | Doyle | Peter | 01.01.1950 |
| 7 | 09.01.2016 | Doyle | Peter | 01.01.1950 |
|:-----------|------------:|:------------:|------------:|:------------:|
CONTACTS
| ID | SUPERID |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 4 |
| 7 | 4 |
| 8 | 6 |
| 9 | 6 |
| 10 | 6 |
| 11 | 7 |
The result shall be like this
ADDRESSES
| ID | DATECREATE | LASTNAME0 | FIRSTNAME0 | BIRTHDAY1 |
|:-----------|------------:|:------------:|------------:|:------------:|
| 4 | 22.10.2016 | Arthur | James | 05.05.1980 |
| 7 | 09.01.2016 | Doyle | Peter | 01.01.1950 |
CONTACTS
| ID | SUPERID |
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
| 4 | 4 |
| 5 | 4 |
| 6 | 4 |
| 7 | 4 |
| 8 | 7 |
| 9 | 7 |
| 10 | 7 |
| 11 | 7 |
My approach would use a temporary table:
/*
CREATE TABLE addresses
([ID] int, [DATECREATE] varchar(10), [LASTNAME0] varchar(6), [FIRSTNAME0] varchar(5), [BIRTHDAY1] datetime);
INSERT INTO addresses
([ID], [DATECREATE], [LASTNAME0], [FIRSTNAME0], [BIRTHDAY1])
VALUES
(1, '19.07.2011', 'Arthur', 'James', '1980-05-05 00:00:00'),
(2, '23.08.2012', 'Arthur', 'James', '1980-05-05 00:00:00'),
(3, '11.12.2015', 'Arthur', 'James', '1980-05-05 00:00:00'),
(4, '22.10.2016', 'Arthur', 'James', '1980-05-05 00:00:00'),
(6, '20.12.2014', 'Doyle', 'Peter', '1950-01-01 00:00:00'),
(7, '09.01.2016', 'Doyle', 'Peter', '1950-01-01 00:00:00');
CREATE TABLE contacts
([ID] int, [SUPERID] int);
INSERT INTO contacts
([ID], [SUPERID])
VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 4),
(7, 4),
(8, 6),
(9, 6),
(10, 6),
(11, 7);
*/
DROP TABLE IF EXISTS #t; --sqls2016+ only, google for an older method if yours is sub 2016
SELECT id as oldid, MAX(id) OVER(PARTITION BY lastname0, firstname0, birthday1) as newid INTO #t
FROM
addresses;
/*now #t contains data like
1, 4
2, 4
3, 4
4, 4
6, 7
7, 7*/
--remove the ones we don't need to change
DELETE FROM #t WHERE oldid = newid;
BEGIN TRANSACTION;
SELECT * FROM addresses;
SELECT * FROM contacts;
--now #t is the list of contact changes we need to make, so make those changes
UPDATE contacts
SET contacts.superid = #t.newid
FROM
contacts INNER JOIN #t ON contacts.superid = #t.oldid;
--now scrub the old addresses with no contact records. This catches all such records, not just those in #t
DELETE FROM addresses WHERE id NOT IN (SELECT DISTINCT superid FROM contacts);
--alternative to just clean up the records we affected in this operation
DELETE FROM addresses WHERE id IN (SELECT oldid FROM #t);
SELECT * FROM addresses;
SELECT * FROM contacts;
ROLLBACK TRANSACTION;
Please note, i have tested this and it produces the results you want but I advocate caution copying an update/delete query off the internet and running. I've inserted a transaction that selects the data before and after and rolls back the transaction so nothing gets wrecked. Run it on a test db first though!
I know this is a much asked question and I've had a look through whats already available but I believe my case is slightly unique (and if it's not please point me in the right direction).
I am trying to find the latest occurrence of a row associated to a user a currently across two tables and several columns.
table: statusUpdate
+-------+-----------+-----------+-------------------+
| id | name | status | date_change |
+-------+-----------+-----------+-------------------+
| 1 | Matt | 0 | 01-01-2001 |
| 2 | Jeff | 1 | 01-01-2001 |
| 3 | Jeff | 2 | 01-01-2002 |
| 4 | Bill | 2 | 01-01-2001 |
| 5 | Bill | 3 | 01-01-2004 |
+-------+-----------+-----------+-------------------+
table: relationship
+-------+-----------+--------------+
| id | userID |stautsUpdateID|
+-------+-----------+--------------+
| 1 | 22 | 1 |
| 2 | 33 | 2 |
| 3 | 33 | 3 |
| 4 | 44 | 4 |
| 5 | 44 | 5 |
+-------+-----------+--------------+
There is a third table which links userID to its own table but these sample tables should be good enough to get my question over.
I am looking to get the latest status change by date. The problem currently is that it returns all instances of a status change.
Current results:
+-------+---------+-----------+-------------------+
|userID |statusID | status | date_change |
+-------+---------+-----------+-------------------+
| 33 | 2 | 1 | 01-01-2001 |
| 33 | 3 | 2 | 01-01-2002 |
| 44 | 4 | 2 | 01-01-2001 |
| 44 | 5 | 3 | 01-01-2004 |
+-------+---------+-----------+-------------------+
Expected results:
+-------+-----------+-----------+-------------------+
|userID |statusID | status | date_change |
+-------+-----------+-----------+-------------------+
| 33 | 3 | 2 | 01-01-2002 |
| 44 | 5 | 3 | 01-01-2004 |
+-------+-----------+-----------+-------------------+
I hope this all makes sense, please ask for more information otherwise.
Just to reiterate I just want to return the latest instance of a users status change by date.
Sample code of one of my attempts:
select
st.ID, st.status, st.date_change, r.userID
from statusUpdate st
inner join Relationship r on st.ID = r.statusUpdateID
inner join (select ID, max(date_change) as recent from statusUpdate
group by ID) as y on r.stausUpdateID = y.ID and st.date_change =
y.recent
Hope someone can point me in the right direction.
use row_number() to get the last row by user
select *
from
(
select st.ID, st.status, st.date_change, r.userID,
rn = row_number() over (partition by r.userID order by st.date_change desc)
from statusUpdate st
inner join Relationship r on st.ID = r.statusUpdateID
) as d
where rn = 1
I ADDED MAX condition to your answer
CREATE TABLE #Table1
([id] int, [name] varchar(4), [status] int, [date_change] datetime)
;
INSERT INTO #Table1
([id], [name], [status], [date_change])
VALUES
(1, 'Matt', 0, '2001-01-01 00:00:00'),
(2, 'Jeff', 1, '2001-01-01 00:00:00'),
(3, 'Jeff', 2, '2002-01-01 00:00:00'),
(4, 'Bill', 2, '2001-01-01 00:00:00'),
(5, 'Bill', 3, '2004-01-01 00:00:00')
;
CREATE TABLE #Table2
([id] int, [userID] int, [stautsUpdateID] int)
;
INSERT INTO #Table2
([id], [userID], [stautsUpdateID])
VALUES
(1, 22, 1),
(2, 33, 2),
(3, 33, 3),
(4, 44, 4),
(5, 44, 5)
select
max(st.ID) id , max(st.status) status , max(st.date_change) date_change, r.userID
from #Table1 st
inner join #Table2 r on st.ID = r.stautsUpdateID
inner join (select ID, max(date_change) as recent from #Table1
group by ID) as y on r.stautsUpdateID = y.ID and st.date_change =
y.recent
group by r.userID
output
id status date_change userID
1 0 2001-01-01 00:00:00.000 22
3 2 2002-01-01 00:00:00.000 33
5 3 2004-01-01 00:00:00.000 44