I'm practising subquery in Snowflake (it supports common SQL queries). I stuck and cannot get the result I want.
Data
INSERT INTO members (memid, surname, firstname, address, zipcode, telephone, recommendedby, joindate) VALUES
(12, 'Baker', 'Anne', '55 Powdery Street, Boston', 80743, '844-076-5141', 9, '2012-08-10 14:23:22'),
(21, 'Mackenzie', 'Anna', '64 Perkington Lane, Reading', 64577, '(822) 661-2898', 1, '2012-08-26 09:32:05'),
(6, 'Tracy', 'Burton', '3 Tunisia Drive, Boston', 45678, '(822) 354-9973', NULL, '2012-07-15 08:52:55');
I want to get each member's name, member id, recommender's name and recommender's id.
My code
with recommender as (
select distinct concat(t1.firstname, ' ', t1.surname) recommender
, memid recommender_id
from "EXERCISES"."CD"."MEMBERS" t1
where exists (select surname from "EXERCISES"."CD"."MEMBERS" t2
where t1.memid = t2.recommendedby)
)
, member as (
select
distinct concat(firstname, ' ', surname) as member,
memid,
recommender,
recommender_id
from recommender
left join "EXERCISES"."CD"."MEMBERS" t3 on recommender.recommender_id = t3.recommendedby
) select * from member
order by member;
My output
Noticed that Burton Tracy is missing from the output because she doesn't have any recommender. I want to keep her data in the output. How should I rewrite my code?
Thank you
I'm not quite sure why you are using CTEs for this...? Or subqueries, for that matter.
Getting the person who recommeded a member is not more than a LEFT JOIN:
select
concat(m.firstname, ' ', m.surname) as member,
m.memid member_id,
concat(r.firstname, ' ', r.surname) as recommender
r.memid recommender_id
from
members m
left join members r on r.memid = m.recommendedby
Related
I have a query where I'm trying to select some customer information: name, address, city, state, and zip. I'd like to pull all information and only pull of the records if there is a dupe.
Example of data:
Invoice_Date First Last Addr City State Zip
11/11/14 Jim Jones 12 Cedar alkdjf TN 29430
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430
11/11/14 Robert Smith 15 block slkjdd TX 10932
What I want to return:
Invoice_Date First Last Addr City State Zip
11/11/15 Ralph Jones 12 Cedar alkdjf TN 29430 (newest Record)
11/11/14 Robert Smith 15 block slkjdd TX 10932
This is my query that is able to pull ALL customers for the specified dates:
SELECT
Invoice_Tb.Invoice_Date, Invoice_Tb.Customer_First_Name,
Invoice_Tb.Customer_Last_Name,
Invoice_Tb.Customer_Address, Invoice_Tb.City,
Invoice_Tb.Customer_State, Invoice_Tb.ZIP_Code
FROM
Invoice_Tb
LEFT OUTER JOIN
Invoice_Detail_Tb ON Invoice_Tb.Store_Number = Invoice_Detail_Tb.Store_Number
AND Invoice_Tb.Invoice_Number = Invoice_Detail_Tb.Invoice_Number
AND Invoice_Tb.Invoice_Date = Invoice_Detail_Tb.Invoice_Date
WHERE
(Invoice_Tb.Invoice_Date IN ('11/11/14', '11/11/15'))
AND (Invoice_Detail_Tb.Invoice_Detail_Code = 'FSV')
AND (LEN(Invoice_Tb.Customer_Address) > 4)
ORDER BY
Invoice_Tb.Customer_Address
Now, obviously I can't use Row_Number here, because it's not an option in SQL Server 2000, so that's out.
I've tried Select Distinct - but I'm in need of the other information (first name, last name, etc), and when using select Distinct, it also returns distinct records for First and Last name. I only want 1 record, per address.
How can I return 1 row, for each distinct Address while including first name last name of the MOST recent visit, in this case - 11/11/15.
create table #SomeTest (Invoice_Number Int, InvoiceDt DateTime, FName Varchar(24), LName Varchar(24), Addr Varchar(24), City Varchar(24), St Varchar(2), Zip Varchar(12) )
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (1, '11/11/14', 'Jim','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (2, '11/11/15', 'Ralph','Jones', '12 Cedar', 'alkdjf', 'TN', '29430')
insert into #SomeTest (Invoice_Number, InvoiceDt, FName, LName, Addr, City, St, Zip) values (3, '11/11/14', 'Robert','Smith', '15 block', 'slkjdd', 'TX', '10932')
select * from #SomeTest
where Invoice_Number in
(
select Invoice_Number from
(select Invoice_Number = max(Invoice_Number), SupperAddy = Addr + '#' + City + '#' + St + '#' + Zip from #SomeTest
group by Addr + '#' + City + '#' + St + '#' + Zip) X
)
If you don't want to use any analytic function (like row_number) then you could do something like this:
--This gives you the most recent date for an address
Select max(invoice_Date) as Invoice_date
Invoice_Tb.Customer_Address
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
into #tmp1
from Invoice_Tb
group by Invoice_Tb.Customer_Address
, Invoice_Tb.City
, Invoice_Tb.Customer_State
, Invoice_Tb.ZIP_Code
GO
--link back to the name for the most recent address
Select a.Invoice_date
,b.Customer_First_Name as [First]
,b.Customer_Last_Name as [Last]
, a.Invoice_Tb.Customer_Address as [Addr]
, a.Invoice_Tb.City
, a.Invoice_Tb.Customer_State as [State]
, a.Invoice_Tb.ZIP_Code as [Zip]
from #tmp1 a
left join Invoice_T b on
a.Invoicedate = b.Invoicedate
a.Customer_Address = b.Customer_Address
a.City = b.City
a.Customer_State = b.Customer_State
a.ZIP_Code = b.ZIP_Code
GO
Here's the "standard" way I always did this in my MsSql 2000 days. In your case, a subquery in the WHERE clause would also work, but I will use the INNER JOIN version of this solution. I employed a little bit of pseudo-code to cut down on my typing. You should be able to figure it out:
SELECT
t1.Invoice_Date, t1.Customer_First_Name,
t1.Customer_Last_Name,
t1.Customer_Address, t1.City,
t1.Customer_State, t1.ZIP_Code
FROM
Invoice_Tb t1
INNER JOIN Invoice_Tb t2
ON t1.Invoice_Number = (
SELECT TOP 1 t2.Invoice_Number
WHERE t2.Customer_Address=t1.Customer_Address
AND {t2.City,State,Zip=t1.City,State,Zip} --psuedocode
ORDER BY t2.Invoice_Date DESC
)
LEFT OUTER JOIN
Invoice_Detail_Tb ON ...
WHERE
...
ORDER BY
Invoice_Tb.Customer_Address
Also note that if any of the "address" fields can be NULL, you will have to handle that possibility when comparing them in the subquery.
you can do something like this query:
SELECT
i.last,
MAX(invoice_date) AS InvoiceDAte,
MAX(first) AS First,
MAX(addr) AS addr,
MAX(zip) AS zip
FROM invoice i
INNER JOIN (SELECT
last,
MAX(invoice_date) AS MaxInvoiceDate
FROM invoice
GROUP BY last) a
ON a.last = i.last
AND a.MaxInvoiceDate = i.Invoice_date
GROUP BY i.last
Here max(invoice_date) can be multiple in that case we require to take top 1 on that
I have a problem when trying to add rows from a tempTable to a table. The problem is that it adds the rows from TempDealer table even if they're already in the Dealership table (Notice that I'm specifying in the WHERE Statement WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM #dealerStatus). Every time I run the stored procedure it adds again all rows from the TempDealer to the Dealership table when it should only add them once. Any ideas? thanks in advance.
INSERT INTO #dealerStatus (dealerId, supplier_ref, [add], [timestamp])
SELECT NULL, td.supplier_ref, 1, GETDATE()
FROM TempDealer td
WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM #dealerStatus)
INSERT INTO Dealership(
dealership_name,
telephone,
fax,
sales_email,
support_email,
service_mask,
address1,
address2,
town,
county,
postcode,
website,
date_modified,
supplier_ref,
dealer_type,
county_id,
town_id,
area_id,
district_id,
longitude,
latitude
)
SELECT DISTINCT
[updateSource].leasing_broker_name,
[updateSource].telephone,
[updateSource].fax_number,
[updateSource].email,
[updateSource].support_email,
[updateSource].service_mask,
[updateSource].address1,
[updateSource].address2,
[updateSource].town,
[updateSource].county,
[updateSource].post_code,
[updateSource].web_address,
GETDATE(),
[updateSource].supplier_ref,
1,
[updateSource].county_id,
[updateSource].town_id,
[updateSource].region,
[updateSource].district,
[updateSource].longitude,
[updateSource].latitude
FROM
#dealerStatus dealerUpdateStatus INNER JOIN
TempDealer [updateSource] ON dealerUpdateStatus.supplier_ref = updateSource.supplier_ref
WHERE
dealerUpdateStatus.[add] = 1
I sorted it out this way:
INSERT INTO #dealerStatus (dealerId, supplier_ref, [add], [timestamp])
SELECT NULL, td.supplier_ref, 1, GETDATE()
FROM TempDealer td
WHERE td.supplier_ref NOT IN (SELECT supplier_ref FROM Dealership WHERE dealership.supplier_ref IS NOT NULL and dealership.dealer_type = 1)
I have a query to get list of members in a group as below
;WITH CTE (GroupName, GroupMember, isMemberGroup)
AS (SELECT ag.name,
agm.Member,
agm.MemberIsGroup
FROM tb1 ag
LEFT JOIN tb2 agm
ON ag.ID = agm.ID
WHERE ag.name = 'somegroupame')
SELECT *
FROM CTE
Group Name Group Member IsMemberGroup
Admin John 0
Admin Sam 0
Admin GDBA 1
xyz Dan 0
xyz GXy 1
I want to write a query to get members of the sub group as well if IsMemberGroup is 1. Please guide me how to achieve that.
The expected result is to get a list of members including members of sub groups for a give group. The recursion should happen for all the sub groups' groups as well if any:
Resultant: Admin
Group Group Member
Admin John
Admin Sam
Admin(GDBA) Mike
Admin(GDBA) June
Admin(GDBA/Bcksdmin)Mark
You could solve this problem using a recursive CTE to generate your group member hierarchy.
This example using the following data:
/* Let's create some sample data to experiment with.
*/
DECLARE #Sample TABLE
(
GroupName VARCHAR(50),
GroupMember VARCHAR(50),
IsMemberGroup BIT
)
;
INSERT INTO #Sample
(
GroupName,
GroupMember,
IsMemberGroup
)
VALUES
('Admin', 'John', 0),
('Admin', 'Sam', 0),
('Admin', 'GDBA', 1),
('GDBA', 'Mike', 0),
('GDBA', 'June', 0),
('GDBA', 'Bcksdmin', 1),
('Bcksdmin', 'Mark', 0)
;
The query uses the CTE to calculate the group hierarchy. This is then joined back to the original data, like so:
/* Using a recursive CTE we can build the group name
* hierarchy.
*/
WITH [Group] AS
(
/* Anchor query returns all top level teams, ie
* those that do not appear in the group member
* field.
*/
SELECT
s1.GroupName,
CAST(s1.GroupName AS VARCHAR(MAX)) AS Breadcrumb
FROM
#Sample AS s1
LEFT OUTER JOIN #Sample AS s2 ON s2.GroupMember = s1.GroupName
WHERE
s2.GroupName IS NULL
GROUP BY
s1.GroupName
UNION ALL
/* Using recursion, find all children.
* (Sub groups are identified by the IsMemberGroup flag).
*/
SELECT
s.GroupMember AS GroupName,
g.Breadcrumb + '/' + s.GroupMember
FROM
[Group] AS g
INNER JOIN #Sample AS s ON s.GroupName = g.GroupName
WHERE
s.IsMemberGroup = 1
)
SELECT
g.Breadcrumb,
s.GroupMember
FROM
#Sample AS s
INNER JOIN [Group] AS g ON g.GroupName = s.GroupName
WHERE
s.IsMemberGroup = 0
;
Although this solution works I would suggest you revisit the table design, if possible. The column GroupMember is pulling double duty by storing both teams and team members. Splitting this into Group (with parent lookup column) and GroupMember tables would simplify the task. Each new table would only need to describe one real world object, making it clearer where new columns should be created.
I have a query against a large number of big tables (rows and columns) with a number of joins, however one of tables has some duplicate rows of data causing issues for my query. Since this is a read only realtime feed from another department I can't fix that data, however I am trying to prevent issues in my query from it.
Given that, I need to add this crap data as a left join to my good query. The data set looks like:
IDNo FirstName LastName ...
-------------------------------------------
uqx bob smith
abc john willis
ABC john willis
aBc john willis
WTF jeff bridges
sss bill doe
ere sally abby
wtf jeff bridges
...
(about 2 dozen columns, and 100K rows)
My first instinct was to perform a distinct gave me about 80K rows:
SELECT DISTINCT P.IDNo
FROM people P
But when I try the following, I get all the rows back:
SELECT DISTINCT P.*
FROM people P
OR
SELECT
DISTINCT(P.IDNo) AS IDNoUnq
,P.FirstName
,P.LastName
...etc.
FROM people P
I then thought I would do a FIRST() aggregate function on all the columns, however that feels wrong too. Syntactically am I doing something wrong here?
Update:
Just wanted to note: These records are duplicates based on a non-key / non-indexed field of ID listed above. The ID is a text field which although has the same value, it is a different case than the other data causing the issue.
distinct is not a function. It always operates on all columns of the select list.
Your problem is a typical "greatest N per group" problem which can easily be solved using a window function:
select ...
from (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) t
where rn = 1;
Using the order by clause you can select which of the duplicates you want to pick.
The above can be used in a left join, see below:
select ...
from x
left join (
select IDNo,
FirstName,
LastName,
....,
row_number() over (partition by lower(idno) order by firstname) as rn
from people
) p on p.idno = x.idno and p.rn = 1
where ...
Add an identity column (PeopleID) and then use a correlated subquery to return the first value for each value.
SELECT *
FROM People p
WHERE PeopleID = (
SELECT MIN(PeopleID)
FROM People
WHERE IDNo = p.IDNo
)
After careful consideration this dillema has a few different solutions:
Aggregate Everything
Use an aggregate on each column to get the biggest or smallest field value. This is what I am doing since it takes 2 partially filled out records and "merges" the data.
http://sqlfiddle.com/#!3/59cde/1
SELECT
UPPER(IDNo) AS user_id
, MAX(FirstName) AS name_first
, MAX(LastName) AS name_last
, MAX(entry) AS row_num
FROM people P
GROUP BY
IDNo
Get First (or Last record)
http://sqlfiddle.com/#!3/59cde/23
-- ------------------------------------------------------
-- Notes
-- entry: Auto-Number primary key some sort of unique PK is required for this method
-- IDNo: Should be primary key in feed, but is not, we are making an upper case version
-- This gets the first entry to get last entry, change MIN() to MAX()
-- ------------------------------------------------------
SELECT
PC.user_id
,PData.FirstName
,PData.LastName
,PData.entry
FROM (
SELECT
P2.user_id
,MIN(P2.entry) AS rownum
FROM (
SELECT
UPPER(P.IDNo) AS user_id
, P.entry
FROM people P
) AS P2
GROUP BY
P2.user_id
) AS PC
LEFT JOIN people PData
ON PData.entry = PC.rownum
ORDER BY
PData.entry
Use Cross Apply or Outer Apply, this way you can limit the amount of data to be joined from the table with the duplicates to the first hit.
Select
x.*,
c.*
from
x
Cross Apply
(
Select
Top (1)
IDNo,
FirstName,
LastName,
....,
from
people As p
where
p.idno = x.idno
Order By
p.idno //unnecessary if you don't need a specific match based on order
) As c
Cross Apply behaves like an inner join, Outer Apply like a left join
SQL Server CROSS APPLY and OUTER APPLY
Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.
SELECT DISTINCT P2.*
FROM (
SELECT
IDNo
, FirstName
, LastName
FROM people P
) P2
Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3
CREATE TABLE people
(
[entry] int
, [IDNo] varchar(3)
, [FirstName] varchar(5)
, [LastName] varchar(7)
);
INSERT INTO people
(entry,[IDNo], [FirstName], [LastName])
VALUES
(1,'uqx', 'bob', 'smith'),
(2,'abc', 'john', 'willis'),
(3,'ABC', 'john', 'willis'),
(4,'aBc', 'john', 'willis'),
(5,'WTF', 'jeff', 'bridges'),
(6,'Sss', 'bill', 'doe'),
(7,'sSs', 'bill', 'doe'),
(8,'ssS', 'bill', 'doe'),
(9,'ere', 'sally', 'abby'),
(10,'wtf', 'jeff', 'bridges')
;
Try this
SELECT *
FROM people P
where P.IDNo in (SELECT DISTINCT IDNo
FROM people)
Depending on the nature of the duplicate rows, it looks like all you want is to have case-sensitivity on those columns. Setting the collation on these columns should be what you're after:
SELECT DISTINCT p.IDNO COLLATE SQL_Latin1_General_CP1_CI_AS, p.FirstName COLLATE SQL_Latin1_General_CP1_CI_AS, p.LastName COLLATE SQL_Latin1_General_CP1_CI_AS
FROM people P
http://msdn.microsoft.com/en-us/library/ms184391.aspx
I'm looking for a good solution to use the containstable feature of the SQL Serve r2005 effectivly. Currently I have, e.g. an Employee and an Address table.
-Employee
Id
Name
-Address
Id
Street
City
EmployeeId
Now the user can enter search terms in only one textbox and I want this terms to be split and search with an "AND" operator. FREETEXTTABLE seems to work with "OR" automatically.
Now lets say the user entered "John Hamburg". This means he wants to find John in Hamburg.
So this is "John AND Hamburg".
So the following will contain no results since CONTAINSTABLE checks every column for "John AND Hamburg".
So my question is: What is the best way to perform a fulltext search with AND operators across multiple columns/tables?
SELECT *
FROM Employee emp
INNER JOIN
CONTAINSTABLE(Employee, *, '(JOHN AND Hamburg)', 1000) AS keyTblSp
ON sp.ServiceProviderId = keyTblSp.[KEY]
LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId
UNION ALL
SELECT *
FROM Employee emp
LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId
INNER JOIN
CONTAINSTABLE([Address], *, '(JOHN AND Hamburg)', 1000) AS keyTblAddr
ON addr.AddressId = keyTblAddr.[KEY]
...
This is more of a syntax problem. How do you divine the user's intent with just one input box?
Are they looking for "John Hamburg" the person?
Are they looking for "John Hamburg Street"?
Are they looking for "John" who lives on "Hamburg Street" in Springfield?
Are they looking for "John" who lives in the city of "Hamburg"?
Without knowing the user's intent, the best you can hope for is to OR the terms, and take the highest ranking hits.
Otherwise, you need to program in a ton of logic, depending on the number of words passed in:
2 words:
Search Employee data for term 1, Search Employee data for term 2, Search Address data for term 1, Search address data for term 2. Merge results by term, order by most hits.
3 words:
Search Employee data for term 1, Search Employee data for term 2, Search employee data for term 3, Search Address data for term 1, Search address data for term 2, Search address data for term 3. Merge results by term, order by most hits.
etc...
I guess I would redesign the GUI to separate the input into Name and Address, at a minimum. If that is not possible, enforce a syntax rule to the effect "First words will be considered a name until a comma appears, any words after that will be considered addresses"
EDIT:
Your best bet is still OR the terms, and take the highest ranking hits. Here's an example of that, and an example why this is not ideal without some pre-processing of the input to divine the user's intent:
insert into Employee (id, [name]) values (1, 'John Hamburg')
insert into Employee (id, [name]) values (2, 'John Smith')
insert into Employee (id, [name]) values (3, 'Bob Hamburg')
insert into Employee (id, [name]) values (4, 'Bob Smith')
insert into Employee (id, [name]) values (5, 'John Doe')
insert into Address (id, street, city, employeeid) values (1, 'Main St.', 'Springville', 1)
insert into Address (id, street, city, employeeid) values (2, 'Hamburg St.', 'Springville', 2)
insert into Address (id, street, city, employeeid) values (3, 'St. John Ave.', 'Springville', 3)
insert into Address (id, street, city, employeeid) values (4, '5th Ave.', 'Hamburg', 4)
insert into Address (id, street, city, employeeid) values (5, 'Oak Lane', 'Hamburg', 5)
Now since we don't know what keywords will apply to what table, we have to assume they could apply to either table, so we have to OR the terms against each table, UNION the results, Aggregate them, and compute the highest rank.
SELECT Id, [Name], Street, City, SUM([Rank])
FROM
(
SELECT emp.Id, [Name], Street, City, [Rank]
FROM Employee emp
JOIN [Address] addr ON emp.Id = addr.EmployeeId
JOIN CONTAINSTABLE(Employee, *, 'JOHN OR Hamburg') AS keyTblEmp ON emp.Id = keyTblEmp.[KEY]
UNION ALL
SELECT emp.Id, [Name], Street, City, [Rank]
FROM Employee emp
JOIN [Address] addr ON emp.Id = addr.EmployeeId
JOIN CONTAINSTABLE([Address], *, 'JOHN OR Hamburg') AS keyTblAdd ON addr.Id = keyTblAdd.[KEY]
) as tmp
GROUP BY Id, [Name], Street, City
ORDER BY SUM([Rank]) DESC
This is less than ideal, here's what you get for the example (in your case, you would have wanted John Doe from Hamburg to show up first):
Id Name Street City Rank
2 John Smith Hamburg St. Springville 112
3 Bob Hamburg St. John Ave. Springville 112
5 John Doe Oak Lane Hamburg 96
1 John Hamburg Main St. Springville 48
4 Bob Smith 5th Ave. Hamburg 48
But that is the best you can do without parsing the input before submitting it to SQL to make a "best guess" at what the user wants.
I had the same problem. Here is my solution, which worked for my case:
I created a view that returns the columns that I want. I added another extra column which aggregates all the columns I want to search among. So, in this case the view would be like
SELECT emp.*, addr.*, ISNULL(emp.Name,'') + ' ' + ISNULL(addr.City, '') AS SearchResult
FROM Employee emp
LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId
After this I created a full-text index on SearchResult column. Then, I search on this column
SELECT *
FROM vEmpAddr ea
INNER JOIN CONTAINSTABLE(vEmpAddr, *, 'John AND Hamburg') a ON ea.ID = a.[Key]