SQL Select unique values in 1 column - sql

I've got problem (a little problem I suppose) and I hope, you'll help me.
I use Sybase Anywhere and here's my code:
SELECT TOP 4 Person.Id_person, Person.Name, Person.Surname, Visit.Date, Visit.Place
From Person, Visit
WHERE Visit.Id_person = Person.Id_person
ORDER BY Visit.DATE DESC
and here's the result:
3 | Paul | McDonald | 2010-01-19 | Ohio
3 | Paul | McDonald | 2010-01-18 | New York
19 | Ted | Malicky | 2009-12-24 | Tokyo
12 | Meg | Newton | 2009-10-13 | Warsaw
and I would like not to duplicate Paul McDonald, and have only first (by the date) visit. I'd like to have result like this:
3 | Paul | McDonald | 2010-01-19 | Ohio
19 | Ted | Malicky | 2009-12-24 | Tokyo
12 | Meg | Newton | 2009-10-13 | Warsaw
....
What should I do? Could you help me? :(

Here's a different way to do it using the ROW_NUMBER function to ensure that if someone has two meetings on the same day it still works:
SELECT TOP 4
Person.Id_person,
Person.Name,
Person.Surname,
T1.Date,
T1.Place
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id_person ORDER BY Date DESC) AS rn
FROM Visit) AS T1
JOIN Person
ON T1.Id_person = Person.Id_person
WHERE rn = 1
ORDER BY Date DESC
Here's the result I get:
Id_person Name Surname Date Place
3 Paul McDonald 2010-01-19 Ohio
19 Ted Malicky 2009-12-24 Tokyo
12 Meg Newton 2009-10-13 Warsaw
1 Foo Bar 2009-06-03 Someplace
Here's the test data I used:
CREATE TABLE Person (Id_person INT NOT NULL, Name NVARCHAR(100) NOT NULL, Surname NVARCHAR(100) NOT NULL);
INSERT INTO Person (Id_person, Name, Surname) VALUES
(3, 'Paul', 'McDonald'),
(19, 'Ted', 'Malicky'),
(12, 'Meg', 'Newton'),
(1, 'Foo', 'Bar'),
(2, 'Baz', 'Qux');
CREATE TABLE Visit (Id_person INT NOT NULL, Date DATE NOT NULL, Place NVARCHAR(100) NOT NULL);
INSERT INTO Visit (Id_person, Date, Place) VALUES
(3, '2010-01-19', 'Ohio'),
(3, '2010-01-18', 'New York'),
(19, '2009-12-24', 'Tokyo'),
(12, '2009-10-13', 'Warsaw'),
(1, '2009-06-03', 'Someplace'),
(12, '2009-10-13', 'Anotherplace'),
(2, '2009-05-04', 'Somewhere');
Tested on SQL Server 2008, but I believe the syntax for Sybase is similar.

There is an easier way and it'll show you the most recent trip for each person as well:
SELECT TOP 4 Person.Id_person, Person.Name, Person.Surname, Visit.Date, Visit.Place
From Person, Visit
WHERE Visit.Id_person = Person.Id_person
AND (Visit.[Date] = (Select Max([Date])
From Visit Where (Person.Id_person=Visit.Id_Person)))
ORDER BY Visit.DATE DESC
I use a variant of this quite often in my work. The only caveat is that the "Date" field in the visit table is a DateTime (and, of course, that someone can't be in two places at the same time).

You can add a where not exists clause to filter out earlier visits:
SELECT TOP 4 p1.Id_person, p1.Name, p1.Surname, v1.Date, v1.Place
FROM Person p1, Visit v1
WHERE p1.Id_person = v1.Id_person
AND NOT EXISTS (
SELECT *
From Person p2, Visit v2
WHERE v2.Id_person = p2.Id_person
AND p1.Id_person = p2.Id_person
AND v2.Date > v1.Date
)
ORDER BY v1.DATE DESC
To improve readability, consider rewriting the double from as a join. For example, change:
FROM Person v1, Visit v1
WHERE v1.Id_person = p1.Id_person
into:
FROM Person p1
INNER JOIN Visit v1 ON v1.Id_person = p1.Id_person

Related

ActiveRecord - selecting the ID of a user's most recent appointment

I have a User table and a list of all their appointments in an Appointment table. A user can book multiple appointments, so their most recent appointment might not be the same as the highest ID.
e.g. Appointment 1 is booked for July 16. Before Appointment 1 occurs, the user decides they'd also like one sooner and books Appointment 2 for July 15.
I can get the ids on a per-user basis using a loop and then combine them, but out of curiosity, I was wondering how this could be done in one single query.
Something along the lines of:
User.joins(:appointments).group(:id).pluck("MAX(appointments.date)")
This only gets the date though, and not the id of the appointment that has that date. While my question is for ActiveRecord, if anyone has a solution in something like SQL I'm sure I could find an analogous function.
I don't know anything about ActiveRecord I'm afraid, but in SQL, I would use a self join with a less than condition to achieve this. I set up some temp tables to demonstrate:
create table #User
(
id int,
fullName varchar(50)
)
create table #Appointment
(
id int,
userId int,
apptDate date
)
insert into #User
values (1, 'John Smith'), (2, 'Jane Doe'), (3, 'Robert White'), (4, 'Sharon Black')
insert into #Appointment
values
(1, 3, '2019-08-01'),
(2, 2, '2019-10-21'),
(3, 1, '2019-07-16'), --John Smith Appointment 1, booked for July 16th
(4, 4, '2019-09-28'),
(5, 1, '2019-07-15') --John Smith Appointment 2, booked for July 15th
You can then run the following query to return each User and their earliest dated appointment, along with it's id and any other fields you want:
select
u.fullName,
a.id as EarliestAppId,
a.apptDate as EarliestAppDate
from #User u
left join #Appointment a on u.id = a.userId
left join #Appointment earlier on u.id = earlier.userId and earlier.apptDate < a.apptDate
where earlier.id is null
This returns the following results, correctly identifying the earlier of John Smith's 2 appointments:
/------------------------------------------------\
| fullName | EarliestAppId | EarliestAppDate |
|--------------|---------------|-----------------|
| John Smith | 5 | 2019-07-15 |
| Jane Doe | 2 | 2019-10-21 |
| Robert White | 1 | 2019-08-01 |
| Sharon Black | 4 | 2019-09-28 |
\------------------------------------------------/

SQL Server Find the date in joining order

I am using MS-SQL Server there are two tables
membership
+---+-----------------+---------------------+----------------
| | membershipName | createddate | price |
+---+-----------------+---------------------+----------------
| 1 | Swimming | 2010-01-01 | 30 |
| 2 | Swimming | 2010-05-01 | 32 |
| 3 | Swimming | 2011-01-01 | 35 |
| 4 | Swimming | 2012-01-01 | 40 |
+---+-----------------+---------------------+----------------
member
+---+-----------------+---------------------+-----------------
| | memberName | membership | joiningDate |
+---+-----------------+---------------------+-----------------
| 0 | Andy | Swimming | 2008-02-02 |
| 1 | John | Swimming | 2010-02-02 |
| 2 | Andy | Swimming | 2011-02-02 |
| 3 | Alice | Swimming | 2015-02-02 |
+---+-----------------+---------------------+----------------
I want find the member's membership price for the right period of time
e.g
Andy return NULL
John return 30
Alice return 40
the best logic is to see
if the joiningDate is in between two start date
if yes choose the earlier date
if not
if the joining date is before the earlier date then use the earliest date
if the joining date is after the latest date then use the latest date
I am a Java programmer, do this in sql is quite tricky for me, any hint would be nice!
edit 1: sorry I forgot to consider month
edit 2: added desirable result
I hope I understood you correctly. try this out:
SELECT TOP 1 ms.Price
FROM membership ms
LEFT JOIN member m
ON m.joiningdate > ms.createdate
WHERE m.id = 3
ORDER BY price DESC
I hope I got this correctly. You might try it like this:
Declared table variable to mock-up a test scenario:
DECLARE #membership TABLE(id INT, membershipName VARCHAR(100),createddate DATETIME,price DECIMAL(10,4));
INSERT INTO #membership VALUES
(1,'Swimming',{d'2010-01-01'},30)
,(2,'Swimming',{d'2010-05-01'},32)
,(3,'Swimming',{d'2011-01-01'},35)
,(4,'Swimming',{d'2012-01-01'},40);
DECLARE #member TABLE(id INT,memberName VARCHAR(100),membership VARCHAR(100),joiningDate DATETIME);
INSERT INTO #member VALUES
(0,'Andy','Swimming',{d'2008-02-02'})
,(1,'John','Swimming',{d'2010-02-02'})
,(2,'Andy','Swimming',{d'2011-02-02'})
,(3,'Alice','Swimming',{d'2015-02-02'});
As you are on SQL-Server 2012 you are lucky. You can use LEAD:
The CTE "Intervalls" will return the membership table as is and it will add one column with one second before the next rows createddate. LEAD helps you to get hands on a value of a later coming row. First I take away one second, then I set a very high date in case of NULL:
WITH Intervalls AS
(
SELECT *
,ISNULL(DATEADD(SECOND ,-1,LEAD(createddate) OVER(ORDER BY createddate)),{d'2100-01-01'}) AS EndOfIntervall
FROM #membership AS ms
)
--The SELECT reads all members and joins them to the membership where their date is in the range according to "Intervalls". Only the case ealier than the first must be treated specially:
SELECT m.*
,ISNULL(i.price, CASE WHEN YEAR(m.joiningDate)<(SELECT MIN(x.createddate) FROM #membership as x)
THEN (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC) END)
FROM #member AS m
LEFT JOIN Intervalls AS i ON m.joiningDate BETWEEN i.createddate AND i.EndOfIntervall
UPDATE Better approach (thx to Paparis)
SELECT m.*
,ISNULL(Corresponding.price, (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC)) AS price
FROM #member AS m
OUTER APPLY
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
) AS Corresponding
UPDATE 2: Even simpler!
SELECT m.*
,ISNULL
(
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
),
(
SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC
)
) AS price
FROM #member AS m

Select Top User over a list of Pages

I have a table containing records of Users' internet history. The table's structure contains the User_ID, the Page Accessed, and the Date Accessed of the page. For Example:
+==========================================+
|User_ID | Page_Accessed | Date_Accessed |
+==========================================+
|Johh.Doe | Google | 1/1/2015 |
|Johh.Doe | Google | 1/1/2015 |
|Suzy.Lue | Google | 7/11/2015 |
|Suzy.Lue | Wikipedia | 4/23/2015 |
|Babe Ruth| StackOverflow | 9/1/2015 |
+==========================================+
I am currently trying to use a SQL query that uses:
RANK() OVER (PARTITION BY [Page Accessed] ORDER BY Count(DateAcc))
Then I use a PIVOT() by the Various Sites. However after selecting the records WHERE (Num = 1) from the PIVOT() and a GROUP BY [Rank], I'm ending up with resulting query similar to:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| NULL | NULL |
| 1 | NULL | Suzy Lue | NULL |
| 1 | NULL | NULL | Babe Ruth |
+=================================================+
Instead I need to reformat my output as:
+=================================================+
|Rank | Google | Wikipedia | StackOverflow |
+=================================================+
| 1 | John Doe| Suzy Lue | Babe Ruth |
+=================================================+
My Current Query:
SELECT Rank, Google, Wikipedia, StackOverflow
FROM(
SELECT TOP (100) PERCENT User_ID, Page_Accessed, COUNT(Date_Accessed) AS Views,
RANK() OVER (PARTITION BY Page_Accessed ORDER BY Count(Date_Accessed) DESC) AS Rank
FROM Record_Table
GROUP BY dbo.location_key.subSite, dbo.user_info_list_parse.Name
ORDER BY Views DESC) AS tb
PIVOT (
max(tb.User_ID) FOR
Page_Accessed IN ( Google, Wikipedia, StackOverflow)
) pvt
WHERE (Num = 1)
Are there any creative solutions to obtain this result?
I think you've already found solution but for your information and for others reading this - let me erase noise in this query. There is no need to ORDER BY, no need to apply TOP (100) PERCENT, Views column is redundant. I would simplify this query as follows:
CREATE TABLE InternetHistory
(
[User_ID] varchar(20),
[Page_Accessed] varchar(20),
[Date_Accessed] datetime
)
INSERT InternetHistory VALUES
('Johh.Doe', 'Google', '2015-01-01'),
('Johh.Doe', 'Google', '2015-01-01'),
('Suzy.Lue', 'Google', '2015-07-11'),
('Suzy.Lue', 'Wikipedia', '2015-04-23'),
('Babe Ruth', 'StackOverflow', '2015-01-09')
SELECT * FROM
(
SELECT [User_ID], [Page_Accessed], RANK() OVER (PARTITION BY [Page_Accessed] ORDER BY COUNT(*) DESC) Ranking
FROM InternetHistory
GROUP BY [User_ID], [Page_Accessed]
) AS Src
PIVOT
(
MAX([User_Id]) FOR [Page_Accessed] IN ([Google], [Wikipedia], [StackOverflow])
) AS Pvt
WHERE Ranking = 1

How to combine certain data from different rows based on a certain column?

I have a table that looks like this:
--------------------------------
| name | email | friend |
--------------------------------
1 | bob | bobs email | kate |
--------------------------------
2 | bob | bobs email | joe |
--------------------------------
3 | tim | tims email | eddie |
How can I create new columns (friend1, friend2, etc.) and move friends there, on the condition that name and email are the same (there might be two bobs, for instance, bob and bob with a different email).
My desired table looks like this:
-----------------------------------------------------
| name | email | friend1 | friend2 | friend3 |
-----------------------------------------------------
1 | bob | bobs email | kate | joe | |
-----------------------------------------------------
2 | tim | tims email | eddie | | |
This can't be achieved as the query you need has no static metadata (i.e. you don't know the columns) as it might change over time if a friend is added. But if you mean that you need only just three columns for friends, you can use the PIVOT command. You can use the below link as an example:
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx
Another solution (which is unfortunately not easily available in SQL Server) is to aggregate the friends, i.e. you will have only one column containing all friends regardless their count and separated with comma. This can be achieved using CLR function (Example: http://www.mssqltips.com/sqlservertip/2022/concat-aggregates-sql-server-clr-function/), CTE (Example: Optimal way to concatenate/aggregate strings) or FOR XML (Example: Does T-SQL have an aggregate function to concatenate strings?).
Hope this helps...
Having these sample data:
DECLARE #DataSource TABLE
(
[name] VARCHAR(12)
,[email] VARCHAR(24)
,[friend] VARCHAR(12)
)
INSERT INTO #DataSource ([name], [email], [friend])
VALUES ('bob', 'bobs email', 'kate')
,('bob', 'bobs email', 'joe')
,('tim', 'tim email', 'edie')
The following query:
SELECT DD.[name]
,DD.[email]
,Friends.[friend]
,ROW_NUMBER() OVER (PARTITION BY DD.[name], DD.[email] ORDER BY Friends.[friend]) AS [FriendNumber]
FROM
(
SELECT DISTINCT [name]
,[email]
FROM #DataSource
) DD -- Distinct Data
CROSS APPLY
(
SELECT [friend]
FROM #DataSource DS
WHERE DS.[name] = DD.[name]
AND DS.[email] = DD.[email]
) Friends
will give you:
So, you can now build want you want using pivot, but note that you need to know the maximum number of friends which a person could have:
SELECT *
FROM
(
SELECT DD.[name]
,DD.[email]
,Friends.[friend]
,'friend' + CAST(ROW_NUMBER() OVER (PARTITION BY DD.[name], DD.[email] ORDER BY Friends.[friend]) AS VARCHAR(2)) AS [FriendNumber]
FROM
(
SELECT DISTINCT [name]
,[email]
FROM #DataSource
) DD -- Distinct Data
CROSS APPLY
(
SELECT [friend]
FROM #DataSource DS
WHERE DS.[name] = DD.[name]
AND DS.[email] = DD.[email]
) Friends
) DS
PIVOT
(
MAX([friend]) FOR [FriendNumber] IN ([friend1], [friend2], [friend3])
) PVT

Creating a query to find matching objects in a "join" table

I am trying to find an efficient query to find all matching objects in a "join" table.
Given an object Adopter that has many Pets, and Pets that have many Adopters through a AdopterPets join table. How could I find all of the Adopters that have the same Pets?
The schema is fairly normalized and looks like this.
TABLE Adopter
INTEGER id
TABLE AdopterPets
INTEGER adopter_id
INTEGER pet_id
TABLE Pets
INTEGER id
Right now the solution I am using loops through all Adopters and asks for their pets anytime it we have a match store it away and can use it later, but I am sure there has to be a better way using SQL.
One SQL solution I looked at was GROUP BY but it did not seem to be the right trick for this problem.
EDIT
To explain a little more of what I am looking for I will try to give an example.
+---------+ +------------------+ +------+
| Adptors | | AdptorsPets | | Pets |
|---------| +----------+-------+ |------|
| 1 | |adptor_id | pet_id| | 1 |
| 2 | +------------------+ | 2 |
| 3 | |1 | 1 | | 3 |
+---------+ |2 | 1 | +------+
|1 | 2 |
|3 | 1 |
|3 | 2 |
|2 | 3 |
+------------------+
When you asked the Adopter with the id of 1 for any other Adopters that have the same Pets you would be retured id 3.
If you asked the same question for the Adopter with the id of 3 you would get id 1.
If you asked again the same question of the Adopter with id 2` you would be returned nothing.
I hope this helps clear things up!
Thank you all for the help, I used a combination of a few things:
SELECT adopter_id
FROM (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id)
AS pets
FROM adopters_pets
GROUP BY adopter_id
) AS grouped_pets
WHERE pets = array[1,2,3] #array must be ordered
AND adopter_id <> current_adopter_id;
In the subquery I get pet_ids grouped by their adopter. The ordering of the pet_ids is key so that the results in the main query will not be order dependent.
In the main query I compare the results of the subquery to the pet ids of the adopter I am looking to match. For the purpose of this answer the pet_ids of the particular adopter are represented by [1,2,3]. I then make sure that that the adopter I am comparing to is not included in the results.
Let me know if anyone sees any optimizations or if there is a way to compare arrays where order does not matter.
I'm not sure if this is exactly what you're looking for but this might give you some ideas.
First I created some sample data:
create table adopter (id serial not null primary key, name varchar );
insert into adopter (name) values ('Bob'), ('Sally'), ('John');
create table pets (id serial not null primary key, kind varchar);
insert into pets (kind) values ('Dog'), ('Cat'), ('Rabbit'), ('Snake');
create table adopterpets (adopter_id integer, pet_id integer);
insert into adopterpets values (1, 1), (1, 2), (2, 1), (2,3), (2,4), (3, 1), (3,3);
Next I ran this query:
SELECT p.kind, array_agg(a.name) AS adopters
FROM pets p
JOIN adopterpets ap ON ap.pet_id = p.id
JOIN adopter a ON a.id = ap.adopter_id
GROUP BY p.kind
HAVING count(*) > 1
ORDER BY kind;
kind | adopters
--------+------------------
Dog | {Bob,Sally,John}
Rabbit | {Sally,John}
(2 rows)
In this example, for each pet I'm creating an array of all owners. The HAVING count(*) > 1 clause ensures we only show pets with shared owners (more than 1). If we leave this out we'll include pets that don't share owners.
UPDATE
#scommette: Glad you've got it working! I've refactored your working example a little bit below to:
use #> operator. This checks if one array contains the other avoids need to explicitly set order
moved the grouped_pets subquery to a CTE. This isn't only solution but neatly allows you to both filter out the current_adopter_id and get the pets for that id
You might find it helpful to wrap this in a function.
WITH grouped_pets AS (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id) AS pets
FROM adopters_pets
GROUP BY adopter_id
)
SELECT * FROM grouped_pets
WHERE adopter_id <> 3
AND pets #> (
SELECT pets FROM grouped_pets WHERE adopter_id = 3
);
If you're using Oracle then wm_concat could be useful here
select pet_id, wm_concat(adopter_id) adopters
from AdopterPets
group by pet_id ;
--
-- Relational division 1.0
-- Show all people who own *exactly* the same (non-empty) set
-- of animals as I do.
--
-- Test data
CREATE TABLE adopter (id INTEGER NOT NULL primary key, fname varchar );
INSERT INTO adopter (id,fname) VALUES (1,'Bob'), (2,'Alice'), (3,'Chris');
CREATE TABLE pets (id INTEGER NOT NULL primary key, kind varchar);
INSERT INTO pets (id,kind) VALUES (1,'Dog'), (2,'Cat'), (3,'Pig');
CREATE TABLE adopterpets (adopter_id integer REFERENCES adopter(id)
, pet_id integer REFERENCES pets(id)
);
INSERT INTO adopterpets (adopter_id,pet_id) VALUES (1, 1), (1, 2), (2, 1), (2,3), (3,1), (3,2);
-- Show it to the world
SELECT ap.adopter_id, ap.pet_id
, a.fname, p.kind
FROM adopterpets ap
JOIN adopter a ON a.id = ap.adopter_id
JOIN pets p ON p.id = ap.pet_id
ORDER BY ap.adopter_id,ap.pet_id;
SELECT DISTINCT other.fname AS same_as_me
FROM adopter other
-- moi has *at least* one same kind of animal as toi
WHERE EXISTS (
SELECT * FROM adopterpets moi
JOIN adopterpets toi ON moi.pet_id = toi.pet_id
WHERE toi.adopter_id = other.id
AND moi.adopter_id <> toi.adopter_id
-- C'est moi!
AND moi.adopter_id = 1 -- 'Bob'
-- But moi should not own an animal that toi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets lnx
WHERE lnx.adopter_id = moi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets lnx2
WHERE lnx2.adopter_id = toi.adopter_id
AND lnx2.pet_id = lnx.pet_id
)
)
-- ... And toi should not own an animal that moi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets rnx
WHERE rnx.adopter_id = toi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets rnx2
WHERE rnx2.adopter_id = moi.adopter_id
AND rnx2.pet_id = rnx.pet_id
)
)
)
;
Result:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "adopter_pkey" for table "adopter"
CREATE TABLE
INSERT 0 3
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pets_pkey" for table "pets"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 6
adopter_id | pet_id | fname | kind
------------+--------+-------+------
1 | 1 | Bob | Dog
1 | 2 | Bob | Cat
2 | 1 | Alice | Dog
2 | 3 | Alice | Pig
3 | 1 | Chris | Dog
3 | 2 | Chris | Cat
(6 rows)
same_as_me
------------
Chris
(1 row)