Retrieve a variable number of subsets with different conditions - sql

I have the following database:
CREATE TABLE IF NOT EXISTS city (
id serial primary key,
name character varying UNIQUE NOT NULL
);
CREATE TABLE IF NOT EXISTS inhabitants (
id serial primary key,
fullname character varying UNIQUE NOT NULL,
home integer REFERENCES city
);
INSERT INTO city (name) VALUES
('michigan'),
('washington'),
('new york'),
('london'),
('los angeles')
ON CONFLICT DO NOTHING;
INSERT INTO inhabitants (fullname, home) VALUES
('flannigan, amy', 1),
('hannigan, leon', 1),
('shennanigan, frank', 1),
('catcher, floyd', 2),
('rice, amy', 2),
('black, joe', 2),
('higgins, simon', 3),
('stewart, rick', 3),
('white, frank', 3),
('henson, ben', 5),
('hedge, tim', 5),
('wilson, bill', 5),
('moriarty, doc', 4),
('fletcher, dolores', 4),
('fletcher, hank', 4),
('williamson, ann', 1),
('stewart, mary', 3)
ON CONFLICT DO NOTHING;
I want to extract a varying number of subsets with a varying number of inhabitants for each subset. Currently I am using a query for each subset, e.g., if I need two subsets I may use these two queries:
select fullname, home from inhabitants i
where home = (SELECT id FROM city WHERE name = 'michigan')
ORDER BY random() LIMIT 2;
and
select fullname, home from inhabitants i
where home = (SELECT id FROM city WHERE name = 'london')
ORDER BY random() LIMIT 1;
The results may look like this:
fullname | home
-----------------+------
hannigan, leon | 1
williamson, ann | 1
(2 rows)
and
fullname | home
-------------------+------
fletcher, dolores | 4
(1 row)
I join those two results in Bash, so they look what I actually want:
fullname | home
-------------------+------
hannigan, leon | 1
williamson, ann | 1
fletcher, dolores | 4
(3 rows)
I would like to minimize the number of database calls.
Is there a way to do this with one query (or function) or at least a better way than what I am doing currently?

Use window functions:
select fullname, home
from (select i.*,
row_number() over (partition by home order by random()) as seqnum
from inhabitants i
) i join
city c
on c.id = i.home
where (name = 'michigan' and seqnum <= 2) or
(name = 'london' and seqnum <= 1)

Related

SQL: rows that share several values on a specific column

I have a table Visited with 2 columns:
ID | City
ID is an integer, City is a string.
Note that none of the columns is a key by itself - we can have the same ID visiting several cities, and several different IDs in the same city.
Given a specific ID, I want to return all the IDs in the table that visited at least half of the places that the input ID did (not including themselves)
edit: We only count places that are the same.
so if
ID 1 visited cities a,b,c.
ID 2 visited b,c,d.
ID 3 visited c,d,e.
then for ID=1 we return only [2], because out of the three cities ID1 visited, ID3 visited only one
Inner join the visited table with the list of cities visited by the specific id, then select ids with at least half of the number of rows when grouped by id.
with u as
(select city as visitedBySpecificId from visited where id = *specificId*),
v as
(select * from visited inner join u on city = visitedBySpecificId where id <> *specificId*)
(select id from v group by id having count(*) >= (select count(*) from u)/2.0)
Fiddle
Join them and compare the counts.
create table suspect_tracking (id int, city varchar(30))
insert into suspect_tracking values
(1, 'Brussels'), (1,'London'), (1,'Paris')
, (1,'New York'), (1,'Bangkok'), (1, 'Hong Kong')
, (1,'Dubai'), (1,'Singapoor'), (1,'Rome')
, (1,'Macau'), (1, 'Istanbul'), (1,'Kuala Lumpur')
, (1,'Dehli'), (1,'Tokyo'), (1,'Moscow')
, (2,'New York'), (2,'Bangkok'), (2, 'Hong Kong')
, (2,'Dubai'), (2,'Singapoor'), (2,'Rome')
, (2,'Macau'), (2, 'Istanbul'), (2,'Kuala Lumpur')
, (3,'Macau'), (3, 'Istanbul'), (3,'Kuala Lumpur')
, (3,'Dehli'), (3,'Tokyo'), (3,'Moscow')
with cte_suspects as (
select id, city
from suspect_tracking
group by id, city
)
, cte_prime_suspect as (
select distinct id, city
from suspect_tracking
where id = 1
)
, cte_prime_total as (
select id, count(city) as cities
from cte_prime_suspect
group by id
)
select sus.id
from cte_prime_suspect prime
join cte_prime_total primetot
on primetot.id = prime.id
join cte_suspects sus
on sus.city = prime.city and sus.id <> prime.id
group by prime.id, sus.id, primetot.cities
having count(sus.city) >= primetot.cities/2
| id |
| -: |
| 2 |
db<>fiddle here

How to select rows that have min value from a column but has an if statement in SQL based on another Column?

CREATE TABLE Persons
(
PersonID int,
Diagnosis varchar(255),
ConsultantID varchar(255),
EpisodeNumber varchar(255)
);
INSERT INTO Persons (PersonID, Diagnosis,ConsultantID, EpisodeNumber)
VALUES (1, 'Headache','001', 1),
(1, 'Headache','001', 2),
(1, 'Stomachache','002', 1),
(1, 'Bone Fracture','002', 2),
(2, 'Headache', '003',1),
(2, 'Headache', '003',2),
(3, 'Hand','004', 1),
(3, 'Headache','003', 1);
I have created the table above as an example and the table would look like this:
I would like to select the rows based on the PersonID, The consultant ID and ( the Minimum EpisodeNumber unless there is different diagnosis for the PersonID). For Example, the desired output would be as below:
select * from (
select * , row_number() over (partition by PersonID,Diagnosis,ConsultantID order by EpisodeNumber) rn
from Persons
) t where rn = 1
PersonID
Diagnosis
ConsultantID
EpisodeNumber
1
Bone Fracture
002
2
1
Headache
001
1
1
Stomachache
002
1
2
Headache
003
1
3
Hand
004
1
3
Headache
003
1
db<>fiddle here
It sounds like all you need is:
select personid, diagnosis, consultantid, min(episodenumber) as episodenumber
from Persons
group by personid, diagnosis, consultantid;
that will generate the result you posted. Either it is this simple or your example is not sufficient to explain what the problem is.

How do i pivot PostgreSQL table?

I have table company_representatives which looks like that:
Create table script:
CREATE TABLE IF NOT EXISTS company_representatives (
_id integer NOT NULL,
name varchar(50) NOT NULL,
surname varchar(100) NOT NULL,
date_of_join date NOT NULL,
role varchar(250) NOT NULL,
company_id integer NOT NULL,
CONSTRAINT PK_company_representatives PRIMARY KEY ( _id ),
CONSTRAINT FK_144 FOREIGN KEY ( company_id ) REFERENCES companies ( _id )
);
INSERT INTO company_representatives VALUES
(1,'random name','random surname', '2001-01-23', 'CEO', 1),
(2,'next random name','next random surname', '2001-01-23', 'Co-founder', 1),
(3,'John','Doe', '2003-02-12', 'HR', 1),
(4,'Bread','Pitt', '2001-01-23', 'Security officer', 1),
(5,'Toast','Malone', '1997-11-05', 'CEO', 2),
...
I need to pivot this table to make it's columns look like that:
company_id | CEO | Co-Founder | HR | Security Officer
1 1 2 3 4 "_id of company's representatives"
2 5 6 7 8
3 9 10 11 12
You can simply use FILTER directly in the SELECT clause:
SELECT DISTINCT ON (company_id)
company_id,
count(*) FILTER (WHERE role = 'CEO') AS CEO,
count(*) FILTER (WHERE role = 'Co-founder') AS "Co-Founder",
count(*) FILTER (WHERE role = 'HR') AS HR,
count(*) FILTER (WHERE role = 'Security officer') AS "Security Officer"
FROM company_representatives
GROUP BY company_id;
In question it is not clear what the values attached to the roles actually mean, so I assumed you just want to count them. If not, just change it to other aggregate function.
EDIT (see comments): pivot table using crosstab, assuming there is one record for each role in all companies:
SELECT *
FROM crosstab(
'SELECT company_id, _id, name
FROM company_representatives ORDER BY company_id,role'
) AS ct(company_id integer,ceo text,co_founder text,hr text,security_officer text);
Demo: db<>fiddle

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures

Am I using GROUP_CONCAT properly?

I'm selecting properties and joining them to mapping tables where they get mapped to filters such as location, destination, and property type.
My goal is to grab all the properties and then LEFT JOIN them to the tables, and then basically get data that shows all the locations, destinations a property is attached to and the property type itself.
Here's my query:
SELECT p.slug AS property_slug,
p.name AS property_name,
p.founder AS founder,
IF (p.display_city != '', display_city, city) AS city,
d.name AS state,
type
GROUP_CONCAT( CONVERT(subcategories_id, CHAR(8)) ) AS foo,
GROUP_CONCAT( CONVERT(categories_id, CHAR(8)) ) AS bah
FROM properties AS p
LEFT JOIN destinations AS d ON d.id = p.state
LEFT JOIN regions AS r ON d.region_id = r.id
LEFT JOIN properties_subcategories AS sc ON p.id = sc.properties_id
LEFT JOIN categories_subcategories AS c ON c.subcategory_id = sc.subcategories_id
WHERE 1 = 1
AND p.is_active = 1
GROUP BY p.id
Before I do the GROUP BY and GROUP_CONCAT my data looks like this:
id name type category_id subcategory_id state
--------------------------------------------------------------------------
1 The Hilton Hotel 1 1 2 7
1 The Hilton Hotel 1 1 3 7
1 The BlaBla Resort 2 2 5 7
After the GROUP BY and GROUP_CONCAT it becomes...
id name type category_id subcategory_id state
--------------------------------------------------------------------------
1 The Hilton Hotel 1 1, 1 2, 3 7
1 The BlaBla Resort 2 1 3 7
Is this the preferred way of grabbing all the possible mappings for the property in one go, with GROUP_CONCAT into a CSV like this?
Using this data, I can render something like...
<div class="property" categories="1" subcategories="2,3">
<h2>{property_name}</h2>
<span>{property_location}</span>
</div>
Then use Javascript to show/hide based on if the user clicks on an anchor which has say, a subcategory="2" attribute it would hide each .property that doesn't have 2 inside of its subcategories attribute value.
I believe you want something like this:
CREATE TABLE property (id INT NOT NULL PRIMARY KEY, name TEXT);
INSERT
INTO property
VALUES
(1, 'Hilton'),
(2, 'Astoria');
CREATE TABLE category (id INT NOT NULL PRIMARY KEY, property INT NOT NULL);
INSERT
INTO category
VALUES
(1, 1),
(2, 1),
(3, 2);
CREATE TABLE subcategory (id INT NOT NULL PRIMARY KEY, category INT NOT NULL);
INSERT
INTO subcategory
VALUES
(1, 1),
(2, 1),
(3, 2),
(5, 3),
(6, 3),
(7, 3);
SELECT id, name,
CONCAT(
'{',
(
SELECT GROUP_CONCAT(
'"', c.id, '": '
'[',
(
SELECT GROUP_CONCAT(sc.id ORDER BY sc.id SEPARATOR ', ' )
FROM subcategory sc
WHERE sc.category = c.id
),
']' ORDER BY c.id SEPARATOR ', ')
FROM category c
WHERE c.property = p.id
), '}')
FROM property p;
which would output this:
1 Hilton {"1": [1, 2], "2": [3]}
2 Astoria {"3": [5, 6, 7]}
The last field is a properly formed JSON which maps category id's to the arrays of subcategory id's.
You should add DISTINCT, and possibly ORDER BY:
GROUP_CONCAT(DISTINCT CONVERT(subcategories_id, CHAR(8))
ORDER BY subcategories_id) AS foo,
GROUP_CONCAT(DISTINCT CONVERT(categories_id, CHAR(8))
ORDER BY categories_id) AS bah
It's "de-normalized" if you want to call it like this. If that's the best representation to be used for rendering is another question, I think it's fine. Some may say it's hack, but I guess it's not too bad.
By the way, a comma seems to be missing after the "type".