Select columns based on count of many-to-many association - sql

I have a Postgres database with 3 tables that looks a little something like this:
table categories
id
type
table games
id
table game_category
id
game_id
category_id
I want to select all games which have more than x categories where type is something
I have gotten this far:
SELECT * FROM games WHERE id IN (
SELECT game_id FROM game_category GROUP BY game_id HAVING COUNT(*) >= 5
)
This works to select all games with more than 5 categories, but doesn't narrow down the categories by their type. How could I expand on this to add the additional check for the type?

You have to join your categories table with the subquery. Then you can add a WHERE clause for the type. Replace '?' with your actual type, of course.
SELECT * FROM games WHERE id IN (
SELECT game_id FROM game_category
INNER JOIN categories ON (categories.id=game_category.category_id)
WHERE categories.type='?'
GROUP BY game_id HAVING COUNT(*) >= 5
)

Considering query response time, you can avoid the in clause. Mitchel's answer would work if written as follows:
SELECT game_id
FROM game_category gc
inner join categories c on c.id = gc.category_id
WHERE type = 'X'
GROUP BY game_id
HAVING COUNT(game_id) >= 5
Notice I avoided using count(*) that is also a query optimization strategy

Related

How to use INTERSECT together with COUNT in SQLite?

I have a table called customer_transactions and a table called blacklist.
The customer_transactions table has a column called atm_name.
Both tables share a unique key called id.
How can I intersect the two tables in such a way that the query shows me
customers that appear on both tables.
a corresponding column that displays the times that they had used a
certain atm alongside the atm's name
(for instance: id_1 -- bank of america -- 2; id_1 -- citibank -- 3;
id_2 -- bank of america -- 1; id_2 -- citibank -- 4, etcetera).
I have something like this
SELECT id,
atm_name,
count(atm_name) as atm_count
FROM customer_transactions
GROUP BY id, atm_name
How can I INTERSECT this table with the blacklist table and maintain what I currently have as output?
Thanks in advance.
You seem to want a join. Assuming that column id relates the two tables, and that it is a unique key in blacklist, you can do:
select ct.id, ct.atm_name, count(*) as atm_count
from customer_transactions ct
inner join blacklist b on b.id = ct.id
group by ct.id, ct.atm_name
You can also express this logic with exists and a correlated subquery:
select ct.id, ct.atm_name, count(*) as atm_count
from customer_transactions ct
where exists (select 1 from blacklist b where b.id = ct.id)
group by ct.id, ct.atm_name

INNER JOIN of pagevies, contacts and companies - duplicated entries

In short: 3 table inner join duplicates records
I have data in BigQuery in 3 tables:
Pageviews with columns:
timestamp
user_id
title
path
Contacts with columns:
website_user_id
email
company_id
Companies with columns:
id
name
I want to display all recorded pageviews and, if user and/or company is known, display this data next to pageview.
First, I join contact and pageviews data (SQL is generated by Metabase business intelligence tool):
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
ORDER BY `timestamp` DESC
It works as expected and I can see pageviews attributed to known contacts.
Next, I'd like to show pageviews of contacts with known company and which company is this:
SELECT
`analytics.pageviews`.`timestamp` AS `timestamp`,
`analytics.pageviews`.`title` AS `title`,
`analytics.pageviews`.`path` AS `path`,
`Contacts`.`email` AS `email`,
`Companies`.`name` AS `name`
FROM `analytics.pageviews`
INNER JOIN `analytics.contacts` `Contacts` ON `analytics.pageviews`.`user_id` = `Contacts`.`website_user_id`
INNER JOIN `analytics.companies` `Companies` ON `Contacts`.`company_id` = `Companies`.`id`
ORDER BY `timestamp` DESC
With this query I would expect to see only pageviews where associated contact AND company are known (just another column for company name). The problem is, I get duplicate rows for every pageview (sometimes 5, sometimes 20 identical rows).
I want to avoid selecting DISTINCT timestamps because it can lead to excluding valid pageviews from different users but with identical timestamp.
How to approach this?
Your description sounds like you have duplciates in companies. This is easy to test for:
select c.id, count(*)
from `analytics.companies` c
group by c.id
having count(*) >= 2;
You can get the details using window functions:
select c.*
from (select c.*, count(*) over (partition by c.id) as cnt
from `analytics.companies` c
) c
where cnt >= 2
order by cnt desc, id;

Subquery and normal query comes out with different results

I'm a beginner of the oracle, currently, I'm doing a question using subquery(without JOIN) and normal (with JOIN) query, but at the end, the results are different from this two query,
I can't figure out this problem, does anyone know?
The question is asking about list the dog owner details which has booked at least twice in this platform
SELECT PET_OWNER.Owner_id,Oname,OAdd,COUNT(*) AS BOOKING
FROM PET_OWNER
WHERE Owner_id IN(
SELECT Owner_id
FROM PET
WHERE PType = 'DOG' AND Pet_id IN(SELECT Pet_id FROM BOOKING))
GROUP BY PET_OWNER.Owner_id,Oname,OAdd
HAVING COUNT(*) >=2
ORDER BY PET_OWNER.Owner_id;
This subquery shows no rows selected,
SELECT PET_OWNER.Owner_id,Oname,OAdd,COUNT(*) AS BOOKING
FROM PET_OWNER,PET,BOOKING
WHERE PET_OWNER.Owner_id = PET.Owner_id AND
PET.Pet_id = BOOKING.Pet_id AND
PType = 'DOG'
GROUP BY PET_OWNER.Owner_id,Oname,OAdd
HAVING COUNT(*) >=2
ORDER BY PET_OWNER.Owner_id;
this query shows 10 records which are the correct answer for this question
I expected these two queries come out with the same result but it is not
does anyone know what is wrong with it?
can anyone show me how to convert this code to subquery?
Because duplicated join key will cause duplicatation in result.
In your case, the Owner_id should be non-unique in the PET table.
It is still possible to get the correct answer by using join. And as the owner_id in the subquery t is unique, so the execution plan should be same with the subquery version.
select p.* from Pet_Owner p
join (
select PET.Owner_id
from PET
inner join Booking on Booking.Pet_id = PET.Pet_id
where pType = 'DOG'
group by PET.Owner_id
having count(1) >= 2) t
on t.Owner_id = p.Owner_id
order by p.Owner_id
By the way, your SQL code is so old-school as it is in ANSI-89, while the join syntax is already in ANSI-92. I know many school teachers still love the old style, I hope you can read both, but only write code in ANSI-92 way.
What happen is that it will give you distinct values on your PET_OWNER.Owner_id,Oname,OAdd. So what we need is to group by owner_id first.
Here's your query. get first those owner_id with count() >= 2 as subquery
select * from Pet_Owner where Owner_id in (
select t1.Owner_id from PET_OWNER t1
inner join PET t2 on t1.Owner_id = t2.Owner_id
inner join Booking t3 on t3.Pet_id = t2.Pet_id
where pType = 'DOG'
group by t1.Owner_id
having count(1) >= 2)
order by Owner_id
not using join, nested subqueries is our only option
select * from Pet_Owner where Owner_id in (
select owner_id from Pet_Owner where Owner_id in
(select Owner_id from Pet where Pet_id in
(select Pet_id in Booking) and PType='DOG')
group by owner_id
having count(1) >= 2)
order by Owner_id
if you are trying to the # of dogs per owner:
select * from Pet_Owner where Owner_id in (
select Owner_id from Pet where Pet_id in
(select Pet_id in Booking) and PType='DOG'
group by owner_id
having count(1) >= 2)
) order by Owner_id

Union Three or more tables with conditions

I need a help to solve some problem.
I have some table levelAsignment with columns level_id, store_id and user_id. For each user_id I can write a query to get his level_ids and store_ids.
Also I have a table stores.
I need to get for each store his level and count the users of the current level and store.
It's easy, but the problem is in storing data, Because in the levelAsignment table the user can set all stores for some operator level.
It looks like this:
level_id | store_Id | user_id
4 1 5
1 5 5
6 1
when store_id = 1 in the stores table it means all stores, so I need to show all stores except 1.
select * from stores where id != 1;
so I need an advice how to organize that.
I find different ways to solve the problem, but there were many unions and conditions.
This depends on how you are able to join the stores table
I think you should join level_assignment (where the store_id = 1) with all data in the stores table, but subquery where the outer query excludes the store_id = 1 column from the level assignment table. You may have to create a join column in temporary tables for the stores data. Then union the level_assignment table where store_id != 1
Example:
WITH get_all_stores_for_store_id_1 AS (
SELECT
a.level_assigment,
a.store_id,
b.store_id,
a.user_id
FROM level_assignment a
LEFT JOIN stores b ON a.join_column = b.join_column
WHERE a.store_id = 1)
SELECT
level_assignment,
b.store_id AS store_id,
user_id
FROM get_all_stores_for_store_id_1
UNION
SELECT
level_assignment,
store_id,
user_id
FROM level_assignment
WHERE store_id != 1
Does that make sense?
Thinking about how to join the data, we could do something like this:
Get the stores table and create a 1 column with a one in every row for the stores, so that we can then join all stores to the level_assignment table with store_id = 1:
WITH set_1_column_in_stores_table AS (
SELECT
1 AS join_id,
store_id,
FROM stores),
all_store_rows_get_all_stores AS (
SELECT
a.level_assigment,
a.store_id,
b.store_id,
a.user_id
FROM level_assignment a
LEFT JOIN set_1_column_in_stores_table b ON a.store_id= b.join_id
-- The above will join all stores where store_id = 1 in level_assigment
WHERE a.store_id = 1)
SELECT
level_assignment,
b.store_id AS store_id,
user_id
FROM all_store_rows_get_all_stores
UNION
SELECT
level_assignment,
store_id,
user_id
FROM level_assignment
WHERE store_id != 1

Getting SQL records sharing same keywords

I have table for article keywords:
id INT
keyword VARCHAR
And I have an article id, let's say 13. This article has 4 keywords in this table.
I'm trying to get other articles where they share 2 or more keywords.
I can get a list of articles having same keywords with my original article with this query:
SELECT id FROM table WHERE keyword IN (SELECT keyword FROM table WHERE id=13)
But this only gives me a list of all articles sharing at least one keyword... But I need articles sharing 2 or more keywords, preferably ordered descending by the most occurrences...
How do I achieve this?
DECLARE #original_id int = 13
SELECT
id,
COUNT(*) c
FROM keywords k1
INNER JOIN (
SELECT keyword
FROM keywords
WHERE id = #original_id
) k2 ON (k1.keyword = k2.keyword)
GROUP BY id
HAVING COUNT(*) > 1
ORDER BY c DESC, id
SELECT id
, Count(*) As number_of_keywords
FROM articles
INNER
JOIN keywords
ON keywords.keyword = articles.keyword
GROUP
BY id
HAVING Count(*) >= 2