Counting different occurrences - sql

I am studying SQL using PostreSQL and I have a doubt regarding counting the number of different occurrences of a column's values with respect to another.
I suppose this is not the typical COUNT and GROUP BY problem because I cannot find any help or reference for my problem, so I will better explain what I would like to do (if possible) with a short example.
Suppose I have the following table:
CREATE TABLE MYTABLE
(
id INTEGER NOT NULL,
genre VARCHAR(20) NOT NULL,
country VARCHAR(20) NOT NULL,
CONSTRAINT PK_MOVIE PRIMARY KEY (id)
);
INSERT INTO MYTABLE VALUES (1, 'Horror', 'EEUU');
INSERT INTO MYTABLE VALUES (2, 'Drama', 'EEEU');
INSERT INTO MYTABLE VALUES (3, 'Drama', 'Italy');
INSERT INTO MYTABLE VALUES (4, 'Horror', 'UK');
INSERT INTO MYTABLE VALUES (5, 'Drama', 'EEEU');
INSERT INTO MYTABLE VALUES (6, 'Drama', 'EEEU');
So MYTABLE looks like this:
id | genre | country
----+--------+---------
1 | Horror | EEUU
2 | Drama | EEEU
3 | Drama | Italy
4 | Horror | UK
5 | Drama | EEEU
6 | Drama | EEEU
I can now count how many times the value of country is repeated for each value of genre with the following query:
select distinct count(*), m.genre, m.country
FROM MYTABLE m
GROUP BY m.genre, m.country;
which returns:
count | genre | country
-------+--------+---------
3 | Drama | EEEU
1 | Horror | EEUU
1 | Horror | UK
1 | Drama | Italy
(4 rows)
But how could I obtain how many different values of country I have for each genre ? In other words I would like to obtain such a table:
genre | different_countries
--------+------------------
Horror | 2
Drama | 2
Does exist such a query ?

You want count(distinct):
select m.genre, count(distinct m.country)
from MYTABLE m
group by m.genre;
As for your query, you almost never need to use select distinct with group by -- and not in this case. group by already removes duplicate rows for the group by keys.

you may want to use subquery
select count(1), t1.genre from (
select distinct country, genre
from MOVIE) as t1
group by t1.genre

Related

ORDER BY value in join table not grouped before aggregation

I am trying to order a Postgres result set based on an array_aggregate function.
I have the following query that works great:
select a.id, a.name, array_agg(f.name)
from actors a
join actor_films af on a.id = actor_id
join films f on film_id = f.id
group by a.id
order by a.id;
This gives me the following results, for example:
id | name | array_agg
----+--------+---------------------------------
1 | bob | {"delta force"}
2 | joe | {"delta force","the funny one"}
3 | fred | {"bad movie",AARRR}
4 | sally | {"the funny one"}
5 | suzzy | {"bad movie","delta force"}
6 | jill | {AARRR}
7 | victor | {"the funny one"}
I want to sort the results so that it is sorted alphabetically by Film name. For example, the final order should be:
id | name | array_agg
----+--------+---------------------------------
3 | fred | {"bad movie",AARRR}
6 | jill | {AARRR}
5 | suzzy | {"bad movie","delta force"}
1 | bob | {"delta force"}
2 | joe | {"delta force","the funny one"}
4 | sally | {"the funny one"}
7 | victor | {"the funny one"}
This is based on the alphabetical name of any movies they are in. When I add the ORDER BY f.name I get the following error:
ERROR: column "f.name" must appear in the GROUP BY clause or be used in an aggregate function
I cannot add it to the group, because I need it aggregated in the array, and I want to sort pre-aggregation, such that I can get the following order. Is this possible?
If you would like reproduce this example, here is the setup code:
create table actors(id serial primary key, name text);
create table films(id serial primary key, name text);
create table actor_films(actor_id int references actors (id), film_id int references film (id));
insert into actors (name) values('bob'), ('joe'), ('fred'), ('sally'), ('suzzy'), ('jill'), ('victor');
insert into films (name) values('AARRR'), ('the funny one'), ('bad movie'), ('delta force');
insert into actor_films(actor_id, film_id) values (2, 2), (7, 2), (4,2), (2, 4), (1, 4), (5, 4), (6, 1), (3, 1), (3, 3), (5, 3);
And the final query with the error:
select a.id, a.name, array_agg(f.name)
from actors a
join actor_films af on a.id = actor_id
join films f on film_id = f.id
group by a.id
order by f.name, a.id;
You can use an aggregation function:
order by min(f.name), a.id

SQLite query - filter name where each associated id is contained within a set of ids

I'm trying to work out a query that will find me all of the distinct Names whose LocationIDs are in a given set of ids. The catch is if any of the LocationIDs associated with a distinct Name are not in the set, then the Name should not be in the results.
Say I have the following table:
ID | LocationID | ... | Name
-----------------------------
1 | 1 | ... | A
2 | 1 | ... | B
3 | 2 | ... | B
I'm needing a query similar to
SELECT DISTINCT Name FROM table WHERE LocationID IN (1, 2);
The problem with the above is it's just checking if the LocationID is 1 OR 2, this would return the following:
A
B
But what I need it to return is
B
Since B is the only Name where both of its LocationIDs are in the set (1, 2)
You can try to write two subquery.
get count by each Name
get count by your condition.
then join them by count amount, which means your need to all match your condition count number.
Schema (SQLite v3.17)
CREATE TABLE T(
ID int,
LocationID int,
Name varchar(5)
);
INSERT INTO T VALUES (1, 1,'A');
INSERT INTO T VALUES (2, 1,'B');
INSERT INTO T VALUES (3, 2,'B');
Query #1
SELECT t2.Name
FROM
(
SELECT COUNT(DISTINCT LocationID) cnt
FROM T
WHERE LocationID IN (1, 2)
) t1
JOIN
(
SELECT COUNT(DISTINCT LocationID) cnt,Name
FROM T
WHERE LocationID IN (1, 2)
GROUP BY Name
) t2 on t1.cnt = t2.cnt;
| Name |
| ---- |
| B |
View on DB Fiddle
You can just use aggregation. Assuming no duplicates in your table:
SELECT Name
FROM table
WHERE LocationID IN (1, 2)
GROUP BY Name
HAVING COUNT(*) = 2;
If Name/LocationID pairs can be duplicated, use HAVING COUNT(DISTINCT LocationID) = 2.

PostgreSQL 3 Table Join Multiplying

I have 3 tables. The first has the records I want. The other two have categories to be applied to the first table. If the lookup value from table3 is found in the description, I want to return that category. Else, return the category in table2. I think I have that logic correct, but the results are being multiplied. How can I limit the results to just the table1 records I want, but apply the correct category?
Here is my query with an example schema. It should only return the first 6 rows in table1 with whichever category is correct, but it returns 10. http://sqlfiddle.com/#!15/fc6fa/49/0
SELECT table1.product_code, table1.date_signed, table1.description,
CASE
WHEN lower(table1.description) LIKE ('%' || lower(table3.lookup_value) || '%')
THEN table3.category
ELSE table2.category
END
FROM table1
LEFT JOIN table2 ON table2.psc_code = table1.product_code
LEFT JOIN table3 ON table3.psc_code = table1.product_code
WHERE date_signed = '2017-02-01';
create table table1 (
product_code int,
date_signed timestamp,
description varchar(20)
);
insert into table1
(product_code, date_signed, description)
values
(1, '2017-02-01', 'i have a RED car'),
(2, '2017-02-01', 'i have a blue boat'),
(3, '2017-02-01', 'i have a dark cat'),
(1, '2017-02-01', 'i have a green truck'),
(2, '2017-02-01', 'i have a blue rug'),
(3, '2017-02-01', 'i have a dark dog'),
(1, '2017-02-02', 'i REd NO SHOW'),
(2, '2017-02-02', 'i blue NO SHOW'),
(3, '2017-02-02', 'i dark NO SHOW');
create table table2 (
psc_code int,
category varchar(20)
);
insert into table2
(psc_code, category)
values
(1, 'vehicle'),
(2, 'vehicle');
create table table3 (
psc_code int,
lookup_value varchar(20),
category varchar(20)
);
insert into table3
(psc_code, lookup_value, category)
values
(1, 'fox', 'animal'),
(1, 'red', 'color'),
(1, 'box', 'shipping'),
(2, 'cat', 'animal');
You're trying to join 1 to many, and you only want one value.
SELECT table1.product_code, table1.date_signed, table1.description,
CASE
WHEN EXISTS (select 1 from table3
where table3.psc_code = table1.product_code and
lower(table1.description) LIKE ('%' || lower(table3.lookup_value) || '%'))
THEN (select table3.category from table3
where table3.psc_code = table1.product_code and
lower(table1.description) LIKE ('%' || lower(table3.lookup_value) || '%') limit 1)
ELSE (select table2.category
from table2
where table2.psc_code = table1.product_code
limit 1)
END
FROM table1
WHERE date_signed = '2017-02-01';
http://rextester.com/TQIY93378
+--------------+---------------------+----------------------+----------+
| product_code | date_signed | description | category |
+--------------+---------------------+----------------------+----------+
| 1 | 01.02.2017 00:00:00 | i have a RED car | color |
| 2 | 01.02.2017 00:00:00 | i have a blue boat | vehicle |
| 3 | 01.02.2017 00:00:00 | i have a dark cat | NULL |
| 1 | 01.02.2017 00:00:00 | i have a green truck | vehicle |
| 2 | 01.02.2017 00:00:00 | i have a blue rug | vehicle |
| 3 | 01.02.2017 00:00:00 | i have a dark dog | NULL |
+--------------+---------------------+----------------------+----------+
Yep you will get a cartesian product on this one.
Your problem is that you have multiple rows of each product_code that match fro table1. So when you join on table3, you get six records out with an id of 1. The other join conditions don't pose a situation where you have multiple matches on both sides, so that is how you get six product code 1 rows, 2 product code 2 rows and 2 product code 3 rows.
The solution is to join in ways in which the foreign key is targetting a unique row in the target table.
This really ought to be a useful example of why normalization and key awareness matter. Where you break the basic rules of functional dependencies, bad problems ted to multiply.

How to select all attributes (*) with distinct values in a particular column(s)?

Here is link to the w3school database for learners:
W3School Database
If we execute the following query:
SELECT DISTINCT city FROM Customers
it returns us a list of different City attributes from the table.
What to do if we want to get all the rows like that we get from SELECT * FROM Customers query, with unique value for City attribute in each row.
DISTINCT when used with multiple columns, is applied for all the columns together. So, the set of values of all columns is considered and not just one column.
If you want to have distinct values, then concatenate all the columns, which will make it distinct.
Or, you could group the rows using GROUP BY.
You need to select all values from customers table, where city is unique. So, logically, I came with such query:
SELECT * FROM `customers` WHERE `city` in (SELECT DISTINCT `city` FROM `customers`)
I think you want something like this:
(change PK field to your Customers Table primary key or index like Id)
In SQL Server (and standard SQL)
SELECT
*
FROM (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY City ORDER BY PK) rn
FROM
Customers ) Dt
WHERE
(rn = 1)
In MySQL
SELECT
*
FORM (
SELECT
a.City, a.PK, count(*) as rn
FROM
Customers a
JOIN
Customers b ON a.City = b.City AND a.PK >= b.PK
GROUP BY a.City, a.PK ) As DT
WHERE (rn = 1)
This query -I hope - will return your Cities distinctly and also shows other columns.
You can use GROUP BY clause for getting distinct values in a particular column. Consider the following table - 'contact':
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 2 | PQR | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+
To select all columns with distinct values in City attribute, use the following query:
SELECT *
FROM contact
GROUP BY city;
This will give you the output as follows:
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+

Insert records in table based on 'master' record in same table

I have one table, e.g. pricerules, that have stored special prices by article for customers. Now I want to sync the pricerules, based on an other user. Suppose I have this as dataset:
+---------------------------+
| user_id | prod_id | price |
+---------+---------+-------+
| 10 | 1 | 1 |
| 10 | 2 | 5 |
| 10 | 3 | 7 |
| 20 | 2 | 5 |
| 30 | 2 | 5 |
| 30 | 3 | 7 |
+---------+---------+-------+
Now I would like to update/insert the prices for several other users, based upon the prices of user 10. I already wrote the delete and update query, but I'm stuck with the insert query to insert the new rules that the other users don't have yet.
So effectively this would do the following inserts:
INSERT INTO pricerules
(user_id, prod_id, price)
VALUES
(20, 1, 1),
(20, 3, 7),
(30, 1, 1);
Is there a way to do this in one query? I have been looking for MINUS to select the records that are not present for user 20, but I would have to execute a query for each user.
I thought maybe I could use MERGE.
I am using Oracle 10.1 ..
You were right. Merge is the way to go. Please try the following.
merge into pricerules p
using ( select t1.user_id, t2.prod_id, t2.price
from
(select distinct user_id
from pricerules
where user_id <> 10) t1,
(select distinct prod_id, price
from pricerules
where user_id = 10) t2
) t
on (p.user_id = t.user_id
and p.prod_id = t.prod_id
and p.price = t.price)
when not matched then
insert (user_id, prod_id, price) values (t.user_id, t.prod_id, t.price) ;
I haven't used Oracle in a long while, so my syntax may be slightly off, but the general idea is:
INSERT INTO pricerules
(user_id, prod_id, price)
select 20 as user_id, 1 as prod_id, 1 as price from dual
union all
select 20, 3, 7 from dual
union all
select 30, 1, 1 from dual
Quickly typed this together so I'm not sure if it's correct. But what I'm trying to do is to select all product id's from user 10 that the new user does not already have.
What's missing in your question is where the user_id's originate. You may want to join this with a user table so you can run it for all users.
insert into pricerules
(user_id, prod_id, price)
select &new_user_id
,prod_id
,price
from pricerules p
where user_id = 10
and not exists (select 1
from pricerules p2
where p2.userid = &new_userid
and p2.prod_id = p.prod_id)