SQL reduce duplicates in union clause - sql

let's say for example I have the following query:
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;
As you can see the WHERE Country='Germany' is repeated in both the targets of the union - is there any way to reduce this to a query without repetitions? I don't like my queries being too long.
I'm currently working on Oracle.

Why not include the WHERE only once like
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION ALL
SELECT City, Country FROM Suppliers
ORDER BY City
) tab
WHERE Country='Germany'
(OR) do a JOIN like
SELECT c.City as CustomerCity, c.Country as customerCountry,
s.City as suppliercity, s.Country as suppliercountry
FROM Customers c
LEFT JOIN Suppliers s ON c.Country = s.Country
AND c.Country='Germany'
ORDER BY c.City;

select distinct city, country
from
(
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
) x
order by city
You can't really get around the need for a UNION if you really want both sets of rows: I've added a UNION ALL inside the main SQL and a DISTINCT outside to remove duplicates but with no extra sort operations (assuming you want to do that).

The code that you already have is pretty compact and clear while not sacrificing performance.
The suggestions to use a subquery can eliminate the duplicate WHERE statements, but will require more i/o activity than the simple union you originally provided.
When there is a union inside the sub query and then a WHERE outside of it this is asking the SQL engine to build a temporary table that is all rows in the customer table added to all the rows in the supplier table and then query that resulting table throwing out the rows which are not country = Germany. If your tables only have a couple hundred rows and you are running the query locally, it will likely not show much performance difference, but if you have thousands of rows or tables are on different servers across the network, performance could be orders of magnitude slower.
If performance is a consideration, you could make the query a bit simpler and more maintainable by using a variable for the country like so:
VAR country varchar2(64);
EXEC :country := 'Germany';
SELECT City, Country FROM Customers
WHERE Country=' :country
UNION
SELECT City, Country FROM Suppliers
WHERE Country= :country
ORDER BY City;
This clearly does not make for shorter code, but it is somewhat cleaner and would be easier to modify and only retrieves the rows that you are interested in which will give better performance.

SELECT * FROM
(
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers
) t
WHERE t.Country='Germany'
ORDER BY t.City;

You could use a common table expression (CTE):
WITH CTE AS (
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers)
SELECT City, Country
FROM CTE
WHERE CTE.Country='Germany'
ORDER BY CTE.City;
I find it easier to read than nested sub-queries.

Agree with Patrick. Query performance is much more important that query length. All the alternate solutions mentioned have a slower performance than your original one.

Related

Group By Clause, Do i have to call all rows what i using in Select?

Do I need to put all the column names in group by which I have select put in select?
for example in this simple query :
Select
CustomerID,
CompanyName,
ContactName,
ContactTitle,
City,
Country
From
Customers
Group By
Country,
CompanyName,
ContactName,
ContactTitle,
City,
Country,
CustomerID
I have to allways call same amount Group By what i used in Select?
If you're just selecting columns and you want the returned records to discard the exact duplicate rows? Then there are 2 methods.
1) group by
2) distinct
Your query doesn't use any of the aggregate functions like f.e. COUNT, MIN, MAX, SUM, ...
So your query could use DISTINCT instead of a GROUP BY.
select DISTINCT
CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
But if CustomerID is a primary key, then CustomerID would already make the result unique.
So then this query doesn't need a GROUP BY or a DISTINCT to only get unique records.
select CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
Note that one could have both DISTINCT and GROUP BY in the same query. But that's just pointless. A GROUP BY already enforces the uniqueness, so adding a DISTINCT to make them unique would just make the query slower for no reason.
As for the why all the columns in that select also have to be listed in the GROUP BY? Some databases, f.e. MySql can be more tolerant about not having to group on all columns. But it's a rule from one of the SQL Standards. So most databases enforce that. It's to avoid potential misleading results.
GROUP BY x, y means you want one result row per x and y. So if you have a table with bills, you could group by year and month for instance and thus get the number of bills (count(*)) and the total (sum(amount)) per month.
So the question is what rows do you want to see. A row per company (with the number of their customers) maybe? A row per city? The GROUP BY clause contains exactly those columns mentioned.
Your GROUP BY clause does exactly nothing, as select customers and you group by customer ID (which should be the customer table's primary key).

count value in three separate columns - Rails/SQL

Have a table roasts that has three columns country, country2, country3. The columns are for indicating which countries a blend of coffee comes from. It, therefore, goes that a country name could appear in any one of the three columns. Whilst I can do a count on any given column, I'd like to count how many times a value appears in all three of the country columns.
I'd like to do this in Rails/ActiveRecord. I've got as far as the below SQL, but this output isnt right:
SELECT country, country2, country3, COUNT(*) AS CountryCount FROM roasts GROUP BY country, country2, country3;
I suspect it's how I'm grouping.
You should have a table called something like CoffeeCountries that has one row per country for each blend. I would strongly recommend that you change the data structure.
With your data structure, you need to unpivot the data. Because your data is small and you are probably not familiar with lateral joins, I'm going to use the union all approach:
select country, count(*)
from ((select country1 as country from roasts) union all
(select country2 as country from roasts) union all
(select country3 as country from roasts)
) c
where country is not null
group by country
order by count(*) desc;
If you had a large table, I would recommend the lateral join. This version scans the table three times (once for each country).

PostgreSQL: get the min of a column with it's associated city

I have been at this for the past two hours and have tried many different ways in regards to subquery and joins. Here's the exact question "Get the name and city of customers who live in the city where the least number of products are made"
Here is a snapshot of the database tables
I know how to get the min
select min(quantity)
from products
but this returns just the min without the city attached to it so I can't search for the city in the customers table.
I have also tried group by and found it gave me 3 min's (one for each group of cities) which i believe may help me
select city,min(quantity)
from products
group by city
Putting everything together I got something that looks like
SELECT
c.name,c.city
FROM
customers c
INNER JOIN
(
SELECT
city,
MIN(quantity) AS min_quantity
FROM
products
GROUP BY
city
) AS SQ ON
SQ.city = c.city
But this returns multiple customers, which isn't correct. I assume by looking at the database the city when the lowest number of products seems to be Newark and there are no customers who reside in Newark so I assume again this query would result in 0 hits.Thank you for your time.
Example
Here is an example "Get the pids of products ordered through any agent who makes at least one order for a customer in Kyoto"
and the answer I provided is
select pid
from orders
inner join agents
on orders.aid = agents.aid
inner join customers
on customers.cid = orders.cid
where customers.city = 'Kyoto'
In Postgresql you have sophisticated tools, viz., windowing and CTEs.
WITH
find_least_sumq AS
(SELECT city, RANK() OVER ( PARTITION BY city ORDER BY SUM(quantity) ) AS r
FROM products)
SELECT name, city
FROM customers NATURAL JOIN find_least_sumq /* ON city */
WHERE r=1; /* rank 1 is smallest summed quantity including ties */
In Drew's answer, you are zeronig in on the cities where the smallest number of any particular item is made. I interpret the question as wanting the sum of items made in that city.
I guess it be something around this idea:
select customers.name, city.city, city.min
from customers
join (
select city, sum (quantity) as min
from products
group by city
--filter by the cities where the total_quantity = min_quantity
having sum (quantity) = (
--get the minimum quantity
select min(quantity) from products
)
) city on customers.city = city.city
This can be made so much simpler. Just sort the output by the field you want to get the minimum of.
SELECT city, quantity FROM customers ORDER BY quantity LIMIT 1;
I have just figured out my own answer. I guess taking a break and coming back to it was all I needed. For future readers this answer will use a subquery to help you get the min of a column and compare a different column (of that same row) to a different tables column.
This example is getting the city where the least number of products are made (quantity column) in the products table and comparing that city to the cities to the city column in the customers table, then printing the names and the city of those customers. (to help clarify, use the link in the original question to look at the structure of the database I am talking about) First step is to sum all the products to their respective cities, and then take the min of that, and then find the customers in that city.Here was my solution
with citySum as(
select city,sum(quantity) as sum
from products
group by city)
select name,city
from customers
where city
in
(select city
from citySum
where sum =(
select min(sum)
from citySum))
Here is another solution I have found today that works as well using only Sub queries
select c.name,c.city
from customers c
where c.city
in
(select city
from
(select p.city,sum(p.quantity) as lowestSum
from products p
group by p.city) summedCityQuantities
order by lowestsum asc
limit 1)

How to optimize an SQLite3 query

I'm learning SQLite3 by means of a book ("Using SQLite") and the Northwind database. I have written the following code to order the customers by the number of customers in their city, then alphabetically by their name.
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
It takes about 50-100ms to run. Is there a standard procedure to follow to optimize this query, or more generally, queries of its type?
In the most general case, query optimization starts with reading the query optimizer's execution plan. In SQLite, you just use
EXPLAIN QUERY PLAN statement
In your case,
EXPLAIN QUERY PLAN
SELECT ContactName, Phone, City as originalCity
FROM Customers
ORDER BY (
SELECT count(*)
FROM Customers
WHERE city=originalCity)
DESC, ContactName ASC
You might also need to read the output of
EXPLAIN statement
which goes into more low-level detail.
In general (not only SQLite), it's better to do the count for all values (cities) at once, and a join to construct the query:
SELECT ContactName, Phone, Customers.City as originalCity
FROM Customers
JOIN (SELECT city, count(*) cnt
FROM Customers
GROUP BY city) Customers_City_Count
ON Customers.city = Customers_City_Count.city
ORDER BY Customers_City_Count.cnt DESC, ContactName ASC
(to prevent, like in your case, the count from being computed many times for the same value (city))

Count number of users from a certain country

I have a table of users, and in this table I have a country field telling where these people are from (i.e. "Sweden", "Italy", ...). How can I do a SQL query to get something like:
Country Number
Sweden 10
Italy 50
... ...
Users select their countries from a list I give to them, but the list is really huge so it would be great to have a SQL query that can avoid using that list, that is look in the DB and give back only those countries which are in the database, because for example I have nobody from Barbados, even if I have that option in the country select field of the signup form :)
Thanks in advance!
If the name of the country is in the Users table, try something like this:
SELECT Country, COUNT (*) AS Number
FROM Users
GROUP BY Country
ORDER BY Country
If the name of the country is in the country table, then you will have to join
SELECT Contries.CountryName, Count (*) AS Number
FROM Users
INNER JOIN Countries
ON Users.CountryId = Countries.CountryId
GROUP BY Countries.CountryName
ORDER BY Countries.CountryName
This will give what you want. But you might want to cache the result of the query. With a lot of users it's quite a heavy query.
SELECT
country,
COUNT(*)
FROM
users
GROUP BY
country
Perhaps a better idea is (assuming you don't need the counts) to do it like this:
SELECT
DISTINCT country
FROM
users
Sounds like you want something like this...?
SELECT Country, COUNT(*) AS Number
FROM Users
GROUP BY Country
This is pretty straightforward:
SELECT
Country, COUNT(*) AS 'Number'
FROM
YourTable
GROUP BY
Country
ORDER BY
Country
You just group your data by country and count the entries for each country.
Or if you want them sorted by the number of visitors, use a different ORDER BY clause:
SELECT
Country, COUNT(*) AS 'Number'
FROM
YourTable
GROUP BY
Country
ORDER BY
COUNT(*) DESC
If you want the count per country:
select country, count(*) from users group by country;
If you just want the possible values:
select distinct country from users;
SELECT BillingCountry, COUNT(*)as Invoices
FROM Invoice
GROUP BY BillingCountry
ORDER BY Invoices DESC