count value in three separate columns - Rails/SQL - sql

Have a table roasts that has three columns country, country2, country3. The columns are for indicating which countries a blend of coffee comes from. It, therefore, goes that a country name could appear in any one of the three columns. Whilst I can do a count on any given column, I'd like to count how many times a value appears in all three of the country columns.
I'd like to do this in Rails/ActiveRecord. I've got as far as the below SQL, but this output isnt right:
SELECT country, country2, country3, COUNT(*) AS CountryCount FROM roasts GROUP BY country, country2, country3;
I suspect it's how I'm grouping.

You should have a table called something like CoffeeCountries that has one row per country for each blend. I would strongly recommend that you change the data structure.
With your data structure, you need to unpivot the data. Because your data is small and you are probably not familiar with lateral joins, I'm going to use the union all approach:
select country, count(*)
from ((select country1 as country from roasts) union all
(select country2 as country from roasts) union all
(select country3 as country from roasts)
) c
where country is not null
group by country
order by count(*) desc;
If you had a large table, I would recommend the lateral join. This version scans the table three times (once for each country).

Related

How do I order countries by name in SQL?

I have a problem with the following task from the platform Codesignal:
After some investigation, you've created a database containing a foreignCompetitors table, which has the following structure:
competitor: the name of the competitor;
country: the country in which the competitor is operating.
In your report, you need to include the number of competitors per country and an additional row at the bottom that contains a summary: ("Total:", total_number_of_competitors)
Given the foreignCompetitors table, compose the resulting table with two columns: country and competitors. The first column should contain the country name, and the second column should contain the number of competitors in this country. The table should be sorted by the country names in ascending order. In addition, it should have an extra row at the bottom with the summary, as described above.
Example
For the following table foreignCompetitors
my solution:
CREATE PROCEDURE solution()
BEGIN
(SELECT country, COUNT(*) AS competitors
FROM foreignCompetitors
GROUP BY country
ORDER BY country)
UNION
SELECT 'Total:', COUNT(*) FROM foreignCompetitors;
END
But my output is:
The result of the countries is not sorted by their names.
I cannot understand why is that even though I try to sort them with ORDER BY.
You want a GROUP BY WITH ROLLUP here:
SELECT COALESCE(country, 'Total:') AS country, COUNT(*) AS competitors
FROM foreignCompetitors
GROUP BY country WITH ROLLUP
ORDER BY country;
If you want to stick with your union approach, then you need to introduce a computed column into the union query which places the total row at the bottom of the result set. Consider:
SELECT country, competitors
FROM
(
SELECT country, COUNT(*) AS competitors, 1 AS pos
FROM foreignCompetitors
GROUP BY country
UNION ALL
SELECT 'Total:', COUNT(*), 2
FROM foreignCompetitors
) t
ORDER BY pos, country;

Get count summed across two columns in SQL?

I am working in Postgres and have the following accounts table:
account_number | integer
country1 | character varying(1000)
country2 | character varying(1000)
I want to get a count of accounts in each country, regardless of whether the country is country1 or country2.
So if the content of the table was:
account_number,country1,country2
123,France,Germany
124,Switzerland,France
125,Germany
Then the desired output from the query would be:
France,2
Germany,2
Switzerland,1
I know how to do this for one country at a time (select country1, count(*) from accounts group by country1) but not for both countries simultaneously.
You can try the below -
with cte as
(
select account_number, country1 as country
from table1
union all
select account_number, country2
from table1
)
select country, count(*) as cnt
from cte
group by country
I recommend unpivoting the data using a lateral join and then aggregating:
select country, count(*)
from t cross join lateral
(values (country1), (country2)
) v(country)
where v.country is not null
group by country;
In addition to being a more concise way to write the query, it should be faster because the table is scanned only once. This could be a very big win if the "table" is really a view or subquery.

Group By Clause, Do i have to call all rows what i using in Select?

Do I need to put all the column names in group by which I have select put in select?
for example in this simple query :
Select
CustomerID,
CompanyName,
ContactName,
ContactTitle,
City,
Country
From
Customers
Group By
Country,
CompanyName,
ContactName,
ContactTitle,
City,
Country,
CustomerID
I have to allways call same amount Group By what i used in Select?
If you're just selecting columns and you want the returned records to discard the exact duplicate rows? Then there are 2 methods.
1) group by
2) distinct
Your query doesn't use any of the aggregate functions like f.e. COUNT, MIN, MAX, SUM, ...
So your query could use DISTINCT instead of a GROUP BY.
select DISTINCT
CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
But if CustomerID is a primary key, then CustomerID would already make the result unique.
So then this query doesn't need a GROUP BY or a DISTINCT to only get unique records.
select CustomerID, CompanyName, ContactName, ContactTitle, City, Country
from Customers
Note that one could have both DISTINCT and GROUP BY in the same query. But that's just pointless. A GROUP BY already enforces the uniqueness, so adding a DISTINCT to make them unique would just make the query slower for no reason.
As for the why all the columns in that select also have to be listed in the GROUP BY? Some databases, f.e. MySql can be more tolerant about not having to group on all columns. But it's a rule from one of the SQL Standards. So most databases enforce that. It's to avoid potential misleading results.
GROUP BY x, y means you want one result row per x and y. So if you have a table with bills, you could group by year and month for instance and thus get the number of bills (count(*)) and the total (sum(amount)) per month.
So the question is what rows do you want to see. A row per company (with the number of their customers) maybe? A row per city? The GROUP BY clause contains exactly those columns mentioned.
Your GROUP BY clause does exactly nothing, as select customers and you group by customer ID (which should be the customer table's primary key).

SQL query to count number of rows with same value

I have this table data:
I want to perform an sql query which will give me the total number of distinct loan applications per city.
So for example, I would expect this output
City Wexford
Loans 1
City Waterford 1
Loans 1
City Galway
Loans 3
Any idea what kind of query I need to perform to get the count of distinct loans for each city?
I would guess, probably a COUNT (Distinct ID) with GROUP BY. Something like this:
SELECT city, COUNT(DISTINCT LoanApplicationID) as Loans
FROM tableName
GROUP BY city
Here is another approach for this question. I am adding this because using DISTINCT may cause performance issues for another example, especially for large databases. Good to keep in mind.
select city,count(LoanApplicationID) as Loans
from (
select LoanApplicationID, city
from tablename
group by LoanApplicationID, city
) t
group by city

SQL reduce duplicates in union clause

let's say for example I have the following query:
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
ORDER BY City;
As you can see the WHERE Country='Germany' is repeated in both the targets of the union - is there any way to reduce this to a query without repetitions? I don't like my queries being too long.
I'm currently working on Oracle.
Why not include the WHERE only once like
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION ALL
SELECT City, Country FROM Suppliers
ORDER BY City
) tab
WHERE Country='Germany'
(OR) do a JOIN like
SELECT c.City as CustomerCity, c.Country as customerCountry,
s.City as suppliercity, s.Country as suppliercountry
FROM Customers c
LEFT JOIN Suppliers s ON c.Country = s.Country
AND c.Country='Germany'
ORDER BY c.City;
select distinct city, country
from
(
SELECT City, Country FROM Customers
WHERE Country='Germany'
UNION ALL
SELECT City, Country FROM Suppliers
WHERE Country='Germany'
) x
order by city
You can't really get around the need for a UNION if you really want both sets of rows: I've added a UNION ALL inside the main SQL and a DISTINCT outside to remove duplicates but with no extra sort operations (assuming you want to do that).
The code that you already have is pretty compact and clear while not sacrificing performance.
The suggestions to use a subquery can eliminate the duplicate WHERE statements, but will require more i/o activity than the simple union you originally provided.
When there is a union inside the sub query and then a WHERE outside of it this is asking the SQL engine to build a temporary table that is all rows in the customer table added to all the rows in the supplier table and then query that resulting table throwing out the rows which are not country = Germany. If your tables only have a couple hundred rows and you are running the query locally, it will likely not show much performance difference, but if you have thousands of rows or tables are on different servers across the network, performance could be orders of magnitude slower.
If performance is a consideration, you could make the query a bit simpler and more maintainable by using a variable for the country like so:
VAR country varchar2(64);
EXEC :country := 'Germany';
SELECT City, Country FROM Customers
WHERE Country=' :country
UNION
SELECT City, Country FROM Suppliers
WHERE Country= :country
ORDER BY City;
This clearly does not make for shorter code, but it is somewhat cleaner and would be easier to modify and only retrieves the rows that you are interested in which will give better performance.
SELECT * FROM
(
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers
) t
WHERE t.Country='Germany'
ORDER BY t.City;
You could use a common table expression (CTE):
WITH CTE AS (
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers)
SELECT City, Country
FROM CTE
WHERE CTE.Country='Germany'
ORDER BY CTE.City;
I find it easier to read than nested sub-queries.
Agree with Patrick. Query performance is much more important that query length. All the alternate solutions mentioned have a slower performance than your original one.