Select the countries with fewest number of tuples - sql

At http://www.dofactory.com/sql/sandbox I'm experimenting with submitting my own SQL queries against their sample database to become better at SQL. What I want to do is to select all countries from Customer that have exactly the fewest number of tuples. Here is my query attempt:
SELECT a.Country
FROM [Customer] a, (SELECT COUNT(*) AS Tot
FROM [Customer]
GROUP BY Country) b
GROUP BY a.Country
HAVING COUNT(*) = MIN(b.Tot)
However, the website returns an empty table instead of the correct result which is (Ireland, Norway, Poland). The correct result is easily realized by grouping the table by country and using COUNT(*), and then looking at the countries that have the smallest COUNT(*) value out of all COUNT(*) values. I would like some advice on how to generate the correct result without any assumptions about the table's data.

I would do this using SELECT TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES c.Country
FROM Customer c
GROUP BY c.Country
ORDER BY COUNT(*) ASC;
Two notes:
When using table aliases, make them abbreviations for the tables. This makes the query much easier to follow.
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.

Learned somtihing new(WITH TIES) from Gordon Linoff, again...
Here my solution without it...
Select a.Country from [Customer] a
group by a.Country
having count(*) = (select min(b.Tot) from (SELECT COUNT(*) AS Tot FROM [Customer] GROUP BY Country) b)

If you are not using sql 2012 then,
declare #Fewer int=2
;With CTE as
(
select c.*
,ROW_NUMBER()over(partition by countryid order by customerid)rn
from dbo.Customers C
)
select * from cte
where rn<=#Fewer

Related

SQL ORACLE UNPIVOT

I can not deal with this unpivot.
select *
from (select c.country_id, r.region_id, count(*) liczba
from countries c join regions r on r.region_id = c.region_id
group by c.country_id, r.region_id)
UnPivot(
count(liczba) for r.region_id in (any))
order by c.country_id
this is code, and still will not work :(
select r.region_name, count(*)
from countries c join regions r on r.region_id = c.region_id
group by r.region_name
this one works properly.
"invalid identifier"
Seconde imagine!
Your query does not make sense as:
UNPIVOT is used to transpose columns into rows; you only have 3 columns country_id, region_id and liczba.
The UNPIVOT syntax is:
UNPIVOT ( value FOR key IN ( column1 AS 'alias1', column2 AS 'alias2' ) )
You cannot have a aggregation expression as the value.
You need to specify which columns you are UNPIVOTing to rows; you cannot use any unless you have defined a column called any.
You appear to want to use PIVOT rather than UNPIVOT.
I would like to see, how many people work in each job_id
Use GROUP BY and COUNT:
SELECT job_id,
COUNT(DISTINCT EMPLOYEE_ID) AS number_of_employees
FROM employees
GROUP BY job_id
using unpivot
That's not what UNPIVOT is for; UNPIVOT converts columns to rows whereas you want to aggregate rows together.

Can we use order by in subquery? If not why sometime could use top(n) order by?

I'm an entry level trying to learn more about SQL,
I have a question "can we use order by in subquery?" I did look for some article says no we could not use.
But on the other hand, I saw examples using top(n) with order by in subquery:
select c.CustomerId,
c.OrderId
from CustomerOrder c
inner join (
select top 2
with TIES CustomerId,
COUNT(distinct OrderId) as Count
from CustomerOrder
group by CustomerId
order by Count desc
) b on c.CustomerId = b.CustomerId
So now I'm bit confused.
Could anyone advise?
Thank you very much.
Yes, you are right we cannot use order by in a inner query. Because it is acting as a table. A table in itself needs to be sorted when queried for different purposes.
In your query itself the inner query is select some records using Top 2. Eventhough these are top 2 records only, they form a table with 2 records which is enough for it to recognized as a table and join it with another table
The right query will be:-
SELECT * FROM
(
SELECT c.CustomerId, c.OrderId, DENSE_RANK() OVER(ORDER BY b.count DESC) AS RANK
FROM CustomerOrder c
INNER JOIN
(SELECT CustomerId, COUNT(distinct OrderId) as Count
FROM CustomerOrder GROUP BY CustomerId) b
ON c.CustomerId = b.CustomerId
) a
WHERE RANK IN (1,2);
Hope I have answered your question.
Yes we can use order by clause in sub query, for example i have a table named as product (check the screen shot of table http://prntscr.com/f15j3z). Chek this query on your side and revert me in case of any doubt.
select p1.* from product as p1 where product_id = (select p2.product_id from product as p2 order by product_id limit 0,1)
yes we can use order by in subquery,but it is pointless to use it.
It is better to use it in the outer query.There is no use of ordering the result of subquery, because result of inner query will become the input for outer query and it does not have to do any thing with the order of the result of subquery.

SQL Sub Query not working

Any idea why this doesn't work then (based on your examples on forum it should!) Thanks.
select result.low
from (
select top 1 country as low, count()
from customers c, orders o
where c.customerid=o.customerid
group by country
order by count()
) as result
as has been stated there are several issues
select result.low
from (
select top 1 country as low, count() -- count() needs an argument and needs to be aliased
from customers c, orders o -- old style join
where c.customerid=o.customerid
group by country -- column `country` may need qualifying
order by count() -- count() needs an argument
) as result
so the query should probably be
select top 1 country as low
from customers c
join orders o
on c.customerid=o.customerid
group by c.country
order by count(*)

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz
Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)
If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

How to reference a custom field in SQL

I am using mssql and am having trouble using a subquery. The real query is quite complicated, but it has the same structure as this:
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as numberTransactions
from customerData
And what I want to do is order the table by the number of transactions, but when I use
order by numberTransactions
It tells me there is no such field. Is it possible to do this? Should I be using some sort of special keyword, such as this, or self?
use the field number, in this case:
order by 3
Sometimes you have to wrestle with SQL's syntax (expected scope of clauses)
SELECT *
FROM
(
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as numberTransactions
from customerData
) as sub
order by sub.numberTransactions
Also, a solution using JOIN is correct. Look at the query plan, SQL Server should give identical plans for both solutions.
Do an inner join. It's much easier and more readable.
select
customerName,
customerID,
count(*) as numberTransactions
from
customerdata c inner join purchases p on c.customerID = p.customerID
group by customerName,customerID
order by numberTransactions
EDIT: Hey Nathan,
You realize you can inner join this whole table as a sub right?
Select T.*, T2.*
From T inner join
(select
customerName,
customerID,
count(*) as numberTransactions
from
customerdata c inner join purchases p on c.customerID = p.customerID
group by customerName,customerID
) T2 on T.CustomerID = T2.CustomerID
order by T2.numberTransactions
Or if that's no good you can construct your queries using temporary tables (#T1 etc)
There are better ways to get your result but just from your example query this will work on SQL2000 or better.
If you wrap your alias in single ticks 'numberTransactions' and then call ORDER BY 'numberTransactions'
select
customerName,
customerId,
(
select count(*)
from Purchases
where Purchases.customerId=customerData.customerId
) as 'numberTransactions'
from customerData
ORDER BY 'numberTransactions'
The same thing could be achieved by using GROUP BY and a JOIN, and you'll be rid of the subquery. This might be faster too.
I think you can do this in SQL2005, but not SQL2000.
You need to duplicate your logic. SQL Server isn't very smart at columns that you've named but aren't part of the dataset in your FROM statement.
So use
select
customerName,
customerId,
(
select count(*)
from Purchases p
where p.customerId = c.customerId
) as numberTransactions
from customerData c
order by (select count(*) from purchases p where p.customerID = c.customerid)
Also, use aliases, they make your code easier to read and maintain. ;)