Hive sql find out how many common customer in each country - sql

I have a table called custtable, have 3 columns custid,country,date
there are 5 countrise in country: 'CH','US', 'UK','FR' and 'GE'
I hope have elegent query to find out how many unique [custid] in 5 countrise.
currently, I can use subquery and temporary table to find the overlapping set, but any suggestions for a more simple way.
here is my way to find out the overlapping for 2 countries and then I need to do another subquery
with t1 AS
(SELECT DISTINCT [custid]
FROM custtable
   where date>20140101
and country='CH'),
t2 as
(SELECT DISTINCT [custid]
FROM custtable
   where date>20140101
and country='FR'),
t3 AS
(SELECT DISTINCT [custid]
FROM custtable
   where date>20140101
and country='US'),
t4 as
(SELECT DISTINCT [custid]
FROM custtable
   where date>20140101
and country='UK')
select count (distinct t1.custid)
from t1
inner join t3
on (t1.custid=t3.custid)
inner join t2
on (t1.custid=t2.custid)
inner join t4
on (t1.custid=t4.custid)
     
thank you for any input

I think a better way is to count how many distinct countries each custid has and filter count >= 5, e.g.,
with count_table as (
select custid, count(distinct country) as cnt
from custtable
where date>20140101
)
select custid, cnt
from count_table
where cnt >= 5
then count your cusid

SELECT COUNTRY
, COUNT(DISTINCT CUSTID) AS CNT
FROM CUSTTABLE
GROUP BY COUNTRY

If you want customers in all five countries:
select custid
from custtable
where date > 20140101
group by custid
having count(distinct country) = 5;
If you want those particular five countries (as your query suggests):
select custid
from custtable
where date > 20140101 and
country in ('CH','US', 'UK','FR', 'GE')
group by custid
having count(distinct country) = 5;

Related

Find the sailors that have been on EVERY boat

I don't know how to explain the problem in a generic way so i'll post the specific case:
i have 3 tables:
Sailors:
S(ids, names, rating, age)
Boats:
B(idb, nameb, color)
Bookings:
Bo(ids, idb, date)
i have to write a query that finds all the sailors who have booked EVERY boat.
Even if i posted a specific case i'd like a generic answare that can be applied to every problem of tha same kind.
thank you in advance.
You can get the sailors's ids who have booked every boat with this query:
select ids
from bookings
group by ids
having count(distinct idb) = (select count(*) from boats)
So use it either with the operator IN:
select * from sailors
where ids in (
select ids
from bookings
group by ids
having count(distinct idb) = (select count(*) from boats)
)
or join it to sailors:
select s.*
from sailors s
inner join (
select ids
from bookings
group by ids
having count(distinct idb) = (select count(*) from boats)
) t on t.ids = s.ids
You can use sum with in:
select * from sailors s1 group by ids having (select sum(idb in (select b2.idb from bookings b2 where b2.ids = s1.id)) from boats) = (select count(*) from boats)

Selecting rows with the most repeated values at specific column

Problem in general words: I need to select value from one table referenced to the most repeated values in another table.
Tables have this structure:
screenshot
screenshot2
The question is to find country which has the most results from sportsmen related to it.
First, INNER JOIN tables to have relation between result and country
SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id);
Then, I count how much time each country appear
SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id))
GROUP BY country
;
And got this screenshot3
Now it feels like I'm one step away from solution ))
I guess it's possible with one more SELECT FROM (SELECT ...) and MAX() but I can't wrap it up?
ps:
I did it with doubling the query like this but I feel like it's so inefficient if there are millions of rows.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
)
WHERE highest_participation = (SELECT MAX(highest_participation)
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
))
Also I did it with a view
CREATE VIEW temp AS
SELECT country as country_with_most_participations, COUNT(country) as country_participate_in_#_comp
FROM(
SELECT country, competition_id FROM result
INNER JOIN sportsman USING(sportsman_id)
)
GROUP BY country;
SELECT country_with_most_participations FROM temp
WHERE country_participate_in_#_comp = (SELECT MAX(country_participate_in_#_comp) FROM temp);
But not sure if it's easiest way.
If I understand this correctly you want to rank the countries per competition count and show the highest ranking country (or countries) with their count. I suggest you use RANK for the ranking.
select country, competition_count
from
(
select
s.country,
count(*) as competition_count,
rank() over (order by count(*) desc) as rn
from sportsman s
inner join result r using (sportsman_id)
group by s.country
) ranked_by_count
where rn = 1
order by country;
If the order of the result rows doesn't matter, you can shorten this to:
select s.country, count(*) as competition_count
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by count(*) desc
fetch first rows with ties;
You seem to be overcomplicating this. Starting from your existing join query, you can aggregate, order the results and keep the top row(s) only.
select s.country, count(*) cnt
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by cnt desc
fetch first 1 row with ties
Note that this allows top ties, if any.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
order by 2 desc
)
where rownum=1

Select Most Recent Date with Inner Join

Running into a wall when trying to pull info from tables similar to those below. Not sure how to approach this.
The results should have the most recent TRANSAMT for each ACCNUM along with NAME and address.
Select A.ACCNUM, MAX(B.TRANSAMT) as BAMT, B.ADDRESS from
From TableA A inner join TableB on A.ACCNUM = B.ACCNUM
This is what i have so far. Any help would be appreciated.
TableA
ACCNUM NAME ADDRESS
00001 R. GRANT Miami, FL
00002 B. PAUL Dallas, TX
TableB
ACCNUM TRANSAMT TRANSDATE
00001 150 1/1/2015
00001 200 13/2/2015
00002 100 2/1/205
00003 50 18/2/2015
You can use the ANSI standard row_number() function in most databases. This allows you to do conditional aggregation:
select a.accnum, a.name, b.amount, a.address
from tableA a left join
(select b.*, row_number() over (partition by accnum order by transdate desc) as seqnum
from tableB b
) b
on a.accnum = b.accnum and b.seqnum = 1;
Note: I changed the join to a left join. This will keep all records in tableA, even those with no matches. I am not sure if that is the intention of your query.
You can use row_number to order rows per each account number by the most recent first.
select accnum, amt, name, address
from (
select A.ACCNUM, B.TRANSAMT as BAMT, B.ADDRESS,A.Name,
row_number() over(partition by a.accnum order by b.transdate desc) as rn
From TableA A
inner join TableB on A.ACCNUM = B.ACCNUM
) t
where rn = 1;
Please note this will not work if you are using MySQL.
This one with no ROW_NUMBER():
with find_max as(
select acc_name,max(TRANSDATE) as TRANSDATE from talbeB group by acc_name)
select find_max.ACCNUM , A.TRANSAMT ,
find_max.TRANSDATE , B.ADDRESS,B.Name
from tableA as A
join find_max on find_max.ACCNUM=A.ACCNUM and find_max.ACCNUM=A.ACCNUM
join TableB B on A.ACCNUM = B.ACCNUM
First find the max date for each acc_name, the join both of tables to it.
Will work on most data bases.

Row value from another table

I have a table that is having 2 duplicate rows (total of 3 rows), so I used the code below to get the duplicate value in the column
SELECT CustNo, COUNT(*) TotalCount
FROM Rental
GROUP BY CustNo
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
So once I get the repeated value, I need to get the CustNo derived as duplicate from the customer table. How do I go about taking this value and using it in the select statment all in the same query.
I also have the select statement prepared like this.
Select * from Customer where CustNo = 'T0002';
Thanks.
Select * from Customer
where CustNo IN
(
SELECT CustNo
FROM Rental
GROUP BY CustNo
HAVING COUNT(*) > 1
)
You can use join:
SELECT c.*
FROM (SELECT CustNo, COUNT(*) TotalCount
FROM Rental
GROUP BY CustNo
HAVING COUNT(*) > 1
) cc JOIN
Customer c
on cc.CustNo = c.CustNo;
Select C.* from Customer C RIGHT JOIN (
SELECT CustNo
FROM Rental
GROUP BY CustNo
HAVING COUNT(*) > 1) D
ON C.CustNo = D.CustNo
You can also try this,
With tblDups as(
select CustNo,count(CustNo) as TotalCount from a_rental
Group by CustNo
Having count(CustNo) >1)
select b.* from a_rental b
inner join tblDups a on a.CustNo = b.Custno

Help with SQL QUERY OF JOIN+COUNT+MAX

I need a help constructung an sql query for mysql database. 2 Table as follows:
tblcities (id,name)
tblmembers(id,name,city_id)
Now I want to retrieve the 'city' details that has maximum number of 'members'.
Regards
SELECT tblcities.id, tblcities.name, COUNT(tblmembers.id) AS member_count
FROM tblcities
LEFT JOIN tblmembers ON tblcities.id = tblmembers.city_id
GROUP BY tblcities.id
ORDER BY member_count DESC
LIMIT 1
Basically: retrieve all cities and count how many members each has, sort by that member count in descending order, making the highest count first - then show only that first city.
Terrible, but that's a way of doing it:
SELECT * FROM tblcities WHERE id IN (
SELECT city_id
FROM tblMembers
GROUP BY city_id
HAVING COUNT(*) = (
SELECT MAX(TOTAL)
FROM (
SELECT COUNT(*) AS TOTAL
FROM tblMembers
GROUP BY city_id
) AS AUX
)
)
That way, if there is a tie, still you'll get all cities with the maximum number of members...
Select ...
From tblCities As C
Join (
Select city_id, Count(*) As MemberCount
From tblMembers
Order By Count(*) Desc
Limit 1
) As MostMembers
On MostMembers.city_id = C.id
select top 1 c.id, c.name, count(*)
from tblcities c, tblmembers m
where c.id = m.city_id
group by c.id, c.name
order by count(*) desc