Count results that have different column value related to same ID - sql

I'm new to SQL and looking for help on how to best do this.
I have 2 tables with the following columns:
Investors: Round ID, Investor Name, Investor City, Investor Country
Rounds: Round ID, Company Name, Company City, Company Country
I joined them to get this result
Round ID
Investor Country
Company Country
1
US
Spain
1
UK
Spain
1
Spain
Spain
2
France
Germany
2
UK
Germany
3
UK
Italy
3
Italy
Italy
I will need to get the number of investors (per round ID) which have their country different from the Company Country, So like for Round 1 I will have 2, for Round 2 it's 0 and for round 3 it's 1.
How could I do this?
Thank you for your help!

Just use conditional aggregation:
select round,
sum(case when investor_country <> company_company then 1 else 0 end) as cnt
from t
group by round;

Looking at your expected output, I think you need the count = 0 in case there do not exists a single record for investor country = company country and if there is, then you need all other record count.
You can use conditions as follows:
select round_id,
case when count(case when investor_country = company_company then 1 end) = 0
then 0
else count(case when investor_country <> company_company then 1 end)
end as cnt
from your_table t
group by round_id;

If you need diffrent counts:
SELECT
RoundId,
SUM(IIF(InvestorCountry != CompanyCountry,1,0)) AS Count
FROM
YOUR_TABLE_OR_VIEW
GROUP BY
RoundId
If you need difrent count and when all result of a same Round are difrent you want zero:
SELECT
t.RoundId,
IIF(t.Count = t.DiffrentCount,0,t.DiffrentCount) 'Count'
FROM
(
SELECT
RoundId,
SUM(1) AS 'Count',
SUM(IIF(InvestorCountry != CompanyCountry,1,0)) AS 'DiffrentCount',
FROM
YOUR_TABLE_OR_VIEW
GROUP BY
RoundId
)t

Related

How do I calculate the percent of the total per year when COUNT and WHERE is used?

In SQL Server, I have a query
SELECT season, COUNT(DISTINCT player_name) AS 'No. of Foreign Players'
FROM nbastats
WHERE country <> 'USA'
GROUP BY season
It return these results
id
season
No. of Foreign Players
1
1996-97
9
2
1997-98
14
3
1998-99
22
4
1999-00
24
5
2000-01
40
6
2001-02
51
7
2002-03
62
What I'm trying to do is to instead get the percentage of foreign players (over total players) each season. The database only provides "country" so I assume I can only use
WHERE country <> 'USA'
and perhaps divide the total but I am unsure how to with WHERE in the way. Any help would be greatly appreciated!
I think you want a ratio of a conditional:
SELECT season,
COUNT(DISTINCT CASE WHEN country <> 'USA' THEN player_name END) * 1.0 / COUNT(DISTINCT player_name) AS foreign_ratio
FROM nbastats
GROUP BY season

How to count the number of times a specific text string appears and group it by other columns

I have a table population_table that contains columns with a user_id, provider_name, and city. I want to count the number of times a user appears in each city, per provider. So for instance, I would want the output to look something like this:
provider_name | Users | Atlanta | Chicago | New York
______________________________________________________
Alpha 100 50 25 25
Beta 200 100 75 25
Kappa 500 300 100 100
I tried using:
select provider_name, count (distinct user_id) AS Users, count(city) AS City
from population_table
group by provider_name
How can I write this query to get the breakdown of the users per provider per city?
I think you want conditional aggregation. It is not clear from your description that count(distinct) is necessary. So I would try this first:
select provider_name, count(*) AS Users,
sum(case when city = 'Atlanta' then 1 else 0 end) as Atlanta,
sum(case when city = 'Chicago' then 1 else 0 end) as Chicago,
sum(case when city = 'New York' then 1 else 0 end) as New_York
from population_table
group by provider_name;
If count(distinct) is necessary:
select provider_name, count(distinct user_id) AS Users,
count(distinct case when city = 'Atlanta' then user_id end) as Atlanta,
count(distinct case when city = 'Chicago' then user_id end) as Chicago,
count(distinct case when city = 'New York' then user_id end) as New_York
from population_table
group by provider_name
If you have a variable number of cities, I do not know how to supply the list in SparkSQL. But using pyspark, you could create output table from input like this:
counts = input.groupBy('provider_name', 'city').count().cache()
countsPerProvider = counts.groupBy('provider_name').count().withColumnRenamed("count", "users")
pivoted = counts.groupBy("provider_name").pivot("city").sum('count')
table = pivoted.join(countsPerProvider, pivoted["provider_name"] == countsPerProvider["provider_name"]).select(pivoted["*"], countsPerProvider["users"])

Challenge with Not Equal to operator

I have a problem to solve. I would like to get the countries where the gender not equal to Female from the following table using only the where clause. I don't want to use the sub query like: select country from table where country not in (select country from table where gender='Female')
Any ideas ?
ID Name Gender Country
1 Jhon Male USA
2 Katie Female USA
3 Steave Male UK
4 Gerry Female UK
5 Brad Male AUS
Regards,
Chandra.
Use not exists
select t.*
from table t
where not exists (select 1
from table
where Country = t.Country and
Gender = 'Female'
);
You can also use group by like that :
select Country
from table t
group by Country
having sum(case when Gender = 'Female' then 1 else 0 end) = 0;
You could avoid subquery and get full rows by using:
SELECT TOP 1 WITH TIES *
FROM tab
ORDER BY SUM(CASE WHEN Gender='Female' THEN 1 ELSE 0 END)
OVER(PARTITION BY Country);
DBFiddle Demo - SQL Server
You can do:
select country
from t
except
select country
from t
where gender = 'Female';
As a set operator, except removes duplicates.
Maybe I got your question wrong but why you don't use:
SELECT country FROM table WHERE gender NOT IN('Female')
Or is it a sub query?

SQLite percentages with small values

So I have this table of subscribers of users and the country they are in.
UserID | Name | Country
-------+-------------------+------------
1 | Zaphod Beeblebrox | UK
2 | Arthur Dent | UK
3 | Gene Kelly | USA
4 | Nat King Cole | USA
I need to produce a list of all the users by percentage from each of the countries. I also need all the smaller member countries (under 1%) to be collapsed into an "OTHERS" category.
I can accomplish a simple "top x" of members trivially with a
SELECT COUNTRY, COUNT(*) AS POPULATION FROM SUBSCRIBERS GROUP BY COUNTRY ORDER BY POPULATION DESC LIMIT 10
and can generate the percentages by PHP server side code, but I don't quite know how to:
Do all of it in SQL including percentage calculations directly in the result
Club all under 1% members into a single OTHERS category.
So I need something like this:
Country | Population
--------+-----------
USA | 25.4%
Brazil | 12%
UK | 5%
OTHERS | 65%
Appreciate the help!
Here is query for this, I used a subquery to count the total number of rows and then used that to get the percentage value for each. The 'Others' category was generated in a separate query. Rows are sorted by descending population with the Others row last.
SELECT * FROM
(SELECT country , ROUND((100.0*COUNT(*)/count_all),1) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) > 1
GROUP BY country
ORDER BY population DESC)
UNION ALL
SELECT 'OTHERS', IFNULL(ROUND(100.0*COUNT(*)/count_all,1),0.0) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) <= 1
Ok I think I might have found a way to do this that's a hell of a lot quicker on execution speed:
SELECT territory,
Round(Sum(percentage), 3) AS Population
FROM (SELECT
Round((Count(*)*100.0)/(SELECT Count(*) FROM subscribers),3) AS Percentage,
CASE
WHEN ((Count(*)*100.0)/(SELECT Count(*) FROM subscribers)) > 2 THEN
country
ELSE 'Other'
END AS Territory
FROM subscribers
GROUP BY country
ORDER BY percentage DESC)
GROUP BY territory
ORDER BY population DESC;

Hive sql: count and avg

I'm recently trying to learn Hive and i have a problem with a sql consult.
I have a json file with some information. I want to get the average for each register. Better in example:
country times
USA 1
USA 1
USA 1
ES 1
ES 1
ENG 1
FR 1
then with next consult:
select country, count(*) from data;
I obtain:
country times
USA 3
ES 2
ENG 1
FR 1
then i should get next out:
country avg
USA 0,42 (3/7)
ES 0,28 (2/7)
ENG 0,14 (1/7)
FR 0,14 (1/7)
I don't know how i can obtain this out from the first table.
I tried:
select t1.country, avg(t1.tm),
from (
select country,count(*)as tm from data where not country is null group by country
) t1
group by t1.country;
but my out is wrong.
Thanks for help!! BR.
Divide the each group count by total count to get the result. Use Sub-Query to find the total number of records in your table
Try this
select t1.country, count(*)/IFNULL((select cast(count(*) as float) from data),0)
from data
group by t1.country;