Get count summed across two columns in SQL? - sql

I am working in Postgres and have the following accounts table:
account_number | integer
country1 | character varying(1000)
country2 | character varying(1000)
I want to get a count of accounts in each country, regardless of whether the country is country1 or country2.
So if the content of the table was:
account_number,country1,country2
123,France,Germany
124,Switzerland,France
125,Germany
Then the desired output from the query would be:
France,2
Germany,2
Switzerland,1
I know how to do this for one country at a time (select country1, count(*) from accounts group by country1) but not for both countries simultaneously.

You can try the below -
with cte as
(
select account_number, country1 as country
from table1
union all
select account_number, country2
from table1
)
select country, count(*) as cnt
from cte
group by country

I recommend unpivoting the data using a lateral join and then aggregating:
select country, count(*)
from t cross join lateral
(values (country1), (country2)
) v(country)
where v.country is not null
group by country;
In addition to being a more concise way to write the query, it should be faster because the table is scanned only once. This could be a very big win if the "table" is really a view or subquery.

Related

HADOOP HIVE QUERY - SQL

I write a query in hive. it not working
query:
hive>> select country ,max(total_count) from (select country, count(airlineid) from airport group by country) t2;
it shows expression group by 'country' is missing.
Multiple problems. if you format it you should be able to see it.
select
country, -- you need to add this in group by after t2.
max(total_count) -- you need to create/alias a column called total_count
from
(
select
country,
count(airlineid)
from
airport
group by
country
) t2;
Fixed SQL -
select
country,
max(total_count) max_total_count
from
(
select
country,
count(airlineid)total_count
from
airport
group by
country
) t2
group by country
;

How to aggregate different CTEs in outer query SQL

i am trying to join two ctes to get the difference in performance of different countries and group on id here is my example
every campaign can be done in different countries, so how can i group by at the end to have 1 row per campaign id ?
CTE 1: (planned)
select
country
, campaign_id
, sum(sales) as planned_sales
from table x
group by 1,2
CTE 2: (Actual)
select
country
, campaign_id
, sum(sales) as actual_sales
from table y
group by 1,2
outer select
select
country,
planned_sales,
actual_sales
planned - actual as diff
from cte1
join cte2
on campaign_id = campaign_id
This should do it:
select
cte1.campaign_id,
sum(cte1.planned_sales),
sum(cte2.actual_sales)
sum(cte1.planned_sales) - sum(cte2.actual_sales) as diff
from cte1
join cte2
on cte1.campaign_id = cte2.campaign_id and cte1.country = cte2.country
group by 1
I would suggest using full join, so all data is included in both tables, not just data in one or the other. Your query is basically correct but it needs a group by.
select campaign_id,
sum(cte1.planned_sales) as planned_sales
sum(cte2.actual_sales) as actual_sales,
(coalesce(sum(cte1.planned_sales), 0) -
coalesce(sum(cte2.actual_sales), 0)
) as diff
from cte1 full join
cte2
using (campaign_id, country)
group by campaign_id;
That said, there is no reason why the CTEs should aggregate by both campaign and country. They could just aggregate by campaign id -- simplifying the query and improving performance.

how can I select rows that column does NOT have more than 1 value?

I am very new to SQL and I am wondering how to solve this issue. For example, my table looks as follows:
As you see in the table item_id 1 appears in both city_id 1 and 2, so does the item_id 4, but I want to get all the items where appears only in one city_id.
In this example, these would be item_id 2 (appearing only in city_id 2) and item_id 3 (appearing in city_id 1).
Use aggregation on item_id and count distinct values of city_id. The having clause can be used to filter on aggregates.
select item_id from mytable group by id having count(distinct city_id) = 1
You can use the following query:
SELECT item_id
FROM table_name
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
In case you want to see the city_id to you can use this query:
SELECT item_id, MIN(city_id) AS city_id
FROM example
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
Since there is only one city_id you can use MIN or MAX to get the id.
demo on dbfiddle.uk
You want all the id where they have only one distinct city:
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
It works by counting all the different values that city_id has for the same item_id. For those item ids where they repeat a lot, but the city_id is always the same the count of unique values in the city id is 1, and we can look for these using a HAVING clause. "Having" is like a where clause that runs after a GROUP BY operation is completed. It is the conceptual equivalent of this:
SELECT item_id
FROM
(
SELECT item_id, count(distinct city_id) as cdci
FROM table
GROUP BY item_id
) x
WHERE cdci = 1
If you want the city id too you can either get the MAX city (because in this case there is only one city so it's safe to do):
SELECT item_id, MAX(city_id) as city_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
or you could join this query back to the item table as a subquery:
SELECT t.*
(
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
) x
INNER JOIN
table t
ON x.item_id = t.item_id
This technique is the more general process for performing a group by that finds some particular set of rows, then bringing in the rest of the data from that row. You cant always stick every other column you want in a MAX because it will mix row data up, and you can't put the extra columns in your group by because that will subdivide what you're grouping on, giving the wrong results. Doing the group as a subquery and joining it back is a typical way to get all the row data when you have to group it to find which rows are interesting
In your case this form of query will bring all the duplicated rows (whereas the group by/max won't). If you don't want the duplicate rows you can make the top line SELECT DISTINCT t.* but don't make a habit of slapping distinct in to get rid of duplicated rows; if your tables don't have duplicates to start with but suddenly after you wrote a JOIN you got duplicated rows, google fornwhat a Cartesian product is in database queries and how to prevent it
You just need a group by on item id with having
Select item_id from table group by
item_id having count(distinct city_id)
=1
Also, if you want to have majority of same no of rows as input then
Select item_id, city, rank()
over(partition by item_id order by city)
rn
From table where rn=1;

How to create an additional column with the percentages related to a count distinct statement

I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%
Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC
You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization
You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.
Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC

How to retrieve information from table in one statement when the result has different numbers of rows?

I want to retrieve different information in one statement from the same table and they have different number of rows.
The first select has five rows in the result and the second select has three rows because some prices have null value. I thought maybe if I can put zero instead of null so they will match the same number of rows but I don't know how to do that, or is there another solution?
select count(ID), Land
from Film_ha2911
group by Land
union
select count(ID)
from Film_ha2911
where Price is not null
group by Land;
The use of UNION implies that the number and type of columns in select must corresponding
so in your case you should use null for not select columns
select count(ID), Land
from Film_ha2911
group by Land
union
select count(ID), null
from Film_ha2911
where Price is not null
group by Land;
But in this case seems you need a left join on the subquery for land
select t1.count1, t1.land , t2.count2
from (
select count(ID) count1, Land
from Film_ha2911
group by Land
) t1
left join (
select count(ID) count2, land
from Film_ha2911
where Price is not null
group by Land;
) t2 on t1.land = t2.land
The desired result can be achieved by single SELECT without UNION.
Extra column: PriceNotNull to differentiate is Price value filled or not:
SELECT
Land,
CASE WHEN Price IS NOT NULL THEN 'True' ELSE 'False' END PriceNotNull,
COUNT(ID) AS Count_ID
FROM Film_ha2911
GROUP BY Land, CASE WHEN Price IS NOT NULL THEN 'True' ELSE 'False' END
You can just use count():
select Land, count(*) as total_rows,
count(price) as total_with_price
from Film_ha2911
group by Land;
count() counts the number of non-NULL values, so no special logic is needed to count non-NULL values. By count(id) I assume you want to count all the rows. count(*) is more explicit -- as would count(1) which some people prefer.
If you actually want this on separate rows, I would add an indicator for what the count means:
select Land, 'total rows' as which, count(*) as total_rows
from Film_ha2911
group by Land
union all
select Land, 'with price', count(price)
from Film_ha2911
group by Land;
However, I think the first version with two separate columns is more useful.