Combine SQL Queries in Same Table - sql

I am trying to combine the following queries:
SELECT country_name,
avg(value) as Entrance_Age
FROM `bigquery-public-data.world_bank_intl_education.international_education`
WHERE indicator_code = 'UIS.THAGE.0'
GROUP BY country_name
ORDER BY avg(value) DESC LIMIT 10
SELECT avg(value) as Illiterate
FROM `bigquery-public-data.world_bank_intl_education.international_education`
WHERE indicator_code = 'UIS.ILLPOP.AG25T64'
GROUP BY country_name
ORDER BY avg(value) DESC LIMIT 10
The output from the first query is:[1]: https://i.stack.imgur.com/jsxx7.png
The goal is to get another column named "Illiterate" next to Entrance_Age. I am trying to show the illiteracy rate for each of these 10 countries next to the Entrance Age column. All the data is from the same table. The values are linked to the indicator_code which is a statistics based on the indicator code.
I've tried multiple joins but can't seem to get one that works.
If there is anything I am missing from my question, please let me know.

You can use conditional aggregation:
SELECT country_name,
avg(case when indicator_code = 'UIS.THAGE.0' then value end) as Entrance_Age,
avg(case when indicator_code = 'UIS.ILLPOP.AG25T64' then value end) as illiterate
FROM `bigquery-public-data.world_bank_intl_education.international_education`
GROUP BY country_name
ORDER BY Entrance_Age DESC LIMIT 10

Related

T-SQL query to find the required output

I am new to SQL queries, I have some data and I am trying to find the result which is shown below.
In my sample data, I have customer ID repeating multiple times due to multiple locations, What I am looking to do is create a query which gives output shown in image output format,
If customer exists only once I take that row
If customer exists more than once, I check the country; if Country = 'US', I take that ROW and discard others
If customer exists more than once and country is not US, then I pick the first row
PLEASE NOTE: I Have 35 columns and I dont want to change the ROWS order as I have to select the 1st row in case customer exist more than once and country is not 'US'.
What I have tried: I am trying to do this using rank function but was unsuccessful. Not sure if my approach is right, Please anyone share the T-SQL query for the problem.
Regards,
Rahul
Sample data:
Output required :
I have created a (short) dbfiddle
Short explanation (to just repeat the code here on SO):
Step1:
-- select everyting, and 'US' as first row
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END;
Step2:
-- filter only rows which are first row...
SELECT *
FROM (
SELECT
cust_id,
country,
sales,
CASE WHEN country='US' THEN 0 ELSE 1 END X,
ROW_NUMBER() OVER (PARTITION BY cust_id
ORDER BY (CASE WHEN country='US' THEN 0 ELSE 1 END)) R
FROM table1
-- ORDER BY cust_id, CASE WHEN country='US' THEN 0 ELSE 1 END
) x
WHERE x.R=1
I can't vouch for performance but it should work on SQL Server 2005. Assuming your table is named CustomerData try this:
select cust_id, country, Name, Sales, [Group]
from CustomerData
where country = 'US'
union
select c.* from CustomerData c
join (
select cust_id, min(country) country
from CustomerData
where cust_id not in (
select cust_id
from CustomerData
where country = 'US'
)
group by cust_id
) a on a.cust_id = c.cust_id and a.country = c.country
It works by finding all those with a record with US as the country and then unioning that with the first country from every record that doesn't have the US as a country. If min() isn't getting the country you want then you'll need to find an alternative aggregation function that will select the country you want.

HIVE - Getting ALL columns of the table with COUNT(*) with DISTINCT values

I have the table below called Current_Table
I want to get the output that is,
The Column personalemailtrim to be DISTINCT
The column Occurrences must be over Count >1
Order by the column personalemailtrim
My Query so far build is wrong in many levels, Group by cant with DISTINCT and also using Count(*) doesnt give me any results with Group my etc....
SELECT id,
personalemailtrim,
personworksatnumberofbsbs,
region,
district,
branch,
num,
countofapptsatbsb,
COUNT(personalemailtrim) occurrences
FROM Current_table
GROUP BY id,
personalemailtrim,
personworksatnumberofbsbs,
region,
district,
branch,
num,
countofapptsatbsb
HAVING COUNT(*) > 1
ORDER BY personalemailtrim
Any help provided is really appreciated . I tried several breaking down code methods but i am stuck on this
further to elaborate , The expected output should look like below
As you can see the,
Occurrences are > 1
personalemailtrim is now DISTINCT
I think you want:
select t.*
from (select t.*,
row_number() over (partition by personalemailtrim order by id) as seqnum
from Current_table t
) t
where seqnum = 1 and occurrences > 1;
This assumes that occurrences is the same for each personalemailtrim, which is consistent with your data and with your question.

How to work with problems correlated subqueries that reference other tables, without using Join

I am trying to work on public dataset bigquery-public-data.austin_crime.crime of the BigQuery. My goal is to get the output as three column that shows the
discription(of the crime), count of them, and top district for that particular description(crime).
I am able to get the first two columns with this query.
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
and was hoping I can get that done with one query and then I tried this in order to get the third column showing me the Top district for that particular description (crime) by adding the code below
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
The error i am getting is this. "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
I think i can do that by joins. Can someone has better solution possibly to do that using without join.
Below is for BigQuery Standard SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC

SQL PIVOT group by 2 columns

I have a attendance table as below i want group them by time and section,status is null mean that the employee is absent :
Any idea how to generate output like below?
my current code :
SELECT TIME,COUNT(SECTION) AS SECTION,COUNT(STATUS) AS COUNT
FROM attendance_record
GROUP BY TIME,SECTION
ORDER BY TIME
If I understand your question, just use conditional aggregation:
SELECT TIME, SECTION, COUNT(*) as TOTAL,
COUNT(STATUS) AS IN, ( COUNT(*) - COUNT(STATUS) ) as ABSENT
FROM attendance_record
GROUP BY TIME, SECTION
ORDER BY TIME

How to do a Postgresql group aggregation: 2 fields using one to select the other

I have a table - Data - of rows, simplified, like so:
Name,Amount,Last,Date
A,16,31,1-Jan-2014
A,27,38,1-Feb-2014
A,12,34,1-Mar-2014
B,8,37,1-Jan-2014
B,3,38,1-Feb-2014
B,17,39,1-Mar-2014
I wish to group them similar to:
select Name,sum(Amount),aggr(Last),max(Date) from Data group by Name
For aggr(Last) I want the value of 'Last' from the row that contains max(Date)
So the result I want would be 2 rows
Name,Amount,Last,Date
A,55,34,1-Mar-2014
B,28,39,1-Mar-2014
i.e. in both cases, the value of Last is the one from the row that contained 1-Mar-2014
The query I'm actually doing is basically the same, but with many more sum() fields and millions of rows, so I'm guessing an aggregate function could avoid multiple extra requests each group of incoming rows.
Instead, use row_number() and conditional aggregation:
select Name, sum(Amount),
max(case when seqnum = 1 then Last end) as Last,
max(date)
from (select d.*, row_number() over (partition by name order by date desc) as seqnum
from data d
) d
group by Name;