SQL Case When Repeating Conditions - sql

I have a table called country that has 2 columns: orderid, country_code.
I need to do the following CASE WHEN:
SELECT
CASE
WHEN country_code IN ('PT','IT','ES','PL','AD') THEN 'SWE'
WHEN country_code IN ('MD','BG','BA','SI','HR','ME','RO','RS') THEN 'SEE'
WHEN country_code IN ('CI','GH','MA','NG','UG','KE','TN') THEN 'AFRICA'
WHEN country_code IN ('CI','GH','NG','UG','KE','TN') THEN 'SSA'
WHEN country_code IN ('UA','BY','GE','KZ','KG','AM') THEN 'ECA'
END AS region,
COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
GROUP BY 1
However, when I run the code, Region SSA doesn't appear because the tcondition before (AFRICA) is using the countries of SSA (SSA is the same as AFRICA but without MA).
How can I achieve to have the complete amount of orders for AFRICA and SSA?
rdbms: Amazon Redshift
EDIT:
This is my table right now:
SWE 200
SEE 500
AFRICA 350
SSA 0 <--- (it doesn't appear because the conditions were met by AFRICA region)
ECA 200
And I need the following:
SWE 200
SEE 500
AFRICA 350 --> (MA represent 150 orders)
SSA 200 --> (sames as AFRICA, but without MA)
ECA 200

You could define a common table expression (CTE) that contains all country codes and theirs corresponding region, then join it with your data.
WITH regions AS
(
SELECT CAST('PT,IT,ES,PL,AD' AS VARCHAR) AS country_codes, 'SWE' AS region_code
UNION ALL SELECT CAST('MD,BG,BA,SI,HR,ME,RO,RS' AS VARCHAR) AS country_codes, 'SEE' AS region_code
UNION ALL SELECT CAST('CI,GH,MA,NG,UG,KE,TN' AS VARCHAR) AS country_codes, 'AFRICA' AS region_code
UNION ALL SELECT CAST('CI,GH,NG,UG,KE,TN' AS VARCHAR) AS country_codes, 'SSA' AS region_code
UNION ALL SELECT CAST('UA,BY,GE,KZ,KG,AM' AS VARCHAR) AS country_codes, 'ECA' AS region_code
)
SELECT
regions.region_code AS region,
COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
INNER JOIN regions ON STRPOS(regions.country_codes, country.country_code) > 0
GROUP BY regions.region_code;
Demo: http://sqlfiddle.com/#!17/9eecb/101807

You want to concatenate strings. For the lack of some CONCAT_WS function skipping nulls, just have a comma with every region and trim the string accordingly in the end.
SELECT
TRIM(',' FROM
CASE WHEN country_code IN ('PT','IT','ES','PL','AD') THEN ',SWE' ELSE '' END ||
CASE WHEN country_code IN ('MD','BG','BA','SI','HR','ME','RO','RS') THEN ',SEE' ELSE '' END ||
CASE WHEN country_code IN ('CI','GH','MA','NG','UG','KE','TN') THEN ',AFRICA' ELSE '' END ||
CASE WHEN country_code IN ('CI','GH','NG','UG','KE','TN') THEN ',SSA' ELSE '' END ||
CASE WHEN country_code IN ('UA','BY','GE','KZ','KG','AM') THEN ',ECA' ELSE '' END
) AS regions,
COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
GROUP BY regions
ORDER BY regions;
UPDATE
You have complete changed your request. You answered that you want a column with a region list (e.g. 'AFRICA,SSA'), but now you show that you want one row per single region. Please don't make such fundamental changes to your requests. However, you want a union of single region queries now:
SELECT 'SWE' AS region, COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
WHERE country_code IN ('PT','IT','ES','PL','AD')
UNION ALL
SELECT 'SEE' AS region, COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
WHERE country_code IN ('MD','BG','BA','SI','HR','ME','RO','RS')
UNION ALL
SELECT 'AFRICA' AS region, COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
WHERE country_code IN ('CI','GH','MA','NG','UG','KE','TN')
UNION ALL
SELECT 'SSA' AS region, COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
WHERE country_code IN ('CI','GH','NG','UG','KE','TN')
UNION ALL
SELECT 'ECA' AS region, COUNT(DISTINCT orderid) AS amount_of_orders
FROM country
WHERE country_code IN ('UA','BY','GE','KZ','KG','AM');

Related

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.
I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2
Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example
#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...

Select columns maximum and minimum value for all records

I have a table as below; I want to get the column names having maximum and minimum value except population column (ofcourse it will have maximum value) for all records.
State Population age_below_18 age_18_to_50 age_50_above
1 1000 250 600 150
2 4200 400 300 3500
Result :
State Population Maximum_group Minimum_group Max_value Min_value
1 1000 age_18_to_50 age_50_above 600 150
2 4200 age_50_above age_18_to_50 3500 300
Assuming none of the values are NULL, you can use greatest() and least():
select state, population,
(case when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = greatest(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as maximum_group,
(case when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_below_18'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_18_to_50'
when age_below_18 = least(age_below_18, age_18_to_50, age_50_above)
then 'age_50_above'
end) as minimum_group,
greatest(age_below_18, age_18_to_50, age_50_above) as maximum_value,
least(age_below_18, age_18_to_50, age_50_above) as minimum_value
from t;
If your result set is actually being generated by a query, there is likely a better approach.
An alternative method "unpivots" the data and then reaggregates:
select state, population,
max(which) over (dense_rank first_value order by val desc) as maximum_group,
max(which) over (dense_rank first_value order by val asc) as minimum_group,
max(val) as maximum_value,
min(val) as minimum_value
from ((select state, population, 'age_below_18' as which, age_below_18 as val
from t
) union all
(select state, population, 'age_18_to_50' as which, age_18_to_50 as val
from t
) union all
(select state, population, 'age_50_above' as which, age_50_above as val
from t
)
) t
group by state, population;
This approach would have less performance than the first, although it is perhaps easier to implement as the number of values increases. However, Oracle 12C supports lateral joins, where a similar approach would have competitive performance.
with CTE as (
select T.*
--step2: rank value
,RANK() OVER (PARTITION BY "State", "Population" order by "value") "rk"
from (
--step1: union merge three column to on column
select
"State", "Population",
'age_below_18' as GroupName,
"age_below_18" as "value"
from TestTable
union all
select
"State", "Population",
'age_18_to_50' as GroupName,
"age_18_to_50" as "value"
from TestTable
union all
select
"State", "Population",
'age_50_above' as GroupName,
"age_50_above" as "value"
from TestTable
) T
)
select T1."State", T1."Population"
,T3.GroupName Maximum_group
,T4.GroupName Minimum_group
,T3."value" Max_value
,T4."value" Min_value
--step3: max rank get maxvalue,min rank get minvalue
from (select "State", "Population",max( "rk") as Max_rank from CTE group by "State", "Population") T1
left join (select "State", "Population",min( "rk") as Min_rank from CTE group by "State", "Population") T2
on T1."State" = T2."State" and T1."Population" = T2."Population"
left join CTE T3 on T3."State" = T1."State" and T3."Population" = T1."Population" and T1.Max_rank = T3."rk"
left join CTE T4 on T4."State" = T2."State" and T4."Population" = T2."Population" and T2.Min_rank = T4."rk"
SQL Fiddle DEMO LINK
Hope it help you :)
Another option: use a combination of UNPIVOT(), which "rotates columns into rows" (see: documentation) and analytic functions, which "compute an aggregate value based on a group of rows" (documentation here) eg
Test data
select * from T ;
STATE POPULATION YOUNGERTHAN18 BETWEEN18AND50 OVER50
1 1000 250 600 150
2 4200 400 300 3500
UNPIVOT
select *
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
);
-- result
STATE POPULATION AGEGROUP QUANTITY
1 1000 youngest 250
1 1000 middleaged 600
1 1000 oldest 150
2 4200 youngest 400
2 4200 middleaged 300
2 4200 oldest 3500
Include Analytic Functions
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
youngerthan18 as 'youngest'
, between18and50 as 'middleaged'
, over50 as 'oldest'
)
)
;
-- result
STATE POPULATION MAXQ MINQ BIGGEST_GROUP SMALLEST_GROUP
1 1000 600 150 middleaged oldest
2 4200 3500 300 oldest middleaged
Example tested w/ Oracle 11g (see dbfiddle) and Oracle 12c.
Caution: {1} column (headings) need adjusting (according to your requirements). {2} If there are NULLs in your original table, you should adjust the query eg by using NVL().
An advantage of the described approach is: the code will remain rather clear, even if more 'categories' are used. Eg when working with 11 age groups, the query may look something like ...
select distinct
state
, population
, max( quantity ) over ( partition by state ) maxq
, min( quantity ) over ( partition by state ) minq
, first_value ( agegroup ) over ( partition by state order by quantity desc ) biggest_group
, first_value ( agegroup ) over ( partition by state order by quantity ) smallest_group
from T
unpivot (
quantity for agegroup in (
y10 as 'youngerthan10'
, b10_20 as 'between10and20'
, b20_30 as 'between20and30'
, b30_40 as 'between30and40'
, b40_50 as 'between40and50'
, b50_60 as 'between50and60'
, b60_70 as 'between60and70'
, b70_80 as 'between70and80'
, b80_90 as 'between80and90'
, b90_100 as 'between90and100'
, o100 as 'over100'
)
)
order by state
;
See dbfiddle.

Oracle conditional select based on address type

I'm working on a query to get student contact mailing addresses, and am at point where I am a bit stuck. I have managed to get a list of all student, and their contacts, but now when i try and join the contacts to their addresses, i'm not exactly sure how to get the correct address.
In the address table can hold multiple kinds of addresses (Home, Mailing, Business, Pickup, Dropoff) and basically what i need to do is only bring back one address per contact.
Normally this would be the home address, unless there is a mailing address
So my question is how do i write some type of conditional statement to only get entries WHERE ADDRESS_TYPE_NAME = 'Home' unless there is also an entry WHERE ADDRESS_TYPE_NAME = 'Mailing' for the same PERSON_ID?
Thanks
with CTE as
(
select Person_id,
Address_Type_Name,
Address_Info -- replace with your real column names
from Address_Table
where Address_Type_Name in ('Home','Mailing')
)
select Person_id, Address_info
from CTE a1
where Address_Type_Name = 'Home'
and not exists (select 1
from CTE a2
where a2.Address_Type_Name = 'Mailing'
and a2.Person_id = a1.Person_id)
union
select Person_id, Address_info
from CTE a1
where Address_Type_Name = 'Mailing'
You can prioritize Address Type and get highest priority type with
select Person_id,
case min(case Address_Type_Name
when 'Mailing' then 1
when 'Home' then 2
-- more
end)
when 1 then 'Mailing'
when 2 then 'Home'
-- more
end Best_Address_Type_Name
from Address_Table
group by Person_id;
Then join the result to your data as needed
Here is one way to do it, using the row_number() analytic function and not requiring any joins, explicit or implicit. It also handles various special cases: a student who has neither mailing nor home address (but still needs to be shown in the output), and another student with two mailing addresses (in which case a random one is chosen; if there are criteria to prefer one to the other, the query can be easily adapted to accommodate that).
with
students ( id, name, address_type, address ) as (
select 11, 'Andy', 'home' , '123 X street' from dual union all
select 11, 'Andy', 'office' , 'somewhere else' from dual union all
select 15, 'Eva' , 'mailing', 'post office' from dual union all
select 18, 'Jim' , 'office' , '1 building' from dual union all
select 30, 'Mary', 'mailing', 'mail addr 1' from dual union all
select 30, 'Mary', 'office' , '1 building' from dual union all
select 30, 'Mary', 'home' , 'her home' from dual union all
select 30, 'Mary', 'mailing', 'mail addr 2' from dual
)
-- End of test data (not needed for the SQL query - reference your actual table)
select id, name, address_type,
case when address_type is not null then address end as address
from (
select id, name,
case when address_type in ('home', 'mailing')
then address_type end as address_type,
address,
row_number() over (partition by id
order by case address_type when 'mailing' then 0
when 'home' then 1 end) as rn
from students
)
where rn = 1
;
ID NAME ADDRESS_TYPE ADDRESS
--- ---- ------------ --------------
11 Andy home 123 X street
15 Eva mailing post office
18 Jim
30 Mary mailing mail addr 1
4 rows selected.

SQL LISTAGG for different types of values

I have the following table
name type value
a mobile 123456
a home tel 456789
a Office addr add1
a home addr add2
b mobile 456456
b home tel 123123
b Office addr add3
b home addr add4
I want to make an SQL table like this
name phone address
a 123456; 456789 add1; add2
b 456456; 123123 add3; add4
I tried this SQL:
SELECT name,
LISTAGG(t.CONTACT_INFO,';')
WITHIN GROUP (ORDER BY type) AS phone where type in ('mobile', 'home tel'),
LISTAGG(t.CONTACT_INFO,';')
WITHIN GROUP (ORDER BY type) AS address where type in ('Office addr', 'home addr')
FROM table t
GROUP BY name
And it doesn't work. I know because the WHERE clause is not where it needs to be. But where can I insert the restrictions? If I do it like
SELECT name,
LISTAGG(t.CONTACT_INFO,';')
WITHIN GROUP (ORDER BY type) AS phone,
LISTAGG(t.CONTACT_INFO,';')
WITHIN GROUP (ORDER BY type) AS address
FROM table t
GROUP BY name
It would look like
name phone address
a 123456; 456789;add1; add2 123456; 456789;add1; add2
b 456456; 123123;add3; add4 456456; 123123;add3; add4
may be something like this, but i think you DB design is very bad
with a(name,type,value ) as (select 'a','mobile','123456' from dual union all
select 'a','home tel','456789' from dual union all
select 'a','Office addr', 'add1' from dual union all
select 'a','home addr','add2' from dual union all
select 'b','mobile','456456' from dual union all
select 'b','home tel','123123' from dual union all
select 'b','Office addr','add3' from dual union all
select 'b','home addr', 'add4' from dual )
select a1.name,
Listagg(a1.value, ';') Within Group (order by a1.type desc) phone,
Listagg(a2.value, ';') Within Group (order by a2.type) address
from a a1 inner join a a2
on decode(a1.type,'mobile','Office addr','home tel','home addr') = a2.type and a1.name = a2.name
group by a1.name
output is:
a 123456;456789 add1;add2
b 456456;123123 add3;add4
if order phone numbers (mobile, home) are not important
Try this code block:
select t2.name
,listagg(t2.phone
,';') within group(order by t2.phone) as phone
,listagg(t2.addresss
,';') within group(order by t2.addresss) as addresss
from (select t.name
,case
when t.type in ('mobile'
,'home tel') then
listagg(t.value
,';') within group(order by t.type)
else
null
end as phone
,case
when t.type in ('Office addr'
,'home addr') then
listagg(t.value
,';') within group(order by t.type)
else
null
end as addresss
from table t
group by t.name
,t.type) t2
group by t2.name
Give this result:
a 123456;456789 add1;add2
b 123123;456456 add3;add4

SQL Server GROUP BY troubles!

I'm getting a frustrating error in one of my SQL Server 2008 queries. It parses fine, but crashes when I try to execute. The error I get is the following:
Msg 8120, Level 16, State 1, Line 4
Column
'customertraffic_return.company' is
invalid in the select list because it
is not contained in either an
aggregate function or the GROUP BY
clause.
SELECT *
FROM (SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address) AS a
WHERE spid NOT IN (SELECT spid
FROM customer_exclude)
GROUP BY spid,
tech_email
Well, the error is pretty clear, no??
You're selecting those columns in your inner SELECT:
spid
company
tech_email
rating
and your grouping only by two of those (GROUP BY ctr.sp_id, cci.email_address).
Either you need group by all four of them (GROUP BY ctr.sp_id, cci.email_address, company, rating), or you need to apply an aggregate function (SUM, AVG, MIN, MAX) to the other two columns (company and rating).
Or maybe using a GROUP BY here is totally the wrong way to do - what is it you're really trying to do here??
The inner query:
SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address
has 4 non-aggregate things in the select (sp_id, company, email_address, rating) and you only group on two of them, so it is throwing an error on the first one it sees
So you either need to not group by any of them or group by all of them
i suggest replacing the * with a fully specified column list.
you can either group by all selected columns or use the other columns (not in group by clause) in a aggregate function (like sum)
you cannot: select a,b,c from bla group by a,b
but you can: select a,b,sum(c) from bla groupy by a,b