postgresql dynamically name columns in case statement - sql

I'm looking for a way to dynamically or automatically name the columns in my case statement below. Scenario - I'm trying to find out how many different companies of various industries are found in each country. The countries are the rows while the categories are the columns.
I'm using postgressql so pivot won't work and I don't have a new enough version where I can use cross-tab
I want to be able to replicate this for much larger scenarios where I won't have to worry about 'hardcoding' the cat_nbr and column names like I do here.
SELECT country,
count(CASE WHEN cat_nbr = 1 THEN company_code END) retail,
count(CASE WHEN cat_nbr = 2 THEN company_code END) finance,
count(CASE WHEN cat_nbr = 3 THEN company_code END) oil,
count(CASE WHEN cat_nbr = 4 THEN company_code END) tech
FROM global_companies
GROUP BY country
the table structure format in case it isn't clear has these columns:
country - cat_nbr - company_code - cat_desc.
Cat_desc is where I have hardcoded the words 'retail', 'finance', etc
Is there someway I can do this with less hardcoding in terms of what I refer to each cat_nbr/cat_desc? There are lots and lots of cat_nbrs and cat_descs.

You can not create a query with a dynamic row size. That's impossible, even with cross-tab.
You can however
create a query that returns a SQL statement which you can execute afterward, in the client.
create something like \crosstabview with the client.
You can read more information about this in my question, "How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?".

Instead of hardcoding category names, you could hardcode country names for the columns and let the rows be dynamic as usual.
SELECT cat_nbr,
COUNT(CASE WHEN Country = 'US' THEN company_code END) AS NumUS,
COUNT(CASE WHEN Country = 'UK' THEN company_code END) AS NumUK,
COUNT(CASE WHEN Country = 'FR' THEN company_code END) AS NumFR,
...
FROM global_companies
GROUP BY cat_nbr;
Another alternative is you can aggregate the data into JSON or array structures.

Related

Pivot aggregates in SQL

I'm trying to find a way to pivot the table below (I guess you would say it's in "long" format) into the ("wider") format where all the columns are essentially explicitly Boolean. I hope this simple example gets across what I'm trying to do.
Note there is about 74 people. (so the output table will have 223 columns, 1 + 74 x 3 )
I can't figure out an easy way to do it other than horribly with a huge number of left joins along "Town" by statements like
... left join(
select
town,
case where person = 'Richard' then 1 else 0 end as "Richard"
Fee as "Richard Fee"
from services
where person = 'Richard'
left join...
can some smart person suggest a way to do this using PIVOT functions in SQL?
I am using Snowflake (and dbt so I can get some jinja into play if really necessary to loop through all the people).
Input:
Desired output:
ps. I know this is a ridiculous SQL ask, but this is the "output the client wants" so I have this undesirable task to fulfil.
If persons are known in advance then you could use conditional aggregation:
SELECT town,
MAX(CASE WHEN person = 'Richard' THEN 1 ELSE 0 END) AS "Richard",
MAX(CASE WHEN person = 'Richard' THEN Fee END) AS "Richard Fee",
MAX(CASE WHEN person = 'Richard' THEN Service END) AS "Richard Service",
MAX(CASE WHEN person = 'Caitlin' THEN 1 ELSE 0 END) AS "Caitlin",
...
FROM services
GROUP BY town;

Generate columns from values returned by SELECT

I've got a query that returns data like so:
student
course
grade
a-student
ENG-W05
100
a-student
MAT-W05
85
a-student
ENG-W06
100
b-student
MAT-W05
90
b-student
SCI-W05
75
The data is grouped by student and course. Ideally, I'd like to have the above data transformed into the below:
student
ENG-W05
MAT-W05
ENG-W06
SCI-W05
a-student
100
85
100
NULL
b-student
NULL
90
NULL
75
So, after the transformation, each student only has one record, with all of their grades (and any missing courses graded as null).
Does anyone have any ideas? Obviously, this is fairly simple to do if I take the data out and transform it in a language (like Python), but I'd love to get the data in the desired format with an SQL query.
Also, would it be possible to have the columns order alphabetically (ascending)? So, the final output would be:
student
ENG-W05
ENG-W06
MAT-W05
SCI-W05
a-student
100
100
85
NULL
b-student
NULL
NULL
90
75
EDIT: To clarify, the values in course aren't known. The ones I provided are just examples. So ideally, if more course values found there way into that first query result (the first table), they would still be mapped to columns in the final result (without needing to change the query). In reality, I actually have >1k distinct values for the course column, and so I can't manually write out each one.
demos:db<>fiddle
You can use conditional aggregation for that:
SELECT
student,
SUM(grade) FILTER (WHERE course = 'ENG-W05') as eng_w05,
SUM(grade) FILTER (WHERE course = 'MAT-W05') as mat_w05,
SUM(grade) FILTER (WHERE course = 'ENG-W06') as eng_w06,
SUM(grade) FILTER (WHERE course = 'SCI-W05') as sci_w05
FROM mytable
GROUP BY student
The FILTER clause allows to aggregate only some specific records. So this one aggregates all records for a specific course.
Finding the correct aggregate function could be difficult. Here SUM() does the job, as there's only one value per group. MAX() or MIN() would do it as well. It depends on your real requirement. If there's really only one value per group, it doesn't matter, you just need to do any aggregation.
Instead of FILTER clause, which is Postgres specific, you could use the more SQL standard fitting CASE clause:
SELECT
student,
SUM(
CASE
WHEN course = 'ENG-W05' THEN grade
END
) AS eng_w05,
...
You can use the conditional aggregation as follows:
select student,
max(case when course = 'ENG-W05' then grade end) as "ENG-W05",
max(case when course = 'MAT-W05' then grade end) as "MAT-W05",
max(case when course = 'ENG-W06' then grade end) as "ENG-W06",
max(case when course = 'SCI-W05' then grade end) as "SCI-W05"
from (your_query) t
group by student

Qualifying a column in a SQL select statement

I'm looking to generate a query that pulls from several tables. Most are rather straightforward and I can pull a value from a table directly but there is one table that is pivoted so that the value I want depends on the value in another column.
The table looks like the below:
ID Condition Value
1 Stage1 6
2 Stage2 9
3 Stage3 5
4 Stage4 2
So I'm looking to write a query that essentially "qualifies" the value I want by telling the table which condition.
An example of my SQL:
Select Attribute1, Stage1Value, Stage2Value, Stage3Value
From attribute, stage
where attribute = project1
So I can't just pull the "Value" column as it needs to know which stage in the query.
There are 30 columns I am trying to pull - of which 13 fall into this category. Thanks for any help you can provide.
So, you want conditional aggregation something :
select a.<col>,
sum(case when s.Condition = 'Stage1' then s.value else 0 end),
. . .
sum(case when s.Condition = 'Stage4' then s.value else 0 end)
from attribute a inner join
stage s
on s.<col> = a.<col>
group by a.<col>

How come my count() passes evaluation?

I am fairly new at SQL and I am trying to create a query that tries to determine in which countries only 3 specific languages (spanish,italian,german) are spoken and no other languages.
select country
from langusage
group by country
having count(case when language in ('spanish','german','italian') then 1 else 5 end)=3
The output are all countries that have at least 1 of the aforementioned languages . How come they pass the '=3' test?
The reason is that count(1) = count(5). count() counts the number of non-NULL values.
You intend sum():
select country
from langusage
group by country
having sum(case when iso in ('spanish', 'german', 'italian') then 1 else 5 end) = 3

SQL-Server - Ability to pass subset parameter in the select?

I'm trying create a query that will output a total number, as well as a subset of the total number in SQL-Server. I can think of a way to do this via subqueries, but that seems like a ton of work. Is there a faster way to write this query?
Table name: orders
OrderID Date Type OrderSize
1 1/1/2012 Electronics $282.02
2 1/1/2012 Electronics $1,000.56
3 1/1/2012 Books $17.25
4 1/1/2012 Books $10.00
What I am trying to output would look like this:
Date ElectronicOrders ElectronicOrderSize BookOrders BookOrderSize
1/1/2012 2 $1,282.58 2 $27.25
I could create a temp table, then run 2 update queries - 1 WHERE Type = 'Electronics' and 1 WHERE Type = 'Books'.
What I have seen in some programming languages, such as R, is the ability to subset a variable. Is there a way for me to say something like:
count(OrderID, Type = 'Electronics) as ElectronicOrders, sum(OrderSize, Type = 'Electronics') as ElectronicOrderSize
Or am I stuck with subqueries and UPDATE queries?
I haven't ever gotten the new PIVOT syntax to make sense in my head but you can do a pivot table by grouping, and taking aggregate functions in a case statement.
select [date], sum( case when type = 'Electronics' then (ordersize) else 0 end) AS ElectronicsSum,
sum( case when type = 'Electronics' then 1 else 0 end) AS ElectronicsCount,
sum( case when type = 'Books' then (ordersize) else 0 end) AS BooksSum,
sum( case when type = 'Books' then 1 else 0 end) AS BooksCoumt
from orders
group by [date]
I put a fiddle thing up to test it out. If Aaron B. posts up a solution, give him the answer credit, I might not have even recognized the pivotyness of it.