SQL Using Ungrouped Columns in SELECT statement - sql

I have a GROUP BY Query which appears to use non-aggregated data not in the GROUP BY clause, which I thought would not work.
I was asked to write a query which converted the following data:
| item | type | cost | category |
|------|------|------|----------|
| 1 | X | 10 | A |
| 1 | Y | 20 | A |
| 2 | X | 30 | B |
| 2 | Y | 40 | B |
| 3 | X | 50 | C |
| 3 | Y | 60 | C |
| 4 | X | 70 | D |
| 4 | Y | 80 | D |
into this:
| item | x | y | category |
|------|----|----|----------|
| 1 | 10 | 20 | A |
| 2 | 30 | 40 | B |
| 3 | 50 | 60 | C |
| 4 | 70 | 80 | D |
Note:
The incoming data is clearly not normalised
The item is meant to be unique, but it is repeated for each type value
The category is the same for rows of the same item
I ended up with the following solution:
SELECT
item,
sum(CASE WHEN type='X' THEN cost END) as X,
sum(CASE WHEN type='Y' THEN cost END) as Y,
category
FROM data
GROUP BY item,category;
What surprised me is that it worked. What surprised me more is that it works for PostgreSQL, MariaDB (ANSI Mode), Microsoft SQL and SQLite.
Note:
- I have included category in the GROUP BY simply to allow it to appear in the SELECT clause.
- I have used the sum() function, even though there will only be one value, also simply to included it in the SELECT clause.
I thought I would not be able to use type column in the SELECT column because it is not in the GROUP BY and it is not aggregated. Indeed, if I try to select it by itself, the query will fail.
The question is, how is it that I can use the type column with the CASE operator, when I can’t use it by itself?

Your usage of the "ungrouped" columns is perfectly fine.
The rule is: "Every expression in the SELECT list must either be an aggregat function or it must part of the GROUP BY".
The column type is used inside an aggregate. sum(CASE WHEN type='X' THEN cost END) as X is not really different to sum(cost) or max(type).
This becomes more obvious if you use the standard SQL filter option:
sum(CASE WHEN type='X' THEN cost END)
is the same as:
sum(cost) filter (where type = 'X')
However only very few DBMS support this standard.

Related

SQL - How to transform a table with a range of values into another with all the numbers in that range?

I have a Table (A) with some intervals from start_val to end_val with an attribute for that range of values.
I want a Table (B) in which each row is a number in the interval of start_val to end_val with the attribute of that range.
I need to do that using SQL.
Exemple
Table A:
+---------+--------+----------+
|start_val| end_val| attribute|
+---------+--------+----------+
| 10 | 12 | 1 |
| 20 | 23 | 2 |
+---------+--------+----------+
Table B (Expected result):
+---------+----------+
|start_val| attribute|
|end_val | |
| interv | |
+---------+----------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 20 | 2 |
| 21 | 2 |
| 22 | 2 |
| 23 | 2 |
+---------+----------+
Here is a way to do this
select m.start_val + n -1 as start_val_computed
,m.attribute
from t m
join lateral generate_series(1,(m.end_val-m.start_val)+1) n
on 1=1
+--------------------+-----------+
| start_val_computed | attribute |
+--------------------+-----------+
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 20 | 2 |
| 21 | 2 |
| 22 | 2 |
| 23 | 2 |
+--------------------+-----------+
working example
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=ce9e13765b5a4c3616d95ec659c1dfc9
You may use a calendar table approach:
SELECT
t1.val,
t2.attribute
FROM generate_series(10, 23) AS t1(val)
INNER JOIN TableA t2
ON t1.val BETWEEN t2.start_val AND t2.end_val
ORDER BY
t2.attribute,
t1.val;
Note: You may expand the bounds in the above call to generate_series to cover whatever range you think your data would need.
This is a variant of George's solution, but it is a bit simpler:
select n, m.attribute
from t m cross join lateral
generate_series(m.start_val, m.end_val) n;
The changes are:
CROSS JOIN instead of JOIN. So, no need for an ON clause.
No arithmetic in the GENERATE_SERIES().
No arithmetic in the SELECT.
You can just call the result of GENERATE_SERIES() whatever name you want in the result set.
Postgres actually allows you to put GENERATE_SERIES() in the SELECT:
select generate_series(m.start_val, m.end_val) as n, m.attribute
from t m;
However, I am not a fan of putting row generating functions anywhere other than the FROM clause. I just find it confusing to figure out what the query is doing.

Order Of Execution of the SQL query exception for SELECT/HAVING

I understand that the order or execution is as follows
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
from this SO Answer as well as Microsoft Documentation
However, in my query below, the column total is built on the fly which is later used in having clause. This would mean that having executes AFTER select and not before because the column 'total' does not exist in orders table.
Am I interpreting it wrong or simply missing something?
Query
select customer_id,
sum(CASE
WHEN product_name = 'A' THEN 1
WHEN product_name = 'B' THEN 1
WHEN product_name = 'C' THEN -1
ELSE 0 END
) as total
from Orders
group by customer_id
having total > 1;
Orders table
+------------+-------------+--------------+
| order_id | customer_id | product_name |
+------------+-------------+--------------+
| 10 | 1 | A |
| 20 | 1 | B |
| 30 | 1 | D |
| 40 | 1 | C |
| 50 | 2 | A |
| 60 | 3 | A |
| 70 | 3 | B |
| 80 | 3 | D |
| 90 | 4 | C |
+------------+-------------+--------------+
Result
+-------------+-------+
| customer_id | total |
+-------------+-------+
| 3 | 2 |
+-------------+-------+
What you have described is NOT the "order of execution". It is the order of scoping for identifiers defined in the query.
It is saying that an identifier defined in from is known in the clauses beneath it. Similarly, an identifier defined in the select is not recognized in the having. I should note that many databases do allow the having clause to use aliases in the having clause. SQL Server is not one of them.
SQL is a descriptive language, not a procedural language. That means that a query describes the result set. It does not state the steps used to generate the result. The compiler and optimizer produce the execution plan, which looks nothing like the original query.

How to select from a table with additional where clause on a single column

I'm having trouble formulating a SQL query in Oracle. Here's my sample table:
+----+-----------+-----------+--------+
| id | start | end | number |
+----+-----------+-----------+--------+
| 1 | 21-dec-19 | 03-jan-20 | 12 |
| 2 | 23-dec-19 | 05-jan-20 | 10 |
| 3 | 02-jan-20 | 15-jan-20 | 9 |
| 4 | 09-jan-20 | NULL | 11 |
+----+-----------+-----------+--------+
And here's what I have so far:
SELECT
SUM(number) AS total_number,
SUM(number) AS total_ended_number -- (WHERE end IS NOT NULL)
FROM table
WHERE ... -- a lot of where clauses
And the desired result:
+--------------+--------------------+
| total_number | total_ended_number |
+--------------+--------------------+
| 42 | 31 |
+--------------+--------------------+
I understand I could do a separate select inside 'total_ended_number', but the initial select has a bunch of where clauses already which would need to be applied to the internal select as well.
I'm capable of formulating it in 2 separate selects or 2 nested selects with all the where clauses duplicated, but my intended goal is to not duplicate the where clauses that would both be used on the table.
You could sum over a case expression with this logic:
SELECT
SUM(number) AS total_number,
SUM(CASE WHEN end IS NOT NULL THEN number END) AS total_ended_number
FROM table
WHERE ... -- a lot of where clauses
SUM(case when "end" is not null then number else 0 end) AS total_ended_number

Make a query making groups on the same result row

I have two tables. Like this.
select * from extrafieldvalues;
+----------------------------+
| id | value | type | idItem |
+----------------------------+
| 1 | 100 | 1 | 10 |
| 2 | 150 | 2 | 10 |
| 3 | 101 | 1 | 11 |
| 4 | 90 | 2 | 11 |
+----------------------------+
select * from items
+------------+
| id | name |
+------------+
| 10 | foo |
| 11 | bar |
+------------+
I need to make a query and get something like this:
+--------------------------------------+
| idItem | valtype1 | valtype2 | name |
+--------------------------------------+
| 10 | 100 | 150 | foo |
| 11 | 101 | 90 | bar |
+--------------------------------------+
The quantity of types of extra field values is variable, but every item ALWAYS uses every extra field.
If you have only two fields, then left join is an option for this:
select i.*, efv1.value as value_1, efv2.value as value_2
from items i left join
extrafieldvalues efv1
on efv1.iditem = i.id and
efv1.type = 1 left join
extrafieldvalues efv2
on efv1.iditem = i.id and
efv1.type = 2 ;
In terms of performance, two joins are probably faster than an aggregation -- and it makes it easier to bring in more columns from items. One the other hand, conditional aggregation generalizes more easily and the performance changes by little as more columns from extrafieldvalues are added to the select.
Use conditional aggregation
select iditem,
max(case when type=1 then value end) as valtype1,
max(case when type=2 then value end) as valtype2,name
from extrafieldvalues a inner join items b on a.iditem=b.id
group by iditem,name

SQL Group by one column and decide which column to choose

Let's say I have data like this :
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 2 | 20 | B | 20 |
| 3 | 10 | C | 30 |
| 4 | 10 | D | 80 |
I would like to group rows by code value, but get real rows back (not some aggregate function).
I know that just
select *
from table
group by code
won't work because database don't know which row to return where code is the same.
So my question is how to tell database to select (for example) the lower number column so in my case
| id | code | name | number |
-----------------------------------------------
| 1 | 20 | A | 10 |
| 3 | 10 | C | 30 |
P.S.
I know how to do this by PARTITION but this is only allowed in Oracle databases and can't be created in JPA criteria builder (what is my ultimate goal).
Why You don't use code like this?
SELECT
id,
code,
name,
number
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY number ASC) AS RowNo
FROM table
) s
WHERE s.RowNo = 1
You can look at this site;
Data Partitioning