ORDER BY and UNION - sql

So I'm trying to union two queries and then order by a column. However, whenever I try and run the query it gives an error that doesn't make sense to me.
Example data
CREATE TABLE test (
Company_code VARCHAR(120) NOT NULL,
operating_year INT NOT NULL,
Profit INT NOT NULL
);
INSERT INTO test (Company_code, operating_year, Profit)
VALUES ('A', 1999, 2000),
('A', 2000, 3000),
('B', 1999, 1600),
('B', 2000, 4000);
Query
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'A'
UNION
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'B'
ORDER BY
-- t.profit; --- Does *not* work
-- profit; --- Does work
Ignore the very basic example, and how just adding an OR to the WHERE statement resolves this.
My question is why does having an alias in the ORDER BY throw an error when a UNION is involved. But not when run individually?

My question is why does having an alias (t) in the ORDER BY throw an error when a UNION is involved. But not when run individually.
When you use a set-theoretic operator between two queries (e.g. UNION, UNION ALL, EXCEPT, INTERSECT, etc) the ORDER BY clause now applies to the end result of that operator, which is a new anonymous derived-table. So there's no way to bind t in ORDER BY because it is now out-of-scope.
If you add parentheses around the derived-tables it's easier to see why... so your query is really like this pseudo-SQL:
SELECT
company_code,
profit
FROM
(
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'A'
) AS q1
UNION
(
SELECT
t.company_code,
t.profit
FROM
test t
WHERE
t.company_code = 'B'
) AS q2
ORDER BY
t.profit <-- `t` is not in scope. It's masked by (or "buried inside"?) q1 and q2.
profit <-- this works because `profit` is in the result of the UNION

Because you're not sorting the separate queries that reference columns through a table alias. You are ordering the UNION query output which is almost like a table itself.
The ORDER BY is performed after the UNION this is equivalent:
SELECT * FROM (
SELECT
t.company_code,
t.profit
FROM test t
WHERE t.company_code = 'A'
UNION
SELECT
t.company_code,
t.profit
FROM test t
WHERE t.company_code = 'B'
) t
ORDER BY t.profit; --- Should work
--ORDER BY profit; --- Does work

Related

SQL - expand dataset into lookup table?

I currently have a legacy table that looks like the one below.
This is a set of rules that our business has stored over the years. the issue is the "all" and "both" values really should be separated out into rows so they can be queried more efficiently.
For example, the contract length column can only ever be between 1 and 5, the type column can only ever be "gas" or "water" and the sales channel "internal" or "external". Instead of saying all or both, another row should exist with the specific rule and the table should look like the below.
So this will have a row for every variation in the first table.
I didn't think it would be a long task to manually do myself. but I was wrong :)
Does anyone have any idea on how to achieve this quickly in SQL? I would say what I have tried so far...but I am completely stumped on this one so am wondering if it can even be done at all?
This could be done in a single sql statement, but for the sake of your mental health and the ability to check interim result sets before you get to the final output is probably a lot healthier and less risky.
I would approach this with a UNION query, one set of UNIONs for each column that should be split out to more granular rows.
For instance for contractlength:
SELECT Supplier, 1, Type, SalesChannel FROM yourtable WHERE contractLength in ('1', 'All')
UNION ALL
SELECT Supplier, 2, Type, SalesChannel FROM yourtable WHERE contractLength in ('2', 'All')
UNION ALL
SELECT Supplier, 3, Type, SalesChannel FROM yourtable WHERE contractLength in ('3', 'All')
UNION ALL
SELECT Supplier, 4, Type, SalesChannel FROM yourtable WHERE contractLength in ('4', 'All')
UNION ALL
SELECT Supplier, 5, Type, SalesChannel FROM yourtable WHERE contractLength in ('5', 'All')
You can write those results out to a temp table, and then build your query for type on top of it writing to a new temp table.
SELECT Supplier, contractLength, 'Gas', SalesChannel FROM previousTempTable WHERE type in ('Gas','Both')
UNION ALL
SELECT Supplier, contractLength, 'Water', SalesChannel FROM previousTempTable WHERE type in ('Gas','Both')
Rinse and repeat for SalesChannel.
There's other more elegant ways to solve this with some SELECT DISTINCT and cross joins, but your list of values for each column is limited and this solution I'm proposing feels like a quick easy way to get your data in shape. It's also easy to understand if this is auditable data or the process needs to be repeated.
You don't need to query your table multiple times, or use temp tables. You can do this pretty elegantly with conditional unpivots, by using CROSS APPLY
SELECT
t.Supplier,
c1.ContractLength,
c2.Type,
c3.SalesChannel
FROM YourTable t
CROSS APPLY (
SELECT t.ContractLength
WHERE t.ContractLength <> 'All'
UNION ALL
SELECT *
FROM (VALUES
(1),(2),(3),(4),(5)
) v(ContractLength)
WHERE t.ContractLength = 'All'
) c1
CROSS APPLY (
SELECT t.Type
WHERE t.Type <> 'Both'
UNION ALL
SELECT *
FROM (VALUES
('Gas'),('Water')
) v(Type)
WHERE t.Type = 'Both'
) c2
CROSS APPLY (
SELECT t.SalesChannel
WHERE t.SalesChannel <> 'Both'
UNION ALL
SELECT *
FROM (VALUES
('Internal'),('External')
) v(SalesChannel)
WHERE t.SalesChannel = 'Both'
) c3;
A somewhat less efficient, but more compact, version of the same, is to use normal joins against the VALUES clauses
SELECT
t.Supplier,
c1.ContractLength,
c2.Type,
c3.SalesChannel
FROM YourTable t
JOIN (VALUES
(1),(2),(3),(4),(5)
) c1(ContractLength)
ON c1.ContractLength = t.ContractLength OR t.ContractLength = 'All'
JOIN (VALUES
('Gas'),('Water')
) c2(Type)
ON c2.Type = t.Type OR t.Type = 'Both'
JOIN (VALUES
('Internal'),('External')
) c3(SalesChannel)
ON c3.SalesChannel = t.SalesChannel OR t.SalesChannel = 'Both';

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Query to find if a aggregate string contains certain numbers

I am working on Big Query Standard SQL. I have a data table like shown below (using ; as separator):
id;operation
107327;-1,-1,-1,-1,5,-1,0,2,-1
108296;-1,6,2,-1,-1,-1
690481;0,-1,-1,-1,5
102643;5,-1,-1,-1,-1,-2,2,3,-1,0,-1,-1,-1,-1,-1,-1
103171;0,5
789481;0,-1,5
I would like to take id that only contains operation 0,5 or 0,-1,5 so the result will show:
690481
103171
789481
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.table`
WHERE 0 = (
SELECT COUNT(1)
FROM UNNEST(SPLIT(operation)) op
WHERE NOT op IN ('0', '-1', '5')
)
You can test, play with above using sample data form your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 107327 id, '-1,-1,-1,-1,5,-1,0,2,-1' operation UNION ALL
SELECT 108296, '-1,6,2,-1,-1,-1' UNION ALL
SELECT 690481, '0,-1,-1,-1,5' UNION ALL
SELECT 102643, '5,-1,-1,-1,-1,-2,2,3,-1,0,-1,-1,-1,-1,-1,-1' UNION ALL
SELECT 103171, '0,5' UNION ALL
SELECT 789481, '0,-1,5'
)
SELECT *
FROM `project.dataset.table`
WHERE 0 = (
SELECT COUNT(1)
FROM UNNEST(SPLIT(operation)) op
WHERE NOT op IN ('0', '-1', '5')
)
with output
I think regular expression does what you want:
select t.*
from t
where regexp_contains(operation, '^0,(-1,)*5$');
If you want matches to rows that contain only 0, -1, or 5, you would use:
where regexp_contains(operation, '^((0|-1|5),)*(0|-1|5)$');

use SUM on certain conditions

I have a script that extracts transactions and their details from a database. But my users complain that the file size being generated is too large, and so they asked for certain transactions to be just summed up/consolidated instead if they are of a certain classification, say Checking Accounts. That means there should only be one line in the result set named "Checking" which contains the sum of all transactions under Checking Accounts. Is there a way for an SQL script to go like:
CASE
WHEN Acct_class = 'Checking'
then sum(tran_amount)
ELSE tran_amount
END
I already have the proper GROUP BY and ORDER BY statements, but I can't seem to get my desired output. There is still more than one "Checking" line in the result set. Any ideas would be very much appreciated.
Try This,
Select sum(tran_amount) From tran_amount Where Acct_class = 'Checking'
You can try to achieve this using UNION ALL
SELECT tran_amount, .... FROM table WHERE NOT Acct_class = 'Checking'
UNION ALL
SELECT SUM(tran_amount), .... FROM table WHERE Acct_class = 'Checking' GROUP BY Acct_class, ...;
hi you can try below sql
select account_class,
case when account_class = 'saving' then listagg(trans_detail, ',') within group (order by emp_name) -- will give you all details transactions
when account_class = 'checking' then to_char(sum(trans_detail)) -- will give you only sum of transactions
end as trans_det from emp group by account_class;
Or, if your desired output is getting either the sum, either the actual column value based on another column value, the solution would be to use an analytical function to get the sum together with the actual value:
select
decode(acct_class, 'Checking', tran_amount_sum, tran_amount)
from (
select
sum(tran_amount) over (partition by acct_class) as tran_amount_sum,
tran_amount,
acct_class
from
YOUR_TABLE
)
You can try something like the following, by keeping single rows for some classes, and aggregating for some others:
with test (id, class, amount) as
(
select 1, 'a' , 100 from dual union all
select 2, 'a' , 100 from dual union all
select 3, 'Checking', 100 from dual union all
select 4, 'Checking', 100 from dual union all
select 5, 'c' , 100 from dual union all
select 6, 'c' , 100 from dual union all
select 7, 'c' , 100 from dual union all
select 8, 'd' , 100 from dual
)
select sum(amount), class
from test
group by case
when class = 'Checking' then null /* aggregates elements of class 'b' */
else id /* keeps elements of other classes not aggregated */
end,
class

How to write a Sql statement without using union?

I have a sql statement like below. How can I add a single row(code = 0, desc = 1) to result of this sql statement without using union keyword? thanks.
select code, desc
from material
where material.ExpireDate ='2010/07/23'
You can always create a view for your table which itself uses UNION keyword
CREATE VIEW material_view AS SELECT code, desc, ExpireDate FROM material UNION SELECT '0', '1', NULL;
SELECT code, desc FROM material_view WHERE ExpireDate = '2010/07/23' OR code = '0';
WITH material AS
(
SELECT *
FROM
(VALUES (2, 'x', '2010/07/23'),
(3, 'y', '2009/01/01'),
(4, 'z', '2010/07/23')) vals (code, [desc], ExpireDate)
)
SELECT
COALESCE(m.code,x.code) AS code,
COALESCE(m.[desc],x.[desc]) AS [desc]
FROM material m
FULL OUTER JOIN (SELECT 0 AS code, '1' AS [desc] ) x ON 1=0
WHERE m.code IS NULL OR m.ExpireDate ='2010/07/23'
Gives
code desc
----------- ----
2 x
4 z
0 1
Since you don't want to use either a union or a view, I'd suggest adding a dummy row to the material table (with code = 0, desc = 1, and ExpireDate something that would never normally be selected - eg. 01 January 1900) - then use a query like the following:
select code, desc
from material
where material.ExpireDate ='2010/07/23' or
material.ExpireDate ='1900/01/01'
Normally, a Union would be my preferred option.