Using group/order by with union clause in sql query - sql

I have four sql queries which gave me same columns so I am trying to combine them using UNION clause. Below is what I have tried but it gives me an error:
select clientid,
'Test1' as client_name,
client_timestamp,
sum(client_counts) as count,
processIds as contracts
from output_1
group by 1,2,3,5
order by 1
UNION
select clientid,
'Test2' as client_name,
client_timestamp,
sum(client_counts) as count,
'' as contracts
from output_2
group by 1,2,3,5
order by 1
UNION
select clientid,
'Test3' as client_name,
client_timestamp,
sum(kite_count) as count,
process_metric as contracts
from output_3
group by 1,2,3,5
order by 1
UNION
select clientid,
'Test4' as client_name,
execution_client_ts as client_timestamp,
sum(kite_count) as count,
process_data as contracts
from output_4
group by 1,2,3,5
order by 1
Error I get is "Invalid Syntax" around UNION line. Anything wrong I am doing here?

A union query may only have one order by clause.
If you are satisfied with ordering the whole resultset, you can remove all order by clauses and just keep the very last one, at the end of the query. It applies to the entire dataset that union generates.
Note that your UNIONs are equivalent to UNION ALLs - because the client name is different in each member - and should be phrased as such.
If, on the other hand, you want to order reach sub-result, then this is different. Basically you need a flag in each member, that can then be used to identify each group. The client name might be a good pick, so:
order by client_name, client_id

Related

Is it possible to UNION distinct rows but disregard one column to determine uniqueness?

select d.id, d.registration_number
from DOCUMENTS d
union
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
Would it be possible to union those results based solely on the uniqueness of the registration_number, disregarding the id of the documents?
Or, is it possible to achieve the same result in a different way?
Just to add: actually I'm unioning 5 queries, each ~20 lines long, with 4 columns that should be disregarded in determining uniqueness.
you basically need to wrap the unioned data with something else to get only the ones you want.
SELECT min(id), registration_number
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents)
GROUP BY registration_number
Union will check the combination of all the columns for uniqueness. You could, however, use union all (that does not remove duplicates) and then apply the logic yourself using the row_number window function:
SELECT id, registration_number
FROM (SELECT id, registration_number,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id) AS rn
FROM (SELECT id, registration_number
FROM documents
UNION ALL
SELECT id, registration_number
FROM different_documents) u
) r
WHERE rn = 1
Since the other answers are already correct, may I ask why do you need to retrieve other columns in that query since the primary purpose appear to gather unique registration numbers?
Wouldn't it be simpler to first gather unique registration number and then retrieve the other info?
Or in your actual query, first gather the info without the columns that should be disregarded and then gather the info in these column if need be?
Like,for example, making a view with
SELECT d.registration_number
FROM DOCUMENT d
UNION
SELECT dd.registration_number
FROM DIFFERENT_DOCUMENT dd
and then gather information using that view and JOINS?
Assuming registration_number is unique in each table, you can use not exists:
select d.id, d.registration_number
from DOCUMENTS d
union all
select dd.id, dd.registration_number
from DIFFERENT_DOCUMENTS dd
where not exists (select 1
from DOCUMENTS d
where dd.registration_number = d.registration_number
);

Removing doubling lines

I have written a union query but I need to eliminate the lines that are duplicated (line 2 and 3 in the column 'kods') and leave only distinct values of column 'kods'. How can that be done?
You need to decide which of the id values to discard using either min or max and group by the remaining columns. you don't need distinct and can union all since group by will perform the dedupe.
select kods, min(id) id, vards, uzvards from (
select kods, id, vards, uzvards
from dataset
union all
select kods, id, vards, uzvards
from dataset_2
)x
group by kods, vards, uzvards

group by and union in oracle

I would like to union 2 queries but facing an error in oracle.
select count(*) as faultCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='FAULT'
union
select count(*) as responseCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='RESPONSE'
group by COMP_IDENTIFIER
order by responseCount;
Two queries run perfectly individually.but when using union,it says ORA-00904: "RESPONSECOUNT": invalid identifier
The error you've run into
In Oracle, it's best to always name each column in each UNION subquery the same way. In your case, the following should work:
select count(*) as theCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='FAULT'
group by COMP_IDENTIFIER -- don't forget this
union
select count(*) as theCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='RESPONSE'
group by COMP_IDENTIFIER
order by theCount;
See also:
Curious issue with Oracle UNION and ORDER BY
A good workaround is, of course, to use indexed column references as suggested by a_horse_with_no_name
The query you really wanted
From your comments, however, I suspect you wanted to write an entirely different query, namely:
select count(case AUDIT_CONTEXT when 'FAULT' then 1 end) as faultCount,
count(case AUDIT_CONTEXT when 'RESPONSE' then 1 end) as responseCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT in ('FAULT', 'RESPONSE')
group by COMP_IDENTIFIER
order by responseCount;
The column names of a union are determined by the first query. So your first column is actually named FAULTCOUNT.
But the easiest way to sort the result of a union is to use the column index:
select ...
union
select ...
order by 1;
You most probably also want to use UNION ALL which avoids removing duplicates between the two queries and is faster than a plain UNION
In Union or Union all query column names are determined by the first query column name.
In your query replace "order by responseCount" with "order by faultCount.

Return number of records in SQL select union

I am trying to create a select query so that it meets a certain format. I need the string "Record Count" in first row. Then I also need the number of rows in second column row 1. Then I need to union it with another query
Record Count 125
2134123
Here's what it looks like for sample in a csv that I want the output to appear as
Record Count,125
99902064
12312312
I tried the following code
SELECT 'Record Count', count(select loginid
from employees)
FROM dual
union
select loginid
from employees
When I do this is puts the word record count in all the rows. I only want Record Count in row 1 and then next column have actual #. I was also considering just changing the column header to be "Record Count" but I couldn't figure out how to make the next column header a # i.e. use count(*).
If you need rows in a particular order, then you need to use order by. Here is one method:
select loginid, cnt
FROM (SELECT 'Record Count' as loginid, (select count(loginid) from employees) as cnt, 1 as ordering
FROM dual
union all
select loginid, NULL, 2
from employees
) t
order by ordering;
The subqueries in a union should also have the same columns, and the columns should be given names. And, I'm not aware that you can use a subquery as the argument to count().
For this form, this is a better way to write the query:
select loginid, cnt
FROM (SELECT 'Record Count' as loginid, count(loginid) as cnt, 1 as ordering
FROM employees
union all
select loginid, NULL, 2
from employees
) t
order by ordering;
you can achieve the same removing the union using sqlplus spool.
Example of how to spool How do I spool to a CSV formatted file using SQLPLUS?
SELECT 'Record Count'|| ','|| count(select loginid
from employees)
FROM dual;
select loginid
from employees;

How to do a query that is agnostic of the sort field?

I have multiple tables that have the same date_time added field in each table. After doing a UNION of all tables i want to sort them by the most recent one. But the query will tell me that the i have to add a table name like videos.date_time rather than ORDER BY date_time. How can i structure the query so that it is agnostic of the which date_time field?
Unless you are using a proprietary feature such as SQL Server's TOP directive, the Order By in a Union query is always at the bottom and always applies to the entire query. E.g.
Select Col1, date_time
From Table1
Union All
Select Col1, date_time
From Table2
Order By date_time
If your query does include various elements such TOP or LIMIT which require an Order By and thus you want to differentiate the Order By's, then you can encapsulate your query into a derived table:
Select Col, date_time
From (
Select Col1 As Col, date_time
From Table1
Union All
Select Col1, date_time
From Table2
) As Z
Order By Z.date_time
In SQL Server you can also order by a column number, e.g. "ORDER BY 2" in which case whatever the second column is in your union set would be the sort target.
As I understand you have X tables (where X is > 1), and every table have it's own date_time column and you want to get last updated. If that's true, than one of the possible ways is to do it that way
SELECT id, date_added FROM table1
UNION ALL
SELECT id, date_added FROM table2
ORDER BY date_added DESC;
Other ways which I have in mind is when you fetch results, put them in array and do the "magic" inside it.