Union distinct in Hive - hive

How do I combine rows while removing duplicate rows?
I tried UNION by itself as well as UNION DISTINCT but both returned error messages in Hue.
error while compiling statement: failed: parseexception line 5:10
mismatched input 'distinct' expecting all near 'union' in set operator
SELECT DISTINCT(product1.user)
FROM product1
UNION
SELECT DISTINCT(product2.user)
FROM product2
UNION
SELECT DISTINCT(product3.user)
FROM product3

As answered by #Andrew in the comments above:
If you are on a Hive version prior to 1.2, only union all is supported. So you'll have to use union all, and wrap an outer query around it for the distinct:
select distinct user from (select user from product1 union all select user from product2...) t

Related

How to UNION ALL tables with tablename satisfying condition in BigQuery?

I can unify two tables via:
select * from project.dataset.ourtable1
union all
select * from project.dataset.ourtable2
But what if I have thousands of tables, and I want to unify all which have name starting with ourtable?
I can get all such tables via:
select table_id from project.dataset.__TABLES__
where starts_with(table_id,'ourtable')
which returns a column of tables with table_id starting with ourtable.
How do I perform union all on all of them?
To rephrase the question: I am looking for the equivalent of
select * from project.dataset.ourtable1
union all
select * from project.dataset.ourtable2
union all
.
.
.
union all
select * from project.dataset.ourtable9999
in BigQuery.
A similar thread: here, but it is for SQL-Server, not BQ.
You can use a wildcard:
select * from `project.dataset.ourtable*`

Cross join InterBase get duplicate record based on condition

I am working with InterBase at the moment by Using Firebird. I need to get a record twice with a condition, but every time I run the query I keep getting an error message. Is there a way to get around the cross join in InterBase? Will there be a simpler way to get around to add this logic to the query. It seems like Firebird wont accept the cross join.
SELECT EMPLOYEE_NAME, LAST_NAME, CACODE
FROM EMPLOYEESTABLE
(
SELECT 889 AS CACODE, 8592-265-44444 AS STANDARDCACODE
UNION ALL
SELECT 695 AS CACODE, 8554-265-44578 AS STANDARDCACODE
) C
Select always needs from. The table RDB$DATABASE returns 1 row with the database name so we can use that. I think you are looking to do this:
SELECT EMPLOYEE_NAME, LAST_NAME, CACODE
FROM EMPLOYEESTABLE,
(
SELECT 889 AS CACODE, '8592-265-44444' AS STANDARDCACODE FROM RDB$DATABASE
UNION ALL
SELECT 695 AS CACODE, '8554-265-44578' AS STANDARDCACODE FROM RDB$DATABASE
) C
SELECT
889 AS CACODE,
/* '8592-265-44444' AS STANDARDCACODE, >>unused<< */
EMPLOYEE_NAME,
LAST_NAME
FROM EMPLOYEESTABLE
UNION ALL
SELECT
695 AS CACODE,
/* '8554-265-44578' AS STANDARDCACODE, >>unused<< */
EMPLOYEE_NAME,
LAST_NAME
FROM EMPLOYEESTABLE
Who need joins with such a simple query :-D

Selecting multiple ID's from a table gives error ORA-02070

When I try to select multiple ID's from a table, I am getting error ORA-02070.
Here is the query that I am using:
select *
from hrs_employee_store
where employee_id in (13511677, 576000);
Here is the error which I am getting:
ORA-02070: database ODS_XSTORE does not support TO_NUMBER in this context
Also, when I use this query,
select * from hrs_employee_store
where employee_id in ('13511677', '576000');
I am just getting the row for 13511677.
Is there a way to fix this issue? Thanks
I suspect that EMPLOYEE_ID is not a number. Try:
select *
from hrs_employee_store
where EMPLOYEE_ID in ('13511677', '576000');
This returns matching employees, meaning there is no match for the second.
If you want NULL values for all the extra columns, you can use left join:
select *
from (select '13511677' as employee_id from dual union all
select '576000'
) eid left join
hrs_employee_store es
using (employee_id);

Custom SORT BY SQL

I'm new to the community but have referenced it many times in the past. I have an issue I'm trying to overcome in Access, specifically with a SORT BY issue in SQL.
Long story short, I need to create a report based on the results of several different queries. I used a Union query to skirt the "Query is too complex" issue. The results of the query aren't in the order I'd like them, though.
Since this UNION query is not based on one specific table, rather the results of many queries, I'm not able to sort by a specific column header.
I want to sort the results by the way they are written in the SQL statement. Can anyone provide some insight to how to do this? I've attempted several different ways but always end up with an error message. Here's the code, and any help is greatly appreciated.
SELECT [Aqua-Anvil_Total].Expr1
FROM [Aqua-Anvil_Total];
UNION SELECT [Aqua-Reslin_Total].Expr1
FROM [Aqua-Reslin_Total];
UNION SELECT [Aqua_Zenivex_Total].Expr1
FROM [Aqua_Zenivex_Total];
UNION SELECT [Aqualuer_20-20_Total].Expr1
FROM [Aqualuer_20-20_Total];
UNION SELECT [Avalon_Total].Expr1
FROM [Avalon_Total];
UNION SELECT [BVA_13_Total].Expr1
FROM [BVA_13_Total];
UNION SELECT [Deltagard_Total].Expr1
FROM [Deltagard_Total];
UNION SELECT [Envion_Total].Expr1
FROM [Envion_Total];
UNION SELECT [Scourge_18-54_Total].Expr1
FROM [Scourge_18-54_Total];
UNION SELECT [Zenivex_E20_Total].Expr1
FROM [Zenivex_E20_Total];
This uses union all instead of union, so if you are using union to remove duplicates, there would be more work to do after this.
select Expr1
from (
select [Aqua-Anvil_Total].Expr1, 0 as sort
from [Aqua-Anvil_Total]
union all select [Aqua-Reslin_Total].Expr1, 1 as sort
from [Aqua-Reslin_Total]
union all select [Aqua_Zenivex_Total].Expr1, 2 as sort
from [Aqua_Zenivex_Total]
union all select [Aqualuer_20-20_Total].Expr1, 3 as sort
from [Aqualuer_20-20_Total]
union all select [Avalon_Total].Expr1, 4 as sort
from [Avalon_Total]
union all select [bva_13_Total].Expr1, 5 as sort
from [bva_13_Total]
union all select [Deltagard_Total].Expr1, 6 as sort
from [Deltagard_Total]
union all select [Envion_Total].Expr1, 7 as sort
from [Envion_Total]
union all select [Scourge_18-54_Total].Expr1, 8 as sort
from [Scourge_18-54_Total]
union all select [Zenivex_E20_Total].Expr1, 9 as sort
from [Zenivex_E20_Total]
) as u
order by u.sort

LEFT JOIN on static list of items?

DBMS is intersystems-cache!
Motivation: I need to do a left join on a table so I can get the same list of message types every time, even if the result is zero or null. Unfortunately, this is a large table so including a SELECT DISTINCT() is prohibitively slow. These should never change, so I thought I'd get the list once and just join them statically.
Based on another SO question, here is what I have to replace the SELECT DISTINCT():
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
This returns results that look exactly as expected, identical to the Distinct query. However, when I plug this into my JOIN statement, all the counts come back as zero.
Failing Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Failed results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 0
Working Query
SELECT mh.MessageBodyClassName, count(l.MessageBodyClassName) as MessageCount FROM
(
SELECT DISTINCT(MessageBodyClassName) FROM ens.messageheader
) mh LEFT JOIN
(
SELECT messageBodyClassName FROM ens.messageheader WHERE TimeCreated > DATEADD(hh, -1, GETUTCDATE())
) l ON mh.MessageBodyClassName = l.MessageBodyClassName
GROUP BY mh.MessageBodyClassName
Working and expected results
MessageBodyClassName MessageCount
------------------------------------- ------------
HS.MESSAGE.GATEWAYREGISTRATIONREQUEST 0
HS.MESSAGE.MERGEPATIENTREQUEST 0
HS.MESSAGE.PATIENTSEARCHREQUEST 54
For VKP: Why are the results different? How can I adjust the first query with literals to get the proper (same) results?
The last thing I can think of is to run your DISTINCT query once into a permanent table in your database. That way the inner SELECT in your query will only have to process those three lines. The inner query would lose DISTINCT, like
SELECT MessageBodyClassName FROM ens.messageheader_permvals
EDIT: The below answer did not work
This may be a longshot, but if it doesn't work it might help you diagnose the problem. Instead of the UNION try
SELECT MessageBodyClassName FROM ens.messageheader
WHERE MessageBodyClassName in (
'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST',
'HS.MESSAGE.MERGEPATIENTREQUEST',
'HS.MESSAGE.PATIENTSEARCHREQUEST')
That should return records only if those values actually exist in the table and are compatible with the format of MessageBodyClassName, which we know works using the DISTINCT version. I don't know if the performance will be better this way, but hopefully it will shed some light on the issue.
EDIT: the below answer does not apply, as the OP is was actually trying to select the literal quoted values
You don't have a FROM statements in your UNION query. Try
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
FROM ens.messageheader
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
FROM ens.messageheader
The rest of the query looks right.
I agree with xQbert, problem is the hard codes values
Try
SELECT T1.MessageBodyClassName, T2.MessageBodyClassName
FROM (
SELECT 'HS.MESSAGE.GATEWAYREGISTRATIONREQUEST' as MessageBodyClassName
UNION SELECT 'HS.MESSAGE.MERGEPATIENTREQUEST'
UNION SELECT 'HS.MESSAGE.PATIENTSEARCHREQUEST'
) as T1
LEFT JOIN (
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader
) as T2
ON T1.MessageBodyClassName = T2.MessageBodyClassName
Possible solution: Create a temporal table
CREATE TABLE className as
SELECT DISTINCT(MessageBodyClassName) as MessageBodyClassName
FROM ens.messageheader