math operations between queries Impala SQL - sql

i need to divide the results coming from two different queries in Impala through the HUE editor.
The query i wrote in Oracle is shown below:
select
(select count(distinct t1.ids)
from table1 t1
where extract(year from t1.insertdate)=2020)
/
(select count(distinct t2.ids)
from table2 t2
where extract(year from t2.insertdate)=2019)
from dual
On impala the same query does not work due to "/" operator. Can you please explain me how to do the same thing in Impala SQL?

You can join them on a dummy column and then divide the result sets.
SELECT cnt1.cnt1/cnt2.cnt2
FROM
(SELECT count(DISTINCT t1.ids) cnt1, 'dummy' dum
FROM table1 t1
WHERE YEAR (t1.insertdate)=2020) cnt1
JOIN
(SELECT count(DISTINCT t2.ids) cnt2, 'dummy' dum
FROM table2 t2
WHERE YEAR (t2.insertdate)=2019) cnt2
ON cnt1.dum= cnt2.dum -- dummy column

Related

How to handle duplicate columns in multiple WITH clause SQL

I am trying to create a query with multiple WITH clause in bigquery. I am getting an error : Duplicate column names in the result are not supported. Found duplicate(s): because I have some repeated columns in the tables.
Problem is I can't remove them as I need it to display in my table also they are needed in the group by clause in the tables.
My code somewhat looks like this:
WITH table0 as (## query0),
table1 AS (## query1),
table2 as (## query2),
table3 as (## query3),
table4 as (## query4),
table5 as (## query 5)
select
*
from
table0,
table1,
table2,
table3,
table4,
table5
How do I handle duplicate columns in multiple WITH clause in SQL
Why are you creating a Cartesian product of the subqueries?
In any case, BigQuery gives you more control over the columns than other databases. So, if col1 is common to table0 and table1, you can do:
select t1.*, t2.* except (col1)
If you want to keep both values:
select t1.*, t2.* except (col1), t2.col1 as t2_col1
or
select t1.* except (col1),
t2.* except (col1),
t1.col1 as t1_col1,
t2.col1 as t2_col1
From your previous question (that looks like was deleted) I think I remember your use case and there you had (I can be wrong ) only fields that are used for JOINs are "duplicate"
In such cases you can use below approach (in below example it is assumed that those duplicate fields are id and day)
#standardSQL
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
for example in below super-simplified dummy example
#standardSQL
WITH `project.dataset.table0` AS (
SELECT 1 id, '2019-01-01' day, 0 col0
), `project.dataset.table1` AS (
SELECT 1 id, '2019-01-01' day, 1 col1
), `project.dataset.table2` AS (
SELECT 1 id, '2019-01-01' day, 2 col2
), `project.dataset.table3` AS (
SELECT 1 id, '2019-01-01' day, 3 col3
)
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
result will be
Row id day col0 col1 col2 col3
1 1 2019-01-01 0 1 2 3
w/o any complains about duplicate fields
As you can see from above example - using USING() instead of ON "magicaly" resolves the issue - but again - note - only for case when "duplicate" fields are all JOIN fields

SQL sum, grouping by another table

Consider I have a table t1 with a single column a, and it has values, say
a
-
5
10
15
17
I want to write a single SQL query that does the following. Basically, I want to know the sum of all values up to the value in the table t1.
SELECT SUM(value) FROM t2 WHERE value<=5
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=10
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=15
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=17;
If someone changes the value in a, like delete or insert more elements, I have to rewrite the above query. Is there a query that always works automatically?
Here is the DB fiddle link.
I think you appears to want :
SELECT SUM(case when bound<=a[1] then value else 0 end),
. . .
FROM table t;
After edit with fiddle, you can use subquery instead of UNION :
select *, (select sum(t2.value) from t2 where t2.value <= t1.a)
from t1;
I think it's more clear with a join than a subquery, but just personal preference.
SELECT
t1.a,
SUM(COALESCE(t2.value, 0))
FROM
t1
LEFT JOIN
t2
ON
t2.value <= t1.a
GROUP BY
t1.a

ORA-01427 - Need the counts of each value

I get "ORA-01427: single-row subquery returns more than one row" when I run the following query:
select count(*)
from table1
where to_char(timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
and attribute = (select distinct attribute from table2);
I want to get the counts of each value of attribute in the specific time frame.
I would recommend writing this as:
select count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute);
This formulation makes it easier to use indexes and statistics for optimizing the query. Also, select distinct is not appropriate with in (although I think Oracle will optimize away the distinct).
EDIT:
You appear to want to aggregate by attribute as well:
select t1.attribute, count(*)
from table1 t1
where timestamp >= trunc(sysdate-1/24, 'HH') and
timestamp < trunc(sysdate, 'HH') and
exists (select 1 from table2 t2 where t2.attribute = t1.attribute)
group by t1.attribute;
You can do it with a join and GROUP BY:
SELECT
count(*) AS Cnt
, a.attribute
FROM table1 t
JOIN table2 a ON t.attribute=a.attribute
WHERE to_char(t.timestamp,'yyyymmddhh24') = to_char(sysdate-1/24,'yyyymmddhh24')
GROUP BY a.attribute
This produces a row for each distinct attribute from table2, paired up with the corresponding count from table1.

MS SQL Server sum of sum fields

I have a sql statement that will give me two columns from two tables using sub query.
select
sum(field1) as f1_sum,
(select sum(field2) from table2) as f2_sum
from
table1
group by
table1.field_x
I want to get the total of f1_sum + f2_sum as the third column output from this query. It seems simple but I can't find a way around this.Question is how to get the sum of sum fields.
I am ok to write SP or a view to do this etc..
Can someone assist please ?
you can use subquery like:
SELECT t1.f1_sum+t1.f2_sum AS total_sum FROM
(select sum(field1) as f1_sum , (select sum(field2) from table2) as f2_sum
from table1
group by table1.field_x) AS t1
I would suggest doing it like this:
select t1.f1_sum, t2.f2_sum, coalesce(t1.f1_sum, 0) + coalesce(t2.f2_sum, 0)
from (select sum(field1) as f1_sum
from table1 t1
group by t1.field_x
) t1 cross join
(select sum(field2) as f2_sum from table2) t2;
When possible, I prefer to keep table references in the from clause. I added the coalesce() just in case any of the values could be NULL.
You could also try this :
SELECT SUM(a.field1) f1_sum,
SUM(b.field2) f2_sum,
(SUM(a.field1) + SUM(b.field2)) f3_sum
from table1 a, table2 b
Simply you can write,
select sum(field1) as f1_sum
, (select sum(field2) from table2) as f2_sum
, (ISNULL(sum(field1),0) + ISNULL((select sum(field2) from table2),0)) AS Total_Sum
from table1
group by table1.field_x

How to remove duplicate rows as well as old records in Oracle database

I search through the net and didn't find answer for this kind.
I have table emp_master_data, which have many columns but I want to use few columns to filter the data ( select query) and then after analyzing, I want to delete those records.
The filter should be applied on three columns emp_card_no, emp_id , enrollment_exp_dt. An employee can be enrolled multiple times , which means you'll have multiple records with same emp_no, emp_id and same/different enrollment_exp_dt.
Now , I need to do this:
Remove the duplicate records if there are multiple records with same enrollment_exp_dt, emp_card_no and emp_id.
If in case I have multiple records for same employee but different enrollment_exp_dt , then remove the old records and keep only the latest record ( doesn't have to be >sysdate).
Please let me know the best way I could do. I did try doing this but doesn't solve all the problems.
SELECT *
FROM brm_staging A
WHERE EXISTS (
SELECT 1 FROM brm_staging
WHERE enrollment_exp_dt = A.enrollment_exp_dt
and emp_id= A.emp_id
and emp_card_no =A.emp_card_no
AND ROWID < A.ROWID
);
i got really complicated. can you try the select statement first before deleting and see if this is working? (this is for second senario)
DELETE FROM YOUR_TABLE T1
INNER JOIN (
SELECT T2.* FROM YOUR_TABLE T2,
(SELECT EMP_ID, CARD_NO, COUNT(*) FROM
YOUR_TABLE
GROUP BY EMP_ID, CARD_NO
HAVING COUNT(*) > 1) T3
WHERE T2.EMP_ID=T3.EMP_ID AND T2.CARD_NO = T3.CARD_NO AND
T2.ENROLLMENT_EXP_DT NOT IN (SELECT MAX(T4.ENROLLMENT_EXP_DT)
FROM YOUR_TABLE T4) T5 ON
T1.EMP_ID=T5.EMP_ID AND T1.CARD_NO=T5.CARD_NO AND T1.ENROLLMENT_EXP_DT=T5.ENROLLMENT_EXO_DT
(EDIT) i think this work too (more simplified)
DELETE FROM YOUR_TABLE T1
WHERE EXISTS (
SELECT T2.* FROM YOUR_TABLE T2,
(SELECT EMP_ID, CARD_NO, COUNT(*) FROM
YOUR_TABLE
GROUP BY EMP_ID, CARD_NO
HAVING COUNT(*) > 1) T3
WHERE T2.EMP_ID=T3.EMP_ID AND T2.CARD_NO = T3.CARD_NO AND
T2.ENROLLMENT_EXP_DT NOT IN (SELECT MAX(T4.ENROLLMENT_EXP_DT)
FROM YOUR_TABLE T4)