union tables where date - sql

I need to query 2 tables in an Oracle database. One is a current table, the other is historical, with virtually the same headers. I want to be able to have a union on the query, but just a single date range.
An simple example of what I'm trying to do might be a better explanation:
select order_number, insertdate
from do_table
where insertdate between '1-apr-17' and '8-apr-17'
union
select order_number, insertdate
from doi_table
where insertdate between '1-apr-17' and '8-apr-17'
can it be written like this?
select order_number, insertdate
from do_table
union
select order_number, insertdate
from doi_table
where insertdate between '1-apr-17' and '8-apr-17'
The date range queried changes a lot and the query is quite big, and just for ease, I want the user running the query to be able to enter the date range once.
Any tips?
Thanks

If you want use where only once you need to create a subquery.
SELECT *
FROM (
select order_number, insertdate
from do_table
union
select order_number, insertdate
from doi_table
) T
WHERE insertdate between '1-apr-17' and '8-apr-17'
But I dont recomend it because then you wont be able to benefit from the index on the insertdate field. Your first query is ok, Just use the user parameter twice

One method for handling this is to use a params CTE:
with params as (
select date '2017-04-01' as date1, date '2017-04-08' as date2
from dual
)
select t.order_number, t.insertdate
from params cross join do_table t
where t.insertdate between params.date1 and params.date2
union all
select t.order_number, t.insertdate
from params cross join doi_table t
where t.insertdate between params.date1 and params.date2 ;
Note that I changed the union to a union all. union incurs extra overhead for removing duplicates. If you intend that, then use union. But by default, union all is better.
I should add that in my experience, such a params CTE is fine from a performance perspective, but there could be exceptions.

Related

fill rows in the query result according to Timestamp(Date)

I am using T-sql.
I run the script then I get the following jpg
Select * from tableA
Assuming today is the 2019/6/17 and I would like to get the result like this
There is a stupid way to do it is to insert all the missing rows one by one.
However, the data is so huge for me to conduct this steps.
My aim is to get the query result like the second jpg.
If it is impossible to do it, then how can I add those missing rows?
I would suggest a recursive CTE, but phrased like this:
with a as (
select a.id, a.descr, a.eff_date,
dateadd(day, -1, lead(a.eff_date, 1, '2019-06-18') over (partition by id order by eff_date)) as end_date
from tablea a
),
cte as (
select id, descr, eff_date, end_date
from a
union all
select id, descr, dateadd(day, 1, eff_date), end_date
from cte
where eff_date < end_date
)
select id, descr, eff_date
from cte
order by id, eff_date;
Note that this solution requires no joining nor figuring out what the "previous" values are. This is all handled by the recursive CTE.
Here is a db<>fiddle.

Is it possible to have a where clause reference another where clause in a Union All BigQuery SQL?

I'm playing around with BQ SQL and was wondering if it was possible to have a single WHERE clause within an entire UNION ALL statement. So instead of having multiple different WHERE clauses and having to change all of them in order to run a single query, to instead have it all linked to a single WHERE clause where everything would change based off that single change. I have dummy data below as an example of what I'm doing now:
WITH Temp_A AS(
SELECT DISTINCT
Name,
Date,
Spend
FROM
Spend_Table
),
Temp_B AS(
SELECT DISTINCT
Name,
Date,
Revenue
FROM
Revenue_Table
),
Temp_C AS(
SELECT DISTINCT
Employee AS Name,
Date,
Paystub_Range,
Hourly,
Total_Amount
FROM
Employee_Pay
)
SELECT DISTINCT
a.Name,
a.Date,
a.Spend,
b.Revenue,
NULL AS Paystub_Range,
NULL AS Hourly,
NULL AS Total_Amount
FROM
Temp_A a
LEFT JOIN
Temp_B b
ON
a.Name = b.Name
AND a.Date = b.Date
WHERE
DATE BETWEEN '2020-09-01' AND '2020-09-15'
UNION ALL
SELECT DISTINCT
Name,
Date,
NULL AS Spend,
NULL AS Revenue,
Paystub_Range,
Hourly,
Total_Amount
FROM
Temp_C
WHERE
DATE BETWEEN '2020-09-01' AND '2020-09-15'
What I want to accomplish or see if it is possible is the same concept of the WITH AS data but have a single WHERE statement or have the second WHERE statement reference the first. So For example below same kind of dummy data as above, just doing the SELECT DISTINCT final version. Same Temp tables/etc.
)
SELECT DISTINCT
a.Name,
a.Date,
a.Spend,
b.Revenue,
NULL AS Paystub_Range,
NULL AS Hourly,
NULL AS Total_Amount
FROM
Temp_A a
LEFT JOIN
Temp_B b
ON
a.Name = b.Name
AND a.Date = b.Date
WHERE
DATE BETWEEN '2020-09-01' AND '2020-09-15'
UNION ALL
SELECT DISTINCT
Name,
Date,
NULL AS Spend,
NULL AS Revenue,
Paystub_Range,
Hourly,
Total_Amount
FROM
Temp_C
WHERE
DATE BETWEEN #DateA AND #DateB
Or is there another way of doing this that I'm overlooking that would make this simpler? Any help would be much appreciated as I'm still learning everything I can about this to make it easier in the long run. Please let me know if I need to expand on any details, again this is just dummy data.
Thank you in advance!!
You can do it in two ways: (1) scripting variables and (2) parameterized queries.
(1) scripting variables - declare variables and then use them in query
DECLARE from_date DATE DEFAULT '2020-09-01';
DECLARE to_date DATE DEFAULT '2020-09-15';
...your query...
WHERE DATE BETWEEN from_date AND to_date
...rest of the query...
(2) parameterized queries - is not supported on BigQuery UI but on BQ CLI and Client Library you can use them.
For example, if using BQ CLI, you can do something like this:
bq query \
--use_legacy_sql=false \
--parameter='from_date:DATE:2020-09-01' \
--parameter='to_date:DATE:2020-09-15' \
'SELECT
...your query...
WHERE DATE BETWEEN #from_date AND #to_date
...rest of the query...'

group by and union in oracle

I would like to union 2 queries but facing an error in oracle.
select count(*) as faultCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='FAULT'
union
select count(*) as responseCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='RESPONSE'
group by COMP_IDENTIFIER
order by responseCount;
Two queries run perfectly individually.but when using union,it says ORA-00904: "RESPONSECOUNT": invalid identifier
The error you've run into
In Oracle, it's best to always name each column in each UNION subquery the same way. In your case, the following should work:
select count(*) as theCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='FAULT'
group by COMP_IDENTIFIER -- don't forget this
union
select count(*) as theCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT='RESPONSE'
group by COMP_IDENTIFIER
order by theCount;
See also:
Curious issue with Oracle UNION and ORDER BY
A good workaround is, of course, to use indexed column references as suggested by a_horse_with_no_name
The query you really wanted
From your comments, however, I suspect you wanted to write an entirely different query, namely:
select count(case AUDIT_CONTEXT when 'FAULT' then 1 end) as faultCount,
count(case AUDIT_CONTEXT when 'RESPONSE' then 1 end) as responseCount,
COMP_IDENTIFIER
from CORDYS_NCB_LOG
where AUDIT_CONTEXT in ('FAULT', 'RESPONSE')
group by COMP_IDENTIFIER
order by responseCount;
The column names of a union are determined by the first query. So your first column is actually named FAULTCOUNT.
But the easiest way to sort the result of a union is to use the column index:
select ...
union
select ...
order by 1;
You most probably also want to use UNION ALL which avoids removing duplicates between the two queries and is faster than a plain UNION
In Union or Union all query column names are determined by the first query column name.
In your query replace "order by responseCount" with "order by faultCount.

SQL Server : UNION ALL but remove duplicate IDs by choosing first date of occurrence

I am unioning two queries but I'm getting an ID that occurs in each query. I do not know how to keep only the first time the id occurs. Everything else about the row is different. In general, it will be hard to know which of the two queries I will have to keep a duplicate on, therefore, I need a general solution.
I was thinking about creating a temp table and choosing the min date (once the date has been converted to an int).
Any ideas on the proper syntax?
You can do this using the row_number() function. This will assign a sequential number, starting with 1, to each row with the same id (based on the partition by clause). The ordering of the sequence is determined by the order by clause. So, the following assigns 1 to the earliest date for each id:
select t.*
from (select t.*,
row_number() over (partition by id order by date asc) as seqnum
from ((select *
from <subquery1>
) union all
(select *
from <subquery2>
)
) t
) t
where seqnum = 1;
The final where clause simply filters for the first occurrence.
If you use the keyword UNION, then it will remove duplicates from the two data sets you are working with. UNION ALL preserves duplicates.
You can view the specifics here:
http://www.w3schools.com/sql/sql_union.asp
If you want to only have one of the 2 records and they are not identical you will have to filter them yourself. You may need to do something like the following. THis may be possible to do with the one (select union select) block but this should get you started.
select *
from (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x1
, (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x2
where x1.id = x2.id
and x1.date < x2.date
Although rethinking this if you go down a path like this why bother to UNION it?

recursively increment a date in sql

I am working on a historical conversion of data and was wondering if there's a more efficient way to accomplish a date increment.
I receive a data from a source system on a saturday date (1-7-13) and would like to push that data to make it fill all days of the previous week (1-6-13,1-5-13 ect).
So currently i am doing several unions
insert into target
(date, name)
select date,name
from
(
SELECT date as date, name FROM SOURCE
UNION
SELECT date - 1 as date, name FROM SOURCE
UNION
SELECT date -2 as date, name FROM SOURCE
)
I only ask because it looks like close to 500 million records are going to be going though this sql script. Incase it matters it is going to be running in a BTEQ script in TERADATA.
First, your code would be faster using union all rather than union. union removes duplicates, which does not seem to be needed in this case. If you do need them removed, then do it at the source level:
from (select distinct name from source)
Rather than doing it implicitly with union.
You can also try a cross join approach:
select date - i, name
from source cross join
(select 0 as i union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6
) const
This might be a bit faster, because it doesn't need to set up the reads to the table multiple times.
One option is to use a recursive query, but I don't think it would be much faster -- just perhaps easier to read:
WITH RECURSIVE recursiveCTE (date, name) AS (
SELECT date, name
FROM Source
UNION ALL
SELECT r.date-1, r.name
FROM recursiveCTE R
JOIN Source T ON R.name = T.name AND T.date < r.date+6
)
INSERT INTO Target (date,name)
SELECT date,name From recursiveCTE