I am working on a historical conversion of data and was wondering if there's a more efficient way to accomplish a date increment.
I receive a data from a source system on a saturday date (1-7-13) and would like to push that data to make it fill all days of the previous week (1-6-13,1-5-13 ect).
So currently i am doing several unions
insert into target
(date, name)
select date,name
from
(
SELECT date as date, name FROM SOURCE
UNION
SELECT date - 1 as date, name FROM SOURCE
UNION
SELECT date -2 as date, name FROM SOURCE
)
I only ask because it looks like close to 500 million records are going to be going though this sql script. Incase it matters it is going to be running in a BTEQ script in TERADATA.
First, your code would be faster using union all rather than union. union removes duplicates, which does not seem to be needed in this case. If you do need them removed, then do it at the source level:
from (select distinct name from source)
Rather than doing it implicitly with union.
You can also try a cross join approach:
select date - i, name
from source cross join
(select 0 as i union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6
) const
This might be a bit faster, because it doesn't need to set up the reads to the table multiple times.
One option is to use a recursive query, but I don't think it would be much faster -- just perhaps easier to read:
WITH RECURSIVE recursiveCTE (date, name) AS (
SELECT date, name
FROM Source
UNION ALL
SELECT r.date-1, r.name
FROM recursiveCTE R
JOIN Source T ON R.name = T.name AND T.date < r.date+6
)
INSERT INTO Target (date,name)
SELECT date,name From recursiveCTE
Related
So I have table with the following records:
I want to create a script to iteratively look at the Cnt_Repeat column and insert that same record in a temp table X times depending on the value in Cnt_Repeat so it would look like the following table:
One method supported by most databases is the use of recursive CTEs. The exact syntax might vary, but the idea is:
with cte as (
select loannum, document, cnt_repeat, 1 as lev
from t
union all
select loannum, document, cnt_repeat, lev + 1
from cte
where lev < cnt_repeat
)
select loannum, document, cnt_repeat
from cte;
I need to query 2 tables in an Oracle database. One is a current table, the other is historical, with virtually the same headers. I want to be able to have a union on the query, but just a single date range.
An simple example of what I'm trying to do might be a better explanation:
select order_number, insertdate
from do_table
where insertdate between '1-apr-17' and '8-apr-17'
union
select order_number, insertdate
from doi_table
where insertdate between '1-apr-17' and '8-apr-17'
can it be written like this?
select order_number, insertdate
from do_table
union
select order_number, insertdate
from doi_table
where insertdate between '1-apr-17' and '8-apr-17'
The date range queried changes a lot and the query is quite big, and just for ease, I want the user running the query to be able to enter the date range once.
Any tips?
Thanks
If you want use where only once you need to create a subquery.
SELECT *
FROM (
select order_number, insertdate
from do_table
union
select order_number, insertdate
from doi_table
) T
WHERE insertdate between '1-apr-17' and '8-apr-17'
But I dont recomend it because then you wont be able to benefit from the index on the insertdate field. Your first query is ok, Just use the user parameter twice
One method for handling this is to use a params CTE:
with params as (
select date '2017-04-01' as date1, date '2017-04-08' as date2
from dual
)
select t.order_number, t.insertdate
from params cross join do_table t
where t.insertdate between params.date1 and params.date2
union all
select t.order_number, t.insertdate
from params cross join doi_table t
where t.insertdate between params.date1 and params.date2 ;
Note that I changed the union to a union all. union incurs extra overhead for removing duplicates. If you intend that, then use union. But by default, union all is better.
I should add that in my experience, such a params CTE is fine from a performance perspective, but there could be exceptions.
I have multiple queries nested together by UNION ALLs; some of the inner queries are almost the same.
For example
select sum(x.amount) as amnt, 'txt1' as name, x.cfg as cfg from tbl1
union all
select -sum(x.amount) as amnt, 'txt2' as name, x.cfg as cfg from tbl1
result:
AMNT|NAME|CFG
----+----+---
12 |txt1| Z
-12 |tst2| Z
Since the inner queries are not that small and go to a lot of tables themselves I'm trying to save processing time and resources by combining these two inner queries into one. Take in consideration that the NAME (txt1/txt2) is on the inner query and not in a table
For this particular example, you need to duplicate the results returned, with some conditional logic. If you put the conditional logic into a CTE then perform a Cartesian join against your main table then every row in the main table will be duplicated by the number of records in the join. In this case that would be 2.
with multiplier (m, name) as (
select 1, 'txt1' from dual
union all
select -1, 'txt2' from dual
)
select multiplier.m * sum(t.amount), multiplier.name, t.cfg
from tbl1 t
cross join multiplier
I am unioning two queries but I'm getting an ID that occurs in each query. I do not know how to keep only the first time the id occurs. Everything else about the row is different. In general, it will be hard to know which of the two queries I will have to keep a duplicate on, therefore, I need a general solution.
I was thinking about creating a temp table and choosing the min date (once the date has been converted to an int).
Any ideas on the proper syntax?
You can do this using the row_number() function. This will assign a sequential number, starting with 1, to each row with the same id (based on the partition by clause). The ordering of the sequence is determined by the order by clause. So, the following assigns 1 to the earliest date for each id:
select t.*
from (select t.*,
row_number() over (partition by id order by date asc) as seqnum
from ((select *
from <subquery1>
) union all
(select *
from <subquery2>
)
) t
) t
where seqnum = 1;
The final where clause simply filters for the first occurrence.
If you use the keyword UNION, then it will remove duplicates from the two data sets you are working with. UNION ALL preserves duplicates.
You can view the specifics here:
http://www.w3schools.com/sql/sql_union.asp
If you want to only have one of the 2 records and they are not identical you will have to filter them yourself. You may need to do something like the following. THis may be possible to do with the one (select union select) block but this should get you started.
select *
from (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x1
, (
select id
, date
, otherstuf
from table_1
union all
select id
, date
, otherstuf
from table_2
) x2
where x1.id = x2.id
and x1.date < x2.date
Although rethinking this if you go down a path like this why bother to UNION it?
I have a query that has three prompts; Department, From Date, and To Date. One must select the department ID but has a an option to select the date range. How can I make the date range optional? I was thinking of using the decode function but not sure how to write it so the two date prompts can be left blank.
If you are using a stored procedure you can do something like this in your select statement:
select *
from table
where (field > inDateStart and field < inDateEnd) or
(inDateStart is null and inDateEnd is null)
or using coalesce
select *
from table
where (field => coalesce(inDateStart,field) and
field <= coalesce(inDateEnd,field)
It really depends on your particular situation. Some queries lend themselves to the first some to the second.
Assuming an unspecified date input comes across as NULL, you can do this little trick:
with
TheTable as
(select 1 dept, sysdate dt from dual
union
select 2 dept, sysdate-63 dt from dual
union
select 3 dept, sysdate-95 dt from dual
)
select *
from thetable
where coalesce(:DateFrom,dt) <= dt
and coalesce(:DateTo,dt) >= dt
;
Need a bit more info on the nature of your data to consider dept as an input... Does the table store multiple dates per dept?