Back fill data in table using Oracle - sql

ID Date NAME START_TIME END_TIME
1 2/15/2017 A 2/15/20173:40:39 PM 2/15/2017 3:41:17 PM
2 2/15/2017 B 2/15/20173:40:39 PM 2/15/2017 3:41:17 PM
3 2/15/2017 C 2/15/20173:40:39 PM 2/15/2017 3:41:17 PM
I am facing a problem where I have to back fill my database with these 3 statements From Jan 2016 to Today.
One solution I can try is I can write java code which just loop on and create a new date and new entry for the table and then i can insert using generated query.
But is there any way I can do this using oracle.

This is a commonly used way to generate dates given a start and an end date, which you can simply join to the list of your names to get what you need:
insert into yourTable ( ...)
with names as (
select 'A' as name from dual union all
select 'B' as name from dual union all
select 'C' as name from dual
),
dates as (
select date' 2017-01-01' + level -1 as yourDate
from dual
connect by date' 2016-01-01' + level -1 <= date '2017-02-20'
)
select rownum, name, yourDate
from names
cross join dates
This has to be slightly edited to better suit the number and types of your columns. A small example of how it works:
with names as (
select 'A' as name from dual union all
select 'B' as name from dual union all
select 'C' as name from dual
),
dates as (
select date' 2017-02-18' + level -1 as yourDate,
level as lev
from dual
connect by date' 2017-02-18' + level -1 <= date '2017-02-20')
select rownum, name, yourDate, lev
from names
cross join dates
gives:
ROWNUM N YOURDATE LEV
---------- - --------- ----------
1 A 18-FEB-17 1
2 B 18-FEB-17 1
3 C 18-FEB-17 1
4 A 19-FEB-17 2
5 B 19-FEB-17 2
6 C 19-FEB-17 2
7 A 20-FEB-17 3
8 B 20-FEB-17 3
9 C 20-FEB-17 3

As a basic concept, something like this would do it... You would need to either adapt for the A, B and C, or repeat for each.
with Numbers (NN) as
(
select 1 as NN
from dual
union all
select NN+1
from Numbers
where NN <2000
)
insert into MyTable (ID, Date, Name, StartTime, EndTime)
select NN + 3, -- If repeating, replace the 3 with the max(id) after each run
'A',
to_date('20170215','YYYYMMDD') - NN,
to_date('20170215 154039','YYYYMMDD HH24MISS') - NN,
to_date('20170215 154117','YYYYMMDD HH24MISS') - NN
from NN
where NN <= 365

Related

Converting monthly to daily data

I have monthly data that I would like to transform to daily data. The data looks like this. The extraction_dt is in date format.
isin
extraction_date
yield
001
2013-01-31
100
001
2013-02-28
110
001
2013-03-31
105
...
...
...
002
2013-01-31
200
...
...
...
And I would like to have something like this
isin
extraction_dt
yield
001
2013-01-01
100
001
2013-01-02
100
001
2013-01-03
100
..
.....
...
001
2013-02-01
110
...
...
...
I tried the following code but it does not work. I get the error message AnalysisException: Could not resolve table reference: 'cte'. How would you convert monthly to daily data?
with cte as
(select isin, extraction_dt, yield
from datashop
union all
select isin, extraction_dt, dateadd(d, 1, extraction_dt) AS date_dt, yield
from cte
where datediff(m,date_dt,dateadd(d, 1, date_dt))=0
)
select isin, date_dt,
1.0*isin / count(*) over (partition by isin, date_dt) AS daily_yield
from cte
order by 1,2
I can suggest easy solution.
generate a date series
match it with your data so it gets repeated.
So, here is the SQL you can use for Impala.
select isin, extraction_dt, a.dt AS date_dt, yield
from
datashop d,
(
select now() - INTERVAL (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) ) DAY as dt
from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
) a
WHERE
from_timestamp(a.dt,'yyyy/MM') =from_timestamp(d.extraction_dt,'yyyy/MM')
order by 1,2,3
the alias a is going to generate a series of dates.
WHERE - this clause will restrict to the month of extraction_dt. and you will get all possible values for a month.
ORDER BY - will show a nice output.
Your WITH clause has a recursive (self-referencing) query. In most SQL dialects, this requires using WITH RECURSIVE, not plain WITH. According to the Impala SQL reference, Impala does not support recursive common table expressions:
The Impala WITH clause does not support recursive queries in the
WITH, which is supported in some other database systems.
In other words, you cannot do this in Impala.

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.
Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.
One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo
You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;
Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

Oracle SQL (Toad): Expand table

Suppose I have an SQL (Oracle Toad) table named "test", which has the following fields and entries (dates are in dd/mm/yyyy format):
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/06/2014 3
1 01/09/2014 6
2 01/04/2015 7
2 01/08/2015 43
2 01/09/2015 85
2 01/12/2015 4
I know from how the table has been created that, since there are value entries for id = 1 for February 2014 and June 2014, the values for March through May 2014 must be 0. The same applies to July and August 2014 for id = 1, and for May through July 2015 and October through November 2015 for id = 2.
Now, if I want to calculate, say, the median of the value column for a given id, I will not arrive at the correct result using the table as it stands - as I'm missing 5 zero entries for each id.
I would therefore like to create/use the following (potentially just temporary table)...
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/03/2014 0
1 01/04/2014 0
1 01/05/2014 0
1 01/06/2014 3
1 01/07/2014 0
1 01/08/2014 0
1 01/09/2014 6
2 01/04/2015 7
2 01/05/2015 0
2 01/06/2015 0
2 01/07/2015 0
2 01/08/2015 43
2 01/09/2015 85
2 01/10/2015 0
2 01/11/2015 0
2 01/12/2015 4
...on which I could then compute the median by id:
select id, median(value) as med_value from test group by id
How do I do this? Or would there be an alternative way?
Many thanks,
Mr Clueless
In this solution, I build a table with all the "needed dates" and value of 0 for all of them. Then, instead of a join, I do a union all, group by id and ref_date and ADD the values in each group. If the date had a row with a value in the original table, then that's the resulting value; and if it didn't, the value will be 0. This avoids a join. In almost all cases a union all + aggregate will be faster (sometimes much faster) than a join.
I added more input data for more thorough testing. In your original question, you have two id's, and for both of them you have four positive values. You are missing five values in each case, so there will be five zeros (0) which means the median is 0 in both cases. For id=3 (which I added) I have three positive values and three zeros; the median is half of the smallest positive number. For id=4 I have just one value, which then should be the median as well.
The solution includes, in particular, an answer to your specific question - how to create the temporary table (which most likely doesn't need to be a temporary table at all, but an inline view). With factored subqueries (in the WITH clause), the optimizer decides if to treat them as temporary tables or inline views; you can see what the optimizer decided if you look at the Explain Plan.
with
inputs ( id, ref_date, value ) as (
select 1, to_date('01/01/2014', 'dd/mm/yyyy'), 20 from dual union all
select 1, to_date('01/02/2014', 'dd/mm/yyyy'), 25 from dual union all
select 1, to_date('01/06/2014', 'dd/mm/yyyy'), 3 from dual union all
select 1, to_date('01/09/2014', 'dd/mm/yyyy'), 6 from dual union all
select 2, to_date('01/04/2015', 'dd/mm/yyyy'), 7 from dual union all
select 2, to_date('01/08/2015', 'dd/mm/yyyy'), 43 from dual union all
select 2, to_date('01/09/2015', 'dd/mm/yyyy'), 85 from dual union all
select 2, to_date('01/12/2015', 'dd/mm/yyyy'), 4 from dual union all
select 3, to_date('01/01/2016', 'dd/mm/yyyy'), 12 from dual union all
select 3, to_date('01/03/2016', 'dd/mm/yyyy'), 23 from dual union all
select 3, to_date('01/06/2016', 'dd/mm/yyyy'), 2 from dual union all
select 4, to_date('01/11/2014', 'dd/mm/yyyy'), 9 from dual
),
-- the "inputs" table constructed above is for testing only,
-- it is not part of the solution.
ranges ( id, min_date, max_date ) as (
select id, min(ref_date), max(ref_date)
from inputs
group by id
),
prep ( id, ref_date, value ) as (
select id, add_months(min_date, level - 1), 0
from ranges
connect by level <= 1 + months_between( max_date, min_date )
and prior id = id
and prior sys_guid() is not null
),
v ( id, ref_date, value ) as (
select id, ref_date, sum(value)
from ( select id, ref_date, value from prep union all
select id, ref_date, value from inputs
)
group by id, ref_date
)
select id, median(value) as median_value
from v
group by id
order by id -- ORDER BY is optional
;
ID MEDIAN_VALUE
-- ------------
1 0
2 0
3 1
4 9
If ref_date is date and is second
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, add_months(i.min_date,s.n) as ref_date
, nvl(value,0) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
And with median
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, MEDIAN(nvl(value,0)) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
group by i.id

How can I find unoccupied id numbers in a table?

In my table I want to see a list of unoccupied id numbers in a certain range.
For example there are 10 records in my table with id's: "2,3,4,5,10,12,16,18,21,22" and say that I want to see available ones between 1 and 25. So I want to see a list like:
1,6,7,89,11,13,14,15,17,19,20,23,24,25
How should I write my sql query?
Select the numbers form 1 to 25 and show only those that are not in your table
select n from
( select rownum n from dual connect by level <= 25)
where n not in (select id from table);
Let's say you a #numbers table with three numbers -
CREATE TABLE #numbers (num INT)
INSERT INTO #numbers (num)
SELECT 1
UNION
SELECT 3
UNION
SELECT 6
Now, you can use CTE to generate numbers recursively from 1-25 and deselect those which are in your #numbers table in the WHERE clause -
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 25
)
SELECT n FROM n
WHERE n NOT IN (select num from #numbers)
ORDER BY n
OPTION (MAXRECURSION 25);
You can try using the "NOT IN" clause:
select
u1.user_id + 1 as start
from users as u1
left outer join users as u2 on u1.user_id + 1 = u2.id
where
u2.id is null
see also SQL query to find Missing sequence numbers
You need LISTAGG to get the output in a single row.
SQL> WITH DATA1 AS(
2 SELECT LEVEL rn FROM dual CONNECT BY LEVEL <=25
3 ),
4 data2 AS(
5 SELECT 2 num FROM dual UNION ALL
6 SELECT 3 FROM dual UNION ALL
7 SELECT 4 from dual union all
8 SELECT 5 FROM dual UNION ALL
9 SELECT 10 FROM dual UNION ALL
10 SELECT 12 from dual union all
11 SELECT 16 from dual union all
12 SELECT 18 FROM dual UNION ALL
13 SELECT 21 FROM dual UNION ALL
14 SELECT 22 FROM dual)
15 SELECT listagg(rn, ',')
16 WITHIN GROUP (ORDER BY rn) num_list FROM data1
17 WHERE rn NOT IN(SELECT num FROM data2)
18 /
NUM_LIST
----------------------------------------------------
1,6,7,8,9,11,13,14,15,17,19,20,23,24,25
SQL>

How do I select records with max from id column if two of three other fields are identical

I have a table that stores costs for consumables.
consumable_cost_id consumable_type_id from_date cost
1 1 01/01/2000 £10.95
2 2 01/01/2000 £5.95
3 3 01/01/2000 £1.98
24 3 01/11/2013 £2.98
27 3 22/11/2013 £3.98
33 3 22/11/2013 £4.98
34 3 22/11/2013 £5.98
35 3 22/11/2013 £6.98
If the same consumable is updated more than once on the same day I would like to select only the row where the consumable_cost_id is biggest on that day. Desired output would be:
consumable_cost_id consumable_type_id from_date cost
1 1 01/01/2000 £10.95
2 2 01/01/2000 £5.95
3 3 01/01/2000 £1.98
24 3 01/11/2013 £2.98
35 3 22/11/2013 £6.98
Edit:
Here is my attempt (adapted from another post I found on here):
SELECT cc.*
FROM
consumable_costs cc
INNER JOIN
(
SELECT
from_date,
MAX(consumable_cost_id) AS MaxCcId
FROM consumable_costs
GROUP BY from_date
) groupedcc
ON cc.from_date = groupedcc.from_date
AND cc.consumable_cost_id = groupedcc.MaxCcId
You were very close. This seems to work for me:
SELECT cc.*
FROM
consumable_cost AS cc
INNER JOIN
(
SELECT
Max(consumable_cost_id) AS max_id,
consumable_type_id,
from_date
FROM consumable_cost
GROUP BY consumable_type_id, from_date
) AS m
ON cc.consumable_cost_id = m.max_id
SELECT * FROM consumable_cost
GROUP by consumable_type_id, from_date
ORDER BY cost DESC;
Assuming consumable_cost_id is unique.
SELECT * FROM T t1
WHERE EXISTS(
SELECT t2.consumable_type_id, t2.from_date FROM T t2
GROUP by t2.consumable_type_id, t2.from_date
HAVING MAX(t2.consumable_cost_id) = t1.consumable_cost_id);
Because of comment that this was returning an incorrect result, I created a test-query for Oracle that proves that this query works. As I said, it's for Oracle, but there is really no reason why this should not work in MS Access. The only Oracle specific I used here is the FROM DUAL to generate the virtual data.
WITH T AS
(
SELECT 1 AS consumable_cost_id,1 AS consumable_type_id, TO_DATE('01/01/2000','DD/MM/YYYY') AS FROM_DATE, '£10.95' AS COST FROM DUAL
UNION ALL
SELECT 2,2,TO_DATE('01/01/2000','DD/MM/YYYY'),'£5.95' FROM DUAL
UNION ALL
SELECT 3,3,TO_DATE('01/01/2000','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 24,3,TO_DATE('01/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 27,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 33,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 34,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
UNION ALL
SELECT 35,3,TO_DATE('22/11/2013','DD/MM/YYYY'),'£1.98' FROM DUAL
)
SELECT * FROM T t1
WHERE EXISTS(
SELECT t2.consumable_type_id, t2.from_date FROM T t2
GROUP by t2.consumable_type_id, t2.from_date
HAVING MAX(t2.consumable_cost_id) = t1.consumable_cost_id);
Result:
1 1 01-JAN-00 £10.95
2 2 01-JAN-00 £5.95
3 3 01-JAN-00 £1.98
24 3 01-NOV-13 £1.98
35 3 22-NOV-13 £1.98