Efficiently selecting random duplicate entries in Oracle SQL

Efficiently selecting random duplicate entries in Oracle SQL - sql

I have a dataset which looks like this
id date
250 01-JAN-15
250 01-MAR-15
360 01-JUN-15
470 01-FEB-15
470 01-DEC-15
470 01-NOV-15
780 01-APR-15
790 01-SEP-15
790 01-MAY-15
I want to randomly select rows such that duplicated ids will appear only once. For example:
id date
250 01-MAR-15
360 01-JUN-15
470 01-FEB-15
780 01-APR-15
790 01-SEP-15
My current solution uses an analytic function, which takes a long time to run on hundreds of millions of rows:
select * from(
select aa.*, row_number() over (partition by id order by dbms_random.value) as random_flag
from table aa)
where random_flag = 1
Any tips on how to get the same result without analytic functions?

you can try this query:
SELECT id, (select t2.date from table t2 where t2.id = t1.id and rownum = 1)
FROM table t1
GROUP BY id;

Related

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.

You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

What is an efficient alternative to cross join two large tables to get running total?

I have 2 tables whose schema is as follows:
table1
event_dt
6/30/2018
7/1/2018
7/2/2018
7/3/2018
7/4/2018
7/5/2018
7/6/2018
7/7/2018
7/8/2018
7/9/2018
7/10/2018
table:2
event_dt time(in seconds)
7/7/2018 144
7/8/2018 63
7/1/2018 47
7/8/2018 81
7/9/2018 263
7/7/2018 119
7/8/2018 130
7/9/2018 206
7/5/2018 134
7/1/2018 140
For each date in table 1 i want to find the cumulative sum of time upto that date .So i used a cross join to get the output using the following code:
select t1.event_dt, sum(t2.time)
from yp1 t1 cross join yp2 t2
where t1.event_dt>=t2.event_dt
group by t1.event_dt
Using this query i was able to get the cumulative running total for each date in table 1 as long as there is an event before that day. For example first event date is 07/01/2018 but the first date in table1 is 06/30/2018 so in the final output 6/30/2018 wont be present.
The problem with this method is the cross join is taking too long, i have millions of records since an observation is taken every 6 seconds. SO is there a way to get the same results without a cross join or for that matter any way which is more efficient.

I think the best way is to use SQL's cumulative sum function:
select event_dt, running_time
from (select event_dt, time, sum(time) over (order by event_dt) as running_time
from ((select event_dt, null as time
from t1
) union all
(select event_dt, time
from t2
)
) tt
) tt
where time is null;

Select all user that have a specific date and have been recorded only one time

I have this table.
My Sql table in (SQL Fiddle)
ID Date Value
___ ____ _____
3241 01/01/00 15456
3241 9/17/12 5
3241 9/16/12 100
3241 9/15/12 20
4355 01/01/00 01
4355 9/16/12 12
4355 9/15/12 132
4355 9/14/12 4
1001 01/01/00 456
1001 9/16/12 125
5555 01/01/00 01
1234 01/01/00 01
1234 9/16/12 45
2236 01/01/00 879
2236 9/15/12 128
2236 9/14/12 323
2002 01/01/00 567
I would like, to select all the record that have 01-01-00 as date and have been showed only one time.
The result that i'm trying to have is like the table below.
ID Date Value
___ ____ _____
5555 01/01/00 01
2002 01/01/00 567
I tried to use HAVING clause but because of the GROUP BY, the result is wrong because one of my select has more than one record which isn't good for my case.
My Wrong Attempt:
SELECT * FROM
(SELECT *
FROM table1
GROUP BY id, date, value
HAVING count(Id)=1) t1
WHERE date='01-01-00'
Query Result (SQL Fiddle)

I would use:
select id, max(date) as date, max(value) as value
from t1
group by id
having max(date) = '01-01-00' and count(*) = 1;
A somewhat faster method might be:
select t1.*
from t1
where date = '01-01-00' and
not exists (select 1 from t1 tt1 where tt1.id = t1.id and tt1.date <> '01-01-00');
This can take advantage of index on t1(date) and t1(id, date).

Use IN
SELECT *
FROM table1
WHERE id IN (
SELECT id
FROM table1
GROUP BY id
HAVING count(Id)=1
) and date='01-01-00'

I just didn't notice that i make a error in my group by instead of making only the ID, I put all the columns
SELECT * FROM
(SELECT *
FROM Table1
GROUP BY `ID`
HAVING count(`ID`)=1) t1
WHERE `Date`='2000-01-01 00:00:00'
However for my problem, I take this solution from Gordon Linoff because it's seems better for me.
PS:I have 2 million records.

Change your query as
SELECT *
FROM
(SELECT *
FROM table1
GROUP BY id
HAVING count(id)=1) t1
WHERE t1.date='01-01-00'

Find Max Value and date between date range returning mutliplie records

I hope someone can help me, I been working on this all day.
I need to get max value, and the date and id where that max value is associated with between specific date ranges.
Here is my code , and I have tried many different version but it still returning more than one ID and date
SELECT distinct bw_s.id, avs.carProd, cd_s.RecordDate,
cd_s.milkProduction as MilkProd,
cd_s.WaterProduction as WaterProd
FROM tblTest bw_s
INNER JOIN tblTestCp cd_s WITH(NOLOCK)
ON bw_s.id=cd_s.id
AND cd_s.recorddate BETWEEN '08/06/2014' AND '10/05/2014'
Inner Join
( select id, max(CarVol) as carProd
from tblTestCp
where recorddate BETWEEN '08/06/2014' AND '10/05/2014'
group by id ) avs
on avs.id = bw_s.id
order by id
I have table like this
id RecordDate carProd MilkProd WaterProd
47790 2014-10-05 132155 0 225
47790 2014-10-01 13444 0 0
47790 2014-08-06 132111 10 100
47790 2014-09-05 10000 500 145
47790 2014-09-20 10000 800 500
47791 2014-09-20 10000 300 500
47791 2014-09-21 10001 400 500
47791 2014-08-21 20001 600 500
And the result should be ( max carprod)
id RecordDate carProd MilkProd WaterProd
47790 2014-10-05 132155 0 225
47791 2014-08-21 20001 600 500

I've assumed that the name of your table is "Data":
SELECT
*
FROM
Data
WHERE
Data.RecordDate BETWEEN '2014-08-21' AND '2014-10-01'
ORDER BY
Data.carProd DESC
LIMIT 1;
Make sure to change the dates to match what your particular requirements are.

SQL Query to continuously bucket data

I have a table as follows:
Datetime | ID | Price | Quantity
2013-01-01 13:30:00 1 139 25
2013-01-01 13:30:15 2 140 25
2013-01-01 13:30:30 3 141 15
Supposing that I wish to end up with a table like this, which buckets the data into quantities of 50 as follows:
Bucket_ID | Max | Min | Avg |
1 140 139 139.5
2 141 141 141
Is there a simple query to do this? Data will constantly be added to the first table, it would be nice if it could somehow not recalculate the completed buckets of 50 and instead automatically start averaging the next incomplete bucket. Ideas appreciated! Thanks

You may try this solution. It should work even if "number" is bigger than 50 (but relying on fact that avg(number) < 50).
select
bucket_id,
max(price),
min(price),
avg(price)
from
(
select
price,
bucket_id,
(select sum(t2.number) from test t2 where t2.id <= t1.id ) as accumulated
from test t1
join
(select
rowid as bucket_id,
50 * rowid as bucket
from test) buckets on (buckets.bucket - 50) < accumulated
and buckets.bucket > (accumulated - number))
group by
bucket_id;
You can have a look at this fiddle http://sqlfiddle.com/#!7/4c63c/1 if it is what you want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficiently selecting random duplicate entries in Oracle SQL - sql

you can try this query: SELECT id, (select t2.date from table t2 where t2.id = t1.id and rownum = 1) FROM table t1 GROUP BY id;

Related

How to get latest records based on two columns of max

What is an efficient alternative to cross join two large tables to get running total?

Select all user that have a specific date and have been recorded only one time

Find Max Value and date between date range returning mutliplie records

SQL Query to continuously bucket data

Categories

Resources