Update duplicate latitude values by iteratively increasing margin - sql

I have lat and long columns in an Oracle database table stored as regular numbers.
Some of which are duplicates. I'd like a way to add a very small margin to either column to eliminate duplication. Problem is, for each identical pair the number of duplicate records might vary. In this case I have to adjust the margin I add iteratively for each pair.
example:
ID | LAT | LONG
==================
1 | 1 | 1
2 | 1 | 1
3 | 1 | 1
in this case, I'd like to add a margin of .0003 to either column to eliminate the duplication, but I can't just blindly add that .0003 to IDs 2 and 3 because they would still be duplicates, so I have to do original_value + (margin*i) for i in (0...number of duplicate rows)
so I'd like to end up with something like this:
ID | LAT | LONG
1 | 1 | 1
2 | 1.0003 | 1
3 | 1.0006 | 1
How do I do this in SQL? I can mimic imperative programming apparently with cursors but it does not seem to be the SQL way. Can I somehow achieve this with INSERT INTO SELECT?

I don't know what your exact data looks like, but suppose you have this table, called tbl:
ID LAT LON
---------- ---------- ----------
1 20 25
2 30 33
3 30 33
4 55 60
5 55 60
6 55 60
You could run the following:
select id,
case when rn > 1 then lat+rn-1 else lat end as lat,
lon
from(
select t.*,
row_number() over(partition by lat, lon order by id) as rn
from tbl t
) x;
To get:
ID LAT LON
---------- ---------- ----------
1 20 25
2 30 33
3 31 33
4 55 60
5 56 60
6 57 60
Notice how IDs 2 and 3 were dups, and IDs 4, 5, and 6, were dups. They are no longer exact dups because the lat value has increased, sequentially, to make the rows not duplicates. They go up by one for each next duplicate.
Fiddle: http://sqlfiddle.com/#!4/ef959/1/0
Edit (based on your edit)
select id,
case when rn > .0003 then lat+rn-.0003 else lat end as lat,
lon
from(
select t.*,
row_number() over(partition by lat, lon order by id)*.0003 as rn
from tbl t
) x;
The above will ascend by .0003 rather than 1.
See new fiddle here: http://sqlfiddle.com/#!4/21506/6/0

Related

How to combine and sum consequent values until new value in column

I need some help with summing subsequent values of a column based on category in another column, until that category reaches new value. Here's what my data looks like
id | site_id | date_id | hour_id | location_id | status | status_minutes
1 1 20210101 1 1 Offline 60
2 1 20210101 2 1 Offline 57
3 1 20210101 2 1 Available 3
4 1 20210101 3 1 Available 20
5 1 20210101 3 1 Offline 40
... ... ... ... ... ... ...
25 1 20210101 23 1 Offline 60
26 1 20210102 0 1 Offline 23
As you can see in the above data is at hourly level, and so if status minutes column equals to 60, it'll be just one row for that hour. However, if not, then status minutes will be spread across rows that would add up to 60, as you can see in rows 2 and 3, and in rows 4 and 5.
Now, my goal is to understand stretches of time of how long each status was going on, until next status kicked in. So the output for the example above would be:
site_id | date_id | location_id | status | status_minutes
1 20210101 1 Offline 117
1 20210101 1 Available 23
1 20210101 1 Offline 40
... ... ... ... ...
1 20210101 1 Offline 60
1 20210102 1 Offline 23
Important part is that this operation should only be confined within each day, as seen in the last two rows of example and the output. So the summing happens only within a given day, and then starts again with the 0th hour next day.
This is a gaps and island problem. The section_num is being used to determine groups before finding the total status_minutes.
You may try the following:
SELECT
site_id,
date_id,
location_id,
status,
SUM(status_minutes) as status_minutes
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id
ORDER BY hour_id
) - ROW_NUMBER() OVER (
PARTITION BY site_id,date_id,location_id,status
ORDER BY hour_id
) as section_num
FROM
my_table
) t
GROUP BY
site_id,
date_id,
location_id,
status,
section_num
ORDER BY
site_id,
date_id,
location_id,
section_num
View working demo on db fiddle

Aggregate Sliding Window for hive

I have a hive table which is in sorted order based on a numeric value say count.
fruit count
------ -------
apple 10
orange 8
banana 5
melon 3
pears 1
The total count is 27. I need it divided into three segments. So first 1/3 of count i.e. 1 to 9 is one, 10 to 18 is second and 19 to 27 is third.
I guess I need to do some sought of sliding window.
fruit count zone
------ ------- --------
apple 10 one
orange 8 two
banana 5 three
melon 3 three
pears 1 three
Any idea how to approach this
In SQL way:
select *,
(
sum(count) over (partition by 1 order by count desc) /*<---this line for return running totals*/
/(sum(count) over (partition by 1) /3) /*<-- divided total count into 3 group. In your case this is 9 for each zone value.*/
) /*<--using running totals divided by zone value*/
+ /*<-- 11 / 9 = 1 ... 2 You must plus 1 with quotient to let 11 in the right zone.Thus,I use this + operator */
(
case when
(
sum(count) over (partition by 1 order by count desc)
%(sum(count) over (partition by 1) /3) /*<--calculate remainder */
) >1 then 1 else 0 end /*<--if remainder>1 then the zone must +1*/
) as zone
from yourtable

sql best strategy to partition same values based on temporal sequence

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?
A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5

finding rows against summed value of specific id's in sql

I have a table like below--
Id| Amount|DateAdded |
--|-------|-----------|
1 20 20-Jun-2018
1 10 05-Jun-2018
1 4 21-May-2018
1 5 15-May-2018
1 15 05-May-2018
2 25 15-Jun-2018
2 25 12-Jun-2018
2 65 05-Jun-2018
2 65 20-May-2018
Here If I sum up the Amount of Id = 1 then I will get 54 as the sum result. I want to find those rows of Id = 1 whose sum is not greater then exact 35 or any given value
In case of given value 35 the expected Output for id = 1 should be--
Id| Amount|DateAdded |
--|-------|-----------|
1 20 20-Jun-2018
1 10 05-Jun-2018
1 4 21-May-2018
1 5 15-May-2018
In case of given value 50 the expected Output for Id = 2 should be--
Id| Amount|DateAdded |
--|-------|-----------|
2 25 15-Jun-2018
2 25 12-Jun-2018
You would use a cumulative sum. To get all the rows:
select t.*
from (select t.*,
sum(amount) over (partition by id order by dateadded) as running_amount
from t
) t
where t.running_amount - amount < 35;
To get just the row that passes the mark:
where t.running_amount - amount < 35 and
t.running_amount >= 35

Oracle: Get the smaller values and the first greater value

I have a table like this;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
4 Sample4 40
And I would like to get all of the rows that contain smaller values and the first row that contains greater value.
For example when I send '25' as a parameter to Value column, I want to have following table;
ID Name Value
1 Sample1 10
2 Sample2 20
3 Sample3 30
I'm stuck at this point, thanks in advance.
Analytic functions to the rescue!
create table your_table (
id number,
value number)
insert into your_table
select level, level * 10
from dual
connect by level <= 5
select * from your_table
id | value
----+------
1 | 10
2 | 20
3 | 30
4 | 40
5 | 50
Ok, now we use lag(). Specify field, offset and the default value (for the first row that has no previous one).
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table
id | value | previous_value
---+-------+---------------
1 | 10 | 10
2 | 20 | 10
3 | 30 | 20
4 | 40 | 30
5 | 50 | 40
Now apply where.
select id, value
from (
select id, value, lag(value, 1, value) over (order by value) previous_value
from your_table)
where previous_value < 25
Works for me.
id | value
----+------
1 | 10
2 | 20
3 | 30
Of course you have to have some policy on ties. For example, what happens if two rows have the same value and they are both first — do you want to keep both or only one of them. Or maybe you have some other criterion for breaking the tie (say, sort by id). But the idea is fairly simple.
you can try a query like this :
SELECT * FROM YourTableName WHERE Value < 25 OR ID IN (SELECT TOP 1 ID FROM YourTableName WHERE Value >= 25 ORDER BY Value)
in Oracle, you can try this (but see "That Young Man" answer, I think it's better than mine):
SELECT * FROM (
SELECT ID, NAME, VALUE, 1 AS RN
FROM YT
WHERE VALUE < 25
UNION ALL
SELECT ID, NAME, VALUE, ROW_NUMBER()OVER (ORDER BY VALUE) AS RN
FROM YT
WHERE VALUE > 25
) A
WHERE RN=1;