Calculate Returns in Oracle SQL - sql

I have below data available with me
Date Sec ID Price
01-Jan-2014, 1, 100
02-Jan-2014, 1, 111
03-Jan-2014, 1, 90
04-Jan-2014, 1, 121
01-Jan-2014, 2, 10
02-Jan-2014, 2, 11
03-Jan-2014, 2, 9
04-Jan-2014, 2, 12
I am using the lag function using below query but not getting proper results
select sec_id,date_of_data,price,
LAG(sec_id,1) over (order by sec_id) as prev_sec_id,
LAG(date_of_data,1) over (order by sec_id) as prev_date,
LAG(price,1) over (order by sec_id) as prev_price,
price/LAG(price,1) over (order by sec_id)-1 as price_return
from eqa.asset_mkt_price_ts
where sec_id in (1,2);
and date_of_data between '01-Jan-2014' and '04-Jan-2014'
Results are as below
Date Sec ID Price Prev Sec ID Prev Price
01-Jan-2014, 1, 100, NULL, NULL
02-Jan-2014, 1, 111, 1, 100
03-Jan-2014, 1, 90, 1, 111
04-Jan-2014, 1, 121, 1, 90
01-Jan-2014, 2, 10, 1, 121 ----- Issue Case
02-Jan-2014, 2, 11, 2, 10
03-Jan-2014, 2, 9, 2, 11
04-Jan-2014, 2, 12, 2, 12
As seen above, results are not logical as For Sec ID: 2, Previous Price is being used of Sec ID: 1 which is not correct
Hope any expert around here can help me
Thanks
Hitesh

You need to replace order by sec_id with partition by sec_id order by date. Using order by sec_id produces an analytic window of the whole input table ordered by sec_id, which could give unpredictable results and will always get the previous row regardless of whether a new sec_id group is started.
Partitioning by sec_id gives two analytic windows, so the lag function works as you would like it to:
select x.*
from
(select sec_id,date_of_data,price,
LAG(sec_id,1) over (partition by sec_id order by date) as prev_sec_id,
LAG(date_of_data,1) over (partition by sec_id order by date) as prev_date,
LAG(price,1) over (partition by sec_id order by date) as prev_price,
price/LAG(price,1) over (partition by sec_id order by date)-1 as price_return
from eqa.asset_mkt_price_ts
where sec_id in (1,2)) x where x.price_return < 0.3;

Related

How to get unique results by best position

I have a large PostgreSQL DB table. From this table I need to take rows grouped by Car_id and position columns.
The problem is that I have a lot of duplicates and need to take one row with the best position.
I wrote a sql example that gave me the correct results, but it needs to be modified. Or how can I do it in a cleaner way?
And I need to choose a unique car_id, with a minimum position, last by date of scrape, of all passed license plate numbers, I am not interested in what particular license plate number will be.
Example of SQL:
select
"eventDate",
"Car_id",
min("position") as "carPosition",
groupArray(concat(toString("scrapedAt"), '_', toString("position"))) as "scrapedAtByPosition",
groupArray(concat("licensePlate", '_', toString("position"))) as "licensePlateByPosition",
groupArray(concat(toString("amazonChoice"), '_', toString("position"))) as "amazonChoicesByPosition",
'organic' as "matchType"
from "Car1_ScrapeHistoryLicensePlate"
inner join (
select "Car_id", max("scrapedAt") as "scrapedAt"
from "Car1_ScrapeHistoryLicensePlate"
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "Car_id", "eventDate"
) as t1 USING ("Car_id", "scrapedAt")
where "licensePlate" IN ('ALPR912', 'JGPD831') and "eventDate" between '2022-08-12' and '2022-09-12'
group by "eventDate", "Car_id"
order by "eventDate" desc;
Database records:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, APRJSC512, 1, 1660000001
2022-09-10, 1, APRJSC512, 1, 1660000002
2022-09-10, 1, PLBQWN035, 1, 1660000003
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-10, 1, PLBQWN035, 2, 1660000002
2022-09-11, 2, APRJSC512, 1, 1660000011
2022-09-11, 2, APRJSC512, 2, 1660000022
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-11, 2, PLBQWN035, 2, 1660000044
2022-09-11, 2, PLBQWN035, 5, 1660000022
2022-09-12, 3, APRJSC512, 3, 1660000111
2022-09-12, 3, PLBQWN035, 3, 1660000222
2022-09-13, 4, PLBQWN035, 4, 1660001111
2022-09-14, 5, PLBQWN035, 5, 1660011111
Expected result:
eventDate Car_id licensePlate position scrapedAt
---------- ------ ------------ ------- ---------
2022-09-10, 1, PLBQWN035, 1, 1660000004
2022-09-11, 2, PLBQWN035, 1, 1660000033
2022-09-12, 3, PLBQWN035, 3, 1660000222
In PostgreSQL you can use brilliant distinct on.
The order by list of expressions expressions determine which record to be picked for each car_id. For each group with the same car_id the first one is picked.
select distinct on (car_id) * -- or the relevant expression list here
from the_table
order by car_id, position, scrapedat desc;
DB-fiddle
select eventDate
,Car_id
,licensePlate
,position
,scrapedAt
from
(
select *
,row_number() over(partition by car_id order by position, scrapedat desc) as rn
from t
) t
where rn = 1
eventdate
car_id
licenseplate
position
scrapedat
2022-09-10
1
PLBQWN035
1
1660000004
2022-09-11
2
PLBQWN035
1
1660000033
2022-09-12
3
PLBQWN035
3
1660000222
2022-09-13
4
PLBQWN035
4
1660001111
2022-09-14
5
PLBQWN035
5
1660011111
Fiddle

Cumulative sum of a column

I have a table that has the below data.
COUNTRY LEVEL NUM_OF_DUPLICATES
US 9 6
US 8 24
US 7 12
US 6 20
US 5 39
US 4 81
US 3 80
US 2 430
US 1 178
US 0 430
I wrote a query that will calculate the sum of cumulative rows and got the below output .
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
US 1 178 570
US 0 254 824
Now I want to to filter the data and take only where the POOL <=300, if the POOL field does not have the value 300 then I should take the first value after 300. So, in the above example we do not have the value 300 in the field POOL, so we take the next immediate value after 300 which is 392. So I need a query so that I can pull the records POOL <= 392(as per the example above) which will yield me the output as
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
Please let me know your thoughts. Thanks in advance.
declare #t table(Country varchar(5), Level int, Num_of_Duplicates int)
insert into #t(Country, Level, Num_of_Duplicates)
values
('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130/*-92*/),
('US', 1, 178),
('US', 0, 430);
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc),
(sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates) / 300 as flag,--any row which starts before 300 will have flag=0
--or
case when sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates < 300 then 1 else 0 end as startsbefore300
from #t;
select *
from
(
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc) as Pool
from #t
) as t
where Pool - Num_of_Duplicates < 300 ;
The logic here is quite simple:
Calculate the running sum POOL value up to the current row.
Filter rows so that the previous row's total is < 300, you can either subtract the current row's value, or use a second sum
If the total up to the current row is exactly 300, the previous row will be less, so this row will be included
If the current row's total is more than 300, but the previous row is less then it will also be included
All higher rows are excluded
It's unclear what ordering you want. I've used NUM_OF_DUPLICATES column ascending, but you may want something else
SELECT
COUNTRY,
LEVEL,
NUM_OF_DUPLICATES,
POOL
FROM (
SELECT *,
POOL = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING)
-- alternative calculation
-- ,POOLPrev = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM YourTable
) t
WHERE POOL - NUM_OF_DUPLICATES < 300;
-- you could also use POOLPrev above
I used two temp tables to get the answer.
DECLARE #t TABLE(Country VARCHAR(5), [Level] INT, Num_of_Duplicates INT)
INSERT INTO #t(Country, Level, Num_of_Duplicates)
VALUES ('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130),
('US', 1, 178),
('US', 0, 254);
SELECT
Country
,Level
, Num_of_Duplicates
, SUM (Num_of_Duplicates) OVER (ORDER BY id) AS [POOL]
INTO #temp_table
FROM
(
SELECT
Country,
level,
Num_of_Duplicates,
ROW_NUMBER() OVER (ORDER BY country) AS id
FROM #t
) AS A
SELECT
[POOL],
ROW_NUMBER() OVER (ORDER BY [POOL] ) AS [rank]
INTO #Temp_2
FROM #temp_table
WHERE [POOL] >= 300
SELECT *
FROM #temp_table WHERE
[POOL] <= (SELECT [POOL] FROM #Temp_2 WHERE [rank] = 1 )
DROP TABLE #temp_table
DROP TABLE #Temp_2

SQL Server: grouping on rows

I have data like below:
I want to group the rows for the same visitors having purchase = 1 and all their previous visits where purchase = 0. For the above data example, the rows should be grouped as:
Rows 1 and 2 should be grouped together (because visit_id 1002 has purchase = 1 and visit_id 1001 is the previous visit before the purchase having purchase = 0)
Row 3 should be grouped alone (because visit_id 1003 has purchase = 1 and there is no previous visit to visit_id 1003 having purchase = 0) (visit_id 1001 cannot be considered as the previous visit of visit_id 1003 because visit_id 1002 occurred between 1001 and 1003 and it has purchase = 1)
Row 4 should be grouped alone (because visit_id 2001 does not have any previous visit)
Rows 5,6 and 7 should be grouped together (because visit_id 2004 has purchase = 1 and visit_ids 2002 and 2003 are the previous visits which have purchase = 0)
How could this be achieved? I am using SQL Server 2012.
I am expecting output similar to below:
Code to generate the above data:
CREATE TABLE [#tmp_data]
(
[visitor] INT,
[visit_id] INT,
[visit_time] DATETIME,
[purchase] BIT
);
INSERT INTO #tmp_data( visitor, visit_id, visit_time, purchase )
VALUES( 1, 1001, '2020-01-01 10:00:00', 0 ),
( 1, 1002, '2020-01-02 11:00:00', 1 ),
( 1, 1003, '2020-01-02 14:00:00', 1 ),
( 2, 2001, '2020-01-01 10:00:00', 1 ),
( 2, 2002, '2020-01-07 11:00:00', 0 ),
( 2, 2003, '2020-01-08 14:00:00', 0 ),
( 2, 2004, '2020-01-11 14:00:00', 1 );
I'm not sure what you mean by "grouped". But your description of a grouping is the number of 1 values on or after a given value. So, this assigns a value per visitor
select td.*,
sum(case when purchase = 1 then 1 else 0 end) over (partition by visitor order by visit_time desc) as grouping
from #tmp_data td;
This can be simplified to:
select td.*,
sum( convert(int, purchase) ) over (partition by visitor order by visit_time desc) as grouping
from tmp_data td
order by visitor, visit_time;
Note: This just assigns a "grouping". You can aggregate however you want after that.
Here is a db<>fiddle.

SQL Server : group by consecutive

I have this table:
CREATE TABLE yourtable
(
HevEvenementID INT,
HjvNumeSequJour INT,
HteTypeEvenID INT
);
INSERT INTO yourtable
VALUES (12074, 1, 66), (12074, 2, 66), (12074, 3, 5),
(12074, 4, 7), (12074, 5, 17), (12074, 6, 17),
(12074, 7, 17), (12074, 8, 17), (12074, 9, 17), (12074, 10, 5)
I need to group by consecutive HteTypeEvenID. Right now I am doing this:
SELECT
HevEvenementID,
MAX(HjvNumeSequJour) AS HjvNumeSequJour,
HteTypeEvenID
FROM
(SELECT
HevEvenementID,
HjvNumeSequJour,
HteTypeEvenID
FROM
yourtable y) AS s
GROUP BY
HevEvenementID, HteTypeEvenID
ORDER BY
HevEvenementID,HjvNumeSequJour, HteTypeEvenID
which returns this:
HevEvenementID HjvNumeSequJour HteTypeEvenID
---------------------------------------------
12074 2 66
12074 4 7
12074 9 17
12074 10 5
I need to group by consecutive HteTypeEvenID, to get this result:
HevEvenementID HjvNumeSequJour HteTypeEvenID
----------------------------------------------
12074 2 66
12074 3 5
12074 4 7
12074 9 17
12074 10 5
Any suggestions?
In SQL Server, you can do this with aggregation and difference of row numbers:
select HevEvenementID, HteTypeEvenID,
max(HjvNumeSequJour)
from (select t.*,
row_number() over (partition by HevEvenementID order by HjvNumeSequJour) as seqnum_1,
row_number() over (partition by HevEvenementID, HteTypeEvenID order by HjvNumeSequJour) as seqnum_2
from yourtable t
) t
group by HevEvenementID, HteTypeEvenID, (seqnum_1 - seqnum_2)
order by max(HjvNumeSequJour);
I think the best way to understand how this works is by staring at the results of the subquery. You will see how the difference between the two values defines the groups of adjacent values.

How to Dense Rank Sets of data

I am trying to get a dense rank to group sets of data together. In my table I have ID, GRP_SET, SUB_SET, and INTERVAL which simply represents a date field. When records are inserted using an ID they get inserted as GRP_SETs of 3 rows shown as a SUB_SET. As you can see when inserts happen the interval can change slightly before it finishes inserting the set.
Here is some example data and the DRANK column represents what ranking I'm trying to get.
with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)
select * from q
Example Data
ID GRP_SET SUBSET INTERVAL DRANK
1 a 1 123 1
1 a 2 123 1
1 a 3 124 1
1 b 1 234 2
1 b 3 235 2
1 b 2 235 2
1 a 1 331 3
1 a 2 331 3
1 a 3 331 3
Here is the query I Have that gets close but I seem to need something like a:
Partition By: ID
Order within partition by: ID, Interval
Change Rank when: ID, GRP_SET (change)
select
id, GRP_SET, SUB_SET, interval,
DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
id, interval
Using the MODEL clause
Behold for you are pushing your requirements beyond the limits of what is easy to express in "ordinary" SQL. But luckily, you're using Oracle, which features the MODEL clause, a device whose mystery is only exceeded by its power (excellent whitepaper here). You shall write:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
MODEL PARTITION BY (id)
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
MEASURES (grp_set, sub_set, interval, drank)
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
Proof on SQLFiddle
Explanation:
SELECT
id, grp_set, sub_set, interval, drank
FROM (
-- Here, we initialise your "dense rank" to 1
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)
-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
-- These are the columns that we want to generate from the MODEL clause
MEASURES (grp_set, sub_set, interval, drank)
-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)
Of course, this only works if there are no interleaving events, i.e. there is no other grp_set than a between 123 and 124
This might work for you. The complicating factor is that you want the same "DENSE RANK" for intervals 123 and 124 and for intervals 234 and 235. So we'll truncate them to the nearest 10 for purposes of ordering the DENSE_RANK() function:
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
FROM q
Please see SQL Fiddle demo here.
If you want the intervals to be even closer together in order to be grouped together, then you can multiply the value before truncating. This would group them by 3s (but maybe you don't need them so granular):
SELECT id, grp_set, sub_set, interval, drank
, DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
FROM q