select top 5 max records in "High" column and 5 min records from "Low" Column in same query and from same table partitioned by stock name - sql

we have 6 months historic data and need to find out what is the top 2 max highs and top 2 min lows per each stock for all the stocks. Below is the sample data
Stock High Low Date prevclose ....
------------------------------------
ABB 100 75 29/12/2019 90
ABB 83 50 30/12/2019 87
ABB 73 45 30/12/2019 87
infy 1000 675 29/12/2019 900
infy 830 650 30/12/2019 810
infy 730 645 30/12/2019 788
I tried the following queries, but not getting the expected results.. I need results such as top 2 high rows and top 3 min low in one result set. I tried below query but no luck..
select * into SRTrend from (
--- Resistance
select * from (Select top (5) with ties 'H' as 'Resistance', RowN=Row_Number() over(partition by name order by High desc),* from Historic
order by Row_Number() over(partition by name order by High desc))B
Union all
--Support
select * from (Select top (5) with ties 'L' as 'Support', RowN=Row_Number() over(partition by name order by Low asc),* from Historic
--where name='ABB'
order by Row_Number() over(partition by name order by Low asc))C
)D
PS: Hurdles which I faced is when I tried to export data to another table, getting very messed up results instead of getting top 2 max(highs) and top3 min(lows), I am getting single rows.

You can use rank() as follows:
select *
from (
select
t.*,
rank() over(partition by stock order by high desc) rn_high,
rank() over(partition by stock order by low asc) rn_low
from mytable t
) t
where rn_high <= 2 or rn_low <= 3
The inner query ranks records twice, by descending high and ascending low within groups of stocks. Then the outer query filters on top 2 and bottom 3 per stock (ties included).

Related

SQL Query Problem Involving (SUM, Group By, Order by, I guess? and maybe total, or even count)

By using SQL query, find out the Top 5 highest total Transaction Value, which Industry are they? and the number of stores in that industry?
My SQL data looks like this:
Store Name
Industry
Transaction Value
Ace
A
196
Ace
A
193
Area
A
168
Apple
A
165
Boy
B
145
Boy
B
143
Bull
B
136
Bread
B
131
Cat
C
116
Cat
C
106
Cake
C
104
Candy
C
102
Dog
D
101
Dog
D
92
Door
D
80
Daddy
D
75
Egg
E
70
Egg
E
67
Earl
E
66
Eagle
E
61
This is just for your reference, Top 5 highest Transaction Value are:
No.
Store Name
Industry
Total Transaction Value
1
Ace
A
389
2
Boy
B
288
3
Cat
C
222
4
Dog
D
193
5
Area
A
168
SQL Query Results should look something like this:
Industry
No. of Stores
A
2
B
1
C
1
D
1
E
0
select a.industry, sum(case when b.name is null then 0 else 1 end) as no
from
(select distinct industry from transactions ) a
left join
(select name, industry
from transactions
group by name, industry
order by sum(transaction_vaule) desc limit 5) b
on a.industry = b.industry
group by a.industry
order by a.industry
I think I have a solution for you. Please check my code I have used Common Table Expression ,CASE,SUM and group by =>
WITH CTE AS
(
SELECT industry, SUM(TransactionValue) AS Transaction_Value,
COUNT(StoreName) AS StoreCount FROM MYTable
GROUP BY StoreName,industry
ORDER BY SUM(TransactionValue) DESC
Limit 5
)
SELECT T1.industry,
SUM((CASE WHEN c.industry IS NULL THEN 0
ELSE 1 END)) as CT
FROM
(SELECT DISTINCT Industry FROM MYTable) AS T1
LEFT JOIN CTE as c ON T1.industry=c.industry
GROUP BY T1.industry
Note: Subquery is not best practice, but in your case, I think there will be no performance issue. Also, please check the code because, I do not have Snowflake SQL database installed, so there might be some syntactical error can be evident
.
To get a deterministic result, you must be aware of ties. Let's say the top 9 results are
Cat/A/600, Dog/A/500, Cat/B/500, Dog/B/400, Cat/C/300, Dog/C/300, Cat/D/300, Dog/D/200, Cat/E/100
Which is the top fifth? Cat/C/300 or Dog/C/300 or Cat/D/300? Or none of them? If we pick a row arbitrarily (by LIMIT 5 or FETCH FIRST 5 ROWS ONLY) we prefer one industry over another.
In standard SQL we have the clause FETCH FIRST 5 ROWS WITH TIES, but snowflake doesn't feature this, unfortunately. It does however feature DENSE_RANK. It ranks my sample rows thus:
#1: Cat/A/600
#2: Dog/A/500
#2: Cat/B/500
#3: Dog/B/400
#4: Cat/C/300
#4: Dog/C/300
#4: Cat/D/300
#5: Dog/D/200
#6: Cat/E/100
because the five top values are 600, 500, 400, 300, and 200.
The query:
select industry, count(case when rnk <= 5 then 1 end) as stores
from
(
select industry, dense_rank() over (order by sum(transaction_value) desc) as rnk
from mytable
group by store_name, industry
) ranked
group by industry
order by industry;
If you only want to show top industries:
select industry, count(*) as stores
from
(
select industry, dense_rank() over (order by sum(transaction_value) desc) as rnk
from mytable
group by store_name, industry
) ranked
where rnk <= 5
group by industry
order by industry;

Can't get the cumulative sum(running total) within a group in SQL Server

I'm trying to get a running total within a group but my current code just gives me an aggregate sum.
For example, my data looks like this
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount
12542 1 Full A 1 12.5 40 500
12542 1 Full A 1 12.5 35 420
12542 2 Full A 1 10 40 400
12542 2 Full B 1.2 10 40 480
17842 1 Full A 1 11 27 297
17842 1 Full B 1.3 11 30 429
And what I want is a running total within the same ID, Shift Number, and Status. For example, I want something like this as my final result
ID ShiftNum Status Type Rate HourlyWage Hours Total_Amount Running_Tot
12542 1 Full A 1 12.5 40 500 500
12542 1 Full A 1 12.5 35 420 920
12542 2 Full A 1 10 40 400 400
12542 2 Full B 1.2 10 40 480 880
17842 1 Full A 1 11 27 297 297
17842 1 Full B 1.3 11 30 429 726
However, my current code just gives me the total sum within each group. For example, 920, 920 for row 1&2. Here's my code.
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY ID, ShiftNum, Status) as Runnint_Tot
from table a
How do I fix my code to get the final result I want?
You need an ordering column that uniquely defines each row. There is not an obvious one in your row, but something like this:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY hours) as Running_Tot
Or:
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status
ORDER BY (SELECT NULL)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as Running_Tot
The problem you are facing is because the ORDER BY keys have ties. The default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note the RANGE. That means that all rows with ties are combined.
Also note that there is no utility to including the PARTITION BY keys in the ORDER BY (well . . . there is one exception in SQL Server if you don't care about the ordering, then including a key can be a handy short-cut). The ordering occurs within a partition.
If your rows can have exact duplicates, I would first suggest that you add a primary key. But, in the meantime, you could use:
with a as (
select a.*,
row_number() over (order by id, shiftnum, status) as seqnum
from tablea a
)
Select a.*,
SUM(Hours) OVER (PARTITION BY ID, ShiftNum, Status ORDER BY seqnum) as Running_Tot
from a;
The ordering will be arbitrary, but it will at least accumulate.

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.
Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.
I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;
This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

Grouping of Similar data by amount in Oracle

I have a txn table with columns ac_id, txn_amt. It will store the data txn amounts along with account ids. Below is example of data
AC_ID TXN_AMT
10 1000
10 1000
10 1010
10 1030
10 5000
10 5010
10 10000
20 32000
20 32200
20 5000
I want to write a query in such a way that all the amounts which are within 10% range of the previous amounts should be grouped together. Output should be something like this:
AC_ID TOTAL_AMT TOTAL_CNT GROUP
10 4040 4 1
10 10010 2 2
20 64200 2 3
20 5000 1 4
I tried with LAG function but still clueless. This is the code snippet I tried:
select ac_id, txn_amt, round((((txn_amt - lag(txn_amt, 1) over (partition by ac_id order by ac_id, txn_amt))/txn_amt)*100,2) as amt_diff_pct from txn;
Any clue or help will be highly appreciated.
If by previous you mean "the largest amount less than", then you can do this. You can find where the gaps are (i.e. larger than a 10% difference). Then you can assign a group by counting the number of gaps:
select ac_id, sum(txn_amt) as total_amt, count(*) as total_cnt, grp
from (select t.*,
sum(case when prev_txn_amt * 1.1 > txn_amt then 0 else 1 end) over
(partition by ac_id order by txn_amt) as grp
from (select t.*,
lag(txn_amt) over (partition by ac_id order by txn_amt) as prev_txn_amt
from txn t
) t
) t
group by ac_id, grp;