Postgresql incremental and conditional filtering - sql

I have this table
ID value user stock
----|--------|---------|---------
1 | 10 | mark | AAPL
2 | 20 | rob | GOOG
3 | 30 | mark | AAPL
4 | -40 | mark | AAPL
5 | -10 | rob | GOOG
6 | 25 | mark | GOOG
7 | 5 | mark | GOOG
8 | 45 | mark | AAPL
I would like to build a query (possibly without using any PGSQL function) that returns the rows shown below. It should start in order (ID ASC) summing "value" column grouped by user,stock. If the temporary sum is 0, all the previous rows (for that group) will be discarded.
id value user stock
----|--------|---------|---------
2 | 20 | rob | GOOG
5 | -10 | rob | GOOG
6 | 25 | mark | GOOG
7 | 5 | mark | GOOG
8 | 45 | mark | AAPL
I think that OVER (PARTITION BY) and WINDOW function should be used
SELECT *, SUM(value) OVER w AS scm
FROM "mytable"
WINDOW w AS (PARTITION BY user,stock ORDER BY id ASC)
this returns next table
ID value user stock scm
----|--------|---------|---------|-------
1 | 10 | mark | AAPL | 10
2 | 20 | rob | GOOG | 20
3 | 30 | mark | AAPL | 40
4 | -40 | mark | AAPL | 0
5 | -10 | rob | GOOG | 10
6 | 25 | mark | GOOG | 25
7 | 5 | mark | GOOG | 30
8 | 45 | mark | AAPL | 45
So this should be a good starting point, because it shows that APPL for mark is 0 (id=4) and for that group (AAPL,mark) I should keep all the following rows.
The rule is: for each group (stock,user) keep all the rows following the last row with scm=0

SQL Fiddle
with s as (
select *,
count(scm = 0 or null) over w z
from (
select *,
sum(value) over w as scm
from mytable
window w as (partition by "user", stock order by id asc)
) s
window w as (partition by "user", stock order by id asc)
)
select *
from
s
inner join
(
select max(z) z, "user", stock
from s
group by "user", stock
) z using (z, "user", stock)
where scm > 0
order by s.user, s.stock, id

Something like the following I think would get you want you want. Basically it will perform the following:
Use the SQL statement you have to compute the cumulative sums.
Compute the minimum ID that should be displayed for each (username, stock) group.
Select from the original SQL cumulative sum and filter out any IDs lower than the minimum ID.
WITH sums AS (
SELECT id, value, username, stock, SUM(value) OVER w AS scm
FROM "mytable"
WINDOW w AS (PARTITION BY user,stock ORDER BY id ASC)),
minimum_ids AS (
SELECT username, stock, MAX(id) as minimum_id
FROM sums
WHERE scm <= 0
GROUP BY username, stock)
SELECT sums.id, sums.value, sums.username, sums.stock, sums.scm
FROM sums
LEFT JOIN minimum_ids
ON (sums.username = minimum_ids.username
AND sums.stock = minimum_ids.stock)
WHERE (minimum_ids.minimum_id IS NULL OR sums.id > minimum_ids.minimum_id)
ORDER BY id;

Related

oracle query alternating records

I have a table that looks like this:
SEQ TICKER INDUSTRY
1 AAPL 10
1 FB 10
1 IBM 10
1 CSCO 10
1 FEYE 20
1 F 20
2 JNJ 10
2 CMPQ 10
2 CYBR 10
2 PFPT 10
2 K 20
2 PANW 20
What I need is record with the same industry code, to alternate between the 1 & 2 records like this:
1 AAPL 10
2 IBM 10
1 FB 10
2 CSCO 10
1 FEYE 20
2 PANW 20
So basically, grouped by the same industry code, alternate between the 1 & 2 records.
Can't figure out how.
Use an analytic function to create a row number that starts over for each group (industry and sequence), then sort by that row number.
select seq, ticker, industry
,row_number() over (partition by industry, seq order by ticker)custom_order
from stocks
order by industry, custom_order, seq;
See this SQL Fiddle for a full example. (It doesn't perfectly match your example results but either your example results are incorrect or there's something else to this question I don't understand.)
Don't see how you arrived at the example result in your question, but this result:
| SEQ | TICKER | INDUSTRY |
|-----|--------|----------|
| 1 | AAPL | 10 |
| 2 | CMPQ | 10 |
| 1 | CSCO | 10 |
| 2 | CYBR | 10 |
| 1 | FB | 10 |
| 2 | IBM | 10 |
| 1 | JNJ | 10 |
| 2 | PFPT | 10 |
| 1 | F | 20 |
| 2 | FEYE | 20 |
| 1 | K | 20 |
| 2 | PANW | 20 |
Was produced using this query, where (I assume) you want the SEQ column calculated for you:
select
1 + mod(rn,2) Seq
, ticker
, industry
from (
select
ticker
, industry
, 1+ row_number() over (partition by industry
order by ticker) rn
from stocks
)
order by industry, rn
Please note this is a derivative of the earlier answer by Jon Heller, this derivative can be found online at http://sqlfiddle.com/#!4/088271/1

Select Rows who's Sum Value = 80% of the Total

Here is an example the business problem.
I have 10 sales that resulted in negative margin.
We want to review these records, we generally use the 20/80 rule in reviews.
That is 20 percent of the sales will likely represent 80 of the negative margin.
So with the below records....
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
| 5 | 5 |
| 6 | 5 |
| 7 | 2 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
+----+-------+
I would want to return...
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
+----+-------+
The Total of Value is 106, 80% is then 84.8.
I need all the records, sorted descending who sum value gets me to at least 84.8
We use Microsoft APS PDW SQL, but can process on SMP if needed.
Assuming window functions are supported, you can use
with cte as (select id,value
,sum(value) over(order by value desc,id) as running_sum
,sum(value) over() as total
from tbl
)
select id,value from cte where running_sum < total*0.8
union all
select top 1 id,value from cte where running_sum >= total*0.8 order by value desc
One way is to use running totals:
select
id,
value
from
(
select
id,
value,
sum(value) over () as total,
sum(value) over (order by value desc) as till_here,
sum(value) over (order by value desc rows between unbounded preceding and 1 preceding)
as till_prev
from mytable
) summed_up
where till_here * 1.0 / total <= 0.8
or (till_here * 1.0 / total >= 0.8 and coalesce(till_prev, 0) * 1.0 / total < 0.8)
order by value desc;
This link could be useful, it calculates running totals:
https://www.codeproject.com/Articles/300785/Calculating-simple-running-totals-in-SQL-Server

Sum length of overlapping intervals

I've got a table in a Redshift database that contains intervals which are grouped and that potentially overlap, like so:
| interval_id | l | u | group |
| ----------- | -- | -- | ----- |
| 1 | 1 | 10 | A |
| 2 | 2 | 5 | A |
| 3 | 5 | 15 | A |
| 4 | 26 | 30 | B |
| 5 | 28 | 35 | B |
| 6 | 30 | 31 | B |
| 7 | 44 | 45 | B |
| 8 | 56 | 58 | C |
What I would like to do is to determine the length of the union of the intervals within group. That is, for each interval take u - l, sum over all group members and then subtract off the length of the overlaps between the intervals.
Desired result:
| group | length |
| ----- | ------ |
| A | 14 |
| B | 10 |
| C | 2 |
This question has been asked before, alas it seems that all of the solutions in that thread use features that Redshift doesn't support.
This is not difficult but requires multiple steps. The key is to define the "islands" within each group and then aggregate over those. Lots of subquerys, aggregations, and window functions.
select groupId, sum(ul)
from (select groupId, (max(u) - min(l) + 1) as ul
from (select t.*,
sum(case when prev_max_u < l then 1 else 0 end) over (order by l) as grp
from (select t.*,
max(u) over (order by l rows between unbounded preceding and 1 preceding) as prev_max_u
from t
) t
) t
group by groupid, grp
) g
group by groupId;
The idea is to determine if there is an overlap at the beginning of each record. For this purpose, it uses a cumulative max function on all preceding records. Then, it determines if there is an overlap by comparing the previous max with the current l -- a cumulative sum of overlaps defines a group.
The rest is just aggregation. And more aggregation.

Top Dense_Rank row based on other fields

I have several tables tied together in sql that I am trying to display only the MAX number from a column formulated using DENSE RANK but I need to keep in mind 2 other fields when pulling the TOP row.
Here is a sample of my result:
| sa_id | price | threshold | role_id | rk
1 | 37E41 | 40.00 | NULL | A38D67A | 1
2 | 37E41 | 40.00 | NULL | 46B9D4E | 1
3 | 1CFC1 | 40.00 | NULL | 58C1E03 | 1
4 | BF0D3 | 40.00 | NULL | 28D465B | 1
5 | F914B | 40.00 | NULL | 2920EBD | 1
6 | F3CA1 | 40.00 | NULL | D5E7584 | 1
7 | 0D8C1 | 40.00 | NULL | EECDB5A | 1
8 | A6503 | 40.00 | NULL | B680CB4 | 1
9 | 9BB96 | 40.00 | 0.01 | D66E612 | 1
10 | 9BB96 | 40.00 | 20.03 | D66E612 | 2
11 | 9BB96 | 40.00 | 40.03 | D66E612 | 3
12 | 9BB96 | 40.00 | 60.03 | D66E612 | 4
13 | 9BB96 | 40.00 | 80.03 | D66E612 | 5
What I am hoping to accomplish is to display all columns in this screenshot using the highest value for rk (calculated using DENSE RANK) where price > threshold and the sa_id & role_id are unique.
In this case I would want to display the following rows only: 1, 2, 3, 4, 5, 6, 7, 8, 10
Is this possible?
SELECT
servicerate_audit_id as sa_id
,ticket_price as price
,threshold_threshold/100.00 as threshold
,charge_role.chargerole_id as role_id
,DENSE_RANK() OVER(
PARTITION BY threshold_audit_id
ORDER BY
ISNULL(threshold_threshold,9999999),
threshold_threshold
) as rk
FROM sts_service_charge_rate
INNER JOIN ts_threshold
ON threshold_id = servicerate_threshold_id
INNER JOIN ts_charge_role as charge_role
ON chargerole_id = servicerate_charge_role_id
If you can modify your original query:
SELECT *
FROM (
SELECT
servicerate_audit_id as sa_id
,ticket_price as price
,threshold_threshold/100.00 as threshold
,charge_role.chargerole_id as role_id
,DENSE_RANK() OVER(
PARTITION BY threshold_audit_id
ORDER BY
ISNULL(threshold_threshold,9999999),
threshold_threshold
) as rk
,DENSE_RANK() OVER(
ORDER BY
ISNULL(threshold_threshold,9999999) DESC,
threshold_threshold DESC
) as rk_inverse
FROM sts_service_charge_rate
INNER JOIN ts_threshold
ON threshold_id = servicerate_threshold_id
INNER JOIN ts_charge_role as charge_role
ON chargerole_id = servicerate_charge_role_id
) t
WHERE price > COALESCE(threshold, 0)
AND t.rk_inverse = 1
Observe I just added an inverse calculation of your ranking and filtered for the top rk_inverse per partition. I'm assuming that the PARTITION BY threshold_audit_id and your requirement of having unique (sa_id, role_id) tuples are functionally dependent. Otherwise, your rk_inverse calculation would need to take into consideration a different PARTITION BY clause.
If you cannot modify your original query:
You can calculate another window function that orders your rk values descendingly (highest first) per your partition (sa_id, role_id), and then take only the top one per partition:
SELECT sa_id, price, threshold, role_id, rk
FROM (
SELECT result.*, row_number() OVER (PARTITION BY sa_id, role_id ORDER BY rk DESC) rn
FROM (... original query ...)
WHERE price > COALESCE(threshold, 0)
) t
WHERE rn = 1

SQL query to select today and previous day's price

I have historic stock price data that looks like the below. I want to generate a new table that has one row for each ticker with the most recent day's price and its previous day's price. What would be the best way to do this? My database is Postgres.
+---------+------------+------------+
| ticker | price | date |
+---------+------------+------------|
| AAPL | 6 | 10-23-2015 |
| AAPL | 5 | 10-22-2015 |
| AAPL | 4 | 10-21-2015 |
| AXP | 5 | 10-23-2015 |
| AXP | 3 | 10-22-2015 |
| AXP | 5 | 10-21-2015 |
+------- +-------------+------------+
You can do something like this:
with ranking as (
select ticker, price, dt,
rank() over (partition by ticker order by dt desc) as rank
from stocks
)
select * from ranking where rank in (1,2);
Example: http://sqlfiddle.com/#!15/e45ea/3
Results for your example will look like this:
| ticker | price | dt | rank |
|--------|-------|---------------------------|------|
| AAPL | 6 | October, 23 2015 00:00:00 | 1 |
| AAPL | 5 | October, 22 2015 00:00:00 | 2 |
| AXP | 5 | October, 23 2015 00:00:00 | 1 |
| AXP | 3 | October, 22 2015 00:00:00 | 2 |
If your table is large and have performance issues, use a where to restrict the data to last 30 days or so.
Best bet is to use a window function with an aggregated case statement which is used to create a pivot on the data.
You can see more on window functions here: http://www.postgresql.org/docs/current/static/tutorial-window.html
Below is a pseudo code version of where you may need to head to answer your question (sorry I couldn't validate it due to not have a postgres database setup).
Select
ticker,
SUM(CASE WHEN rank = 1 THEN price ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price ELSE 0 END) yesterday
FROM (
SELECT
ticker,
price,
date,
rank() OVER (PARTITION BY ticker ORDER BY date DESC) as rank
FROM your_table) p
WHERE rank in (1,2)
GROUP BY ticker.
Edit - Updated the case statement with an 'else'