SQL Server - selecting an item based on the previous counter value (same foreign key) - sql

Not sure how to word this above so hopefully this will explain it better. I currently have a table of data as follows which is fetched using this query (the query is looking at a view)
CODE
SELECT
AppRunningPercentages.ProjectID,
AppRunningPercentages.AppID,
AppRunningPercentages.AppCounter,
AppRunningPercentages.PercentageComplete,
RunningPercentage= NULL
from AppRunningPercentages
where ProjectID = 123
DATA
ProjectID(FK) AppID AppCounter PercentageComplete RunningPercentage
123 1 1 50%
123 4 2 40%
123 7 3 10%
Based on my SELECT Statement the values above are shown, however I unsure on how to display the RunningPercentage. based on the above scenario I would like the table below to calculate them as follows within the same SELECT statement however I am unsure on how I can achieve this running total.
RunningPercentage
0%
50%
90%
when the AppCounter = 1, then I want the RunningPercentage to display as 0. This is so I can calculate a value correctly to the current percentage. It is effectively adding the previous percentages together, so when AppCounter = 1, then it is looking for an AppCounter with the value of 0.
When AppCounter = 2, it will add the 0% and the 50% together (50%)
When AppCounter = 3, it will add the 0%, 50% and 40% together (90%)
......And so on
Thankyou for any help with this

In SQL Server 2012+, you would use a cumulative sum:
select t.*,
(sum(PercentageComplete) over (partition by projectid
order by appcounter
) - PercentageComplete
) as RunningPercentage
from t;
Note: you can use a rows between clause instead of subtracting the value in the current row. I find subtracting the value in the current row to be simpler for this logic.
In early versions, you can use outer apply:
select t.*, coalesce(RunningPercentage, 0)
from t outer apply
(select sum(PercentageComplete) as RunningPercentage
from t t2
where t2.projectid = t.projectid and t2.appcounter < t.appcounter
) t2;

Related

Query smallest number of rows to match a given value threshold

I would like to create a query that operates similar to a cash register. Imagine a cash register full of coins of different sizes. I would like to retrieve a total value of coins in the fewest number of coins possible.
Given this table:
id
value
1
100
2
100
3
500
4
500
5
1000
How would I query for a list of rows that:
has a total value of AT LEAST a given threshold
with the minimum excess value (value above the threshod)
in the fewest possible rows
For example, if my threshold is 1050, this would be the expected result:
id
value
1
100
5
1000
I'm working with postgres and elixir/ecto. If it can be done in a single query great, if it requires a sequence of multiple queries no problem.
I had a go at this myself, using answers from previous questions:
Using ABS() to order by the closest value to the threshold
Select rows until a sum reduction of a single column reaches a threshold
Based on #TheImpaler's comment above, this prioritises minimum number of rows over minimum excess. It's not 100% what I was looking for, so open to improvements if anyone can, but if not I think this is going to be good enough:
-- outer query selects all rows underneath the threshold
-- inner subquery adds a running total column
-- window function orders by the difference between value and threshold
SELECT
*
FROM (
SELECT
i.*,
SUM(i.value) OVER (
ORDER BY
ABS(i.value - $THRESHOLD),
i.id
) AS total
FROM
inputs i
) t
WHERE
t.total - t.value < $THRESHOLD;

Filtering Rows in SQL

My data looks like this: Number(String), Number2(String), Transaction Type(String), Cost(Integer)
enter image description here
For number 1, Cost 10 and -10 cancel out so the remaining cost is 100
For number 2, Cost 50 and -50 cancel out, Cost 87 and -87 cancel out
For number 3, Cost remains 274
For number 4, Cost 316 and -316 cancel out, 313 remains as the cost
The output I am looking for Looks like this:
How do I do this in SQL?
I have tried "sum(price)" and group by "number", but oracle doesn't let me get results because of other columns
https://datascience.stackexchange.com/questions/47572/filtering-unique-row-values-in-sql
When you're doing an aggregate query, you have to pick one value for each column - either by including it in the group by, or wrapping it in an aggregate function.
It's not clear what you want to display for columns 2 and 3 in your output, but from your example data it looks like you're taking the MAX, so that's what I did here.
select number, max(number2), max(transaction_type), sum(cost)
from my_data
group by number
having sum(cost) <> 0;
Oracle has very nice functionality equivalent toe first() . . . but the syntax is a little cumbersome:
select number,
max(number2) keep (dense_rank first order by cost desc) as number2,
max(transaction_type) keep (dense_rank first order by cost desc) as transaction_type,
max(cost) as cost
from t
group by number;
In my experience, keep has good performance characteristics.
You're almost there... you'll need to get the sum for each number without the other columns and then join back to your table.
select * from table t
join
(select number,sum(cost)
from table
group by number) sums on sums.number=t.number
You can use correlated subquery :
select t.*
from table t
where t.cost = (select sum(t1.cost) from table t1 where t1.number = t.number);

Distribute records based on various percentages with tsql

I have a table of items with around 800k rows. I need to create a SQL statement that allows my users to pass in various percentages that will total 100% and be limited to 5 percentages. These are then used to group the rows by a group number of each percentage.
For example, a user may request rows to be split using to following random percentages (user decides percentages):
1. 20%, 20%, 30%, 30%
2. 12%, 12%, 12%, 12%, 52%
3. 30%, 30%, 40%
4. 100%
Based on above percentages, I need to return the following:
Field 1 | Field 2 | Group
--------------------------------
Data | Data | 1
Data | Data | 1
The group would represent a number corresponding to the percentages. So for example percentages #1 above, there would be 4 groups with the first group's records being the 1st 20% of all items selected, group 2 being the next 20%, the 3rd group being the next 30%, and the 4th group being the last 30%. Therefore, if there were a total of 200 records, group 1 should have 40 records, group 2 have 40, group 3 have 60, and group 4 have 60.
Sorry if I'm over explaining this but trying to reduce any ambiguity in my question so it's clear.
This data is stored in Azure SQL so any solution provided can use anything Azure SQL and/or SQL 2016 (in most cases) offers.
Thanks in advance to the SQL geniuses out there that are sure to make me feel appreciative and inferior all at the same time! :)
Passing in the percentages is the hard part. The work is done by percent_rank():
with p as (
select ind, p, (sum(p) over (order by ind) - p) as cume_p
from (values (1, 0.2), (2, 0.2), (3, 0.3), (4, 0.4)) v(ind, p)
)
select t.*, v.grp
from (select t.*, percent_rank() over (order by ?) as pr
from t
) t cross apply
(select max(ind)
from p
where p.cume_p <= t.pr
) v(grp);

SELECT DISTINCT is not working

Let's say I have a table name TableA with the below partial data:
LOOKUP_VALUE LOOKUPS_CODE LOOKUPS_ID
------------ ------------ ----------
5% 120 1001
5% 121 1002
5% 123 1003
2% 130 2001
2% 131 2002
I wanted to select only 1 row of 5% and 1 row of 2% as a view using DISTINCT but it fail, my query is:
SELECT DISTINCT lookup_value, lookups_code
FROM TableA;
The above query give me the result as shown below.
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
5% 121
5% 123
2% 130
2% 131
But that is not my expected result, mt expected result is shown below:
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
2% 130
May I know how can I achieve this without specifying any WHERE clause?
Thank you!
I think you're misunderstanding the scope of DISTINCT: it will give your distinct rows, not just distinct on the first field.
If you want one row for each distinct LOOKUP_VALUE, you either need a WHERE clause that will work out which one of them to show, or an aggregation strategy with a GROUP BY clause plus logic in the SELECT that tells the query how to aggregate the other columns (e.g. AVG, MAX, MIN)
Here's my guess at your problem - when you say
"The above query give me the result as shown in the data table above."
this is simply not true - please try it and update your question accordingly.
I am speculating here: I think you are trying to use "Distinct" but also output the other fields. If you run:
select distinct Field1, Field2, Field3 ...
Then your output will be "one row per distinct combination" of the 3 fields.
Try GROUP BY instead - this will let you select the Max, Min, Sum of other fields while still yielding "one row per unique combined values" for fields included in GROUP BY
example below uses your table to return one row per LOOKUP_VALUE and then the max and min of the remaining fields and the count of total records using your data:
select
LOOKUP_VALUE, min( LOOKUPS_CODE) LOOKUPS_CODE_min, max( LOOKUPS_CODE) LOOKUPS_CODE_max, min( LOOKUPS_ID) LOOKUPS_ID_min, max( LOOKUPS_ID) LOOKUPS_ID_max, Count(*) Record_Count
From TableA
Group by LOOKUP_VALUE
I wanted to select only 1 row of 5% and 1 row of 2%
This will get the lowest value lookups_code for each lookup_value:
SELECT lookup_value,
lookups_code
FROM (
SELECT lookup_value,
lookups_code,
ROW_NUMBER() OVER ( PARTITION BY lookup_value ORDER BY lookups_code ) AS rn
FROM TableA
)
WHERE rn = 1
You could also use GROUP BY:
SELECT lookup_value,
MIN( lookups_code ) AS lookups_code
FROM TableA
GROUP BY lookup_value
How about the MIN() function
I believe this works for your desired output, but am currently not able to test it.
SELECT Lookup_Value, MIN(LOOKUPS_CODE)
FROM TableA
GROUP BY Lookup_Value;
I'm going to take a total shot in the dark on this one, but because of the way you have named your fields it implies you are attempting to mimic the vlookup function within Microsoft Excel. If this is the case, the behavior when there are multiple matches is to pick the first match. As arbitrary as that sounds, it's the way it works.
If this is what you want, AND the first value is not necessarily the lowest (or highest, or best looking, or whatever), then the row_number aggregate function would probably suit your needs.
I give you a caveat that my ordering criteria is based on the database row number, which could conceivably be different than what you think. If, however, you insert them into a clean table (with a reset high water mark), then I think it's a pretty safe bet it will behave the way you want. If not, then you are better off including a field explicitly to tell it what order you want the choice to occur.
with cte as (
select
vlookup_value,
vlookups_code,
row_number() over (partition by vlookup_value order by rownum) as rn
from
TableA
)
select
vlookup_value, vlookups_code
from cte
where rn = 1

Split a query result based on the result count

I have a query based on basic criteria that will return X number of records on any given day.
I'm trying to check the result of the basic query then apply a percentage split to it based on the total of X and split it in 2 buckets. Each bucket will be a percentage of the total query result returned in X.
For example:
Query A returns 3500 records.
If the number of records returned from Query A is <= 3000, then split the 3500 records into a 40% / 60% split (1,400 / 2,100).
If the number of records returned from Query A is >=3001 and <=50,000 then split the records into a 10% / 90% split.Etc. Etc.
I want the actual records returned, and not just the math acting on the records that returns one row with a number in it (in the column).
I'm not sure how you want to display different parts of the resulting set of rows, so I've just added additional column(part) in the resulting set of rows that contains values 1 indicating that row belongs to the first part and 2 - second part.
select z.*
, case
when cnt_all <= 3000 and cnt <= 40
then 1
when (cnt_all between 3001 and 50000) and (cnt <= 10)
then 1
else 2
end part
from (select t.*
, 100*(count(col1) over(order by col1) / count(col1) over() )cnt
, count(col1) over() cnt_all
from split_rowset t
order by col1
) z
Demo #1 number of rows 3000.
Demo #2 number of rows 3500.
For better usability you can create a view using the query above and then query that view filtering by part column.
Demo #3 using of a view.