Mean of variable at the selection (SAS) - sql

For expamle I have a table A with 2 variables, the first one is a customer id, and the second is the income of the customer which is in range from 100 to 200 US dollars. The task is to create a table B where I would have customers with mean of income 150 USD and the amount of customers should be maximal. In other words I need to have table B with the maximal amount of customers from table A and the mean of income among the customers of table B should be exactly equal to 150. Is there any elegant approach using SAS Enterprise Guide?

Sort the records by income, low to high. Then compute the mean of all records 1 - N. Find N where mean = 150.
data test;
do id = 1 to 1000;
income = 100 + round(ranuni(1)*100,1);
output;
end;
run;
proc sort data=test;
by income;
run;
data want(where=(ave<=150));
set test;
retain sum 0;
sum = sum + income;
ave = sum / _n_;
drop sum;
run;
You want as many low values as possible. This then lets you add large values to get the mean to 150. So sorting by income should give you what you want.

A greedy algorithm might do the job well-enough, depending on the structure of the data. This is definitely not guaranteed to be optimal, but it can be implemented relatively fast.
The idea is:
Calculate the average of all the records
If the average is $150 then stop
Remove the largest/smallest value to increase or decrease the average, as appropriate
If the average is $150 then stop
Repeat (1) until finished
This should work pretty well if the values cluster around $150. If they are widely dispersed, then you might not get any records in the final bins.
If the algorithm works on your data, then there may be faster ways of implementing it.

Related

Query smallest number of rows to match a given value threshold

I would like to create a query that operates similar to a cash register. Imagine a cash register full of coins of different sizes. I would like to retrieve a total value of coins in the fewest number of coins possible.
Given this table:
id
value
1
100
2
100
3
500
4
500
5
1000
How would I query for a list of rows that:
has a total value of AT LEAST a given threshold
with the minimum excess value (value above the threshod)
in the fewest possible rows
For example, if my threshold is 1050, this would be the expected result:
id
value
1
100
5
1000
I'm working with postgres and elixir/ecto. If it can be done in a single query great, if it requires a sequence of multiple queries no problem.
I had a go at this myself, using answers from previous questions:
Using ABS() to order by the closest value to the threshold
Select rows until a sum reduction of a single column reaches a threshold
Based on #TheImpaler's comment above, this prioritises minimum number of rows over minimum excess. It's not 100% what I was looking for, so open to improvements if anyone can, but if not I think this is going to be good enough:
-- outer query selects all rows underneath the threshold
-- inner subquery adds a running total column
-- window function orders by the difference between value and threshold
SELECT
*
FROM (
SELECT
i.*,
SUM(i.value) OVER (
ORDER BY
ABS(i.value - $THRESHOLD),
i.id
) AS total
FROM
inputs i
) t
WHERE
t.total - t.value < $THRESHOLD;

Join multiple tables in Microsoft SQL Server where there is only one line match from table 1 and multiple lines from table 2 and 3

I am stuck on something, which I have never used in my 10 years of SQL. I thought it would be useful if there was someway of doing this. Firstly I am running SQL Server Express (latest free version) on Windows. To talk to the database I am using SSMS.
There are three tables/queries.
1 table (A) has one data value I want to pull through.
2 tables (B)/(C) have multiple values.
Column common to all tables is CAMPAIGN NAME
Column common to (B)/(C) is PRODUCT NAME
This is an example of the data:
OUTPUT GOAL
I have tried the following:
UNION ALL (but this does not assist when I want to calculate AMOUNT - MARKETING - TOTAL INVESTMENT
I tried PARTITION (but I simple could now get it to work.
If I use joins, it brings through a head count / total investment and marketing cost per product, which when using SUM brings through the incorrect values for head count / total investment and marketing cost vs total amount, quantity.
I tried splitting the costs based on Quantity / Total Quantity or Amount / Total Amount, but the cost associated with the product is not correct or directly relating to the product this way.
Am I trying to do something impossible, or is there a way to do this in SQL?
The following comes pretty close to what you want:
select . . . -- select the columns you want here
from a join
b
on b.campaign_name = a.campaign_name join
c
on c.campaign_name = b.campaign_name and
c.product_name = b.product_name;
This produces a result set with a separate row for each campaign/product.

Confused on this assignment, any guidance?

The Problem:
First create a table called amttopay that has three fields: rec_no, idno and amt (make amount a numeric field that can hold 3 decimal places. You are also going to use a copy of the donor table for this assignment. Take in a number that matches an idno on the donor table. Check the yrgoal for that record. If it is larger than 500 then double it to create a new goal and write four records on the amttopay table containing the quarterly payment number (1 through 4), the idno, and the quarterly amount to pay to achieve the new goal. If it is not larger than 500 then add 50% to the goal to make the new goal and process it by writing the four records with the same information.
I've created the table, and I understand I've gotta write PL/SQL code to accomplish this, but what I'm not understanding is how the question is worded.
"If it is larger than 500 then double it to create a new goal and write four records on the amttopay table containing the quarterly payment number (1 through 4), the idno, and the quarterly amount to pay to achieve the new goal."
What does that mean? How would I go about bringing logic into this?
Thanks so much for the help.
Assuming you are trying to actually understand the question, this is how you would do it:
Break your statement into parts:
Check the yrgoal for that record.
If it is larger than 500 then
double it to create a new goal
and write four records on the amttopay table containing the quarterly payment number (1 through 4), the idno, and the quarterly amount to pay to achieve the new goal.
If it is not larger than 500 then
add 50% to the goal to make the new goal
and process it by writing the four records with the same information.
Simplified, this gives the following:
Create new record
if yrgoal>500 then
double yrgoal
Create 4 records with idnoand the quarterly amount
else
yrgoal * 1.5
Create 4 records as before
The rest is up to you, of course …

UPDATE with HAVING in duplicate values in Excel

Need help with this issue. I have a Develop, i need find the duplicate values in SQL, after need Sum the INVOICE_AMOUNT and Divide for individualy amount Example.
FA-0001 $25.00 BILL-0001
FA-0001 $75.00 BILL-0002.
I need SUM TOTAL of this invoice. SUM(AMOUNT_INVOICE)= $100.00, after divide this result with the individual amount. Example 100.00/25=0.25 , etc etc. and this percentage multiply for DET_SOL_AMOUNT.
I need apply this query in duplicate values.
I try with this query.
UPDATE [T4DET] SET [DET_SOL]=(([LOC_AMOUNT]/SUM([LOC_AMOUNT]))*[DET_SOL_CALC]) FROM [1WEB] WHERE [1WEB].[INVOICE] IN (SELECT [T4DET].[ASSIGNMENT] FROM [T4DET] GROUP BY [T4DET].[ASSIGNMENT] HAVING COUNT(*) > 1)
Thanks for your Help.
If I understood what you want to do correctly, it is easy with Excel. You need to write formulas in 2 columns only, for example:
Group Amount Bill No DET_SOL_CALC Sum of Group Result
FA-0001 $25.00 BILL-0001 2 100 0.5
FA-0001 $75.00 BILL-0002 2 100 1.5
FA-0002 $200.00 BILL-0001 5 600 1.666666667
FA-0002 $100.00 BILL-0002 5 600 0.833333333
FA-0002 $300.00 BILL-0003 5 600 2.5
Put your data in columns A, B and C
ColumnD: DET_SOL_CALC
Column E formula should be: =SUMIF($A$2:$C$6,A2,$B$2:$B$6)
Column F formula should be: =B2/E2*D2
Row 1 is headers of your data
put these formulas in row to and drag them down to the last row of your data, your numbers should be calculated correctly.
Please hit the check mark if this is your answer!
The alter Solution is, Create a Temporal Table with SUM and GROUP BY and agregate three columns for calculations
Example
DET4TEMP
ASSINGMENT NVARCHAR
DOC_AMOUNT MONEY
INSERT INTO 4DETTEMP (ASSINGNMENT,[TOTAL]) ASSIGNMENT, SUM(DOC_AMOUNT) FROM FBL5N GROUP BY ASSIGNMENT
and after query is+
Obtain DET SOL Amount in the other table.
UPDATE 4BET SET DET_SOL_CAL=T2.INCOMING_AMOUNT FROM FBL5N T2 WHERE ASSIGNMENT=T2.INV_CON
Obtain DOC AMOUNT TOTAL of the temporal table.
UPDATE 4BET SET DOC_AMNT_TOTAL=T2.[TOTAL] FROM 4DETTEMP T2 WHERE ASSIGNMENT=T2.ASSIGNMENT
Obtain the Calculation Percentage.
UPDATE 4BET PERC_CAL_AMNT=(DOC_AMNT_TOTAL/DOC_AMNT), DET_SOL=(PERC_CAL_AMNT*DET_SOL_CALC)
after delete temp tables and finish.
This is my solution. The question is Viable?

How to insert uneven data rows into matrix in SAS?

I have an originations data set with loan ids. I then have a corresponding dataset with performance data for each of these loans ids, which can be anywhere from 10-40 rows in the performance data set.
The start date of each of the performance loans is not the same either, although some do overlap. What I want to do is take every loan id group in the performance data set, and then create a row of a certain column value across all occurrences in the data set. It doesn't matter if they start on different dates, I just want to align the values as this is the first value for loan id x and y.
For example:
ID Date Val
3 201601 100
3 201602 102
3 201603 103
--> Result:
ID Val1 Val2 Val3
3 100 102 103
I'm having two issues. One is the differing size of performance data for each id. I can't construct a matrix with differing lengths of rows. I'm assuming I'll need to append 0's to the end of each row to meet a predefined width.
My second issue is that I'm not sure how to read through a the performance data set to group loans, extract the value column, construct the column into a row for that id, and then insert into a matrix. I know how I would do this in Python but I need to use SAS. I can construct tables in SAS, but I'm not sure how to append rows, only columns.
If someone could provide some guidance on this it'd be a great help.
Anyone who runs into a similar issue it ended up being only a few lines of code.
proc transpose data = new_data
out = new_data1;
var trans_state;
by id;
run;
The output will be