Count number of cases and add row to sum average cost - sql

I have a large data set that I'm trying to export in a way I've never done before. There are dozens of columns with flags (0 or 1) to indicate whether a person has that trait. At the end each record has a total cost which sums up all money associated with that person. Sample below
ID
Visit
Stay
Treatment
Total Cost
1
0
1
1
$50
2
1
0
1
$100
I'm trying to get it into a format like so:
Visit
Stay
Treatment
1
1
2
$100
$50
$75
So that number of flags is summed up per column and the average cost is below that. Hence, there's two treatment and the average cost is $75, there's one stay with an average cost of $100.
I've tried GROUPING BY and a few other functions, but haven't been successful. Any help would be greatly appreciated!

We add up all we need and then use union all to unpivot.
select sum(visit) as visit
,sum(stay) as stay
,sum(treatment) as treatment
from t
union all
select sum(visit*total_cost)/sum(visit)
,sum(stay*total_cost)/sum(stay)
,sum(treatment*total_cost)/sum(treatment)
from t
visit
stay
treatment
1
1
2
100
50
75
Fiddle

Related

is there a way to calculate a transaction change in SQL?

I have a table that looks something like this:
sender
reciever
amount
1
2
10
2
1
20
3
2
20
1
3
30
The desired output should be:
user
Trans_Change
1
-20
2
10
3
10
i can't find a way to write a query for it in SQL.
the logic behind the desired output should be that;
1 sends 2 amount of 10, so now 1 has: -10 and 2 has: +10 and so on...
Best Guess given known info:
We simply assign all senders negative transaction amounts union all the receivers as positive amounts and then group the data summing the transactions
With CTE AS (
SELECT sender as aUser, (-1 * amount) as Trans_Change -- Senders lose money
FROM table
UNION ALL
SELECT Receiver as aUser, amount -- receivers get money
FROM Table)
SELECT aUser, sum(Trans_Change) as Trans_Change -- aggregate transaction totals by user
FROM CTE
GROUP BY aUser
Part of addressing this is acknowledging that an amount is being used twice: once for the sender as a negative, once for a receiver as a positive (or credit/debit if you prefer) Realizing this I knew I needed to get that value on two separate rows. selecting the data twice allowed for this. Using two selects and a union all allows us to get that value twice and then it's a simple aggregration.

Query smallest number of rows to match a given value threshold

I would like to create a query that operates similar to a cash register. Imagine a cash register full of coins of different sizes. I would like to retrieve a total value of coins in the fewest number of coins possible.
Given this table:
id
value
1
100
2
100
3
500
4
500
5
1000
How would I query for a list of rows that:
has a total value of AT LEAST a given threshold
with the minimum excess value (value above the threshod)
in the fewest possible rows
For example, if my threshold is 1050, this would be the expected result:
id
value
1
100
5
1000
I'm working with postgres and elixir/ecto. If it can be done in a single query great, if it requires a sequence of multiple queries no problem.
I had a go at this myself, using answers from previous questions:
Using ABS() to order by the closest value to the threshold
Select rows until a sum reduction of a single column reaches a threshold
Based on #TheImpaler's comment above, this prioritises minimum number of rows over minimum excess. It's not 100% what I was looking for, so open to improvements if anyone can, but if not I think this is going to be good enough:
-- outer query selects all rows underneath the threshold
-- inner subquery adds a running total column
-- window function orders by the difference between value and threshold
SELECT
*
FROM (
SELECT
i.*,
SUM(i.value) OVER (
ORDER BY
ABS(i.value - $THRESHOLD),
i.id
) AS total
FROM
inputs i
) t
WHERE
t.total - t.value < $THRESHOLD;

UPDATE with HAVING in duplicate values in Excel

Need help with this issue. I have a Develop, i need find the duplicate values in SQL, after need Sum the INVOICE_AMOUNT and Divide for individualy amount Example.
FA-0001 $25.00 BILL-0001
FA-0001 $75.00 BILL-0002.
I need SUM TOTAL of this invoice. SUM(AMOUNT_INVOICE)= $100.00, after divide this result with the individual amount. Example 100.00/25=0.25 , etc etc. and this percentage multiply for DET_SOL_AMOUNT.
I need apply this query in duplicate values.
I try with this query.
UPDATE [T4DET] SET [DET_SOL]=(([LOC_AMOUNT]/SUM([LOC_AMOUNT]))*[DET_SOL_CALC]) FROM [1WEB] WHERE [1WEB].[INVOICE] IN (SELECT [T4DET].[ASSIGNMENT] FROM [T4DET] GROUP BY [T4DET].[ASSIGNMENT] HAVING COUNT(*) > 1)
Thanks for your Help.
If I understood what you want to do correctly, it is easy with Excel. You need to write formulas in 2 columns only, for example:
Group Amount Bill No DET_SOL_CALC Sum of Group Result
FA-0001 $25.00 BILL-0001 2 100 0.5
FA-0001 $75.00 BILL-0002 2 100 1.5
FA-0002 $200.00 BILL-0001 5 600 1.666666667
FA-0002 $100.00 BILL-0002 5 600 0.833333333
FA-0002 $300.00 BILL-0003 5 600 2.5
Put your data in columns A, B and C
ColumnD: DET_SOL_CALC
Column E formula should be: =SUMIF($A$2:$C$6,A2,$B$2:$B$6)
Column F formula should be: =B2/E2*D2
Row 1 is headers of your data
put these formulas in row to and drag them down to the last row of your data, your numbers should be calculated correctly.
Please hit the check mark if this is your answer!
The alter Solution is, Create a Temporal Table with SUM and GROUP BY and agregate three columns for calculations
Example
DET4TEMP
ASSINGMENT NVARCHAR
DOC_AMOUNT MONEY
INSERT INTO 4DETTEMP (ASSINGNMENT,[TOTAL]) ASSIGNMENT, SUM(DOC_AMOUNT) FROM FBL5N GROUP BY ASSIGNMENT
and after query is+
Obtain DET SOL Amount in the other table.
UPDATE 4BET SET DET_SOL_CAL=T2.INCOMING_AMOUNT FROM FBL5N T2 WHERE ASSIGNMENT=T2.INV_CON
Obtain DOC AMOUNT TOTAL of the temporal table.
UPDATE 4BET SET DOC_AMNT_TOTAL=T2.[TOTAL] FROM 4DETTEMP T2 WHERE ASSIGNMENT=T2.ASSIGNMENT
Obtain the Calculation Percentage.
UPDATE 4BET PERC_CAL_AMNT=(DOC_AMNT_TOTAL/DOC_AMNT), DET_SOL=(PERC_CAL_AMNT*DET_SOL_CALC)
after delete temp tables and finish.
This is my solution. The question is Viable?

Mean of variable at the selection (SAS)

For expamle I have a table A with 2 variables, the first one is a customer id, and the second is the income of the customer which is in range from 100 to 200 US dollars. The task is to create a table B where I would have customers with mean of income 150 USD and the amount of customers should be maximal. In other words I need to have table B with the maximal amount of customers from table A and the mean of income among the customers of table B should be exactly equal to 150. Is there any elegant approach using SAS Enterprise Guide?
Sort the records by income, low to high. Then compute the mean of all records 1 - N. Find N where mean = 150.
data test;
do id = 1 to 1000;
income = 100 + round(ranuni(1)*100,1);
output;
end;
run;
proc sort data=test;
by income;
run;
data want(where=(ave<=150));
set test;
retain sum 0;
sum = sum + income;
ave = sum / _n_;
drop sum;
run;
You want as many low values as possible. This then lets you add large values to get the mean to 150. So sorting by income should give you what you want.
A greedy algorithm might do the job well-enough, depending on the structure of the data. This is definitely not guaranteed to be optimal, but it can be implemented relatively fast.
The idea is:
Calculate the average of all the records
If the average is $150 then stop
Remove the largest/smallest value to increase or decrease the average, as appropriate
If the average is $150 then stop
Repeat (1) until finished
This should work pretty well if the values cluster around $150. If they are widely dispersed, then you might not get any records in the final bins.
If the algorithm works on your data, then there may be faster ways of implementing it.

MDX query to count number of rows that match a certain condition (newest row for each question, client group)

I have the following fact table:
response_history_id client_id question_id answer
1 1 2 24
2 1 2 27
3 1 3 12
4 1 2 43
5 2 2 39
It holds history of client answers to some questions. The largest response_history_id for each client_id,question_id combination is the latest answer for that question and client.
What I want to do is to count the number of clients whose latest answer falls within a specific range
I have some dimensions:
question associated with question_id
client associated with client_id
response_history_id associated with response_history_id
range associated with answer. 0-20 low, 20-40 = medium, >40 is high
and some measures:
max_history_id as max(response_history_id)
clients_count as disticnt count(client_id)
Now, I want to group only the latest answers by range:
select
[ranges].members on 0,
{[Measures].[clients_count]} on 1
from (select [question].[All].[2] on 1 from [Cube])
What I get is:
Measures All low medium high
clients_count 2 0 2 1
But what I wanted (and I can't get) is the calculation based on the latest answer:
Measures All low medium high
clients_count 2 0 1 1
I understand why my query doesn't give me the desired result, it's more for demonstration purpose. But I have tried a lot of more complex MDX queries and still couldn't get any good result.
Also, I can't generate a static view from my fact table because later on I would like to limit the search by another column in fact table which is timestamp, my queries must eventually be able to get _the number of clients whose latest answer to a question before a given timestamp falls within a specific range.
Can anyone help me with this please?
I can define other dimensions and measures and I am using iccube.