SUM by Combination in SAS - sql

I want to get from this table:
[ProductCode] [ClientNO] [Fund]
11 3 100
12 4 45
11 3 18
12 4 5
To this one:
[ProductCode] [ClientNO] [Fund]
11 3 118
12 4 50
So basically sum FUND when all the given variables match.
I'm almost there with this statement:
Proc sql;
create table SumByCombination as
select *, sum(Fund) as Total
from FundsData
group by ProductCode,ClientNO
;
quit;
But with this I get all the rows (duplicates) with a SUM column.
Edit: This is what I get.
[ProductCode] [ClientNO] [_SUM_]
11 3 118
12 4 50
11 3 118
12 4 50
I know this should be a no-brainer but I keep getting stuck.
What would be the easiest way to do this in Proc SQL ? What about other methods ?
Thanks

Stop using SELECT * in your queries. You should explicitly identify the columns that you want the SELECT to return.
Select * is nasty and evil and should very very rarely, if ever, be used.
Here is the SQL Fiddle, which returns your expected result
select ProductCode
,ClientNO
,sum(Fund) as Total
from FundsData
group by
ProductCode
,ClientNO

You're using SAS, so do it the SAS way - PROC MEANS.
proc means data=fundsdata;
var fund;
class productcode clientno;
types productcode*clientno;
output out=sumbycombination sum(fund)=fund;
run;

Related

SQL Group by joining with time difference

I have this college project with a good focus on the frontend, but I'm struggling with a SQL query (PostgreSQL) that needs to be executed at one of the backend endpoints.
The table I'm speaking of is the following:
id
todo_id
column_id
time_in_status
0
259190
3
0
1
259190
10300
30
2
259190
10001
60
3
259190
10600
90
4
259190
6
30
A good way to simplify what it is, is saying it's a to-do organizer by vertical columns where each column would be represented by its column_id, and each row is task column change event.
With all that said what I need to get the job done is to generate a view (or another suggested better way) from this table that will show how long each task spent on each column_id. Also for a certain todo_id, column_id is not unique, so that could be multiple events on column 10300 and the table below would group by it and sum them
For example, the table above would output a view like this:
id
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
...
0
259190
0
30
60
...
select *
from crosstab(
'select todo_id, id, time_in_status
from t'
)
as t(todo_id int, "time_in_column_3" int, "time_in_column_10300" int, "time_in_column_10001" int, "time_in_column_10600" int, "time_in_column_6" int )
todo_id
time_in_column_3
time_in_column_10300
time_in_column_10001
time_in_column_10600
time_in_column_6
259190
0
30
60
90
30
Fiddle

How to select last element for each ID

I would like select some elements from the last id
Here an example that I have :
id money
1 200
1 150
1 500
3 50
4 40
4 300
5 110
Here what I would like :
1 500
3 50
4 300
5 110
So like you can see, I took last id and the money who corresponds.
I tried to do a group by id order by id descending with limit 1. But limit 1 is not available in proc sql from sas and it doesn't work.
Thanks in advance
Unlike SAS datasets, SQL tables represent unordered sets. In your case, it looks like you want the maximum value in the second column, in which case you can use aggregation:
proc sql;
select id, max(money)
from t
group by id;
If you actually mean the last row per id based on the ordering in the SAS dataset, I would suggest using a data step instead.

How can i get incremental counter with sql?

Can you help me with sql query to get the desired result
Database used :- Redshift
requirement is
I have 3 columns as:- dish_id,cateogory_id,counter
So i want counter to increase +1 if the dish_id is repeated and if not it should remain 1
the query i need should be able to query the source table and get the results as
dish_id category_id counter
21 4 1
21 6 2
21 6 3
12 1 1
Unless I missunderstood your question, you can accomplish that using window functions:
SELECT *,row_number() OVER (PARTITION BY dish_id) FROM my_table;

Counting instances of an entry in one table from another table in SQL

I have two sql tables site and sms
The structure is ROUGHLY like
SENEGAL_SITE:
siteID Lon Lat
1 11.232 12.32
2 12.232 12.42
3 11.232 12.62
4 11.232 11.42
ATA_SMS_Apr:
out_going_site_id inSite no_sms
4 1 65
2 4 21
3 4 54
i want to query out a result somthing like this
site id SMS_Site_count
1 5
2 3
3 1
So basically I want to count the number of sms through each site tower
The query which I used to do this is this
select * ,
count((select *
from ATA_SMS_Apr a
where s.site_id=a.out_going_site_id))
from SENEGAL_SITE s
Doing this I get a error as Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
you need a correlated subquery to get the sms count from ATA_SMS_Apr table for each site id
The count is moved inside the subquery from outside.
select * ,
(select count(*)
from ATA_SMS_Apr a
where s.site_id=a.out_going_site_id) as SMS_Site_Count
from SENEGAL_SITE s

SQL Creating new variables

I am fairly inexperienced with SQL, but am working to try to condense my code into one query so that it is more efficient. Below is a simplified example of a much more complex problem I have. I am having problems with the syntax of creating the summary groups and variables. In my case, the data are housed in several different table, but the joins are not a problem for me so I have only created one table here.
This is the data I have:
Name Class Wk Score ExCred X
Joe A 1 35 ? 3
Hal A 1 50 5 4
Sal A 1 45 ? 3
Kim B 1 30 5 6
Cal B 1 40 ? 6
Joe A 2 50 ? 2
Hal A 2 40 ? 3
Sal A 2 40 ? 4
Kim B 2 40 5 5
Cal B 2 40 ? 4
The table I am trying to create will look like this:
Class Wk Avg_Score Sum_X
A 1 45 10
B 1 37.5 12
A 2 43.3 9
B 2 42.5 9
So, the data are summarized by class and week. The avg_score is the average of the sum and 'score' and 'ExCred' for each student. Sum_X is simply the sum of X for each class.
I have had success with this in SAS SQL by using multiple proc means statements, but this is clunky and seems to take a really long time. There has to be a more elegant way to do this. I know it probably involves the group by statement..... Help?
Thanks. Pyll
I see no particular reason not to use proc means here. It should be significantly faster than proc sql on datasets of substantial size.
proc means data=have;
class class wk;
types class*wk;
var score x;
output out=want mean(score)= sum(x)=;
run;
Just preprocess the data to include ExCred into the Score variable; if execution time is an issue use a view to do so.
If you did want to do it in sql, you would indeed use a group by.
proc sql;
create table want as
select class, wk, mean(score+ex_cred), sum(x)
from have
group by class, wk;
quit;