I have a table1 :
ZP age Sexe Count
A 40 0 5
A 40 1 3
C 55 1 2
And i want to add a column wich sum the count column by grouping the first two variables :
ZP age Sexe Count Sum
A 40 0 5 8
A 40 1 3 8
C 55 1 2 2
this is what i do :
CREATE TABLE table2 AS SELECT zp, age, SUM(count) FROM table1 GROUP BY zp, age
then :
CREATE TABLE table3 AS SELECT * FROM table1 NATURAL JOIN table2
But i have a feeling this is a sloppy way to do it. Do you know any better ways ? For example with no intermediates tables.
edit : i am using SQL through a proc sql in SAS
I'm not quite sure if there is a method for a single select statement but below will work without multiple create table statements:
data have;
length ZP $3 age 3 Sexe $3 Count 3;
input ZP $ age Sexe $ Count;
datalines;
A 40 0 5
A 40 1 3
C 55 1 2
;
run;
proc sql noprint;
create table WANT as
select a.*, b.SUM
from
(select * from HAVE) a,
(select ZP,sum(COUNT) as SUM from HAVE group by ZP) b
where a.ZP = b.ZP;
quit;
PROC SQL does not support enhanced SQL features like PARTITION.
But it looks like you want to include summarized data and detail rows at the same time? If that is the question then PROC SQL will do that for you automatically. If you include in your list of variables to select variables that are neither group by variables or summary statistics then SAS will automatically add in the needed re-joining of the summary statistics to the detail rows to produce the table you want.
proc sql;
SELECT zp, age, sexe, count, SUM(count)
FROM table1
group by zp, age
;
quit;
You can use SUM as follows with standard SQL:2003 syntax (I don't know if SAS accepts it):
SELECT zp, age, sexe, count, SUM(count) OVER (PARTITION BY zp, age)
FROM table1;
data have;
input ZP $ age Sexe Count;
datalines;
A 40 0 5
A 40 1 3
C 55 1 2
;
run;
proc sql;
create table want as select
*, sum(count) as sum
from have
group by zp, age;
quit;
Related
I have table like this:
NAME IDENTIFICATIONR SCORE
JOHN DB 10
JOHN IT NULL
KAL DB 9
HENRY KK 3
KAL DB 10
HENRY IP 9
ALI IG 10
ALI PA 9
And with select sentence I want that my result would be like only those names whose scores are 9 or above. So basically it means, that, for exaple, Henry cannot be selected, because he has score under the value of 9 in one line , but in the other he has the score of 3 (null values also should be emitted).
My newtable should look like this:
NAME
KAL
ALI
I'm using a sas program. THANK YOU!!
The COUNT of names will be <> COUNT of scores if there is a missing score. Requesting equality in the having clause will ensure no person with a missing score is in your result set.
proc sql;
create table want as
select distinct name from have
group by name
having count(name) = count(score) and min(score) >= 9;
here the solution
select name
from table name where score >= 9
and score <> NULL;
Select NAME from YOUR_TABLE_NAME name where SCORE > 9 and score is not null
You can do aggregation :
select name
from table t
group by name
having sum(case when (score < 9 or score is null) then 1 else 0 end) = 0;
If you want full rows then you can use not exists :
select t.*
from table t
where not exists (select 1
from table t1
where t1.name = t.name and (t1.score < 9 or t1.score is null)
);
You seem to be treated NULL scores as a value less than 9. You can also just use coalesce() with min():
select name
from have
group by name
having min(coalesce(score, 0)) >= 9;
Note that select distinct is almost never useful with group by -- and SAS proc sql probably does not optimize it well.
I am trying to create a list of percentages from a dataset of transactional data using SAS/SQL to understand how a specific department contributes to overall sales count for a given quarter. For example, if there were 100 sales of Store ID 234980 and 20 of those were in department a in Q4 of 2006, then the list should output:
Store ID 234980 , 20%.
This is the code I am using to achieve this result.
data testdata;
set work.dataset;
format PostingDate yyq.;
run;
PROC SQL;
CREATE TABLE aggregatedata AS
SELECT DISTINCT testdata.ID,
SUM(CASE
WHEN testdata.Store='A' THEN 1 ELSE 0
END)/COUNT(Store) as PERCENT,
PostingDate
FROM work.testdata
group by testdata.ID, testdata.PostingDate;
QUIT;
However, the output I am receiving is more like this:
StoreID DepartmentA Quarter
100 1 2014Q1
100 0 2014Q2
100 1 2014Q2
100 0 2014Q2
100 0 2014Q2
100 0 2014Q2
101 1 2015Q3
101 0 2015Q3
101 0 2015Q4
Why does my code not aggregate to the store level?
If you want to group by QTR then you need to transform your date values into quarter values. Otherwise '01JAN2017'd and '01FEB2017'd would be seen as two distinct values even though they would both display the same using the YYQ. format.
proc sql;
create table aggregatedata as
select id
, intnx('qtr',postingdate,0,'b') as postingdate format=yyq.
, sum(store='A')/count(store) as percent
from work.testdata
group by 1,2
;
quit;
You do not want to set both DISTINCT and GROUP BY
Perhaps try:
select t.testingdate
,t.StoreID
,t.Department
,count(t.*) / count(select t2.*
from testdata t2
where t.testingdate = t2.testingdate
and t.StoreID = t2.StoreID) AS Percentage
from testdata t
group by t.testingdate
,t.StoreID
,t.Department
Alternately you could use a left join, which may be more efficient. The nested select to count all records, regardless of department may be more clear to read.
I have multiple columns in a table but I only want the highest value from the columns to be selected in a sql.
Example Info:
D1 D2 D3 D4
----- ----- ----- -----
3 2 150 5
1 3 20 10
Output needs to be:
MaxPower
150
20
Anyone know a good way to do this? A single sql would be preferred but vba would work also.
select max(v) as maggiore from (
select id,d1 as v from table
union all
select id,d2 from table
union all
select id,d3 from table
union all
select id,d4 from table
) as t
group by id
How about select max(max(d1,d2), max(d3,d4)) from table?
I am trying to create a report that has a summary for each group. For example:
ID NAME COUNT TOTAL TYPE
-------------------------------------------------------------
1 Test 1 10 A
2 Test 2 8 A
18
7 Mr. Test 9 B
12 XYZ 4 B
13
25 ABC 3 C
26 DEF 5 C
19 GHIJK 1 C
9
I have a query that can do everything except the TOTAL columns:
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
from some_data sd, data_defs defs
where sd.data_def_id = defs.data_def_id
group by some_data.type, some_data.id, defs.data_nam
order by some_data.id asc, count(amv.MSG_ID) desc ;
I'm just not sure how to get a summary on a group. In this case, I'm trying to get a sum of COUNT for each group of ID.
UPDATE:
Groups are by type. Forgot that in the original post.
TOTAL is SUM(COUNT) for each group.
How about using ROLLUP like...
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
from some_data sd, data_defs defs
where sd.data_def_id = defs.data_def_id
group by ROLLUP(some_data.type, (some_data.id, defs.data_nam))
order by some_data.id asc, count(amv.MSG_ID) desc ;
This works for a similar example in my database, but I only did it over two columns, not sure how it will function over more...
Hope this is helpful,
Craig...
EDIT: In a ROLLUP, columns you want to sum over but not subtotal over like id and data_nam should be lumped together inside the ROLLUP in parantheses)
Assuming SQL*Plus, you could do something like this:
col d1 noprint
col d2 noprint
WITH q AS
(SELECT sd.id, count(sd.DATA_DEF_ID) COUNT, defs.data_name NAME, sd.type
FROM some_data sd JOIN data_defs defs ON (sd.data_def_id = defs.data_def_id)
GROUP BY some_data.type, some_data.id, defs.data_nam)
SELECT 1 d1, type d2, id, count, name FROM q
UNION ALL
SELECT 2, type, null, null, null, SUM(count) FROM q GROUP BY 2, type
ORDER BY 2,1,3;
I can't make this work in PL/SQL Developer 8, only SQL*Plus. Not even the command window will work...
Try a subquery that returns the count of all the items of the type. This would
select sd.id DATA_REF_NUM ID, count(sd.DATA_DEF_ID) COUNT, tot.TOTAL_FOR_TYPE, defs.data_name NAME, sd.type
from some_data sd, data_defs defs,
(select count(sd2.DATA_DEF_ID) TOTAL_FOR_TYPE
from some_data sd2
where sd2.type = sd.type) tot
where sd.data_def_id = defs.data_def_id
group by some_data.type, some_data.id, defs.data_nam
order by some_data.id asc, count(amv.MSG_ID) desc ;
Given a table (mytable) containing a numeric field (mynum), how would one go about writing an SQL query which summarizes the table's data based on ranges of values in that field rather than each distinct value?
For the sake of a more concrete example, let's make it intervals of 3 and just "summarize" with a count(*), such that the results tell the number of rows where mynum is 0-2.99, the number of rows where it's 3-5.99, where it's 6-8.99, etc.
The idea is to compute some function of the field that has constant value within each group you want:
select count(*), round(mynum/3.0) foo from mytable group by foo;
I do not know if this is applicable to mySql, anyway in SQL Server I think you can "simply" use group by in both the select list AND the group by list.
Something like:
select
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END,
count(*)
from Profiles
where 1=1
group by
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END
returns something like
column1 column2
---------- ----------
20and30 3
lessthan20 3
morethan30 13