SQL count the number of rows based on multiple column value - sql

I have a table as shown below. How do I write the SQL code if I want to count the number of times the row (e.g. X = A, Y = Burger) appear and return as Z? Thanks
Select * X, Y
from DataBase
Results :
X Y Z(to be determined..)
--------------------
A Burger 2
A Burger 2
A Fries 1
B Burger 2
B Pie 1
B Burger 2
C Pie 2
C Pie 2
C Burger 1
. . .
. . .
. . .

You can do:
select X,Y,count(*) from Table group by X,Y

You are looking for a window function:
select t.*, count(*) over (partition by x, y) as z
from DataBase t;

Related

SAS - PROC SQL- Assign a value to a column based on condition based on another columns

I want to assign a value in new_col based on value in column 'ind' when months = 1;
idnum1 months ind new_col
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
Below query just assign the value X where months = 1 but I want in all the rows of new_col for all the id -
create table tmp as
select t1.*,
case when months = 1 then ind end as new_col
from table t1;
I am trying to do it in SAS using proc sql;
Ideally you would use RETAIN within a data step:
data want;
set have;
retain new_var;
if month=1 then new_var = ind;
run;
SQL isn't as good with this as a data step.
But assuming your variable ID is repeated then this would work. If it's not then you really do need the data step approach.
proc sql;
create table want as
select *, max(ind) as new_col
from have
group by ID;
quit;
EDIT: If you want to retain the first per ID just use FIRST. instead of If month =1.
data want;
set have;
by ID;
retain new_var;
if first.id then new_var = ind;
run;
A robust Proc SQL statement that deals with possibly repeated first month situations that chooses the lowest ind to distribute to the group
data have; input
idnum1 months ind $ new_col $; datalines;
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
3 1 Z .
3 1 Y .
3 1 X .
3 2 A .
;
create table want as
select
have.idnum1, months, ind, new_col, lowest_first_ind
from
have
join
( select idnum1, min(ind) as lowest_first_ind from
(
select idnum1, ind
from have
group by idnum1
having months = min(months)
)
group by idnum1
) value_seeker
on
have.idnum1 = value_seeker.idnum1
;
You can use a window function:
select t1.*,
max(case when months = 1 then ind end) over (partition by id) as new_col
from t1;
If there is only one MONTH=1 observation per BY group then just use a simple join.
create table WANT as
select t1.*,t2.ind as new_col
from table t1
left join (select idnum1,ind from table where month=1) t2
on t1.idnum1 = t2.idnum1
;

Aggregate a column sequentially based on distinct values

I have a standard query that looks like this:
Select
X
From Y
Left Join Z
On ....
Where A and B
Which gives me a table as such:
Product Type Product Sub-type Process Sequence_Number
X X.1 A 1
X X.1 C 2
X X.1 D 3
X X.2 A 1
X X.2 B 2
X X.2 C 3
X X.2 D 4
X X.2 E 5
I want to aggregate all of the product sub-type processes for a specific product type X to arrive at a consolidated process list such as:
Product Type Process
X A
X B
X C
X D
X E
As you can see the commonalities between sub-type processes are there, and gaps in sub-type x.1 have been filled in by x.2 processes.
Your results can be generated by using select distinct:
select distinct product_type, process
from . . .
where . . .;

SQL query for counting sets of values

Added more information to clear up some confusions. Thanks.
I am trying to group sets of values in SQL. I have the following table and trying to somehow get the results as shown in the following table. I have explored group sets in SQL 2008, cubes, basic group by clauses, but I am not able to figure out the SQL query. Can someone please help. You can change the end resultant table format if you want but the basic idea is about how to count similar sets of values. In this table a,b,c exists 2 times so the count is 2 and x,y exists 3 times so the count is 3 and x, y, z exists 1 time so the count is 1. Please help.
UserId ProductId
1 a
1 b
1 c
2 x
2 y
3 x
3 y
4 x
4 y
5 a
5 b
5 c
6 x
6 y
6 z
ProductId Count
a 2
b 2
c 2
x 3
y 3
x 1
y 1
z 1
SELECT COUNT(`ProductId`),`ProductId ` WHERE 1 GROUP BY `ProductId` ORDER BY `ProductId` ASC
SELECT ProductId, COUNT(UserId) AS NbrOfUsers
FROM TABLE_NAME
GROUP BY ProductId, COUNT(UserId)
You're selecting ProductId & the count of how many UserId exist for that ProductId.
GROUP BY ProductId will group your counted UserId based on ProductId and also display the count as NbrOfUsers.
Your output will look like this:
ProductId NbrOfUsers
a 2
b 2
c 2
x 3
y 3

SQL query: same rows

I'm having trouble finding the right sql query. I want to select all the rows with a unique x value and if there are rows with the same x value, then I want to select the row with the greatest y value. As an example I've put a part of my database below.
ID x y
1 2 3
2 1 5
3 4 6
4 4 7
5 2 6
The selected rows should then be those with ID 2, 4 and 5.
This is what I've got so far
SELECT *
FROM base
WHERE x IN
(
SELECT x
FROM base
HAVING COUNT(*) > 1
)
But this only results in the rows that occur more than once. I've added the tags R, postgresql and sqldf because I'm working in R with those packages.
Here is a typical way to formulate the query in ANSI SQL:
select b.*
from base b
where not exists (select 1
from base b2
where b2.x = b.x and
b2.y > b.y
);
In Postgres, you would use distinct on for performance:
select distinct on (x) b.*
from base b
order by x, y desc;
You could try this query:
select x, max(y) from base group by x;
And, if you'd also like the id column in the result:
select base.*
from base join (select x, max(y) from base group by x) as maxima
on (base.x = maxima.x and base.y = maxima.max);
Example:
CREATE TABLE tmp(id int, x int ,y int);
INSERT INTO .....
test=# SELECT x, max(y) AS y FROM tmp GROUP BY x;
x | y
---+---
4 | 7
1 | 5
2 | 6

PLSQL or SSRS, How to select having all values in a group?

I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.