Postgresql aggregate array - sql

I have a two tables
Student
--------
Id Name
1 John
2 David
3 Will
Grade
---------
Student_id Mark
1 A
2 B
2 B+
3 C
3 A
Is it possible to make native Postgresql SELECT to get results like below:
Name Array of marks
-----------------------
'John', {'A'}
'David', {'B','B+'}
'Will', {'C','A'}
But not like below
Name Mark
----------------
'John', 'A'
'David', 'B'
'David', 'B+'
'Will', 'C'
'Will', 'A'

Use array_agg: http://www.sqlfiddle.com/#!1/5099e/1
SELECT s.name, array_agg(g.Mark) as marks
FROM student s
LEFT JOIN Grade g ON g.Student_id = s.Id
GROUP BY s.Id
By the way, if you are using Postgres 9.1, you don't need to repeat the columns on SELECT to GROUP BY, e.g. you don't need to repeat the student name on GROUP BY. You can merely GROUP BY on primary key. If you remove the primary key on student, you need to repeat the student name on GROUP BY.
CREATE TABLE grade
(Student_id int, Mark varchar(2));
INSERT INTO grade
(Student_id, Mark)
VALUES
(1, 'A'),
(2, 'B'),
(2, 'B+'),
(3, 'C'),
(3, 'A');
CREATE TABLE student
(Id int primary key, Name varchar(5));
INSERT INTO student
(Id, Name)
VALUES
(1, 'John'),
(2, 'David'),
(3, 'Will');

What I understand you can do something like this:
SELECT p.p_name,
STRING_AGG(Grade.Mark, ',' ORDER BY Grade.Mark) As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
EDIT
I am not sure. But maybe something like this then:
SELECT p.p_name, 
    array_to_string(ARRAY_AGG(Grade.Mark),';') As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
Reference here

You could use the following:
SELECT Student.Name as Name,
(SELECT array(SELECT Mark FROM Grade WHERE Grade.Student_id = Student.Id))
AS ArrayOfMarks
FROM Student
As described here: http://www.mkyong.com/database/convert-subquery-result-to-array/

Michael Buen got it right. I got what I needed using array_agg.
Here just a basic query example in case it helps someone:
SELECT directory, ARRAY_AGG(file_name)
FROM table
WHERE type = 'ZIP'
GROUP BY directory;
And the result was something like:
| parent_directory | array_agg |
+-------------------------+----------------------------------------+
| /home/postgresql/files | {zip_1.zip,zip_2.zip,zip_3.zip} |
| /home/postgresql/files2 | {file1.zip,file2.zip} |
This post also helped me a lot: "Group By" in SQL and Python Pandas.
It basically says that it is more convenient to use only PSQL when possible, but that Python Pandas can be useful to achieve extra functionalities in the filtering process.

Related

SQL select query order by on where in

How Can I make the order by based on what I input on where?
example query
select * from student where stud_id in (
'5',
'3',
'4'
)
the result would be
id| name |
5 | John |
3 | Erik |
4 | Michael |
Kindly help me thanks.
One method is with a derived table:
select s.*
from student s cross join
(values (5, 1), (3, 2), (4, 3)
) v(stud_id, ord)
on v.stud_id = s.stud_in
order by v.ord;
stud_id looks like a number so I dropped the single quotes. Numbers should be compared to numbers. If it is really a string, then use the single quotes.
As Gordon mentioned, you need something to provide order. An IN clause doesn't have a pre-defined order, just like a table doesn't. Rather than numbering the row order yourself, you could have a table variable do it like this:
DECLARE TABLE #StudentIDs
(
StudentIDKey int IDENTITY(1,1) PRIMARY KEY,
StudentID int
);
INSERT #StudentIDs (StudentID)
VALUES
(5),
(3),
(4);
SELECT *
FROM Student AS s
INNER JOIN #StudentIDs AS id
ON s.StudentID = id.StudentID
ORDER BY id.StudentIDKey;
That should be far easier if you have a lot of values to work with.
Hope that helps.

Convert Comma separated ids into its assigned values

I am writing a view for the Data export feature ,So basically they need view all the columns with data associated to it.
I have a column in a table Languages Spoken and we are storing values as comma separated list 1,2,3 ....etc.,
where as 1 is english , 2 germany ,3 Spanish etc. this value is stored in different table.
StaffID LanguagesSpoken
---------- -------------
1 1,2,3
2 3,4
3 2,5
So when we want to view the the expected out should be
StaffID LanguagesSpoken
---------- -------------
1 English, Germany, Spanish
2 Spanish,Hindi
3 Germany,Arabic
You can use the following to split the LanguagesSpoken string, do a join with Language table and use string_agg to get what you want. As mentioned by others your schema design needs to be fixed so this will help you get the data into the new schema also:
SELECT StaffID, value
FROM StaffLanguagesSpoken
CROSS APPLY string_split(LanguagesSpoken, ",")
For a table containing the languages like this:
CREATE TABLE languages (
id INTEGER,
name VARCHAR(20)
);
INSERT INTO languages
(id, name)
VALUES
('1', 'English'),
('2', 'Germany'),
('3', 'Spanish'),
('4', 'Hindi'),
('5', 'Arabic');
you can join the tables, group by StaffID and use string_agg():
select
t.StaffID,
string_agg(l.name, ',') within group (order by l.id) LanguagesSpoken
from tablename t inner join languages l
on concat(',', t.languagesspoken, ',') like concat('%,', l.id, ',%')
group by t.StaffID
See the demo.
Results:
> StaffID | LanguagesSpoken
> ------: | :----------------------
> 1 | English,Germany,Spanish
> 2 | Spanish,Hindi
> 3 | Germany,Arabic

SQL - EXIST OR ALL?

I have two different table student and grades;
grade table has an attribute student_id which references student_id from student table.
How do I find which student has every grade that exists?
If this is not clear,
Student ID Name
1 1 John
2 2 Paul
3 3 George
4 4 Mike
5 5 Lisa
Grade Student_Id Course Grade
1 1 Math A
2 1 English B
3 1 Physics C
4 2 Math A
5 2 English A
6 2 Physics B
7 3 Economics A
8 4 Art C
9 5 Biology A
Assume there is only grade a,b,c (no d, e or fail)
I want to find only John because He has grade a,b,c while
other student like Paul(2) should not be selected because he does not have grade c. It does not matter which course he took, I just need to find if he has all the grades out there available.
Feel like I should something like exist or all function in sql but not sure.
Please help. Thank you in advance.
I would use GROUP BY and HAVING, but like this:
SELECT s.Name
FROM Student s JOIN
Grade g
ON s.ID = g.Student_Id
GROUP BY s.id, s.Name
HAVING COUNT(DISTINCT g.Grade) = (SELECT COUNT(DISTINCT g2.grade) FROM grade g2);
You say "all the grades out there", so the query should not use a constant for that.
You can use HAVING COUNT(DISTINCT Grade) = 3 to check that the student has all 3 grades:
SELECT Name
FROM Student S
JOIN Grade G ON S.ID = G.Student_Id
GROUP BY Name
HAVING COUNT(DISTINCT Grade) = 3
Guessing at S.ID vs S.Student on the join. Not sure what the difference is there.
By using exists
select * from student s
where exists ( select 1
from grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Example
with Student as
(
select 1 as id,'John' as person
union all
select 2 as id,'Paul' as person
union all
select 3 as id,'jorge'
),
Grades as
(
select 1 as Graden, 1 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 2 as Graden, 1 as Student_Id, 'English' as Course, 'B' as Grade
union all
select 3 as Graden, 1 as Student_Id, 'Physics' as Course, 'C' as Grade
union all
select 4 as Graden, 2 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 5 as Graden, 2 as Student_Id, 'English' as Course, 'A' as Grade
union all
select 6 as Graden, 2 as Student_Id, 'Physics' as Course, 'B' as Grade
)
select * from Student s
where exists ( select 1
from Grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Note having count(distinct Grade)=3 i used this as in your sample data grade type is 3
Before delving into the answer, here's a working SQL Fiddle Example so you can see this in action.
As Gordon Linoff points out in his excellent answer, you should use GroupBy and Having Count(Distinct ... ) ... as an easy way to check.
However, I'd recommend changing your design to ensure that you have tables for each concern.
Currently your Grade table holds each student's grade per course. So it's more of a StudentCourse table (i.e. it's the combination of student and course that's unique / gives you that table's natural key). You should have an actual Grade table to give you the list of available grades; e.g.
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
This then allows you to ensure that your query would still work if you decided to include grades D and E, without having to amend any code. It also ensures that you only have to query a small table to get the complete list of grades, rather than a potentially huge table; so will give better performance. Finally, it will also help you maintain good data; i.e. so you don't accidentally end up with students with grade X due to a typo; i.e. since the validation/constraints exist in the database.
select Name from Student s
where s.Id in
(
select sc.StudentId
from StudentCourse sc
group by sc.StudentId
having count(distinct sc.Grade) = (select count(Code) from Grade)
)
order by s.Name
Likewise, it's sensible to create a Course table. In this case holding Ids for each course; since holding the full course name in your StudentCourse table (as we're now calling it) uses up a lot more space and again lacks validation / constraints. As such, I'd propose amending your database schema to look like this:
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
create table Course
(
Id bigint not null identity(1,1) constraint PK_Course primary key clustered
, Name nvarchar(128) not null constraint UK_Course_Name unique
)
insert Course (Name) values ('Math'),('English'),('Physics'),('Economics'),('Art'),('Biology')
create table Student
(
Id bigint not null identity(1,1) constraint PK_Student primary key clustered
,Name nvarchar(128) not null constraint UK_Student_Name unique
)
set identity_insert Student on --inserting with IDs to ensure the ids of these students match data from your question
insert Student (Id, Name)
values (1, 'John')
, (2, 'Paul')
, (3, 'George')
, (4, 'Mike')
, (5, 'Lisa')
set identity_insert Student off
create table StudentCourse
(
Id bigint not null identity(1,1) constraint PK_StudentCourse primary key
, StudentId bigint not null constraint FK_StudentCourse_StudentId foreign key references Student(Id)
, CourseId bigint not null constraint FK_StudentCourse_CourseId foreign key references Course(Id)
, Grade char /* allow null in case we use this table for pre-results; otherwise make non-null */ constraint FK_StudentCourse_Grade foreign key references Grade(Code)
, Constraint UK_StudentCourse_StudentAndCourse unique clustered (StudentId, CourseId)
)
insert StudentCourse (StudentId, CourseId, Grade)
select s.Id, c.Id, x.Grade
from (values
('John', 'Math', 'A')
,('John', 'English', 'B')
,('John', 'Physics', 'C')
,('Paul', 'Math', 'A')
,('Paul', 'English', 'A')
,('Paul', 'Physics', 'B')
,('George', 'Economics','A')
,('Mike', 'Art', 'C')
,('Lisa', 'Biology', 'A')
) x(Student, Course, Grade)
inner join Student s on s.Name = x.Student
inner join Course c on c.Name = x.Course

sql select sum conditions

I'm studying sql (by myself) and I would like to know how I would do for these examples:
1- i'd create this 3 tables bellow:
CREATE TABLE Business (
Id INT,
Category INT,
Business_Name VARCHAR(30),
City_Id INT,
Billing INT
);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(1, 1, 'Bread', 1, 50);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(2, 2, 'Oreo', 2, 10);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(3, 2, 'Pizza', 3, 15);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(4, 2, 'Beer', 4, 25);
INSERT INTO business (Id, Category, Business_Name, City_Id, Billing) VALUES(5, 1, 'Steak', 1, 80);
CREATE TABLE City (
Id INT,
City_Name VARCHAR(30)
);
INSERT INTO City (Id, City_Name) VALUES(1, 'Paris');
INSERT INTO City (Id, City_Name) VALUES(2, 'New York');
INSERT INTO City (Id, City_Name) VALUES(3, 'Tokio');
INSERT INTO City (Id, City_Name) VALUES(4, 'Vancouver');
INSERT INTO City (Id, City_Name) VALUES(5, 'Cairo');
CREATE TABLE Category (
Id INT,
Category_Name VARCHAR(30)
);
INSERT INTO Category (Id, Category_Name) VALUES(1, 'Bar');
INSERT INTO Category (Id, Category_Name) VALUES(2, 'Pub');
INSERT INTO Category (Id, Category_Name) VALUES(3, 'Pizza');
2- I want to make these SQL queries:
a) Total Value of Billing (Billing) all stores, like this table:
-----------------------
|Business_Name | Total |
|--------------+-------|
|Total | 180 |
------------------------
b) All Total Billing by Category_Name like this table:
-------------------
|Category | Total |
|---------+-------|
|Bar | 130 |
|---------+-------|
|Pub | 50 |
|---------+-------|
|Pizza | 5 |
----------+--------
c)List the Business_Name with min billing, showing the: Category_Name, Business_Name, and Billing like this table:
----------------------------------------
|Category_Name | Business_Name | Total |
|--------------+---------------+-------|
|Pub | Beer | 5 |
|--------------+---------------+--------
d) All Total of Billing by City, showing the: Category_Name, Business_Name, City_Name and Billing like this table
--------------------------
|City | Total |
|----------------+-------|
|Cairo | 0 |
|----------------+-------|
|New York | 10 |
|----------------+-------|
|Paris | 130 |
|----------------+-------|
|Tokio | 15 |
-----------------+--------
|Vancouver | 25 |
-----------------+--------
Any body with a little more knowledge that could be help me, please? =)
First thing is first, all of these are basic queries and i have to point out that a simple google search for tutorials(ex1, ex2, ex3) would have answered most of these. as we are here to provide help and guidance i hope you take it to heart and read the tutorials before going over the answers.
with that said to be able to help you out i will walk through each query and provide an overview of what is happening.
a) you need an aggregate operation here to sum up the values. You would use the sum key word. normally you need a group by, but in this case since we only have a hard coded column with the word "Total" in it, it is not required. we also give each of the columns an alias as per your table. this is after the column name.
select 'Total' as business_Name,
sum(billing) Total
from business
b) This one is almost an exact copy of a, but requires a grouping. in this case you have to use the group by key word for all columns that are not in aggregates. in this case it is only the category name. it is good practice to not use the ordinal position in group by and order by statements, you should always spell out the columns you are using when able.
select c.category_name,
sum(billing) total
from business b
inner join category c
on b.category = c.id
group by c.category_name
c) We continue to build onto the query and add another column in the select statement and then add a column to the group by to allow grouping.
select c.category_name,
b.business_name,
sum(billing) total
from business b
inner join category c
on b.category = c.id
group by c.category_name, b.business_name
d) For this query its very similar to b, but instead of category_name we do a join on city with city id.
select c.city_name
,sum(billing) as total
from business b
inner join city c on c.id = b.city_id
group by c.city_name
with all of this said, several of your examples do not match your expected output. but these queries do match the expected output with the data you provided.
I really do recommend going through some tutorials to grasp the basics of sql better.
Here is answer to one of the queries. But I'd recommend you to read online basic sql tutorials and you will be able to write them yourself easily.
b)
select c.category_name
,sum(billing)
from business b
join category c
on b.category = c.id
group by 1

oracle search using duplicate values - how do I return duplicate results

I have a requirement to return values from a SELECT statement, regardless of if they are duplicate or not.
Example:
SELECT * FROM PEOPLE_TABLE
WHERE PERSON_ID = 1
AND PERSON_ID = 1;
It obviously just returns a single record e.g.
(person_id, name)
"1", "Henry"
I would like my results to return
"1", "Henry"
"1", "Henry"
What's the best way to achieve this? My actual joins a few tables and uses WHERE IN and then specifies about 600 values (200 unique values).
Using WHERE IN will suppress duplicates. If you're doing something like this:
CREATE TABLE PEOPLE_TABLE (PERSON_ID NUMBER, NAME VARCHAR2(10));
INSERT INTO PEOPLE_TABLE VALUES (1, 'Henry');
INSERT INTO PEOPLE_TABLE VALUES (2, 'George');
INSERT INTO PEOPLE_TABLE VALUES (3, 'Jane');
SELECT PERSON_ID, NAME
FROM PEOPLE_TABLE
WHERE PERSON_ID IN (1, 1, 3);
PERSON_ID NAME
---------- ----------
1 Henry
3 Jane
Then you can join instead, against a table collection that contains the same target values:
SELECT PT.PERSON_ID, PT.NAME
FROM TABLE(SYS.ODCINUMBERLIST(1, 1, 3)) T
JOIN PEOPLE_TABLE PT
ON PT.PERSON_ID = T.COLUMN_VALUE;
PERSON_ID NAME
---------- ----------
1 Henry
1 Henry
3 Jane
SQL Fiddle demo. (Or with strings, from a comment).
You can create your own schema-level table collection type if you prefer and have privileges to do that, or use a built-in one like SYS.OCDINUMBERLIST or SYS.ODCIVARCHAR2LIST, depending on your actual data type.
To repeat the result, use UNION ALL:
SELECT * FROM PEOPLE_TABLE
WHERE PERSON_ID = 1
UNION ALL
SELECT * FROM PEOPLE_TABLE
WHERE PERSON_ID = 1
Actually, I think you're asking for something else, but I can't figure out what.
I would assume your PERSON_ID is a number
You only need one criteria
select * from PEOPLE_TABLE
where PERSON_ID = 1
http://sqlfiddle.com/#!15/3ff40/5