Postgresql check for double entries - sql

I searched for nearly one hour to solve my problem but i cant find anything.
So:
I created a table named s (Suppliers) where some Suppliers for Parts are listed, it looks like this:
insert into S(sno, sname, status, city)
values ('S1', 'Smith', 20, 'London'),
('S2', 'Jones', 10, 'Paris'),
('S3', 'Blake', 30, 'Paris'),
('S4', 'Clark', 20, 'London'),
('S5', 'Adams', 30, 'Athens');
Now i want to check this table for double entries in the column "city", so this would be London and Paris and i want to sort it by the sno and print it out.
I know that it's a bit harder in Postgres than in mySQL and i tried it like this:
SELECT sno, COUNT(city) AS NumOccurencies FROM s GROUP BY sno HAVING ( COUNT (city) > 1 );
But all i get is an empty table :(. I tried different ways but it's always the same, i don't know what to do to be honest. I hope some of you could help me out here :).
Greetings Max

You're thinking about it a little backwards. By grouping by the sno you're finding all of those rows with the same sno, not the same city. Try this instead:
SELECT
city
FROM
S
GROUP BY
city
HAVING
COUNT(*) > 1
You can then use that as a subquery to find the rows that you want:
SELECT
sno, sname, status, city
FROM
S
WHERE
city IN
(
SELECT
city
FROM
S
GROUP BY
city
HAVING
COUNT(*) > 1
)

Related

Trying to show Count of Person, Spouse, Kids

I am doing
select
count(distinct(ssn))
, count (distinct(ssn + spousename))
, count distinct((ssn + spousename + kidname))
My problem is that if the spouse name or kid name are blank there is no spouse or kid so they should not be counted.
How would you bypass a blank value from the count?
data:
SSN Person Spouse Child
111-11.... John (no spouse no kid)
222-22.....Jane Jim Jack
333-33.....Jerry Jack (no spouse)
333-33.....Jerry Jill (no spouse second kid)
444-44..... John Judy (no kid)
My answer should be 4 people, 2 spouses and 3 kids because I am doing a count of unique values that don't include blank.
I can't show the real SSN and names so It looks like fake data
Thank you!
you may try COALESCE
select
count(distinct(ssn))
, count (distinct(ssn + spousename))
, count distinct((ssn + spousename + coalesce(kidname,'')))
Try this:
SELECT
COUNT(ssn) AS people,
SUM(spouses) AS spouses,
SUM(children) AS children
FROM
(
SELECT
ssn,
MAX(person) AS person,
COUNT(spouse) AS spouses,
COUNT(child) AS children
FROM
TableName
GROUP BY
ssn
) BySSN
;
Providing your data in an accurate (and of course made up!), and consumable (DDL+DML) format is critical to getting the correct answer.
Knowing whether you use null or blank is makes a big difference also.
I am a big fan of counting exactly what it is you want to count, rather than relying on distinct.
declare #MyData table (SSN varchar(6), Person varchar(12), Spouse varchar(12), Child varchar(12));
insert into #MyData (SSN, Person, Spouse, Child)
values
('111-11', 'John', null, null),
('222-22', 'Jane', 'Jim', 'Jack'),
('333-33', 'Jerry', null, 'Jack'),
('333-33', 'Jerry', null, 'Jill'),
('444-44', 'John', 'Judy', null);
with cte as (
select count(Spouse) Spouse, count(Child) Child
from #MyData
group by SSN
)
select count(*) [Num People], sum(Spouse) Spouses], sum(Child) [Num Children]
from cte;
Returns:
Num People
Num Spouses
Num Children
4
2
3

How to self join only a subset of rows in PostgreSQL?

Given the following table:
CREATE TABLE people (
name TEXT PRIMARY KEY,
age INT NOT NULL
);
INSERT INTO people VALUES
('Lisa', 30),
('Marta', 27),
('John', 32),
('Sam', 41),
('Alex', 12),
('Aristides',43),
('Cindi', 1)
;
I am using a self join to compare each value of a specific column with all the other values of the same column. My query looks something like this:
SELECT DISTINCT A.name as child
FROM people A, people B
WHERE A.age + 16 < B.age;
This query aims to spot potential sons/daughters based on age difference. More specifically, my goal is to identify the set of people that may have stayed in the same house as one of their parents (ordered by name), assuming that there must be an age difference of at least 16 years between a child and their parents.
Now I would like to combine this kind of logic with the information that is in another table.
The other table looks something like that:
CREATE TABLE houses (
house_name TEXT NOT NULL,
house_member TEXT NOT NULL REFERENCES people(name)
);
INSERT INTO houses VALUES
('house Smith', 'Lisa'),
('house Smith', 'Marta'),
('house Smith', 'John'),
('house Doe', 'Lisa'),
('house Doe', 'Marta'),
('house Doe', 'Alex'),
('house Doe', 'Sam'),
('house McKenny', 'Aristides'),
('house McKenny', 'John'),
('house McKenny', 'Cindi')
;
The two tables can be joined ON houses.house_member = people.name.
More specifically I would like to spot the children only within the same house. It does not make sense to compare the age of each person with the age of all the others, but instead it would be more efficient to compare the age of each person with all the other people in the same house.
My idea is to perform the self join from above but only within a PARTITION BY household_name. However, I don't think this is a good idea since I do not have an aggregate function. Same applies for GROUP BY statements as well. What could I do here?
The expected output should be the following, ordered by house_member:
house_member
Alex
Cindi
For simplicity I have created a fiddle.
At first join two tables to build one table that has all three bits of info: house_name, house_member, age.
And then join it with itself just as you did originally and add one extra filter to look only at the same households.
WITH
CTE_All
AS
(
SELECT
houses.house_name
,houses.house_member
,people.age
FROM
houses
INNER JOIN people ON people.name = houses.house_member
)
SELECT DISTINCT
Children.house_name
,Children.house_member AS child_name
FROM
CTE_All AS Children
INNER JOIN CTE_All AS Parents
ON Children.age + 16 < Parents.age
-- this is our age difference
AND Children.house_name = Parents.house_name
-- within the same house
;
All this is one single query. You don't have to use CTE, you can inline it as a subquery, but it is more readable with CTE.
Result
house_name | child_name
:------------ | :---------
house Doe | Alex
house McKenny | Cindi

SELECT query with combined conditions

SQL noobie here, tried my luck googling around, but came up empty or didn't know the proper keyword. Still, feeling quite awkward between all those advanced questions, however still hopeful to get a solution and learn.
Let's suppose we have a table representing participants for different teams for a children sports tournament.
Participant table:
Our goal is to select out participants that have chosen a WRONG team. Let's suppose that the conditions for the teams are as such:
team Yellow = boys with age 12;
team Red = girls with age 13;
team Blue = boys with age 11;
That would mean that the incorrect registrants are Sarah (incorrect gender, correct age), Jack (incorrect gender and age) and Mary who all should therefore be included in the result of the query.
However I'm struggling with creating a SQL query that would consider conditions from multiple fields (comparing team towards gender and age at the same time) + having more than one set of comparison done at the same time (looking for incorrect participants from 3 teams at the same time).
Help is much appreciated!
You didn't state your DBMS so this is ANSI SQL:
Just select all rows that do not comply with any of the rules:
select *
from participants
where (team, gender, age) not in ( ('Yellow', 'M', 12),
('Red', 'F', 13),
('Blue', 'M', 11) );
Online example: http://rextester.com/ZTEON26060
Main thing here is you have to convert your team rules into some kind of proper data structure. You can put it into the table, or use derived table, like this:
select *
from participants as p
where
not exists (
select *
from (values
('Yellow', 'M', 12),
('Red', 'F', 13),
('Blue', 'M', 11)
) as t(Team, Gender, Age)
where
t.Team = p.Team and
t.Gender = p.Gender and
t.Age = p.Age
)
Or you can check for correct team and then compare with current team:
select
p.*, t.Team as Correct_Team
from participants as p
left join (values
('Yellow', 'M', 12),
('Red', 'F', 13),
('Blue', 'M', 11)
) as t(Team, Gender, Age) on
t.Gender = p.Gender and
t.Age = p.Age
sql fiddle demo
You can try this :
Select
Name,
Gender,
Age,
Team AS Chosen_team,
Case when Gender='M' and Age=12 Then 'Yellow'
when Gender='F' and Age=13 then 'Red'
when gender='M' and Age=11 then 'Blue'
End as Ideal_team,
Case when Chosen_team <> Ideal_team then 'FALSE' ELSE 'TRUE'
from your_table;
Now select the records with value false. You will get your list.
You can use a combination of or and and to do this.
select * from yourtable
where (team ='Yellow' and not (gender = 'M' and age = 12))
or (team ='Red' and not (gender = 'F' and age = 13))
or (team ='Blue' and not (gender = 'M' and age = 11))
There are a few things you haven't mentioned about your team restrictions:
Can you have different combinations of Age and Gender for the same teams?
Can any of the same Age and Gender combinations match multiple teams?
Would the valid teams cover all permutations of of participant ages and genders?
Since that's not stated, I'm going to provide a generic solution that makes it easy to extend the valid teams. And the query will return all participants that don't match at least one valid team. (NOTE: It may be that there is no valid team for a particular participant.)
Approach:
Put valid combinations in a temporary (or even persistent) table of some sorts (I'll use a CTE).
Select all participants where you cannot find a matching Age, Gender and Team in the "Valid Teams" table.
The CTE for ValidTeams below could easy be replaced with a table if your RDBMS doesn't support CTE's. NOTE: If there are many permutations of valid gender/age/team a separate table will be better.
;WITH ValidTeams AS (
SELECT 'Yellow' AS Team, 'M' AS Gender, 12 AS Age
UNION ALL SELECT 'Red', 'F', 13
UNION ALL SELECT 'Blue', 'M', 11
)
SELECT Name, Gender, Age, Team AS InvalidTeam
FROM Participants p
WHERE NOT EXISTS (
SELECT *
FROM ValidTeams v
WHERE v.Gender = p.Gender
AND v.Age = p.Age
AND v.Team = p.Team
)

Postgresql aggregate array

I have a two tables
Student
--------
Id Name
1 John
2 David
3 Will
Grade
---------
Student_id Mark
1 A
2 B
2 B+
3 C
3 A
Is it possible to make native Postgresql SELECT to get results like below:
Name Array of marks
-----------------------
'John', {'A'}
'David', {'B','B+'}
'Will', {'C','A'}
But not like below
Name Mark
----------------
'John', 'A'
'David', 'B'
'David', 'B+'
'Will', 'C'
'Will', 'A'
Use array_agg: http://www.sqlfiddle.com/#!1/5099e/1
SELECT s.name, array_agg(g.Mark) as marks
FROM student s
LEFT JOIN Grade g ON g.Student_id = s.Id
GROUP BY s.Id
By the way, if you are using Postgres 9.1, you don't need to repeat the columns on SELECT to GROUP BY, e.g. you don't need to repeat the student name on GROUP BY. You can merely GROUP BY on primary key. If you remove the primary key on student, you need to repeat the student name on GROUP BY.
CREATE TABLE grade
(Student_id int, Mark varchar(2));
INSERT INTO grade
(Student_id, Mark)
VALUES
(1, 'A'),
(2, 'B'),
(2, 'B+'),
(3, 'C'),
(3, 'A');
CREATE TABLE student
(Id int primary key, Name varchar(5));
INSERT INTO student
(Id, Name)
VALUES
(1, 'John'),
(2, 'David'),
(3, 'Will');
What I understand you can do something like this:
SELECT p.p_name,
STRING_AGG(Grade.Mark, ',' ORDER BY Grade.Mark) As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
EDIT
I am not sure. But maybe something like this then:
SELECT p.p_name, 
    array_to_string(ARRAY_AGG(Grade.Mark),';') As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
Reference here
You could use the following:
SELECT Student.Name as Name,
(SELECT array(SELECT Mark FROM Grade WHERE Grade.Student_id = Student.Id))
AS ArrayOfMarks
FROM Student
As described here: http://www.mkyong.com/database/convert-subquery-result-to-array/
Michael Buen got it right. I got what I needed using array_agg.
Here just a basic query example in case it helps someone:
SELECT directory, ARRAY_AGG(file_name)
FROM table
WHERE type = 'ZIP'
GROUP BY directory;
And the result was something like:
| parent_directory | array_agg |
+-------------------------+----------------------------------------+
| /home/postgresql/files | {zip_1.zip,zip_2.zip,zip_3.zip} |
| /home/postgresql/files2 | {file1.zip,file2.zip} |
This post also helped me a lot: "Group By" in SQL and Python Pandas.
It basically says that it is more convenient to use only PSQL when possible, but that Python Pandas can be useful to achieve extra functionalities in the filtering process.

SQL - How to display the students with the same age?

the code I wrote only tells me how many students have the same age. I want their names too...
SELECT YEAR(CURRENT DATE-DATEOFBIRTH) AS AGE, COUNT(*) AS HOWMANY
FROM STUDENTS
GROUP BY YEAR(CURRENT DATE-DATEOFBIRTH);
this returns something like this:
AGE HOWMANY
--- -------
21 3
30 5
Thank you.
TABLE STUDENTS COLUMNS:
StudentID (primary key), Name(varchar), Firstname(varchar), Dateofbirth(varchar)
I was thinking of maybe using the code above and somewhere add the function concat that will put the stundents' names on the same row as in
your existing SQL looks like it has errors, but you could use GROUP_CONCAT:
add GROUP_CONTACT(colname) as another column to fetch, then split by , in your application
The resulting data set does not appear useful on the surface based on the question unless you are looking for a listing of students, their age, and how many other students are of the same age:
SELECT NAME, AGE, HOWMANY
FROM STUDENTS AS S,
(SELECT YEAR(CURRENT DATE-DATEOFBIRTH) AS AGE,
COUNT(*) AS HOWMANY
FROM STUDENTS
GROUP BY YEAR(CURRENT DATE-DATEOFBIRTH)
) AS A
WHERE YEAR(CURRENT DATE-S.DATEOFBIRTH) = A.AGE
Basically perform a self-join with the age counts you have calculated.
What about...
SELECT name FROM students WHERE age = ENTER_AGE_HERE;
You have the names and the number of students can be found by finding the number of entries you get from the query.
For example, in PHP, you can find the length of the array.
Of course, you have to change to names in my example to the names used in your database.
CREATE TABLE #Student
(
id int identity(1,1),
age int,
name varchar(255)
)
INSERT INTO #Student S
VALUES(21,'bob'),
(21,'tom'),
(21,'dick'),
(21,'william'),
(35,'mark'),
(35,'anthony')
SELECT age,COUNT(*),STUFF(
(
SELECT ',' + name
FROM #Student SS
WHERE SS.age = S.age
FOR XML PATH('')
), 1, 1, '')
FROM #Student s
GROUP BY age