JOIN, aggregate and convert in postgres between two tables - sql

Here are the two tables i have: [all columns in both tables are of type "text"], Table name and the column names are in bold fonts.
Names
--------------------------------
Name | DoB | Team |
--------------------------------
Harry | 3/12/85 | England
Kevin | 8/07/86 | England
James | 5/05/89 | England
Scores
------------------------
ScoreName | Score
------------------------
James-1 | 120
Harry-1 | 30
Harry-2 | 40
James-2 | 56
End result i need is a table that has the following
NameScores
---------------------------------------------
Name | DoB | Team | ScoreData
---------------------------------------------
Harry | 3/12/85 | England | "{"ScoreName":"Harry-1", "Score":"30"}, {"ScoreName":"Harry-2", "Score":"40"}"
Kevin | 8/07/86 | England | null
James | 5/05/89 | England | "{"ScoreName":"James-1", "Score":"120"}, {"ScoreName":"James-2", "Score":"56"}"
I need to do this using a single SQL command which i will use to create a materialized view.
I have gotten as far as realising that it will involve a combination of string_agg, JOIN and JSON, but haven't been able to crack it fully. Please help :)

I don't think the join is tricky. The complication is building the JSON object:
select n.name, n.dob, n.team,
json_agg(json_build_object('ScoreName', s.name,
'Score', s.score)) as ScoreData
from names n left join
scores s
ons.name like concat(s.name, '-', '%')
group by n.name, n.dob, n.team;
Note: json_build_object() was introduced in Postgres 9.4.
EDIT:
I think you can add a case statement to get the simple NULL:
(case when s.name is null then NULL
else json_agg(json_build_object('ScoreName', s.name,
'Score', s.score))
end) as ScoreData

Use json_agg() with row_to_json() to aggregate scores data into a json value:
select n.*, json_agg(row_to_json(s)) "ScoreData"
from "Names" n
left join "Scores" s
on n."Name" = regexp_replace(s."ScoreName", '(.*)-.*', '\1')
group by 1, 2, 3;
Name | DoB | Team | ScoreData
-------+---------+---------+---------------------------------------------------------------------------
Harry | 3/12/85 | England | [{"ScoreName":"Harry-1","Score":30}, {"ScoreName":"Harry-2","Score":40}]
James | 5/05/89 | England | [{"ScoreName":"James-1","Score":120}, {"ScoreName":"James-2","Score":56}]
Kevin | 8/07/86 | England | [null]
(3 rows)

Related

SQL split values on multiple rows

I made a select and it looks like this
SELECT name,class_name
FROM students
INNER JOIN classes on classes.id = students.id
and I get a table like
| name | class_name |
| -------- | -------------- |
| Daniel | Math |
| Johnny | Physics |
| Johnny | Math |
| Andrew | English |
...
How am I supposed to split the table to get the first two classes each student attends (students can attend more than two classes or a single class)
Example:
| name | class_1 | class_2 |
| -------- | -------------- | ---------|
| Daniel | Math | |
| Johnny | Math | Physics |
| Andrew | English | |
...
I was thinking of transposing the table, however, I don't know how to actually do it or if it's a good approach.
I concatenated every different class for each student in a single column with commas as separators (because classes can contain spaces, for example, "Computer Science") then I split the single column into the two required classes.
WITH part
AS (SELECT NAME,
class_name
FROM students
INNER JOIN classes
ON classes.id = students.id),
ans
AS (SELECT NAME,
Array_to_string(Array_agg(class_name), ',') AS cl
FROM part
GROUP BY NAME)
SELECT NAME,
Split_part(cl, ',', 1) AS class_1,
Split_part(cl, ',', 2) AS class_2
FROM ans

SQL Where != stringval not filtering out stringval

I have a table (as table1)comes from HBase that has certain things that I would like to filter out. I have recreated the table, my SQL query, and the output I receive below. What happens is that when I try to filter out the string value it stays in the table, even if I want it out.
table1 ( some positions are fully capitalized some arent, want to make them all capitalized and filter out positions )
name | company | personal_id | position
Joe | Applebees| 32 | manager
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | barista
query
df = sqlContext.sql("select name, company, personal_id, UCASE(position) as position
from table1
where position != 'BARISTA'") #tried lower & upper case
Output Reieved
name | company | personal_id | position
Joe | Applebees| 32 | MANAGER
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | BARISTA /*dont want this output*/
Why did the row Ron | Startbucks| 13 | BARISTA not filter with my where clause?
try
where UCASE(position) != 'BARISTA'
Why are you grouping the result. there is no need to group the result until aggregate function is used. Try below query -
select name, company, personal_id, UCASE(position) as position
from table1
where upper(position) != 'BARISTA'

How to use XML Path to generate a grid

I need to output results of a query to a grid, rather a long list of values.
What I have right now is
(SELECT COLUMN1+' '+COLUMN2
FROM TABLE
FOR XML PATH) AS MyGrid
Results I have are displayed as
Bob s12345 Chuck s54321
I would like to have them displayed as
Bob s12345
Chuck s54321
Any help, please?
Added table records
CustID | CustName | StoreNumber | City
------+----------+--------------+-----------
1 | Bob | s12345 | Somewhere
2 | Chuck | s54321 | Town
3 | Paul | s19285 | BillaBong
4 | David | s65478 | North
5 | Arnold | s47381 | South
The MyGrid ALIAS is passed to Outlook as merge field.
you can use cross apply with values
select value1,value2 from table
cross apply
(values (value3 ,value4))b(v1,v2)

SQL JOIN to omit other columns after first result

Here is the result I need, simplified:
select name, phonenumber
from contacttmp
left outer join phonetmp on (contacttmp.id = phonetmp.contact_id);
name | phonenumber
-------+--------------
bob | 111-222-3333
bob | 111-222-4444
bob | 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
The query, however displays the name, I'm trying to omit the name after the first result:
name | phonenumber
-------+--------------
bob | 111-222-3333
| 111-222-4444
| 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
Here's how I made the example tables and the data:
create table contacttmp (id serial, name text);
create table phonetmp (phoneNumber text, contact_id integer);
select * from contacttmp;
id | name
----+-------
1 | bob
2 | frank
3 | joe
select * from phonetmp ;
phonenumber | contact_id
--------------+------------
111-222-3333 | 1
111-222-4444 | 1
111-222-5555 | 1
111-222-6666 | 2
111-222-7777 | 3
Old part of question
I'm working on a contacts program in PHP and a requirement is to display the results but omit the other fields after the first record is displayed if there are multiple results of that same record.
From the postgres tutorial join examples I'm doing something like this with a left outer join:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
I can't figure out how to, or if it is possible to, alter the above query to not display the other fields after the first result.
For example, if we add the clause "WHERE location = '(-194,53)'" we don't want the second (and third if there is one) results to display the columns other than location, so the query (plus something extra) and the result would look like this:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-194,53)
Is this possible with some kind of JOIN or exclusion or other query? Or do I have to remove these fields in PHP after getting all the results (would rather not do).
To avoid confusion, I'm required to achieve a result set like:
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-19,5)
| | | | | | (-94,3)
Philadelphia | 55 | 60 | 0.1 | 1995-12-12 | Philadelphia | (-1,1)
| | | | | | (-77,55)
| | | | | | (-3,33)
Where any additional results for the same record (city) with different locations would only display the different location.
You can do this type of logic in SQL, but it is not recommended. The result set from SQL queries is in a table format. Tables represented unordered sets and generally have all columns meaning the same thing.
So, having a result set that depends on the values from the "preceding" row is not a proper way to use SQL. Although you can get this result in Postgres, I do not recommend it. Usually, this type of formatting is done on the application side.
If you want to avoid repeating the same information, you can use a window function that tells you the position of that row in the group (a PARTITION for this purpose, not a group in the GROUP BY sense), then hide the text for the columns you don't want to repeat if that position in the group is greater than 1.
WITH joined_results AS (
SELECT
w.city, c.location, w.temp_lo, w.temp_hi, w.prcp, w.date,
ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos
FROM weather w
LEFT OUTER JOIN cities c ON (w.city = c.name)
ORDER BY w.city, c.location
)
SELECT
CASE WHEN pos > 1 THEN '' ELSE city END,
CASE WHEN pos > 1 THEN '' ELSE location END,
temp_lo, temp_hi, prcp, date
FROM joined_results;
This should give you this:
city | location | temp_lo | temp_hi | prcp | date
---------------+-----------+---------+---------+------+------------
Hayward | | 37 | 54 | | 1994-11-29
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27
| | 43 | 57 | 0 | 1994-11-29
To understand what ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos does, it probably worth looking at what you get with SELECT * FROM joined_results:
city | location | temp_lo | temp_hi | prcp | date | pos
---------------+-----------+---------+---------+------+------------+-----
Hayward | | 37 | 54 | | 1994-11-29 | 1
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27 | 1
San Francisco | (-194,53) | 43 | 57 | 0 | 1994-11-29 | 2
After that, just replace what you don't want with white space using CASE WHEN pos > 1 THEN '' ELSE ... END.
(This being said, it's something I'd generally prefer to do in the presentation layer rather than in the query.)
Consider the slightly modified test case in the fiddle below.
Simple case
For the simple case dealing with a single column from each column, comparing to the previous row with the window function lag() does the job:
SELECT CASE WHEN lag(c.contact) OVER (ORDER BY c.contact, p.phone_nr)
= c.contact THEN NULL ELSE c.contact END
, p.phone_nr
FROM contact c
LEFT JOIN phone p USING (contact_id);
You could repeat that for n columns, but that's tedious
For many columns
SELECT c.*, p.phone_nr
FROM (
SELECT *
, row_number() OVER (PARTITION BY contact_id ORDER BY phone_nr) AS rn
FROM phone
) p
LEFT JOIN contact c ON c.contact_id = p.contact_id AND p.rn = 1;
Something like a "reverse LEFT JOIN". This is assuming referential integrity (no missing rows in contact. Also, contacts without any entries in phone are not in the result. Easy to add if need should be.
SQL Fiddle.
Aside, your query in the first example exhibits a rookie mistake.
SELECT * FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
One does not combine a LEFT JOIN with a WHERE clause on the right table. Doesn't makes sense. Details:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
Except to test for existence ...
Select rows which are not present in other table

SQL Server - Given a Set of Columns, finding missing combinations within the Set

I have the following Table. What I want to get is the missing combinations of Student, Class, Book. I have a query below that does it, but I would like others to provide more efficient queries (ie possibly ones that use group by) to find the missing combos.
SQL FIDDLE HERE - http://sqlfiddle.com/#!6/16e2b/3
StudentBook Table
+---------+---------+--------------+
| Student | Class | Book |
+---------+---------+--------------+
| Albert | Math | AlgebraBook |
| Albert | Math | FractionBook |
| Bridget | Math | AlgebraBook |
| Bridget | Math | FractionBook |
| Charles | Math | AlgebraBook |
| Charles | Math | FractionBook |
| Debbie | English | NovelBook |
| Debbie | English | PoemBook |
| Edward | English | PoemBook |
| Frank | English | PoemBook |
+---------+---------+--------------+
The following Rows in the Set are the missing combinations
Correct Result of My Query Below
+---------+---------+-----------+
| Student | Class | Book |
+---------+---------+-----------+
| Edward | English | NovelBook |
| Frank | English | NovelBook |
+---------+---------+-----------+
And I can use the following Query to get the Missing Combinations, but I want a faster more efficient solutions. Basically I'm looking for other more Effective Techniques, such as possibly using Group By.
WITH CTE_ClassBooks AS
(
SELECT DISTINCT Class, Book FROM StudentBook
),
CTE_StudentClasses AS
(
SELECT DISTINCT Student, Class FROM StudentBook
),
CTE_CombosOfStudentClassBooks AS
(
SELECT DISTINCT b.Student, a.Class, a.Book
FROM CTE_ClassBooks a
INNER JOIN CTE_StudentClasses b ON a.Class = B.Class
)
SELECT * FROM CTE_CombosOfStudentClassBooks
EXCEPT
SELECT * FROM StudentBook
This might be a little faster, your route doesn't seem terribly inefficient though.
;WITH cte AS (SELECT DISTINCT Class,Book FROM Table1)
SELECT b.Student,a.*
FROM cte a
JOIN Table1 b
ON a.Class = b.Class
LEFT JOIN Table1 c
ON a.Class = c.CLass
AND a.Book = c.Book
AND b.Student = c.Student
WHERE c.Class IS NULL
Demo: SQL Fiddle
SELECT S1.STUDENT,S1.CLASS,S2.BOOK FROM
STUDENTBOOK S1,(SELECT DISTINCT CLASS,BOOK FROM STUDENTBOOK) S2
WHERE S1.CLASS = S2.CLASS
AND S1.BOOK <> S2.BOOK
EXCEPT
SELECT STUDENT,CLASS,BOOK FROM STUDENTBOOK