SQL JOIN to omit other columns after first result - sql

Here is the result I need, simplified:
select name, phonenumber
from contacttmp
left outer join phonetmp on (contacttmp.id = phonetmp.contact_id);
name | phonenumber
-------+--------------
bob | 111-222-3333
bob | 111-222-4444
bob | 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
The query, however displays the name, I'm trying to omit the name after the first result:
name | phonenumber
-------+--------------
bob | 111-222-3333
| 111-222-4444
| 111-222-5555
frank | 111-222-6666
joe | 111-222-7777
Here's how I made the example tables and the data:
create table contacttmp (id serial, name text);
create table phonetmp (phoneNumber text, contact_id integer);
select * from contacttmp;
id | name
----+-------
1 | bob
2 | frank
3 | joe
select * from phonetmp ;
phonenumber | contact_id
--------------+------------
111-222-3333 | 1
111-222-4444 | 1
111-222-5555 | 1
111-222-6666 | 2
111-222-7777 | 3
Old part of question
I'm working on a contacts program in PHP and a requirement is to display the results but omit the other fields after the first record is displayed if there are multiple results of that same record.
From the postgres tutorial join examples I'm doing something like this with a left outer join:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
I can't figure out how to, or if it is possible to, alter the above query to not display the other fields after the first result.
For example, if we add the clause "WHERE location = '(-194,53)'" we don't want the second (and third if there is one) results to display the columns other than location, so the query (plus something extra) and the result would look like this:
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-194,53)
Is this possible with some kind of JOIN or exclusion or other query? Or do I have to remove these fields in PHP after getting all the results (would rather not do).
To avoid confusion, I'm required to achieve a result set like:
city | temp_lo | temp_hi | prcp | date | name | location
--------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
| | | | | | (-19,5)
| | | | | | (-94,3)
Philadelphia | 55 | 60 | 0.1 | 1995-12-12 | Philadelphia | (-1,1)
| | | | | | (-77,55)
| | | | | | (-3,33)
Where any additional results for the same record (city) with different locations would only display the different location.

You can do this type of logic in SQL, but it is not recommended. The result set from SQL queries is in a table format. Tables represented unordered sets and generally have all columns meaning the same thing.
So, having a result set that depends on the values from the "preceding" row is not a proper way to use SQL. Although you can get this result in Postgres, I do not recommend it. Usually, this type of formatting is done on the application side.

If you want to avoid repeating the same information, you can use a window function that tells you the position of that row in the group (a PARTITION for this purpose, not a group in the GROUP BY sense), then hide the text for the columns you don't want to repeat if that position in the group is greater than 1.
WITH joined_results AS (
SELECT
w.city, c.location, w.temp_lo, w.temp_hi, w.prcp, w.date,
ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos
FROM weather w
LEFT OUTER JOIN cities c ON (w.city = c.name)
ORDER BY w.city, c.location
)
SELECT
CASE WHEN pos > 1 THEN '' ELSE city END,
CASE WHEN pos > 1 THEN '' ELSE location END,
temp_lo, temp_hi, prcp, date
FROM joined_results;
This should give you this:
city | location | temp_lo | temp_hi | prcp | date
---------------+-----------+---------+---------+------+------------
Hayward | | 37 | 54 | | 1994-11-29
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27
| | 43 | 57 | 0 | 1994-11-29
To understand what ROW_NUMBER() OVER (PARTITION BY w.city, c.location ORDER BY date) AS pos does, it probably worth looking at what you get with SELECT * FROM joined_results:
city | location | temp_lo | temp_hi | prcp | date | pos
---------------+-----------+---------+---------+------+------------+-----
Hayward | | 37 | 54 | | 1994-11-29 | 1
San Francisco | (-194,53) | 46 | 50 | 0.25 | 1994-11-27 | 1
San Francisco | (-194,53) | 43 | 57 | 0 | 1994-11-29 | 2
After that, just replace what you don't want with white space using CASE WHEN pos > 1 THEN '' ELSE ... END.
(This being said, it's something I'd generally prefer to do in the presentation layer rather than in the query.)

Consider the slightly modified test case in the fiddle below.
Simple case
For the simple case dealing with a single column from each column, comparing to the previous row with the window function lag() does the job:
SELECT CASE WHEN lag(c.contact) OVER (ORDER BY c.contact, p.phone_nr)
= c.contact THEN NULL ELSE c.contact END
, p.phone_nr
FROM contact c
LEFT JOIN phone p USING (contact_id);
You could repeat that for n columns, but that's tedious
For many columns
SELECT c.*, p.phone_nr
FROM (
SELECT *
, row_number() OVER (PARTITION BY contact_id ORDER BY phone_nr) AS rn
FROM phone
) p
LEFT JOIN contact c ON c.contact_id = p.contact_id AND p.rn = 1;
Something like a "reverse LEFT JOIN". This is assuming referential integrity (no missing rows in contact. Also, contacts without any entries in phone are not in the result. Easy to add if need should be.
SQL Fiddle.
Aside, your query in the first example exhibits a rookie mistake.
SELECT * FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name)
WHERE location = '(-194,53)';
One does not combine a LEFT JOIN with a WHERE clause on the right table. Doesn't makes sense. Details:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
Except to test for existence ...
Select rows which are not present in other table

Related

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

SQL - UNION vs NULL functions. Which is better?

I have three tables: ACCT, PERS, ORG. Each ACCT is owned by either a PERS or ORG. The PERS and ORG tables are very similar and so are all of their child tables, but all PERS and ORG data is separate.
I'm writing a query to get PERS and ORG information for each account in ACCT and I'm curious what the best method of combining the information is. Should I use a series of left joins and NULL functions to fill in the blanks, or should I write the queries separately and use UNION to combine?
I've already written separate queries for PERS ACCT's and another for ORG ACCT's and plan on using UNION. My question more pertains to best practice in the future.
I'm expecting both to give me my desired my results, but I want to find the most efficient method both in development time and run time.
EDIT: Sample Table Data
ACCT Table:
+---------+---------+--------------+-------------+
| ACCTNBR | ACCTTYP | OWNERPERSNBR | OWNERORGNBR |
+---------+---------+--------------+-------------+
| 555001 | abc | 3010 | |
| 555002 | abc | | 2255 |
| 555003 | tre | 5125 | |
| 555004 | tre | 4485 | |
| 555005 | dsa | | 6785 |
+---------+---------+--------------+-------------+
PERS Table:
+---------+--------------+---------------+----------+-------+
| PERSNBR | PHONE | STREET | CITY | STATE |
+---------+--------------+---------------+----------+-------+
| 3010 | 555-555-5555 | 1234 Main St | New York | NY |
| 5125 | 555-555-5555 | 1234 State St | New York | NY |
| 4485 | 555-555-5555 | 6542 Vine St | New York | NY |
+---------+--------------+---------------+----------+-------+
ORG Table:
+--------+--------------+--------------+----------+-------+
| ORGNBR | PHONE | STREET | CITY | STATE |
+--------+--------------+--------------+----------+-------+
| 2255 | 222-222-2222 | 1000 Main St | New York | NY |
| 6785 | 333-333-3333 | 400 4th St | New York | NY |
+--------+--------------+--------------+----------+-------+
Desired Output:
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
| ACCTNBR | ACCTTYP | OWNERPERSNBR | OWNERORGNBR | PHONE | STREET | CITY | STATE |
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
| 555001 | abc | 3010 | | 555-555-5555 | 1234 Main St | New York | NY |
| 555002 | abc | | 2255 | 222-222-2222 | 1000 Main St | New York | NY |
| 555003 | tre | 5125 | | 555-555-5555 | 1234 State St | New York | NY |
| 555004 | tre | 4485 | | 555-555-5555 | 6542 Vine St | New York | NY |
| 555005 | dsa | | 6785 | 333-333-3333 | 400 4th St | New York | NY |
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
Query Option 1: Write 2 queries and use UNION to combine them:
select a.acctnbr, a.accttyp, a.ownerpersnbr, a.ownerorgnbr, p.phone, p.street, p.city, p.state
from acct a
inner join pers p on p.persnbr = a.ownerpersnbr
UNION
select a.acctnbr, a.accttyp, a.ownerpersnbr, a.ownerorgnbr, o.phone, o.street, o.city, o.state
from acct a
inner join org o on o.orgnbr = a.ownerorgnbr
Option 2: Use NVL() or Coalesce to return a single data set:
SELECT a.acctnbr,
a.accttyp,
NVL(a.ownerpersnbr, a.ownerorgnbr) Owner,
NVL(p.phone, o.phone) Phone,
NVL(p.street, o.street) Street,
NVL(p.city, o.city) City,
NVL(p.state, o.state) State
FROM
acct a
LEFT JOIN pers p on p.persnbr = a.ownerpersnbr
LEFT JOIN org o on o.orgnbr = a.ownerorgnbr
There are way more fields in each of the 3 tables as well as many more PERS and ORG tables in my actual query. Is one way better (faster, more efficient) than another?
That depends, on what you consider "better".
Assuming, that you will always want to pull all rows from ACCT table, I'd say to go for the LEFT OUTER JOIN and no UNION. (If using UNION, then rather go for UNION ALL variant.)
EDIT: As you've already shown your queries, mine is no longer required, and did not match your structures. Removing this part.
Why LEFT JOIN? Because with UNION you'd have to go through ACCT twice, based on "parent" criteria (whether separate or done INNER JOIN criteria), while with plain LEFT OUTER JOIN you'll probably get just one pass through ACCT. In both cases, rows from "parents" will most probably be accessed based on primary keys.
As you are probably considering performance, when looking for "better", as always: Test your queries and look at the execution plans with adequate and fresh database statistics in place, as depending on the data "layout" (histograms, etc.) the "better" may be something completely different.
I think you misunderstand what a Union does versus a join statement. A union takes the records from multiple tables, generally similar or the same structure and combines them into a single resultset. It is not meant to combine multiple dissimilar tables.
What I am seeing is that you have two tables PERS and ORG with some of the same data in it. In this case I suggest you union those two tables and then join to ACCT to get the sample output.
In this case to get the output as you have shown you would want to use Outer joins so that you don't drop any records without a match. That will give you nulls in some places but most of the time that is what you want. It is much easier to filter those out later.
Very rough sample code.
SELECT a.*, b.*
from Acct as a
FULL OUTER JOIN (
Select * from PERS UNION Select * from ORG
) as b
ON a.ID = b.ID

SQL Where != stringval not filtering out stringval

I have a table (as table1)comes from HBase that has certain things that I would like to filter out. I have recreated the table, my SQL query, and the output I receive below. What happens is that when I try to filter out the string value it stays in the table, even if I want it out.
table1 ( some positions are fully capitalized some arent, want to make them all capitalized and filter out positions )
name | company | personal_id | position
Joe | Applebees| 32 | manager
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | barista
query
df = sqlContext.sql("select name, company, personal_id, UCASE(position) as position
from table1
where position != 'BARISTA'") #tried lower & upper case
Output Reieved
name | company | personal_id | position
Joe | Applebees| 32 | MANAGER
Jack | Target | 12 | CLERK
Jim | Chipotle | 22 | COOK
Ron | Starbucks| 13 | BARISTA /*dont want this output*/
Why did the row Ron | Startbucks| 13 | BARISTA not filter with my where clause?
try
where UCASE(position) != 'BARISTA'
Why are you grouping the result. there is no need to group the result until aggregate function is used. Try below query -
select name, company, personal_id, UCASE(position) as position
from table1
where upper(position) != 'BARISTA'

Using a table to lookup multiple IDs on one row

I have two tables I am using at work to help me gain experience in writing SQL queries. One table contains a list of Applications and has three columns -
Application_Name, Application_Contact_ID and Business_Contact_ID. I then have a separate table called Contacts with two columns - Contact_ID and Contact_Name. I am trying to write a query that will list the Application_Name and Contact_Name for both the Applications_Contact_ID and Business_Contact_ID columns instead of the ID number itself.
I understand I need to JOIN the two tables but I haven't quite figured out how to formulate the correct statement. Help Please!
APPLICATIONS TABLE:
+------------------+------------------------+---------------------+
| Application_Name | Application_Contact_ID | Business_Contact_ID |
+------------------+------------------------+---------------------+
| Adobe | 23 | 23 |
| Word | 52 | 14 |
| NotePad++ | 44 | 989 |
+------------------+------------------------+---------------------+
CONTACTS TABLE:
+------------+--------------+
| Contact_ID | Contact_Name |
+------------+--------------+
| 23 | Tim |
| 52 | John |
| 14 | Jen |
| 44 | Carl |
| 989 | Sam |
+------------+--------------+
What I am trying to get is:
+------------------+--------------------------+-----------------------+
| Application_Name | Application_Contact_Name | Business_Contact_Name |
+------------------+--------------------------+-----------------------+
| Adobe | Tim | Tim |
| Word | John | Jen |
| NotePad++ | Carl | Sam |
+------------------+--------------------------+-----------------------+
I've tried the below but it is only returning the name for one of the columns:
SELECT Application_Name, Application_Contact_ID, Business_Contact_ID, Contact_Name
FROM Applications
JOIN Contact ON Contact_ID = Application_Contact_ID
This is a pretty critical and 101 part of SQL. Consider reading this other answer on a different question, which explains the joins in more depth. The trick to your query, is that you have to join the CONTACTS table twice, which is a bit hard to visualize, because you have to go there for both the application_contact_id and business_contact_id.
There are many flavors of joins (INNER, LEFT, RIGHT, etc.), which you'll want to familiarize yourself with for the future reference. Consider reading this article at the very least: https://www.techonthenet.com/sql_server/joins.php.
SELECT t1.application_name Application_Name,
t2.contact_name Application_Contact_name,
t3.contact_name Business_Contact_name
FROM applications t1
INNER JOIN contacts ON t2 t1.Application_Contact_ID = t2.contact_id -- join contacts for appName
INNER JOIN contacts ON t3 t1.business_Contact_ID = t3.contact_id; -- join contacts for busName

JOIN, aggregate and convert in postgres between two tables

Here are the two tables i have: [all columns in both tables are of type "text"], Table name and the column names are in bold fonts.
Names
--------------------------------
Name | DoB | Team |
--------------------------------
Harry | 3/12/85 | England
Kevin | 8/07/86 | England
James | 5/05/89 | England
Scores
------------------------
ScoreName | Score
------------------------
James-1 | 120
Harry-1 | 30
Harry-2 | 40
James-2 | 56
End result i need is a table that has the following
NameScores
---------------------------------------------
Name | DoB | Team | ScoreData
---------------------------------------------
Harry | 3/12/85 | England | "{"ScoreName":"Harry-1", "Score":"30"}, {"ScoreName":"Harry-2", "Score":"40"}"
Kevin | 8/07/86 | England | null
James | 5/05/89 | England | "{"ScoreName":"James-1", "Score":"120"}, {"ScoreName":"James-2", "Score":"56"}"
I need to do this using a single SQL command which i will use to create a materialized view.
I have gotten as far as realising that it will involve a combination of string_agg, JOIN and JSON, but haven't been able to crack it fully. Please help :)
I don't think the join is tricky. The complication is building the JSON object:
select n.name, n.dob, n.team,
json_agg(json_build_object('ScoreName', s.name,
'Score', s.score)) as ScoreData
from names n left join
scores s
ons.name like concat(s.name, '-', '%')
group by n.name, n.dob, n.team;
Note: json_build_object() was introduced in Postgres 9.4.
EDIT:
I think you can add a case statement to get the simple NULL:
(case when s.name is null then NULL
else json_agg(json_build_object('ScoreName', s.name,
'Score', s.score))
end) as ScoreData
Use json_agg() with row_to_json() to aggregate scores data into a json value:
select n.*, json_agg(row_to_json(s)) "ScoreData"
from "Names" n
left join "Scores" s
on n."Name" = regexp_replace(s."ScoreName", '(.*)-.*', '\1')
group by 1, 2, 3;
Name | DoB | Team | ScoreData
-------+---------+---------+---------------------------------------------------------------------------
Harry | 3/12/85 | England | [{"ScoreName":"Harry-1","Score":30}, {"ScoreName":"Harry-2","Score":40}]
James | 5/05/89 | England | [{"ScoreName":"James-1","Score":120}, {"ScoreName":"James-2","Score":56}]
Kevin | 8/07/86 | England | [null]
(3 rows)