Changing IS NULL with (+) Syntax - sql

I have this query :
SELECT B.id, B.name, D.id, D.name FROM TBB B, TDD D
WHERE (D.id = B.id OR D.id IS NULL)
From what I thought, (D.id = B.id OR D.id IS NULL) will show record that have id in both table TBB and TDD the but also show all of B.id records even if both table doesn't have the same id because of D.id IS NULL
So, Is this the same one like my above query :
SELECT B.id, B.name, D.id, D.name FROM TBB B, TDD D
WHERE B.id = D.id (+)
Thanks in advance!

second query will return all the row exist in TBB table, not matching column in TDD table would return as NULL.
in-case TDD table have any ID that doesn't match with TBB table ID , that row will not return by above two query.

I am pretty sure that your top query would result in a cartesian join to every record in B for records that have a NULL due to the or statement in your query.
Based on that, you would be better off using the outer join.
I also wrote a lengthy Q&A that looks at joins and how to pull data from multiple tables you might be interested in ( How can an SQL query return data from multiple tables ) it covers unions, inner and outer joins as well as subqueries. It has loads of code and output results which are explained in detail. (To the point I hit the answer length limit, so had to post a second answer)
Edit: After running a quick test, this is what I come up with:
mysql> select a.ID, a.Title, b.Name as Author
from books a join authors b
on a.authorID=b.ID or b.id=0;
+----+----------------------+-------------------+
| ID | Title | Author |
+----+----------------------+-------------------+
| 1 | Call of the Wild | Fluffeh |
| 1 | Call of the Wild | Jack London |
| 2 | Martin Eden | Fluffeh |
| 2 | Martin Eden | Jack London |
| 3 | Old Goriot | Fluffeh |
| 3 | Old Goriot | Honore de Balzac |
| 4 | Cousin Bette | Fluffeh |
| 4 | Cousin Bette | Honore de Balzac |
| 5 | Jew Suess | Fluffeh |
| 5 | Jew Suess | Lion Feuchtwanger |
| 6 | Nana | Fluffeh |
| 6 | Nana | Emile Zola |
| 7 | The Belly of Paris | Fluffeh |
| 7 | The Belly of Paris | Emile Zola |
| 8 | In Cold blood | Fluffeh |
| 8 | In Cold blood | Truman Capote |
| 9 | Breakfast at Tiffany | Fluffeh |
| 9 | Breakfast at Tiffany | Truman Capote |
+----+----------------------+-------------------+
18 rows in set (0.00 sec)
mysql> select a.ID, a.Title, b.Name as Author
from books a right outer join authors b
on a.authorID=b.ID;
+------+----------------------+-------------------+
| ID | Title | Author |
+------+----------------------+-------------------+
| NULL | NULL | Fluffeh |
| 1 | Call of the Wild | Jack London |
| 2 | Martin Eden | Jack London |
| 3 | Old Goriot | Honore de Balzac |
| 4 | Cousin Bette | Honore de Balzac |
| 5 | Jew Suess | Lion Feuchtwanger |
| 6 | Nana | Emile Zola |
| 7 | The Belly of Paris | Emile Zola |
| 8 | In Cold blood | Truman Capote |
| 9 | Breakfast at Tiffany | Truman Capote |
+------+----------------------+-------------------+
10 rows in set (0.00 sec)
Which is certainly not the same as an outer join. As I thought (at least in MySQL) the results cartesian in the first statement, but not in the outer join.

Related

Remove Duplicate Result on Query

could help me solve this duplication problem where it returns more than 1 result for the same record I want to bring only 1 result for each id, and only the last history of each record.
My Query:
SELECT DISTINCT ON(tickets.ticket_id,ticket_histories.created_at)
ticket.id AS ticket_id,
tickets.priority,
tickets.title,
tickets.company,
tickets.ticket_statuse,
tickets.created_at AS created_ticket,
group_user.id AS group_id,
group_user.name AS user_group,
ch_history.description AS ch_description,
ch_history.created_at AS ch_history
FROM
tickets
INNER JOIN company ON (company.id = tickets.company_id)
INNER JOIN (SELECT id,
tickets_id,
description,
user_id,
MAX(tickets.created_at) AS created_ticket
FROM
ch_history
GROUP BY id,
created_at,
ticket_id,
user_id,
description
ORDER BY created_at DESC LIMIT 1) AS ch_history ON (ch_history.ticket_id = ticket.id)
INNER JOIN users ON (users.id = ch_history.user_id)
INNER JOIN group_users ON (group_users.id = users.group_user_id)
WHERE company = 15
GROUP BY
tickets.id,
ch_history.created_at DESC;
Result of my query, but returns 3 or 5 identical ids with different histories
I want to return only 1 id of each ticket, and only the last recorded history of each tick
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:35.724485 | SAME COMPANY | people | 5 | TEST 1 | 2019-12-10 09:31:45.780667
49706 | 2 | INCLUDE DATA | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 2 | 2019-12-10 09:38:52.769515
49706 | 2 | ANY TITLE | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 3 | 2019-12-10 09:39:22.779473
49706 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TESTE 4 | 2019-12-10 09:42:59.50332
49706 | 2 | WHITESTRIPES | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 5 | 2019-12-10 09:44:30.675434
wanted to return as below
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:10.724485 | SAME COMPANY | people | 5 | TEST 1 | 2020-01-01 18:31:45.780667
49707 | 2 | INCLUDE DATA | 1 | f | 2019-12-11 19:22:21.320701 | SAME COMPANY | people | 5 | TEST 2 | 2020-02-05 16:38:52.769515
49708 | 2 | ANY TITLE | 1 | f | 2019-12-15 07:15:57.320950 | SAME COMPANY | people | 5 | TEST 3 | 2020-02-06 07:39:22.779473
49709 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-16 08:30:28.320881 | SAME COMPANY | people | 5 | TESTE 4 | 2020-01-07 11:42:59.50332
49701 | 2 | WHITESTRIPES | 1 | f | 2019-12-21 11:04:00.320450 | SAME COMPANY | people | 5 | TEST 5 | 2020-01-04 10:44:30.675434
I wanted to return as shown below, see that the field ch_description, and ch_history bring only the most recent records and only the last of each ticket listed, without duplication I wanted to bring this way could help me.
Two things jump out at me:
You have listed "created at" as part of your "distinct on," which is going to inherently give you multiple rows per ticket id (unless there happens to be only one)
The distinct on should make the subquery on the ticket history unnecessary... and even if you chose to do it this way, you again are going on the "created at" column, which will give you multiple results. The ideal subquery, should you choose this approach, would have been to group by ticket_id and only ticket_id.
Slightly related:
An alternative approach to the subquery would be an analytic function (windowing function), but I'll save that for another day.
I think the query you want, which will give you one row per ticket_id, based on the history table's created_at field would be something like this:
select distinct on (t.id)
<your fields here>
from
tickets t
join company c on t.company_id = c.id
join ch_history ch on ch.ticket_id = t.id
join users u on ch.user_id = u.ud
join group_users g on u.group_user_id = g.id
where
company = 15
order by
t.id, ch.created_at -- this is what tells distinct on which record to choose

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

Postgres Join Query is SOMETIMES taking the cartesian product

I'm attempting to join multiple tables for one query and I am getting inconsistent results from the database, I believe my query is taking the cartesian product of all the users, when I only want users who are in the DirectConversation.
The Schema for reference:
The query is (where $id stands for the variable User.id):
SELECT c.*, count(dm.id),
u1.first_name, u1.last_name, u1.company, u1.picture,
u2.first_name, u2.last_name, u2.company, u2.picture
FROM "DirectConversation" as c, "DirectMessage" as dm, "Profile" as u1, "Profile" as u2
WHERE u1."id_User" = c."id_User1"
AND u2."id_User" = c."id_User2"
AND c.id = dm."id_DirectConversation"
AND dm.viewed = 'f' AND dm.deleted = 'f'
AND c."id_User1" = $id OR c."id_User2" = $id
GROUP BY c.id, u1.id, u2.id;
The expected result (the result when the user id = 1 ):
id | id_User1 | id_User2 | count | first_name | last_name | company | picture | first_name | last_name | company | picture
----+----------+----------+-------+------------+-----------+--------------------+-----------------------------------------------------------------------------+------------+-----------+----------------+--------------------------------------------------------------------------
1 | 1 | 2 | 3 | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg | Nikola | Tesla | Teslas Widgets | http://upload.wikimedia.org/wikipedia/commons/7/79/Tesla_circa_1890.jpeg
(1 row)
(END)
The error result (the result when the user id= 2):
id | id_User1 | id_User2 | count | first_name | last_name | company | picture | first_name | last_name | company | picture
----+----------+----------+-------+------------+-----------+--------------------+----------------------------------------------------------------------------------------------------------------+------------+-----------+--------------------+----------------------------------------------------------------------------------------------------------------
1 | 1 | 2 | 4 | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg
1 | 1 | 2 | 4 | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg | Nikola | Tesla | Teslas Widgets | http://upload.wikimedia.org/wikipedia/commons/7/79/Tesla_circa_1890.jpeg
1 | 1 | 2 | 4 | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg | Rosalind | Franklin | DNA R US | http://upload.wikimedia.org/wikipedia/en/9/97/Rosalind_Franklin.jpg
1 | 1 | 2 | 4 | Albert | Einstein | alberts inventions | http://upload.wikimedia.org/wikipedia/commons/d/d3/Albert_Einstein_Head.jpg | Charles | Babbage | Babbages Cabbages | http://upload.wikimedia.org/wikipedia/commons/6/6b/Charles_Babbage_-_1860.jpg
... Note this was truncated for brevity. I believe this is taking the cartesian product of all the users, however I am unaware as to why
The version of postgres I'm using:
version
-----------------------------------------------------------------------------------------------------
PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
I just moved your where clauses for the joins to the ON statement and made proper joins, if this doesn't work I'll set up a sqlfiddle and see what the problem with this sql is
SELECT c.*, count(dm.id),
u1.first_name, u1.last_name, u1.company, u1.picture,
u2.first_name, u2.last_name, u2.company, u2.picture
FROM "DirectConversation" c
JOIN "DirectMessage" dm ON c.id = dm."id_DirectConversation"
JOIN "Profile" u1 ON u1."id_User" = c."id_User1"
JOIN "Profile" u2 ON u2."id_User" = c."id_User2"
WHERE
dm.viewed = 'f' AND dm.deleted = 'f'
AND (c."id_User1" = $id OR c."id_User2" = $id)
GROUP BY c.id, u1.id, u2.id;
Edit: grouped the OR clause just to be safe

SQL one-to-many tables group by

Let consider example: I have following tables - TableA with people and TableB containing language skills of these people. Each row describing person can have none, one or more rows in TableB. Example below:
People
+-----+--------+
| pId | Name |
+-----+--------+
| 0 | Thomas |
| 1 | Henry |
| 2 | John |
+-----+--------+
Skills
+-----+-----+----------+---------------+
| lID | pId | Language | LanguageSkill |
+-----+-----+----------+---------------+
| 0 | 0 | Dutch | 0 |
| 1 | 0 | French | 4 |
| 2 | 0 | Italian | 2 |
| 3 | 2 | Italian | 2 |
+-----+-----+----------+---------------+
Thomas knows dutch, french and italian, Henry doesn't know any foreign language and John knows italian.
What I want to get is the best known language for each person from TableA:
+--------+----------+
| Name | Language |
+--------+----------+
| Thomas | French |
| Henry | NULL |
| John | Italian |
+--------+----------+
I have feeling that is quite easy thing, but don't have idea how to achieve it in a simple way.
Thanks for your responses.
You need to get the best language for each person using the following query:
SELECT pid, language
from TableB
group by pid
having languageskill = max(languageskill)
Then you join it onto the People table:
SELECT a.name, b.language
from TableA a
LEFT JOIN
(
SELECT pid, language, languageskill
from TableB
group by pid
having languageskill = max(languageskill)
) b
ON a.pid = b.pid
Of course, this method would not give more than one row if the person had a 'tied' best language, and you would lose that data about the 'tied' best language.

Querying a many to many table

I'm trying to query a table and I'm having a hard time figuring out the query.
This are my tables(simplified):
Member
ID | NAME
1 | Frans
2 | Eric
3 | Stephan
4 | Kris
Evenement
ID | NAME
1 | Picknic
2 | Party
3 | Movie
Evenement
ID_EVENEMENT | ID_MEMBER
1 | Kris
1 | Stephan
1 | Eric
2 | Eric
2 | Frans
3 | Frans
3 | Stephan
Alright, the query I want to do is this:
I want to
select
member_evenement.ID_MEMBER and member_evenement.ID_EVENEMENT
from
member_evenement
where
member.ID on member_evenement.ID_MEMBER
where
member_evenement.ID_MEMBER does not exist
for each member_evenement.ID_EVENEMENT
separately.
I'm using sql server 2008 R2
I hope I explained my question well enough.
If these are my base tables
Member
ID | NAME
1 | Frans
2 | Eric
3 | Stephan
4 | Kris
Evenement
ID | NAME
1 | Picknic
2 | Party
3 | Movie
Member_Evenement
ID_EVENEMENT | ID_MEMBER
1 | Kris
1 | Stephan
1 | Eric
2 | Eric
2 | Frans
3 | Frans
3 | Stephan
then the result of my query should look like this:
Evenement
ID_EVENEMENT | ID_MEMBER | MEMBER_NAME | EVENEMENT_NAME
1 | 1 | Frans | Picknic
2 | 3 | Stephan | Party
2 | 4 | Kris | Party
3 | 2 | Eric | Movie
3 | 4 | Kris | Movie
SELECT e.ID AS ID_EVENEMENT, m.ID AS ID_MEMBER,
FROM Evenement e, Member m
EXCEPT
SELECT ID_EVENEMENT, ID_MEMBER
FROM member_evenement;
To return all combinations of member and evenement that are not recorded on the member_evenement table, try the following:
select e.id id_evenement,
m.id id_member,
m.name member_name,
e.name evenement_name
from member m
cross join evenement e
left join member_evenement me
on e.id = me.id_evenement and m.id = me.id_member
where me.id_evenement is null or me.id_member is null
(This assumes that id_member is actually the member's id, and not their name as in the sample data.)
Another possibility, which gives the exact same execution plan as Mark Bannister's and onedaywhen's answer is the following:
SELECT member.id AS memberid, evenement.id AS evenementid
FROM member
CROSS JOIN evenement WHERE
NOT EXISTS(
SELECT NULL AS [Empty]
FROM member_evenement
WHERE member_evenement.memberid = member.id
AND member_evenement.evenementid = evenement.id
)