How to prioritise selection of one column value over another on join?

How to prioritise selection of one column value over another on join? - sql

I have two tables, namely offers(containing columns id and user_id) and offer_maps (containing offer_id, user_id). I want to join both of the tables on offer_id, and the final selection should have user_id column populated by prioritising offer_maps' user_id column over offers' user_id column. For example, if offer_maps' user_id column is null and offers' user_id column has a value, the final user_id should have offers' user_id column. But if both are populated, then pick only offer_maps' user_id column value. How can I achieve this through sql query? Here's a sample which I wrote
select concat(offers.user_id, o.user_id) AS user_id
from offers
left join offer_maps o on offers.id::text = o.offer_id
This actually joins both the values of columns, but I need only one in case both exist.

You can use ISNULL in this case:
select ISNULL (offers.user_id, o.user_id) AS user_id
from offers
left join offer_maps o on offers.id::text = o.offer_id

--This assumes O.user_id is not nullable, but OM.user_id is:
SELECT O.id[offer_id], ISNULL(O.user_id, OM.user_id)[user_id]
FROM offers as O
LEFT JOIN offer_maps as OM
ON OM.offer_id = O.id
AND ISNULL(OM.user_id, O.user_id) = O.user_id --Join when OM.user_id is null.

Related

CASE WHEN Statement with two tables with table 2 column referencing has non-unique value

I have a query that I have worked on and only one section has caused me fits. I am trying to create a column within the query based on the values of two tables. I have tried CASE WHEN and it functions, but due to the non-unique values involved, the row count in the query between the original query without increases. For example, this is the case when that I have written:
Select r.Id,
r.RequiredOn AT TIME ZONE 'UTC' AT TIME ZONE 'Central Standard Time'
as RequiredDate,
Concat(vs.Salutation, ' ',vs.FirstName, ' ', vs.LastName) as Name,
oo.Name as RequestingOrganization,
o.Name as Location,
Case
When r.IntendedOutcome = '1' Then 'T'
When r.IntendedOutcome = '2' Then 'R'
End as RequestType,
etr.TypeRequested,
Case
When etr.Identifier is not null then etr.Identifier
When etr.Identifier is null then ' '
End as Identifier,
f.OfferedOn,
f.OfferResponse,
r.DestinationCountryCodes,
o.Id,
CASE
WHEN o.Id = oir.OrganizationId AND oir.OrganizationRoleId =
'de51c814-f86d-49c9-941b-999a98be4894'
THEN 1
ELSE NULL
END AS Bk1
From [Request] r
Left Join Recovered etr
on etr.DistributionRequestId = r.Id
Left Join [Offer] f
on f.Id = etr.Id
Left Join [dbo].Contact vs
on vs.Id = r.SId
Left Join [dbo].Organization o
on o.Id = r.SLocationId or o.Id = r.RLocationId
Left Join [dbo].Organization oo
on oo.Id = r.RequestingOrganizationId
Left Join dbo.OrganizationInRole oir
on oir.OrganizationId = o.Id
Where f.Response = 'Accepted' or f.Response is NULL
The picture shows that the OrganizationId is not unique with this table and therefore when an OrganizationId is matched and the OrganizationRoleId is found, it is bringing all of the OrganizationRoleId's over in the query and adding to it rather than just seeing that it has the particular Role ID and adding to the one row I need it to.
The Organization Role column in non-unique and every organization can multiple roles(sometimes 4-5). I need that if the OrganizationId is A and the matching OrganizationId in Table 2 has the identifier in the OrganizationRole column, then add a 1.
The Organization table (Table 2) has a OrganizationId column and a OrganizationRole column. The OrganizationId is non-unique as the OrgnanizationId could be used in 5 consecutive rows since that organization has 5 Roles.
The results that I am getting are that the query is pulling all of the Roles from Organizations that do match that table. It basically added 33% more rows to the query versus the original.

When you say
... if the OrganizationId is A and the matching OrganizationId in Table 2 has the identifier in the OrganizationRole column, then add a 1.
Are you wanting to create a count of the number if times this condition is true? If so, you need to wrap your CASE in an aggregate function and group on the other rows.
Alternatively, as Stu suggests in the comments, you could pre-aggregate the OrganizationInRole table, filtering for the role you are actually interested in; something like
SELECT r.Id,
...
oir.RoleCount AS Bk1
FROM [Request] r
...
LEFT JOIN [dbo].Organization o
ON o.Id = r.SLocationId or o.Id = r.RLocationId
...
LEFT JOIN (
SELECT OrganizationId, COUNT(*) AS RoleCount
FROM dbo.OrganizationInRole
WHERE OrganizationRoleId = 'de51c814-f86d-49c9-941b-999a98be4894'
GROUP BY OrganizationId) AS oir ON oir.OrganizationId = o.Id
...
You can do this for any other table which has multiple related rows, reducing them to a single row to join to and removing the need for aggregation and grouping in the main query.

SQL: Two queries in a single set of results?

I'm using Postgres 9.6. I have three tables, like this:
Table public.user
id integer
name character varying
email character varying
Table public.project
id integer
user_id integer
Table public.sale
id integer
user_id integer
user_id is a foreign key in both the project and sale tables.
Is there a way I can get a list back of all user IDs with the number of projects and number of sales attached to them, as a single query?
So I'd like final data that looks like this:
user_id,num_projects,num_stories
121,28,1
122,43,6
123,67,2
I know how to do just the number of projects:
SELECT "user".id, COUNT(*) AS num_visualisations
JOIN project ON project.user_id="user".id
GROUP BY "user".id
ORDER BY "user".id DESC
But I don't know how also to get the number of sales too, in a single query.

Use subqueries for the aggregation and a left join:
select u.*, p.num_projects, s.num_sales
from user u left join
(select p.user_id, count(*) as num_projects
from projects p
group by p.user_id
) p
on p.user_id = u.id left join
(select s.user_id, count(*) as num_sales
from sales s
group by s.user_id
) s
on s.user_id = u.id;

Postgres Join and return flag if a row exists

I am very certain that this is possible in SQL but I am not sure how to implement this. I am using PostgreSQL
I have 2 tables
users with columns id, name and created_date
user_docs with columns id, value
I want to write a select query which returns all users table columns, along with another column called has_docs which indicates whether the user has any document rows in the user_docs table.
Can someone help?

You can left join the two tables and check if not null for the value
SELECT u.id,
u.name,
u.created_date,
CASE WHEN ud.value IS NOT NULL
THEN 'Y'
ELSE 'N'
END has_docs
FROM users u
LEFT JOIN user_docs ud
ON u.id = ud.id

SQL LEFT JOIN - Inner select not returning columns

I have two tables called 'Customers' and 'Orders'. Tables column names are as follow:
Customers: id, name, address
Orders: id, person_id, product, price
The desired outcome is to query all customers with one of their latest purchases. I have a lot of duplicates in 'Orders' table whereby two records with same time-stamp due to some bug.
I have written the following code but the issue is that the query does not return table 2(Orders) column values. Can anyone advise what the issue is?
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C
LEFT JOIN
(
SELECT TOP 1 person_id
FROM Orders
WHERE status = 'Pending'
) O ON C.ID = O.person_id
Results: O.item, O.price, O.product values are all null
Edit: Sample Data
ID/ NAME/ ADDRESS/
1/ A/ Ad1/
2/ B/ Ad2/
3/ C/ Ad3/
ID/ Person ID/ PRODUCT PRICE/ Created Date
ID-1234/ 1/ Book/ $5/ 26-2-2017
ID-1235/ 1/ Book/ $5/ 26-2-2017
ID-1236/ 2/ Calendar/ $10/ 4-2-2017
ID-1238/ 1/ Pen/ $2/ 1-1-2016

Assuming that the id column in Orders is a primary key autoincrement, then the following should work:
SELECT c.id,
c.name,
COALESCE(t1.price, 0.0) AS price,
COALESCE(t1.product, 'NA') AS product
FROM Customers c
LEFT JOIN Orders t1
ON c.id = t1.person_id
LEFT JOIN
(
SELECT person_id, MAX(CAST(SUBSTRING(id, 4, LEN(id)) AS INT)) AS max_id
FROM Orders
GROUP BY person_id
) t2
ON t1.person_id = t2.person_id AND
t2.max_id = CAST(SUBSTRING(t1.id, 4, LEN(t1.id)) AS INT)
This answer assumes that taking the greatest order ID per customer will yield the most recent purchase. Ideally you should have a timestamp column which captures when a transaction took place. Note that even in the query above, we still have no way of knowing when the most recent transaction took place.

So where is the timestamp column? It's not mentioned in your table schema. But your description does not mention the status column either, and that is clearly in there.
Is orders.id unique? Is it the key for the Orders table?> If it is, then your schema has no way to identify "duplicate" records. You cannot mean to imply that only one order per customer is allowed, so if there are multiple orders for a single customer, how do we identify the duplicates? By the unmentioned timestamp column?
If there IS a `timestamp column, and that's how you would identify dupes, then use it.
SELECT C.Id,C.Name, O.item, O.price, O.product
FROM Customers C LEFT JOIN Orders o
on o.id = (Select Min(id) from orders
where person_id = c.Id
and timestamp = o.timestamp
and status = 'Pending')

Problem With DISTINCT!

Here is my query:
SELECT
DISTINCT `c`.`user_id`,
`c`.`created_at`,
`c`.`body`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`,
`u`.`username`,
`u`.`avatar_path`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1) ORDER BY `u`.`id` DESC;
It works. The problem though is with the DISTINCT word. As I understand it, it should select only one row per c.user_id.
But what I get is even 4-5 rows with the same c.user_id column. Where is the problem?

actually, DISTINCT does not limit itself to 1 column, basically when you say:
SELECT DISTINCT a, b
What you're saying is, "give me the distinct value of a and b combined" .. just like a multi-column UNIQUE index

distinct will ensure that ALL values in your select clause are unique, not just user_id. If you want to limit the results to individual user_ids, you should group by user_id.
Perhaps what you want is:
SELECT
`c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`,
(SELECT COUNT(*) FROM profiles_comments c2 WHERE c2.user_id = c.user_id AND c2.profile_id = 1) AS `comments_count`
FROM `profiles_comments` AS `c` INNER JOIN `users` AS `u` ON u.id = c.user_id
WHERE (c.profile_id = 1)
GROUP BY `c`.`user_id`,
`u`.`username`,
`u`.`avatar_path`
ORDER BY `u`.`id` DESC;

DISTINCT works at a row level, not just a column level
If you want the DISTiNCT of only one column then you will have to aggregate the rest of the columns returned (MIN, MAX, SUM, AVG, etc)
SELECT DISTINCT (Name), Min (ID)
From MyTable

Distinct will try to return only unique rows, it will not return only 1 row per user id in your example.
http://dev.mysql.com/doc/refman/5.0/en/distinct-optimization.html

You misunderstand. The DISTINCT modifier applies to the entire row — it states that no two identical ROWS will be returned in the result set.
Looking at your SQL, what value of the several available do you expect to see returned in the created_at column (for instance)? It would be impossible to predict the results of the query as written.
Also, you're using profile_comments twice in your SELECT. It appears that you're trying to obtain a count of how many times each user has commented. If so, what you want to do is use an AGGREGATE query, grouped on user_id and including only those columns that uniquely identify a user along with a COUNT of the comments:
SELECT user_id, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id
You can add the join to users to get the user name if you want but, logically, your result set cannot include other columns from profile_comments and still produce only a single row per user_id unless those columns are also aggregated in some way:
SELECT user_id, MIN(created_at) AS Earliest, MAX(created_at) AS Latest, COUNT(*) FROM profile_comments WHERE profile_id = 1 GROUP BY user_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to prioritise selection of one column value over another on join? - sql

You can use ISNULL in this case: select ISNULL (offers.user_id, o.user_id) AS user_id from offers left join offer_maps o on offers.id::text = o.offer_id

--This assumes O.user_id is not nullable, but OM.user_id is: SELECT O.id[offer_id], ISNULL(O.user_id, OM.user_id)[user_id] FROM offers as O LEFT JOIN offer_maps as OM ON OM.offer_id = O.id AND ISNULL(OM.user_id, O.user_id) = O.user_id --Join when OM.user_id is null.

Related

CASE WHEN Statement with two tables with table 2 column referencing has non-unique value

SQL: Two queries in a single set of results?

Postgres Join and return flag if a row exists

SQL LEFT JOIN - Inner select not returning columns

Problem With DISTINCT!

Categories

Resources