Make a 1 to 1 multi-field SQL join where only some of the values match - sql

I am trying to build a table that will be used as a conversion chart. I aim to make a simple join with this conversion table on multiple fields (8 in my case), and get a result. I will try to simplify the examples as much as I can because the original chart is a 40x10 matrix.
Let's say that I have these two (I know they don't make much sense and have bad design but they are just examples):
supply_conversion_chart
---
supply (integer)
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
purchases
---
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
and conversion chart would look something like this:
| supply | customer_id | product_id | size | purchase_type |
|--------|--------------|------------|----------|---------------|
| 100 | 1 | anything | anything | online |
| 101 | 1 | anything | anything | offline |
| 102 | other than 1 | anything | anything | online |
| 103 | 1 | 5 | XXL | online |
The main goal was to get an exact supply value by simply doing a join by doing something like:
SELECT supply
FROM purchases p
JOIN supply_conversion_chart scc ON
p.customer_id = scc.customer_id AND
p.product_id = scc.product_id AND
p.size = scc.size AND
p.purchase_type = scc.purchase_type;
Let's say that these are the records on purchases table:
| customer_id | product_id | size | purchase_type |
|-------------|------------|------|---------------|
| 1 | 3 | M | online |
| 1 | 5 | S | offline |
| 12345 | 4 | XL | online |
| 1 | 5 | XXL | online |
| 4353 | null | M | online |
I would expect first record's supply value to be 101, second record's to be 102, third 102, fourth 103, and fifth to be 102. However, as far as I know, SQL won't be able to do a proper join on all of these records except the fourth one, which is fully matching with supply 103 on supply_conversion_chart table. I don't know if it is possible in the first place to do a join using multiple fields when some of those fields are not fully matching.
My approach is probably faulty and there are better ways to get the results I am trying to achieve but I don't even know where to start. What should I do?
The original chart is much bigger that the provided example, and that I will be doing a join on 8 different fields.

You approach is a lateral join:
select p.*, scc.*
from purchases p left join lateral
(select scc.*
from supply_conversion_chart scc
where (scc.customer_id = p.customer_id or scc.customer_id is null) and
(scc.product_id = p.product_id or scc. product_id is null) and
(scc.size = p.size or scc.size is null) and
(scc.purchase_type = p.purchase_type or scc.purchase_type is null)
order by ( (scc.customer_id = p.customer_id)::int +
(scc.product_id = p.product_id)::int
(scc.size = p.size)::int
(scc.purchase_type = p.purchase_type)::int
) desc
limit 1
) scc;
Note: This represents "everything" as NULL. It doesn't have special logic for "customer other than 1". However, it does show you how to implement basically what you are trying to do.

Related

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

1 to Many Query: Help Filtering Results

Problem: SQL Query that looks at the values in the "Many" relationship, and doesn't return values from the "1" relationship.
Tables Example: (this shows two different tables).
+---------------+----------------------------+-------+
| Unique Number | <-- Table 1 -- Table 2 --> | Roles |
+---------------+----------------------------+-------+
| 1 | | A |
| 2 | | B |
| 3 | | C |
| 4 | | D |
| 5 | | |
| 6 | | |
| 7 | | |
| 8 | | |
| 9 | | |
| 10 | | |
+---------------+----------------------------+-------+
When I run my query, I get multiple, unique numbers that show all of the roles associated to each number like so.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 4 | C |
| 4 | A |
| 5 | B |
| 5 | C |
| 5 | D |
| 6 | D |
| 6 | A |
+---------------+-------+
I would like to be able to run my query and be able to say, "When the role of A is present, don't even show me the unique numbers that have the role of A".
Maybe if SQL could look at the roles and say, WHEN role A comes up, grab unique number and remove it from column 1.
Based on what I would "like" to happen (I put that in quotations as this might not even be possible) the following is what I would expect my query to return.
+---------------+-------+
| Unique Number | Roles |
+---------------+-------+
| 1 | C |
| 1 | D |
| 5 | B |
| 5 | C |
| 5 | D |
+---------------+-------+
UPDATE:
Query Example: I am querying 8 tables, but I condensed it to 4 for simplicity.
SELECT
c.UniqueNumber,
cp.pType,
p.pRole,
a.aRole
FROM c
JOIN cp ON cp.uniqueVal = c.uniqueVal
JOIN p ON p.uniqueVal = cp.uniqueVal
LEFT OUTER JOIN a.uniqueVal = p.uniqueVal
WHERE
--I do some basic filtering to get to the relevant clients data but nothing more than that.
ORDER BY
c.uniqueNumber
Table sizes: these tables can have anywhere from 50,000 rows to 500,000+
Pretending the table name is t and the column names are alpha and numb:
SELECT t.numb, t.alpha
FROM t
LEFT JOIN t AS s ON t.numb = s.numb
AND s.alpha = 'A'
WHERE s.numb IS NULL;
You can also do a subselect:
SELECT numb, alpha
FROM t
WHERE numb NOT IN (SELECT numb FROM t WHERE alpha = 'A');
Or one of the following if the subselect is materializing more than once (pick the one that is faster, ie, the one with the smaller subtable size):
SELECT t.numb, t.alpha
FROM t
JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') = 0) AS s USING (numb);
SELECT t.numb, t.alpha
FROM t
LEFT JOIN (SELECT numb FROM t GROUP BY numb HAVING SUM(alpha = 'A') > 0) AS s USING (numb)
WHERE s.numb IS NULL;
But the first one is probably faster and better[1]. Any of these methods can be folded into a larger query with multiple additional tables being joined in.
[1] Straight joins tend to be easier to read and faster to execute than queries involving subselects and the common exceptions are exceptionally rare for self-referential joins as they require a large mismatch in the size of the tables. You might hit those exceptions though, if the number of rows that reference the 'A' alpha value is exceptionally small and it is indexed properly.
There are many ways to do it, and the trade-offs depend on factors such as the size of the tables involved and what indexes are available. On general principles, my first instinct is to avoid a correlated subquery such as another, now-deleted answer proposed, but if the relationship table is small then it probably doesn't matter.
This version instead uses an uncorrelated subquery in the where clause, in conjunction with the not in operator:
select num, role
from one_to_many
where num not in (select otm2.num from one_to_many otm2 where otm2.role = 'A')
That form might be particularly effective if there are many rows in one_to_many, but only a small proportion have role A. Of course you can add an order by clause if the order in which result rows are returned is important.
There are also alternatives involving joining inline views or CTEs, and some of those might have advantages under particular circumstances.

How to Count the same field with different criteria on the same Query

I have a database like this
| Contact | Incident | OpenTime | Country | Product |
| C1 | | 1/1/2014 | MX | Office |
| C2 | I1 | 2/2/2014 | BR | SAP |
| C3 | | 3/2/2014 | US | SAP |
| C4 | I2 | 3/3/2014 | US | SAP |
| C5 | I3 | 3/4/2014 | US | Office |
| C6 | | 3/5/2014 | TW | SAP |
I want to run a query with criteria on country and and open time, and I want to receive back something like this:
| Product | Contacts with | Incidents |
| | no Incidents | |
| Office | 1 | 1 |
| SAP | 2 | 2 |
I can easily get one part to work with a query like
SELECT Service, count(
FROM database
WHERE criterias AND Incident is Null //(or Not Null) depending on the row
GROUP BY Product
What I am struggling to do is counting Incident is Null, and Incident is not Null on the same table as a result of the same query as in the example above.
I have tried the following
SELECT Service AS Service,
(SELECT count Contacts FROM Database Where Incident Is Null) as Contact,
(SELECT count Contacts FROM Database Where Incident Is not Null) as Incident
FROM database
WHERE criterias AND Incident is Null //(or Not Null) depending on the row
GROUP BY Product
The issue I have with the above sentence is that whatever criteria I use on the "main" select are ignored by the nested Selects.
I have tried using UNION ALL as well, but did not managed to make it work.
Ultimately I resolved it with this approach: I counted the total contacts per product, counted the numbers of incidents and added a calculated field with the result
SELECT Service, COUNT (Contact) AS Total, COUNT (Incident) as Incidents,
(Total - Incident) as Only Contact
From Database
Where <criterias>
GROUP BY Service
Although I make it work, I am still sure that there is a more elegant approach for it.
How can I retrieve the different counting on the same column with different count criteria in one query?
Just use conditional aggregation:
SELECT Product,
SUM(IIF(incident is not null, 1, 1)) as incidents,
SUM(IIF(incident is null, 1, 1)) as noincidents
FROM database
WHERE criterias
GROUP BY Product;
Possibly a very MS Access solution would suit:
TRANSFORM Count(tmp.Contact) AS CountOfContact
SELECT tmp.Product
FROM tmp
GROUP BY tmp.Product
PIVOT IIf(Trim([Incident] & "")="","No Incident","Incident");
This IIf(Trim([Incident] & "")="" covers all possibilities of Null string, Null and space filled.
tmp is the name of the table.

SQL server - advanced grouping

I have at table containing procurement contracts that looks like this:
+------+-----------+------------+---------+------------+-----------+
| type | text | date | company | supplierID | name |
+ -----+-----------+------------+---------+------------+-----------+
| 0 | None | 2004-03-29 | 310 | 227234 | HRC INFRA |
| 0 | None | 2007-09-30 | 310 | 227234 | HRC INFRA |
| 0 | None | 2010-11-29 | 310 | 227234 | HRC INFRA |
| 2 | Strategic | 2011-01-01 | 310 | 227234 | HRC INFRA |
| 0 | None | 2012-04-10 | 310 | 227234 | HRC INFRA |
+------+-----------+------------+---------+------------+-----------+
In this example the first three rows the contract is the same. So I only want the first one.
The row with type = 2 is a change in procurement contract with the given supplier. I want to select that row as well.
On the last row the contract changes back to 0, so I want to select that row as well.
Basically I want to select the first row and the rows where the contract type changes. So the result should look like this:
+------+-----------+------------+---------+------------+-----------+
| type | text | date | company | supplierID | name |
+ -----+-----------+------------+---------+------------+-----------+
| 0 | None | 2004-03-29 | 310 | 227234 | HRC INFRA |
| 2 | Strategic | 2011-01-01 | 310 | 227234 | HRC INFRA |
| 0 | None | 2012-04-10 | 310 | 227234 | HRC INFRA |
+------+-----------+------------+---------+------------+-----------+
Any suggestions to how I can accomplish this?
;WITH cte AS
(
SELECT ROW_NUMBER() OVER (ORDER BY date) AS Id,
type, text, date, company, supplierId, name
FROM your_table
)
SELECT c1.type, c1.text, c1.date, c1.company,
c1.supplierId, c1.name
FROM cte c1 LEFT JOIN cte c2 ON c1.id = c2.id + 1
WHERE c2.text IS NULL OR c1.text != c2.text
Demo on SQLFiddle
I don't have SQL server in front of me to test it out so I'm not going to attempt the actual solution for it right now, but fyi there are few things you need:
1) A way to make sure the records are ordered properly. I don't see any kind of an id here which means you have no guarantee that they will appear in that order. I assume there is one so just make sure you order by it
2) You need to do an outer join on the table to itself on whatever the index is, but instead of "table1.index = table2.index" it will look like "table1.index = table2.index + 1". If your indexes aren't sequential then it will make joining them this way more complex than that though.
3) In the where clause you'll specify something like
where table1.type <> table2.type
That will get you most the way there. That won't pick up the very first record though since there is no record before the first record to compare to so you'll need an OR addition to compensate for that. And I'm assuming that type has no NULL values.
Sorry I couldn't be more help with an actual implementation but maybe someone else will take care of that shortly.
might be what you want. Presumingly you dont have type < 0.
SELECT *
FROM [TABLE] as ot where ot.type <>
(select top 1 coalesce(it.type, -1) from [TABLE] as it where it.date < ot.date order by it.date desc)
Also, take not of brandon note to make shure tables are ordered, due i dont see PK.

PostgreSQL Inner Join on the same table + second table?

If this is a stupid question, forgive me, I'm not very familiar with PostgreSQL.
I've collected inventory data from used car dealerships in my area and stored it in a postgreSQL table. I've got a second table with particular details regarding certain makes and models. For example:
The dealership table is structured like so:
-----------------------------------------
| Dealership | Make | Model | Year | ID |
----------------------------------------|
| A | Ford | F250 | 2003 | 1 |
| A | Chevy| Cobalt| 2005 | 2 |
| B | Ford | F250 | 2003 | 1 |
| B | Dodge| Chrgr | 2012 | 3 |
-----------------------------------------
The details table is structured like so:
-----------------------------------------
| ID | DetailA| DetailB| DetailC|
-----------------------------------------
| 1 | data | data | data |
| 2 | data | data | data |
| 3 | data | data | data |
| 4 | data | data | data |
-----------------------------------------
My goal is to retrieve vehicle matches from multiple dealerships and display the appropriate details. In the above example, I would like to see:
-----------------------------------------------------
| Make | Model | Year | DetailA | DetailB | DetailC |
-----------------------------------------------------
| Ford | F250 | 2003 | data | data | data |
-----------------------------------------------------
With this result, I will know that both A and B havea 2003 Ford F250 for sale, and can view the related details of the vehicle.
I've tried many different queries, but most are variations on something like this:
SELECT DISTINCT
dealership_table.make,
dealership_table.model,
dealership_table.year
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM
dealership_table
INNER JOIN
details_table
ON
dealership_table.id = details_table.id
WHERE
dealership_table.dealership = 'A'
OR
dealership_table.dealership = 'B'
However this returns all of the distinct matches from the table where dealership is either A or B. I've tried multiple inner-joins, but I an error complaining details_table is specified multiple times.
If I'm doing something really silly, I apologize. Like I said before, I'm pretty much an SQL noob.
What am I doing wrong? How should I go about retrieving the desired results? Any suggestions, solutions, or advice is greatly appreciated!
You can write:
SELECT dealership_table1.make,
dealership_table1.model,
dealership_table1.year,
details_table.detaila,
details_table.detailb,
details_table.detailc
FROM dealership_table dealership_table1
JOIN dealership_table dealership_table2
ON dealership_table1.make = dealership_table2.make
AND dealership_table1.model = dealership_table2.model
AND dealership_table1.year = dealership_table2.year
JOIN details_table
ON dealership_table.id = details_table.id
WHERE dealership_table1.dealership = 'A'
AND dealership_table1.dealership = 'B'
;
(Note that the FROM dealership_table dealership_table1 and JOIN dealership_table dealership_table2 set up distinct "aliases", so you can use the same table multiple different times in the same query without getting name-conflicts.)
I may be misunderstanding your table layout, but I think you should consider changing to a different structure. Here's what I would propose:
Vehicle:
----------------------------
| ID | Make | Model | Year |
----------------------------
| 1 | Ford | F250 | 2003 |
| 2 | Chevy| Cobalt| 2005 |
| 3 | Dodge| Chrgr | 2012 |
----------------------------
Dealership:
----------------------------
| Dealership | ID | Detail |
----------------------------
| A | 1 | data |
| A | 2 | data |
| B | 1 | data |
| B | 3 | data |
----------------------------
This way you're not storing vehicle information (make/model/year) in more than one place.
Here's how you would write your desired query given the above schema:
SELECT Make, Model, Year, A.Detail, B.Detail, C.Detail
FROM Vehicle V
LEFT OUTER JOIN Dealership A on A.Dealership = 'A' and A.id = V.id
LEFT OUTER JOIN Dealership B on B.Dealership = 'B' and B.id = V.id
LEFT OUTER JOIN Dealership C on C.Dealership = 'C' and C.id = V.id