JOIN on varchar column with subquery

JOIN on varchar column with subquery - sql

i know that this is not the recommended way to join tables. But it's only relevant for one rarely used report for one person and i don't want to change my datamodel for it.
I have two tables Model and SparePart that are not directly linked with each other via foreign keys.
Model SparePart
idModel idSparePart
ModelName SparePartDescription
Price
In special cases a model is also a sparepart(exchange unit). Then i need the price for this model from the SparePart table via its SparePartDescription column.
For example:
ModelName = C510
SparePartDescription = C510/Exchange Unit/Exch unit/Red
So i try to join both tables to get the price with following SQL:
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
FROM modModel AS m INNER JOIN
tabSparePart AS sp ON m.ModelName =
(SELECT TOP 1 LEFT(sp.SparePartDescription, CHARINDEX('/', sp.SparePartDescription) - 1)order by price desc)
WHERE (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
ORDER BY m.ModelName, sp.SparePartDescription
But i get multiple records for one model:
idModel ModelName Price SparePartDescription
569 C510 70,75 C510/Exchange Unit/Exch unit/Red
569 C510 70,75 C510/Exchange Unit/Latin/Generic/Black
569 C510 70,75 C510/Exchange Unit/Latin/Generic/Silver
433 C702 80,72 C702/Exchange Unit/Latin/Generic/Black
433 C702 NULL C702/Exchange Unit/Latin/Generic/Cyan
433 C702 80,72 C702/Exchange Unit/Orange Global/Black
I only want to select one record if there are multiple spareparts with matching SparePartDescription.

Sql Server 2005 and better introduced the 'APPLY' operator which allows you to join against a subquery... Try this.
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
FROM modModel AS m
CROSS APPLY
(
SELECT TOP 1 * FROM tabSparePart
WHERE m.ModelName =
LEFT(SparePartDescription, LEN(ModelName))
ORDER BY Price DESC
) sp
WHERE (sp.fiSparePartCategory = 6)
ORDER BY m.ModelName, sp.SparePartDescription
It inner joins the 'modModel' table with the subquery 'only the top one matching tabSparePart'.
You can also use OUTER APPLY which will emulate a LEFT JOIN on the subquery. Documentation is here.

Try the ROW_NUMBER function. It guarantees that you'll only get one of each item as defined in the PARTITION BY clause.
SELECT a.idModel, a.ModelName, Price, SparePartDescription
FROM modModel a
LEFT JOIN
(
SELECT m.idModel, m.ModelName, sp.Price, sp.SparePartDescription
, ROW_NUMBER() OVER (PARTITION BY m.idModel, m.ModelName ORDER BY sp.price DESC) AS r
FROM modModel AS m
INNER JOIN tabSparePart AS sp
ON m.ModelName = LEFT(sp.SparePartDescription, CHARINDEX('/', sp.SparePartDescription) - 1)
WHERE (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
) b
ON a.idModel = b.idModel
AND b.r = 1
ORDER BY ModelName, SparePartDescription

First, your join condition can be simplified some, and then you can use ROW_NUMBER() to specify some kind of order to your results, allowing the first result (per model) to be selected. I also changed it to a LEFT JOIN in case there was no match. If that is not required, it's simple to change back to an INNER JOIN :)
WITH
ranked_results AS
(
SELECT
m.idModel, m.ModelName, sp.Price, sp.SparePartDescription,
ROW_NUMBER() OVER (PARTITION BY m.idModel ORDER BY sp.Price DESC) AS rank
FROM
modModel AS m
LEFT JOIN
tabSparePart AS sp
ON LEFT(sp.SparePartDescription, LEN(m.ModelName)) = m.ModelName
AND (CHARINDEX('/', sp.SparePartDescription) > 0)
AND (sp.fiSparePartCategory = 6)
)
SELECT
*
FROM
ranked_results
WHERE
rank = 1
ORDER BY
ModelName,
SparePartDescription
#MattMurrell's answer just appeared while I was typing this. One difference here is that the selection criteria is being applied to the whole set, rather than separately in the CROSS APPLY. This may have a performance benefit, you'd have to try and see. CROSS APPLY with inline functions is normally more performant that correlated sub-queries, so I can't predict which is faster.

Related

How to group a sum query without using group by

I have a large query/table with many columns, and the primary key in the query/table is the id_shift. Multiple orders belong to one shift, and for each shift I want to display the value of the order with the largest ldm value (length of shipment).
I do not want to use group by because then I would need to specify all the columns in the query (which are about 50-100 columns), and it is important that the query is fast.
I have created this query (and I want to add it to the large query):
SELECT
(MAX(ldm.uvalue) OVER ()) AS [max ldm],
plannedshift.id_shift
FROM plannedshift
LEFT JOIN action ac ON plannedshift.id_shift=ac.id_shift AND ac.name = 'pickup'
JOIN [order] ord ON ac.id_order = ord.id_order AND ac.name = 'pickup'
LEFT JOIN orderamount ldm ON ord.id_order = ldm.id_order AND ldm.id_unit = 5
But this gives me multiple rows for the same id_shift, because a row is created for each order. For example:
id_shift
max ldm
62822
12.80
62822
12.80
62822
12.80
Is there something I can do to get only one row for each id_shift, with the max ldm value from all the orders that belong to that shift?

You can use row_number():
SELECT ps.*
FROM (SELECT . . ., -- whatever columns you want
ROW_NUMBER() OVER (PARTITION BY ps.id_shift ORDER BY ldm.uvalue DESC) as seqnum
FROM plannedshift ps LEFT JOIN
action ac
ON ps.id_shift = ac.id_shift AND
ac.name = 'pickup' LEFT JOIN
[order] ord
ON ac.id_order = ord.id_order AND
ac.name = 'pickup' LEFT JOIN
orderamount ldm
ON ord.id_order = ldm.id_order AND
ldm.id_unit = 5
) ps
WHERE seqnum = 1;
Note that I changed all the JOINs to LEFT JOINs to ensure that all shifts from the first table are in the result set.

You only need to include the columns you are actually displaying in a group by.
If you're query above is the complete query then this is not difficult at all:
group by ps.id_shift
Otherwise you could try a distinct after the select:
SELECT distinct
(MAX(ldm.uvalue) OVER ()) AS [max ldm],
plannedshift.id_shift
FROM plannedshift...

You can use a TOP (1) query that you add with OUTER APPLY to your main query in order to get the top row per shift:
select ...
from ...
outer apply
(
SELECT TOP(1) *
FROM action ac
JOIN orderamount ldm ON ldm.id_order = ac.id_order AND ldm.id_unit = 5
WHERE ac.id_shift = plannedshift.id_shift AND ac.name = 'pickup'
ORDER BY ldm.uvalue DESC
) top_order

Select min value from Tables using (sub Select )

I am trying to select a specific row from the tables. All are good, but the last select min is not working. So I need the min Leg Number after I make the whole selection.
SELECT cashier.* , legs.* ,cashier.id as cashier,
cashier.cashierNumber as cashierNum ,cashier.fullName as cashier
FROM myTable
INNER JOIN legs ON main.main= legs.legMain
INNER JOIN cashier ON legs.cashier = cashier.id
WHERE legs.RRZZFrom ='RR'
AND legs.LegNumber = (SELECT Min(legs.LegNumber) FROM legs)

where legs.RRZZFrom ='RR' and legs.LegNumber in (select min(legs.LegNumber) from legs)

you might want to try using cte.
with cte as (
select min(LegNumber) as minLegNum from legs
)SELECT cashier.* , legs.* ,cashier.id as cashier,
cashier.cashierNumber as cashierNum ,cashier.fullName as cashier
FROM myTable
INNER JOIN legs ON main.main= legs.legMain
INNER JOIN cashier ON legs.cashier = cashier.id
INNER JOIN cte c on c.minLegNum = legs.LegNumber
WHERE legs.RRZZFrom ='RR'

Are you looking for the MIN leg number only from the selected data? In that case, something like this:
WITH Details
AS
(
SELECT cashier.* , legs.* ,cashier.id as cashier,
cashier.cashierNumber as cashierNum ,cashier.fullName as cashier
FROM myTable
INNER JOIN legs ON main.main= legs.legMain
INNER JOIN cashier ON legs.cashier = cashier.id
WHERE legs.RRZZFrom ='RR'
)
SELECT d.*
FROM Details AS d
WHERE d.LegNumber = (SELECT MIN(d2.LegNumber) FROM Details AS d2);
Not sure if you'd need to alias any other columns there, as I don't know the table layout.

Try this:
WITH DataSource AS
(
SELECT cashier.*
,legs.*
,cashier.id as cashier
,cashier.cashierNumber as cashierNum
,cashier.fullName as cashier
,MIN(legs.LegNumber) OVER() AS [MinLegNUmber]
FROM myTable
INNER JOIN legs
ON main.main= legs.legMain
INNER JOIN cashier
ON legs.cashier = cashier.id
WHERE legs.RRZZFrom ='RR'
)
SELECT *
FROM DataSource
WHERE LegNumber = [MinLegNUmber];
The idea is to use a OVER() clause to calculate the minimum value after the selection for each row:
MIN(legs.LegNumber) OVER()
Then in outer query to return only the rows which are matching this value.
The OVER clause is particular powerful syntax which allows to perform operations (ranking, aggregations) over given set of values. The OVER() syntax means the whole entity.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.

SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.

Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by

Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE

Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

Join two tables, only use latest value of right table

I am trying to join 2 tables, but only join with the latest record in a group of records.
The left table:
Part
Part.PartNum
The right table:
Material
Material.Partnum
Material.Formula
Material.RevisionNum
The revision number starts at "A" and increases.
I would like to join the 2 tables by PartNum, but only join with the latest record from right table. I have seen other examples on SO but an having a hard time putting it all together.
Edit:
I found out the first revision number is "New", then it increments A,B,... It will never be more than one or two revisions, so I am not worried about going over the sequence. But how do I choose the latest one with 'New' being the first revision number?

If SQL Server 2005+
;WITH m AS
(
SELECT Partnum, Formula, RevisionNum,
rn = ROW_NUMBER() OVER (PARTITION BY PartNum ORDER BY
CASE WHEN RevisionNum ='New' THEN 1 ELSE 2 END)
FROM dbo.Material
)
SELECT p.PartNum, m.Formula, m.RevisionNum
FROM dbo.Parts AS p
INNER JOIN m ON p.PartNum = m.PartNum
WHERE m.rn = 1;
Though curious, what do you do when there are more than 26 revisions (e.g. what comes after Z)?

A general SQL statement that would run this would be:
select P.PartNum, M.Formula, M.RevisionNum
from Parts P
join Material M on P.PartNum = M.PartNum
where M.RevisionNum = (select max(M2.RevisionNum) from Material M2
where M2.PartNum = P.PartNum);
Repeating the above caveats about what happens after Revision #26. The max(RevisionNum) may break depending upon what happens after #26.
EDIT:
If RevisionNum sequence always starts w/ NEW and then continues, A, B, C, etc., then the max() needs to be replaced w/ something more complicated (and messy):
select P.PartNum, M.RevisionNum
from Parts P
join Material M on P.PartNum = M.PartNum
where (
(select count(*) from Material M2
where M2.PartNum = P.PartNum) > 1
and M.RevisionNum = (select max(M3.RevisionNum) from Material M3
where M3.PartNum = P.PartNum and M3.RevisionNum <> 'NEW')
)
or (
(select count(*) from Material M4
where M4.PartNum = P.PartNum) = 1
and M.RevisionNum = 'NEW'
)
There must be a better way to do this. This works though -- will have to think about a faster solution.
SQL Fiddle: http://sqlfiddle.com/#!3/70c19/3

SQL Server 2005+ as well:
Updated to handle OPs changing requirements
SELECT P.PartNum,
M.Formula,
M.RevisionNum
FROM Part AS P
CROSS APPLY (
SELECT TOP 1 *
FROM Material AS M
WHERE M.Partnum = P.PartNum
ORDER BY CASE WHEN RevisionNum ='New' THEN 2 ELSE 1 END,
M.RevisionNum DESC
) AS M

MSSQL Paging is returning random rows when not supposed too

I'm trying to do some basic paging in MSSQL. The problem I'm having is that I'm sorting the paging on a row that (potentially) has similar values, and the ORDER BY clause is returning "random" results, which doesn't work well.
So for example.
If I have three rows, and I'm sorting them by a "rating", and all of the ratings are = '5' - the rows will seemingly "randomly" order themselves. How do I make it so the rows are showing up in the same order everytime?
I tried ordering it by a datetime that the field was last edited, but the "rating" is sorted in reverse, and again, does not work how i expect it to work.
Here is the SQL I'm using thus far. I know it's sort of confusing without the data so.. any help would be greatful.
SELECT * FROM
(
SELECT
CAST(grg.defaultthumbid AS VARCHAR) + '_' +
CAST(grg.garageid AS VARCHAR) AS imagename,
(
SELECT COUNT(imageid)
FROM dbo.images im (nolock)
WHERE im.garageid = grg.garageid
) AS piccount,
(
SELECT COUNT(commentid)
FROM dbo.comments cmt (nolock)
WHERE cmt.garageid = grg.garageid
) AS commentcount,
grg.GarageID, mk.make, mdl.model, grg.year,
typ.type, usr.username, grg.content,
grg.rating, grg.DateEdit as DateEdit,
ROW_NUMBER() OVER (ORDER BY Rating DESC) As RowIndex
FROM
dbo.garage grg (nolock)
LEFT JOIN dbo.users (nolock) AS usr ON (grg.userid = usr.userid)
LEFT JOIN dbo.make (nolock) AS mk ON (grg.makeid = mk.makeid)
LEFT JOIN dbo.type (nolock) AS typ ON (typ.typeid = mk.typeid)
LEFT JOIN dbo.model (nolock) AS mdl ON (grg.modelid = mdl.modelid)
WHERE
typ.type = 'Automobile' AND
grg.defaultthumbid != 0 AND
usr.username IS NOT NULL
) As QueryResults
WHERE
RowIndex BETWEEN (2 - 1) * 25 + 2 AND 2 * 25
ORDER BY
DateEdit DESC

Try ordering by both, e.g.:
ORDER BY Rating DESC, DateEdit ASC

The query first numbers the rows by [Rating], and then re-sorts the results by [DateEdit]. Possibly not what you intended. Ordering by [RowIndex] ASC should sort it out.
ROW_NUMBER() OVER (ORDER BY [Rating] DESC) As [RowIndex]
...
ORDER BY [RowIndex]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

JOIN on varchar column with subquery - sql

Related

How to group a sum query without using group by

Select min value from Tables using (sub Select )

Limit join to one row

Join two tables, only use latest value of right table

MSSQL Paging is returning random rows when not supposed too

Categories

Resources