Combining SQL Tables for one source of information

Combining SQL Tables for one source of information - sql

I'm trying to combine 3 different tables into one single row in a query but having some problems. Currently I have four tables; PaintSched, Painted_Log, Paint_Defect, and Paint_Inspection.
PaintSched - Single entry, when scheduler just schedules some parts to be painted
LOT QTY
1 150
2 100
Painted_Log - The paint department then takes the lot and says how many they were able to paint
LOT(FK) QTYPainted
1 145
2 100
Paint_Defect - Master List of defects for paint inspection after parts have been painted. We hand inspect all of our parts that we paint for quality.
DID Defect
1 Scratch
2 Paint Run
Paint_Inspection - Everytime a defect is found the inspector hits a correlating button and the following gets logged. Lot is FK and DID stands for Defect ID from Paint_Defect. QTY is always 1
Lot(FK) DID(FK) QTY
1 1 1
1 1 1
1 2 1
1 1 1
2 2 1
1 2 1
2 1 1
What I'm trying to get is the following output:
Lot Sched Painted Scratch Paint Run
1 150 145 3 2
2 100 100 1 1
I've tried the following to no avail:
SELECT PaintSched.Scheduled, PaintSched.Lot, PaintSched.qty, PaintSched.Is_Painted, Painted_Log.falloff
FROM PaintSched
INNER JOIN Painted_Log ON PaintSched.Lot = Painted_Log.lot
INNER JOIN MPA_Desc ON MPA_Desc.MPAID = PaintSched.MPAID
inner JOIN (
SELECT lot, sum(Paint_Inspection.qty) as seds
FROM Paint_Inspection
WHERE Paint_Inspection.Status = '1'
) AS seeds ON PaintSched.Lot = Paint_Inspection.Lot

SELECT
PS.Lot,
PS.Qty Sched,
Painted,
Scratch,
PaintRun
FROM PaintSched PS
LEFT JOIN (SELECT
Lot,
SUM(QTYPainted) Painted
FROM Painted_Log GROUP BY Lot) PL
ON PS.Lot = PL.Lot
LEFT JOIN (SELECT
Lot,
SUM(CASE WHEN DID = 1 THEN 1 ELSE 0 END) Scratch,
SUM(CASE WHEN DID = 2 THEN 1 ELSE 0 END) PaintRun
FROM Paint_Inspection GROUP BY Lot) PI
ON PS.Lot = PI.Lot
Try that in SQL Fiddle
The code above uses conditional sum to roll up the defect count by type before joining that to the lot id. If you have more than 2 statuses, you will need to update the above code accordingly.

Two things:
You should have a group by in the subquery
When you alias the subquery, you don't join it properly
See the edits to your query below:
SELECT
PaintSched.Scheduled,
PaintSched.Lot,
PaintSched.qty,
PaintSched.Is_Painted,
Painted_Log.falloff
FROM PaintSched
INNER JOIN Painted_Log ON PaintSched.Lot = Painted_Log.lot
INNER JOIN MPA_Desc ON MPA_Desc.MPAID = PaintSched.MPAID
INNER JOIN (
SELECT lot, sum(Paint_Inspection.qty) as seds
FROM Paint_Inspection
WHERE Paint_Inspection.Status = '1'
GROUP BY Paint_Inspection.lot -- Missing GROUP BY
) AS seeds
ON PaintSched.Lot = seeds.Lot

Related

T-SQL subselect statement is returning all rows instead of limiting to 1 based on subselect

I am trying to return just the first row where the BLOCK_STOP_ORDER = 2. What is wrong with my SQL? Why isn't WHERE SCHEDULE.BLOCK_STOP_ORDER = (SELECT MIN(S1.BLOCK_STOP_ORDER....
working? When I run the subselect on its own it returns the value '2' - doesn't that mean it should then limit the query result to only the row(s) where BLOCK_STOP_ORDER = 2?
SELECT ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID, SCHEDULE.PATTERN_ID, SCHEDULE.BLOCK_STOP_ORDER,
SCHEDULE.SCHEDULED_TIME, GEO_NODE.GEO_NODE_ABBR, TRIP.TRIP_SEQUENCE AS TPST
FROM SCHEDULE
INNER JOIN GEO_NODE ON SCHEDULE.GEO_NODE_ID = GEO_NODE.GEO_NODE_ID
INNER JOIN ROUTE ON SCHEDULE.ROUTE_ID = ROUTE.ROUTE_ID
INNER JOIN TRIP ON SCHEDULE.TRIP_ID = TRIP.TRIP_ID
WHERE (SCHEDULE.CALENDAR_ID = '120221024') AND ROUTE.ROUTE_ABBR = '001'
AND SCHEDULE.ROUTE_DIRECTION_ID = '2' AND SCHEDULE.PATTERN_ID = '270082'
AND TRIP.TRIP_SEQUENCE = '18600'
AND SCHEDULE.BLOCK_STOP_ORDER =
(SELECT MIN(S1.BLOCK_STOP_ORDER)
FROM SCHEDULE S1
WHERE SCHEDULE.CALENDAR_ID = S1.CALENDAR_ID
AND SCHEDULE.ROUTE_ID = S1.ROUTE_ID
AND SCHEDULE.ROUTE_DIRECTION_ID = S1.ROUTE_DIRECTION_ID
AND SCHEDULE.PATTERN_ID = S1.PATTERN_ID
AND SCHEDULE.SCHEDULED_TIME = S1.SCHEDULED_TIME
AND SCHEDULE.GEO_NODE_ID = S1.GEO_NODE_ID
AND SCHEDULE.BLOCK_STOP_ORDER = S1.BLOCK_STOP_ORDER
AND SCHEDULE.TRIP_ID = S1.TRIP_ID
)
GROUP BY ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID,
SCHEDULE.PATTERN_ID, SCHEDULE.SCHEDULED_TIME,
GEO_NODE.GEO_NODE_ABBR, SCHEDULE.BLOCK_STOP_ORDER, TRIP.TRIP_SEQUENCE
ORDER BY ROUTE.ROUTE_ABBR, SCHEDULE.ROUTE_DIRECTION_ID, TRIP.TRIP_SEQUENCE
Results:
ROUTE_ABBR
ROUTE_DIRECTION_ID
PATTERN_ID
BLOCK_STOP_ORDER
SCHEDULED_TIME
GEO_NODE_ABBR
TPST
001
2
270082
2
18600
1251
18600
001
2
270082
3
18600
1346
18600
001
2
270082
5
18720
1123
18600
001
2
270082
6
18720
11372
18600
001
2
270082
4
18720
1570
18600
001
2
270082
8
18780
11373
18600

This is probably better solved with the row_number() windowing function:
SELECT *
FROM (
SELECT DISTINCT r.ROUTE_ABBR, s.ROUTE_DIRECTION_ID, s.PATTERN_ID, s.BLOCK_STOP_ORDER,
s.SCHEDULED_TIME, g.GEO_NODE_ABBR, t.TRIP_SEQUENCE AS TPST,
row_number() over (order by SCHEDULE.BLOCK_STOP_ORDER) rn
FROM SCHEDULE s
INNER JOIN GEO_NODE g ON s.GEO_NODE_ID = g.GEO_NODE_ID
INNER JOIN ROUTE r ON s.ROUTE_ID = r.ROUTE_ID
INNER JOIN TRIP t ON s.TRIP_ID = t.TRIP_ID
WHERE s.CALENDAR_ID = '120221024' AND r.ROUTE_ABBR = '001'
AND s.ROUTE_DIRECTION_ID = '2' AND s.PATTERN_ID = '270082'
AND t.TRIP_SEQUENCE = '18600'
) t1
WHERE rn=1
ORDER BY t1.ROUTE_ABBR, t1.ROUTE_DIRECTION_ID, t1.TRIP_SEQUENCE
The problem with the original is the name SCHEDULE. For the full version of the query, the subquery is matching the name in the nested select with the instance of the table from the outer select. This correlates the results of the inner table with the outer, so only the item from that row of the outer table is eligible.
When you run the inner query by itself, separate from the outer query, there is only the one instance of the table. In that situation the WHERE conditions are matching the table to itself — they are always true — and you just get the smallest value of all the rows: 2.
This is why you should ALWAYS give ALL the tables in your queries an alias, and ONLY reference them by that alias (as I did in my answer). Do this, and the MIN() version can work... but will still be slower and more code than using row_number().
Finally, the use of DISTINCT / GROUP BY with every SELECT column is usually an indicator you don't fully understand the JOIN relationships used in the query, and in at least one case the join conditions are not sufficiently selective. I'd hesitate to move a query like that to production, even if it seems to be working, though I confess most of us have done it at some point anyway.

Adding a new column in this SQL Query?

I am querying data from the WIP and Employee tables:
WIP
Id,Name
Employee
Id,Name,Orgnization
Joining both I can query:
select w.ID,e.Organization,w.ConsultantName,e.OrganizationID, w.ConsultantID
from vwWIPRecords w
inner join vwEmployees e on w.ConsultantID=e.ID;
Resutls:
1 VHAA Web User 1 1
2 VHAA NZ RP 1 3
3 VHAA Ghom Mure 1 2
4 VHAA Ghom Mure 1 2
Requirment:
In query add anther column which will concatenate and group by e.Organization and e.ConsultantName but it will be only for first unique record. For next (where name and organization is same) it will not show anything. This column will show unique Consultants of a company. Please see record number 3 and 4 in second example.
1 VHAAWeb User 1 1
2 VHAANZ RP 1 3
3 VHAAGhom Mure 1 2
4 1 2
Thanks a lot for your help

Here is a start. The final column is a flag indicating the row should be blank. Let me know if this works for you so far and I can help further.
select w.ID,e.Organization, w.ConsultantName,
e.OrganizationID, w.ConsultantID, CASE WHEN D.Dup > 1 AND D.ID <> w.ID THEN 'Y'
ELSE 'N' END As HideMe
from vwWIPRecords w
inner join vwEmployees e on w.ConsultantID=e.ID
inner join
(
select MIN(w.ID) As ID, e.Organization,w.ConsultantName,
e.OrganizationID, w.ConsultantID, COUNT(*) AS Dup
from vwWIPRecords w
inner join vwEmployees e on w.ConsultantID=e.ID
) D
ON D.Organization = w.Organization
AND D.ConsultantName = w.ConsultantName
AND D.OrganizationID = w.OrganizationID
AND D.ConsultantID = w.ConsultantID

Is it possible to get several COUNT in one single SQL request?

I need help to write a simple procedure. Let me explain what I'm trying to do.
I have 3 tables
tJobOffer
tApplication
tApplicationStatus
I would like to create a procedure that return me a list of tJobOffer with the statistics of different status of this tJobOffer. tApplicationStatus is linked to tApplication that is linked to tJobOffer. An application can be CANDIDATE / ACCEPTED / REFUSED / IGNORED / ...
I created this query :
SELECT
[T].[JobOfferId],
[T].[JobOfferTitle],
COUNT([A].[ApplicationId]) AS [CandidateCount]
FROM [tJobOffer] AS [T]
LEFT JOIN [tApplication] AS [A]
INNER JOIN [tApplicationStatus] AS [S]
ON [S].[ApplicationStatusId] = [A].[ApplicationStatusId]
AND [S].[ApplicationStatusTechnicalName] = 'CANDIDATE'
ON [A].[JobOfferId] = [T].[JobOfferId]
GROUP BY
[T].[JobOfferId],
[T].[JobOfferTitle]
ORDER BY [T].[JobOfferTitle] ;
The result is
> 52ED7C67-21E1-49BB-A1F8-0601E6EED1EA Announce a 0
> F26B228D-0C81-4DA8-A287-F8F997CC1F9C Announce b 0
> 9DA60B23-F113-4C7F-9707-2B90C1556D5D Announce c 2
> 258E11A7-79C1-47B6-8C61-413AA54E2360 Announce d 0
> DA582383-5DF4-4E1D-837C-382371BDEF57 Announce e 1
The result is correct. I get my tJoboffers with statistic on status candidate. I have 2 candidates for Announce c and 1 candidate for announce e. If I change my string 'CANDIDATE' to 'ACCEPTED' or 'REFUSED' I can get the statistic on these status. Is it possible to get everything in one request?
Something like
> 52ED7C67-21E1-49BB-A1F8-0601E6EED1EA Announce a 0 0 2
> F26B228D-0C81-4DA8-A287-F8F997CC1F9C Announce b 0 0 1
> 9DA60B23-F113-4C7F-9707-2B90C1556D5D Announce c 2 0 0
> 258E11A7-79C1-47B6-8C61-413AA54E2360 Announce d 0 0 0
> DA582383-5DF4-4E1D-837C-382371BDEF57 Announce e 1 1 0

use SUM and CASE
SELECT
[T].[JobOfferId],
[T].[JobOfferTitle],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'CANDIDATE' THEN 1 ELSE 0 END) AS [CandidateCount],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'ACCEPTED' THEN 1 ELSE 0 END) AS [ACCEPTEDCount],
SUM(CASE WHEN [S].[ApplicationStatusTechnicalName] = 'REFUSED' THEN 1 ELSE 0 END) AS [REFUSEDCount]
FROM [tJobOffer] AS [T]
LEFT JOIN [tApplication] AS [A]
ON [A].[JobOfferId] = [T].[JobOfferId]
LEFT JOIN [tApplicationStatus] AS [S]
ON [S].[ApplicationStatusId] = [A].[ApplicationStatusId]
GROUP BY
[T].[JobOfferId],
[T].[JobOfferTitle]
ORDER BY [T].[JobOfferTitle] ;

Yes, it is. One way to do that is to use the PIVOT function. The other way to do this would be to use LEFT OUTER JOIN each time you need a count of items, something like that:
SELECT a.JobID, COUNT(b.JobID), COUNT(c.JobID)
FROM AllVacancies as a
LEFT OUTER JOIN
(SELECT JobID from AllVacancies WHERE ApplicationStatus = 'CANDIDATE') as b
ON a.JobID = b.JobID
LEFT OUTER JOIN
(SELECT JobID FROM AllVacancies WHERE ApplicationStatus = 'ACCEPTED') as c
ON a.JobID = cJobID
as many times as the categories that you need.

Yes you can carry as many counts as you want
try this
SELECT COUNT(1),COUNT(2) FROM demoTable;
this will give you the count of no of rows in column 1 and column two
usually this will result the same count unless you have any null values allowesd and existing in any of the column.
If any column has any null value then its count may differ , so basically the idea is to apply count on the primary Key column .
Select count(*) from demoTable ;
this line also results in count values but it applies for the complete table , so performance wise applying count on any particular column is better .
again on the accuracy issue this must be applied on the column with primary key or not null constraint .
moving further , you need not to restrain to a single table
SELECT COUNT(1),COUNT(2) FROM ( joins or any selection from any no of table);
just be aware of the no of columns existing in the selection set

How to improve this left join query

I would like to improve this query.
With INNER JOIN it doesn't take time (less than 1 second).
But with LEFT JOIN it take time nearly 1 min.
The result is about 17500 records.
I don't understand why, and i want to improve it.
SELECT TOP (100) PERCENT iti.Id
, iti.TransferDate
, iti.FromSLoc AS FromSLocId
, slf.Name AS FromSLoc
, ct.Id AS CrateTypeId
, ct.Type AS CrateType
, cs.Id AS CrateSizeId
, cs.Size AS CrateSize
, itd.Amount
, iti.SenderRemark
, iti.ToSLoc AS ToSLocId
, slt.Name AS ToSLoc
, iti.StatusId, ts.Name AS Status
, iti.CreatedBy
FROM dbo.tbIntTransferInfo AS iti
INNER JOIN dbo.tbmStorageLocation AS slf
ON slf.Id = iti.FromSLoc
INNER JOIN dbo.tbmStorageLocation AS slt
ON slt.Id = iti.ToSLoc
INNER JOIN dbo.tbmTransferStatus AS ts
ON ts.Id = iti.StatusId
CROSS JOIN dbo.tbmCrateSize AS cs
INNER JOIN dbo.tbmCrateType AS ct
ON ct.Id = cs.CrateTypeId
AND cs.Cancelled = 0
LEFT JOIN dbo.tbIntTransferDetail AS itd
ON iti.Id = itd.IntTransferId
AND itd.CrateSizeId = cs.Id
ORDER BY iti.Id, CrateTypeId, CrateSizeId
In my system i have 6 sizes of crate. And one transaction may transfer up to 6 crate sizes. What i want is records that show transaction with 6 crate sizes. If the transaction didn't transfer some crate size, let it NULL.
The result that i want look like this:
Id, ... , CrateType, CrateSize, Amount
1 ... X Big 100
1 ... X Small 50
1 ... Y Big NULL
1 ... Y Small NULL
1 ... Z Big 10
1 ... Z Small 20
2 ... X Big 30
2 ... X Small 40
2 ... Y Big NULL
2 ... Y Small NULL
2 ... Z Big NULL
2 ... Z Small NULL
Transaction 1 --> Transfer crate type 'X' and 'Z' with 'Big' and 'Small' size, didn't transfer crate type 'Y'.
Transaction 2 --> Transfer crate type 'X' with 'Big' and 'Small' size, didn't transfer crate type 'Y' and 'Z'.
Help me to improve please.

you need an index on dbo.tbIntTransferInfo on FromSLoc, StatusId, and Id
you need an index on dbo.tbmStorageLocation on Id
you need an index on dbo.tbmTransferStatus on Id
you need an index on dbo.tbmCrateSize on CrateTypeId, Cancelled, and Id
you need an index on dbo.tbIntTransferDetail on IntTransferId, CrateSizeId
If any of those indexes can be 'unique', it would be better.
I doubt 'TOP (100) Percent' is helping this query, I'd have to see the plan with and without to know.

how to write this query using joins?

i have a table campaign which has details of campaign mails sent.
campaign_table: campaign_id campaign_name flag
1 test1 1
2 test2 1
3 test3 0
another table campaign activity which has details of campaign activities.
campaign_activity: campaign_id is_clicked is_opened
1 0 1
1 1 0
2 0 1
2 1 0
I want to get all campaigns with flag value 3 and the number of is_clicked columns with value 1 and number of columns with is_opened value 1 in a single query.
ie. campaign_id campaign_name numberofclicks numberofopens
1 test1 1 1
2 test2 1 1
I did this using sub-query with the query:
select c.campaign_id,c.campaign_name,
(SELECT count(campaign_id) from campaign_activity WHERE campaign_id=c.id AND is_clicked=1) as numberofclicks,
(SELECT count(campaign_id) from campaign_activity WHERE campaign_id=c.id AND is_clicked=1) as numberofopens
FROM
campaign c
WHERE c.flag=1
But people say that using sub-queries are not a good coding convention and you have to use join instead of sub-queries. But i don't know how to get the same result using join. I consulted with some of my colleagues and they are saying that its not possible to use join in this situation. Is it possible to get the same result using joins? if yes, please tell me how.

This should do the trick. Substitute INNER JOIN for LEFT OUTER JOIN if you want to include campaigns which have no activity.
SELECT
c.Campaign_ID
, c.Campaign_Name
, SUM(CASE WHEN a.Is_Clicked = 1 THEN 1 ELSE 0 END) AS NumberOfClicks
, SUM(CASE WHEN a.Is_Opened = 1 THEN 1 ELSE 0 END) AS NumberOfOpens
FROM
dbo.Campaign c
INNER JOIN
dbo.Campaign_Activity a
ON a.Campaign_ID = c.Campaign_ID
GROUP BY
c.Campaign_ID
, c.Campaign_Name

Assuming is_clicked and is_opened are only ever 1 or 0, this should work:
select c.campaign_id, c.campaign_name, sum(d.is_clicked), sum(d.is_opened)
from campaign c inner join campaign_activity d
on c.campaign_id = d.campaign_id
where c.flag = 1
group by c.campaign_id, c.campaign_name
No sub-queries.

Hmm. Is what you want as simple as this? I'm not sure I'm reading the question right...
SELECT
campaign_table.campaign_id, SUM(is_clicked), SUM(is_opened)
FROM
campaign_table
INNER JOIN campaign_activity ON campaign_table.campaign_id = campaign_activity.campaign_id
WHERE
campaign_table.flag = 1
GROUP BY
campaign_table.campaign_id
Note that with an INNER JOIN here, you won't see campaigns where there's nothing corresponding in the campaign_activity table. In that circumstance, you should use a LEFT JOIN, and convert NULL to 0 in the SUM, e.g. SUM(IFNULL(is_clicked, 0)).

I suppose this should do it :
select * from campaign_table inner join campaign_activity on campaign_table.id = campaign_activity.id where campaign_table.flag = 3 and campaign_activity.is_clicked = 1 and campaign_activity.is_opened = 1
Attn : this is not tested in a live situation

The SQL in it's simplest form and most robust form is this: (formatted for readability)
SELECT
campaign_table.campaign_ID, campaign_table.campaign_name, Sum(campaign_activity.is_clicked) AS numberofclicks, Sum(campaign_activity.is_open) AS numberofopens
FROM
campaign_table INNER JOIN campaign_activity ON campaign_table.campaign_ID = campaign_activity.campaign_ID
GROUP BY
campaign_table.campaign_ID, campaign_table.campaign_name, campaign_table.flag
HAVING
campaign_table.flag=1;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Combining SQL Tables for one source of information - sql

Related

T-SQL subselect statement is returning all rows instead of limiting to 1 based on subselect

Adding a new column in this SQL Query?

Is it possible to get several COUNT in one single SQL request?

How to improve this left join query

how to write this query using joins?

Categories

Resources