My inner join resulted in the repetition of many rows - sql

I wrote the following query:
SELECT
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
FROM Cands16 AS CAN
JOIN MercerRobert_Indivs AS MER
ON CAN.CID = MER.RecipID
My goal was to return every row from the Cands16 in which CID = RecipID. This was the result:
While the MER table does have rows with multiple incidences of the same value for RecipID, every incidence of CID in the Cands16 table is unique. I do not want these duplicate rows resulting from my query. So what should I do? I am using SQL Server 2016 Management Studio.

Seeing you do not use any columns from MER, it seems you just want to know whether there exists such id in MER. So the easiest would be to remove the join:
select
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
from Cands16 CAN
where exists (
select *
from MercerRobert_Indivs MER
where CAN.CID = MER.RecipID
) ;

Your example SQL uses a simple join.
SQL Server (Transact-SQL) JOINS are used to retrieve data from multiple tables. A SQL Server JOIN is performed whenever two or more tables are joined in a SQL statement.
There are 4 different types of SQL Server joins:
SQL Server INNER JOIN (or sometimes called simple join)
SQL Server LEFT OUTER JOIN (or sometimes called LEFT JOIN)
SQL Server RIGHT OUTER JOIN (or sometimes called RIGHT JOIN)
SQL Server FULL OUTER JOIN (or sometimes called FULL JOIN)
See this link for more details and explanations of the different joins you can use.

Try:
SELECT
CAN.Cycle
, CAN.FECCandID
, CAN.CID
, CAN.FirstLastP
, CAN.Party
, CAN.DistIDRunFor
, CAN.DistIDCurr
, CAN.CurrCand
, CAN.CycleCand
, CAN.CRPICO
, CAN.RecipCode
, CAN.NoPacs
FROM Cands16 AS CAN
-- the same RecipID can occur several times in MercerRobert_Indivs
-- so make sure this is not the case before joining ...
JOIN (SELECT DISTINCT RecipID FROM MercerRobert_Indivs) AS MER
ON CAN.CID = MER.RecipID

Related

Error in nested SQL statement

Can someone help me fix this SQL statement? I have 2 tables... trying to get a list of all records in table 1 (c) along with a count (if any) of matching records in table 2 (cp_docs).
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
OUTTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Thanks,
Tracy
Hard to say without the error message but your outer join has a couple issues.
OUTER is incorrectly written at OUTTER
Your OUTER keyword needs to be prefixed with LEFT OR RIGHT. With the logic in your query you want likely want LEFT
Fixed SQL:
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
LEFT OUTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Now in your query, you could get null values in the cpd column if there were no values in the cal_prodcedure_doc table. If you look at Max's answer, you would get 0's instead. If you wanted to use your current approach but have the zero's display you would need to wrap cp_docs.cpd in a coalesce function
coalesce(cp_docs.cpd, 0)
In the end I think Max's answer is easier to read and probably the way I would write this query as I think it's easier to read. If the tables are huge you may want to check how each performs to see one is better than the other.
You can just add a subquery to the SELECT clause. It's cleaner than joining a temp table. If you try to read someone else's query to figure out how a calculation is done, you'll start with the SELECT statement. If the select statement points you to a table alias (e.g. cp_docs), you need to find the table in the FROM clause... etc. The execution plans are almost identical; the proposed SELECT clause subquery actually eliminates one innocuous Compute Scaler step.
SELECT c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
(SELECT COUNT(*) FROM cal_procedure_docs where cal_procedure = c.cal_procedure) AS cpd
FROM cal_procedure c
Perhaps you want outer apply :
SELECT TOP 100 c.cal_procedure, c.description, c.active, c.create_user,
c.create_date, c.edit_user, c.edit_date, c.id, cp_docs.cpd
FROM cal_procedure c OUTER APPLY
(select count(id) as cpd
from cal_procedure_doc
where cal_procedure = c.cal_procedure
) cp_docs
ORDER BY ? ? ? ;

Remove duplicate SQL Join results

Okay, I and totally new to SQL so bear with me. I created a statement which I have the results I wanted but wanted get rid of duplicate results. What's a easy solution to this? Here is my statement
SELECT
li.location,
li.logistics_unit,
li.item,
li.company,
li.item_desc,
li.on_hand_qty,
li.in_transit_qty,
li.allocated_qty,
li.lot,
i.item_category3,
location.locating_zone,
location.location_subclass,
i.item_category4
FROM
location_inventory li
INNER JOIN item i ON li.item = i.item
INNER JOIN location l ON l.location = li.location
WHERE
i.item_category3 = 'AS' AND
li.warehouse = 'river' AND
li.location NOT LIKE 'd%' AND
li.location NOT LIKE 'stg%'
ORDER BY
li.item asc
If you are confident in your JOIN then DISTINCT should do the trick like so:
select DISTINCT
location_inventory.location ,
location_inventory.logistics_unit ,
location_inventory.item ,
location_inventory.company ,
location_inventory.item_desc ,
location_inventory.on_hand_qty ,
location_inventory.in_transit_qty ,
location_inventory.allocated_qty ,
location_inventory.lot ,
item.item_category3 ,
location.locating_zone ,
location.location_subclass ,
item.item_category4
from location_inventory
INNER JOIN item
on location_inventory.item=item.item
INNER JOIN location
on location_inventory.location=location.location
where item.item_category3 = 'AS' and
location_inventory.warehouse = 'river' and
location_inventory.location not like 'd%' and
location_inventory.location not like 'stg%'
order by location_inventory.item asc
Assuming that i.item and l.location are primary or unique keys, any duplicates you see are being caused by duplicate items in your location_inventory table. That might or might not be what you want.
SELECT DISTINCT will only eliminate true duplicates (i.e. those where all of the selected columns are duplicate). If that's what you want, use it. Otherwise, you might need to make an inner select that uses SELECT DISTINCT to identify the columns in which you don't want duplicates, and join the results of the inner select back to the tables to pull out all the other data.

How to rewrite sql with multi-subqueries in Hive

Here is a SQL with multi-subqueries for GreenPlum. Unfortunately I have to migrate SQL to Hive, I don't know how to deal with these subqueries in WHERE clause.
select
t.ckid , t.prod_id , t.supp_num , t.wljhdh ,
sum(t.sssl) as zmkc , max(t.dj) as dj
from
%s t
where
exists (select 1
from dw_stage.wms_c_wlsjd w
where w.lydjh = t.wljhdh and w.lzztflag='上架确认'
and (ckid , kqid) in (select ckid , kqid
from dw_stage.jcxx_kqxx
where kqytsxid in ('2','3'))
)
and (t.ckid,t.supp_num) in (select cgck_stock_id,vndr_code from madfrog.cfg_vendor_dist where status=1 and send_method=2 and upper(purch_warehouse_type)='F')
and supp_num not in (select distinct vndr_code as supp_no from madfrog.cfg_vendor_dist where status=1 and send_method in (4,5))
group by t.ckid , t.prod_id , t.supp_num , t.wljhdh
Thank you for your tips.
You will need to convert the subquery and the in clause to
Left Outer Join
Focusing on the structure:
select <cols list>
from <tabname> t
left outer join dw_stage.wms_c_wlsjd w
on w.lydjh = t.wljhdh
where w.lzztflag='上架确认'
The
((t.ckid,t.supp_num) in (select .. )
and
supp_num not in (select distinct vndr_code as supp_no
will also need to be rewritten as an outer joins.
You can find more information about using outer join's in my answer to this other question here: Hive command to execute NOT IN clause

Combining sql queries

I am new to T SQL, I have two query's that I need to combine them according to common column value.
Both query are working fine individually.
First query is
SELECT t_senderTable.nameFull AS "senderName", t_recieverTable.recieverName AS "recieverName"
FROM ((dbo.t_senderTable AS t_senderTable
INNER JOIN t_senderTable AS t_senderTable ON (t_senderTable.Kd = mapTable.senderID))
INNER JOIN t_recieverTable AS t_recieverTabler ON (recieverTable.Id = mapTable.recieverID )
Second query is
SELECT t_license AS "License", t_coName AS "Company Name"
FROM (dbo.t_license AS t_license
INNER JOIN dbo. t_coName ON ( t_coName.id = t_license.senderID ))
WHERE
(
t_license.check < '2' )
Basically I need to combine the two query's, so that using senderID that is common between two query's I get output result of senderName, recieverName and coName
senderID is one to many relation.
Was getting idea from this post but cant get it to work Combining SQL Server Queries
Any ideas how to go about it? Thanks
You can use UNION(removes duplicate rows) or UNION ALL(returns all rows). For it to work, your columns need to match in both queries.
You could put the two queries as subqueries, include the senderId field in the select clause of both the subqueries and join on those values. I think what you are asking for is:
SELECT q1.senderName
, q1.recieverName
, q2."Company Name"
(
SELECT t_senderTable.nameFull AS "senderName", t_recieverTable.recieverName AS "recieverName"
, t_senderTable.Id
FROM ((dbo.t_senderTable AS t_senderTable
INNER JOIN t_senderTable AS t_senderTable ON (t_senderTable.Kd = mapTable.senderID))
INNER JOIN t_recieverTable AS t_recieverTabler ON (recieverTable.Id = mapTable.recieverID )
) q1
INNER JOIN (
SELECT t_license AS "License", t_coName AS "Company Name"
, t_license.senderID
FROM (dbo.t_license AS t_license
INNER JOIN dbo. t_coName ON ( t_coName.id = t_license.senderID ))
WHERE
(
t_license.check < '2' )
) q2
ON q1.Id = q2.senderId

Efficient SQL Query Design

What is generally considered the most efficient way to do this type of query?
We have a database of 10 years worth of laboratory data and we would like to select out performance data for various tests. This query for example will select the number of hours its taken to do a test and calculate an average turnaround time and allow us to plot a sparkline of avg TAT per day.
Say we have 100 test names is it acceptable in terms of performance to iterate over the test names in a loop and fire this query off once per loop? Or is there a more efficient way?
SELECT
Date_Authorised_Index.Date_Authorised
, Result_Set.Date_Booked_In
, avg(DATEDIFF('hh',Result_Set.Date_Time_Booked_In,Result_Set.Date_Time_Authorised)) as HrsIn
, count(Date_Authorised_Index.Date_Authorised) as numbers
, Date_Authorised_Index.Registration_Number
, Date_Authorised_Index.Request_Row_ID
, Date_Authorised_Index.Specimen_Number
, Result_Set.Authorised_By
, Result_Set.Namespace
, Result_Set.Set_Code
, Result_Set.Date_Time_Authorised
, Request.Date_Time_Received
, Request.Location
FROM
Date_Authorised_Index Date_Authorised_Index
, Result_Set Result_Set
, Request
WHERE
Date_Authorised_Index.Date_Authorised = Result_Set.Date_Authorised
AND Date_Authorised_Index.Request_Row_ID = Request.Request_Row_ID
AND Date_Authorised_Index.Request_Row_ID = Result_Set.Request_Row_ID
AND (Date_Authorised_Index.Discipline='C') AND Result_Set.Set_Code=?
GROUP BY
Result_Set.Date_Booked_In
For starters I would rewrite this query so it uses explicit join syntax.
Also even though MySQL does not force you to restate every non-aggregate column in the group by clause that doesn't mean that's a good thing.
Unless the Result_Set.Date_Booked_In uniquely identifies a row, you are selecting random values from a multiple of rows.
SELECT
dai.Date_Authorised
, rs.Date_Booked_In
, avg(DATEDIFF('hh',rs.Date_Time_Booked_In,rs.Date_Time_Authorised)) as HrsIn
, count(dai.Date_Authorised) as numbers
, dai.Registration_Number
, dai.Request_Row_ID
, dai.Specimen_Number
, rs.Authorised_By
, rs.Namespace
, rs.Set_Code
, rs.Date_Time_Authorised
, r.Date_Time_Received
, r.Location
FROM
Date_Authorised_Index dai
INNER JOIN Result_Set rs ON (dai.Date_Authorised = rs.Date_Authorised
AND dai.Request_Row_ID = rs.Request_Row_ID)
INNER JOIN Request R ON (dai.Request_Row_ID = r.Request_Row_ID)
WHERE
(dai.Discipline= 'C') AND rs.Set_Code=?
GROUP BY
rs.Date_Booked_In
If you want to select a 100 rows in one go, just make a new table with the set_codes you want to select and join against that.
Make sure you index the field sc.set_code (or better yet make it the primary key)
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
INNER JOIN Setcodes as sc ON (sc.Set_code = rs.SetCode) <<-- extra join.
WHERE
dai.discipline = 'C'
GROUP BY rs.Date_Booked_In
Or you can even use a `IN (...) like below, although that will propably be slower than a join.
SELECT lots_of_columns
FROM table1 as dai
INNER JOIN table2 as rs ON (what you joined on before)
INNER JOIN table3 as r ON (same here)
WHERE
dai.discipline = 'C' AND rs.Set_Code IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
GROUP BY rs.Date_Booked_In