Duplicate columns with inner Join - sql

SELECT
dealing_record.*
,shares.*
,transaction_type.*
FROM
shares
INNER JOIN shares ON shares.share_ID = dealing_record.share_id
INNER JOIN transaction_type ON transaction_type.transaction_type_id = dealing_record.transaction_type_id;
The above SQL code produces the desired output but with a couple of duplicate columns. Also, with incomplete display of the column headers. When I change the
linesize 100
the headers shows but data displayed overlaps
I have checked through similar questions but I don't seem to get how to solve this.

You have duplicate columns, because, you're asking to the SQL engine for columns that they will show you the same data (with SELECT dealing_record.* and so on) , and then duplicates.
For example, the transaction_type.transaction_type_id column and the dealing_record.transaction_type_id column will have matching rows (otherwise you won't see anything with an INNER JOIN) and you will see those duplicates.
If you want to avoid this problem or, at least, to reduce the risk of having duplicates in your results, improve your query, using only the columns you really need, as #ConradFrix already said. An example would be this:
SELECT
dealing_record.Name
,shares.ID
,shares.Name
,transaction_type.Name
,transaction_type.ID
FROM
shares
INNER JOIN shares ON shares.share_ID = dealing_record.share_id
INNER JOIN transaction_type ON transaction_type.transaction_type_id = dealing_record.transaction_type_id;

Try to join shares with dealing_record, not shares again:
select dealing_record.*,
shares.*,
transaction_type.*
FROM shares inner join dealing_record on shares.share_ID = dealing_record.share_id
inner join transaction_type on transaction_type.transaction_type_id=
dealing_record.transaction_type_id;

Related

Get full join in google big query keeping the all frequecy combination in bigquery giving me only left join for all kind of join

I am trying to join 2 table using Id such that the frequency of all combination are present. But when I am using the join (left, right) I am still getting the inner join or left join output.
these are the table a
b
I am expecting output
I tried the actual code
select
act.action,
dvr.dateofdel,
dvr.output
FROM internal.actions as act
Right join internal.deliveries as dvr
ON dvr.id= act.id
I tried multiple joins but still same outcome ..
Your query does not match your sample data.
But based on your problem statement, I suspect that you want:
select
act.ID_log,
act.ID_send_message,
act.action_date,
act.action,act.ID_email,
dvr.delivery_date,
act.email
from internal.actions as act
left join internal.deliveries as dvr
on dvr.ID_send_message= act.ID_send_message
and dvr.delivery_date >= '2017-01-01'
and dvr.delivery_date < '2018-01-01'
where act.ID_send_message != 0
This will bring all records from act that satisfy the condition in the where clause, along with information coming from dvr; when there is no match in dvr, the corresponding columns will show null values. The important part in the query is that all conditions on the left joined table should be listed in the on clause of the join (rather than in the where clause).

SQL columns that were Inner joined showing up separately?

I'm trying to join some tables together, but when I join them the columns I am joining on are showing up separately.
My code looks something like this:
Select * FROM
SimulationOutput as SO
Inner Join SimulationInput as SI
ON SO.Key = SI.Key
AND
SO.Scenario = SI.Scenario
When I do this, I get the result I expected but my result also includes duplicates of The "Key" columns and the "Scenario" columns that I just joined on.
Is this expected behavior?
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_join_inner
On that website
I ran the following code:
Select * FROM
Orders
Inner Join Customers ON Orders.CustomerID = Customers.CustomerID
ORDER BY Orders.CustomerID
And in that output, the CustomerId column only shows up once, the Customers.CustomerID was dropped. I am trying to understand why this is happening in the latter, but not the former.
The reason I ask is because I intend to use a series of nested tables. And I'm running into duplicate column problems where SQL doesn't know which Scenario column to choose. I understand that I can specify the specific column, but I was planning on specifying my specific columns at the top fo the nested loop, and use a series of nested Select* in between.
So the structure would be something like
Select Column1, Column2, Column3 From
(Select * FROM ......
(Select * FROM.......
(Select * FROM..... ) as T1) as T2) as T3
I understand that this may not be ideal and for this question my end result is moot, I want to understand why the columns I'm joining on aren't being dropped the way they are on the w3schools website.
Is it because I am using an AND statement? Is there a better way to inner join using multiple criteria? Is it because of my version of SQL (TSQL?)
Image From W3 schools output

Duplicate results on inner join

I've written the below query but I'm getting multiple duplicate rows in the results, please can anyone see where I'm going wrong?
use Customers
select customer_details.Customer_ID,
customer_details.customer_name,
metering_point_details.MPAN_ID,
Agents.DA_DC_Charge
from Customer_Details
left join Metering_Point_Details
on customer_details.customer_id = Metering_Point_Details.Customer_ID
left join agents
on customer_details.Customer_ID = agents.customer_id
order by customer_id
It doesn't really matter, but you're not using an INNER JOIN. Regardless, your unexpected rows indicate that your JOIN criteria is not specific enough to return your expected output. You can use SELECT DISTINCT if your results are fully duplicative, and if you'd like to see why you're getting those duplicates you can just use SELECT * to see the full detail between the multiple rows that are returned using your JOIN criteria, which should help you either make your criteria more specific or show you that you've got duplicated records in one of the tables you're using in your JOIN.
With sample data we can dissect the problem more, but odds are you won't need it once you see why the rows are duplicated.

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

multiple sql joins not producing desired results

I'm new to sql and trying to tweak someone else's huge stored procedure to get a subset of the results. The code below is maybe 10% of the whole procedure. I added the lp.posting_date, last left join, and the where clause. Trying to get records where the posting date is between the start date and the end date. Am I doing this right? Apparently not because the results are unaffected by the change. UPDATE: I CHANGED THE LAST JOIN. The results are correct if there's only one area allocation term. If there is more than one area allocation term, the results are duplicated for each term.
SELECT Distinct
l.lease_id ,
l.property_id as property_id,
l.lease_number as LeaseNumber,
l.name as LeaseName,
lty.name as LeaseType,
lst.name as LeaseStatus,
l.possession_date as PossessionDate,
l.rent as RentCommencementDate,
l.store_open_date as StoreOpenDate,
msr.description as MeasureUnit,
l.comments as Comments ,
lat.start_date as atStartDate,
lat.end_date as atEndDate,
lat.rentable_area as Rentable,
lat.usable_area as Usable,
laat.start_date as aatStartDate,
laat.end_date as aatEndDate,
MK.Path as OrgPath,
CAST(laa.percentage as numeric(9,2)) as Percentage,
laa.rentable_area as aaRentable,
laa.usable_area as aaUsable,
laa.headcounts as Headcount,
laa.area_allocation_term_id,
lat.area_term_id,
laa.area_allocation_id,
lp.posting_date
INTO #LEASES FROM la_tbl_lease l
INNER JOIN #LEASEID on l.lease_id=#LEASEID.lease_id
INNER JOIN la_tbl_lease_term lt on lt.lease_id=l.lease_id and lt.IsDeleted=0
LEFT JOIN la_tlu_lease_type lty on lty.lease_type_id=l.lease_type_id and lty.IsDeleted=0
LEFT JOIN la_tlu_lease_status lst on lst.status_id= l.status_id
LEFT JOIN la_tbl_area_group lag on lag.lease_id=l.lease_id
LEFT JOIN fnd_tlu_unit_measure msr on msr.unit_measure_key=lag.unit_measure_key
LEFT JOIN la_tbl_area_term lat on lat.lease_id=l.lease_id and lat.isDeleted=0
LEFT JOIN la_tbl_area_allocat_term laat on laat.area_term_id=lat.area_term_id and laat.isDeleted=0
LEFT JOIN dbo.la_tbl_area_allocation laa on laa.area_allocation_term_id=laat.area_allocation_term_id and laa.isDeleted=0
LEFT JOIN vw_FND_TLU_Menu_Key MK on menu_type_id_key=2 and isActive=1 and id=laa.menu_id_key
INNER JOIN la_tbl_lease_projection lp on lp.lease_projection_id = #LEASEID.lease_projection_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
As may have already been hinted at you should be careful when using the WHERE clause with an OUTER JOIN.
The idea of the OUTER JOIN is to optionally join that table and provide access to the columns.
The JOINS will generate your set and then the WHERE clause will run to restrict your set. If you are using a condition in the WHERE clause that says one of the columns in your outer joined table must exist / equal a value then by the nature of your query you are no longer doing a LEFT JOIN since you are only retrieving rows where that join occurs.
Shorten it and copy it out as a new query in ssms or whatever you are using for testing. Use an inner join unless you want to preserve the left side set even when there is no matching lp.lease_id. Try something like
if object_id('tempdb..#leases) is not null
drop table #leases;
select distinct
l.lease_id
,l.property_id as property_id
,lp.posting_date
into #leases
from la_tbl_lease as l
inner join la_tbl_lease_projection as lp on lp.lease_id = l.lease_id
where lp.posting_date <= laat.end_date and lp.posting_date >= laat.start_date
select * from #leases
drop table #leases
If this gets what you want then you can work from there and add the other left joins to the query (getting rid of the select * and 'drop table' if you copy it back into your proc). If it doesn't then look at your Boolean date logic or provide more detail for us. If you are new to sql and its procedural extensions, try using the object explorer to examine the properties of the columns you are querying, and try selecting the top 1000 * from the tables you are using to get a feel for what the data looks like when building queries. -Mike
You can try the BETWEEN operator as well
Where lp.posting_date BETWEEN laat.start_date AND laat.end_date
Reasoning: You can have issues wheres there is no matching values in a table. In that instance on a left join the table will populate with null. Using the 'BETWEEN' operator insures that all returns have a value that is between the range and no nulls can slip in.
As it turns out, the problem was easier to solve and it was in a different place in the stored procedure. All I had to do was add one line to one of the cursors to include area term allocations by date.