query behave not as expected - sql

I have a query:
select count(*) as total
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
If i understood correct (which i think i did not), right join suppose to return all row from right table in conjunction with left table. it suppose to be at list 10 row. But query returns only 1 row with 1 column 'total' . And it doesn't matter left full inner join it will be, result is the same always.
If i reverse tables and use left join with small modification of query, then it work correct (Modifications have no matter because in this case i get exactly what i expected to get). But I am interested to find what i actually didn't understand about join and why this query works not as expected.

You are returning one column because the select contains an aggregation function, turning this into an aggregation query. The query should be returning 10 times the number of rows in the sheet_record table.
Your query is effectively a cross join. So, if you did:
select *
from sheet_record right join
(select * from sheet_record limit 10) as sr
on 1=1;
You would get 10 rows for each record in sheet_record. Each of those records would have additional columns from one of ten records from the same table.

You are using a count(*) function, without any groupings. This will pretty much will result in retrieving a single row back. Try running your query without the count() to see if you get something closer to what you expect.

Eventually with help of commentators I did understood what was wrong. Not wrong actually, but what exactly i was not catching.
// this code below is work fine. query will return page 15 with 10 records in.
select *from sheet_record inner join (select count(*) as total from sheet_record) as sr on 1=1 limit 10 offset 140;
I was thinking that join takes table from left and join with the right table. But the moment i was working on script(above) I had on right side a view(table built by subquery) instead of pure table and i was thinking that left side as well a view, made by (select * from sheet_record) which is a mistake.
Idea is to get set of records from table X with additional column having value of total number of records in table.
(This is common problem when there is a demand to show table in UI using paging. To know how many pages still should be available i need to know how many record in total so i can calculate how many pages still available)
I think it should be something
select * from (
(here is some subquery which will give a view using count(*) function on some table X and it will be used as left table)
right join
(here is some subquery which will get some set or records from table X with limit and offset)
on 1=1 //becouse i need all row from right table(view) in all cases it should be true
)
Query with right join will a bit complicated.
I am using postgres.
So eventually i managed to get result with right join
select * from (select count(*) as total from sheet_record) as srt right join (select * from sheet_record limit 10 offset 140) as sr on 1=1;

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.
Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

Can't seem to get an outer join working on informix [duplicate]

I am trying to get the number of page opens on a per day basis using the following query.
SELECT day.days, COUNT(*) as opens
FROM day
LEFT OUTER JOIN tracking ON day.days = DAY(FROM_UNIXTIME(open_date))
WHERE tracking.open_id = 10
GROUP BY day.days
The output I get it is this:
days opens
1 9
9 2
The thing is, in my day table, I have a single column that contains the number 1 to 30 to represent the days in a month. I did a left outer join and I am expecting to have all days show on the days column!
But my query is doing that, why might that be?
Nanne's answer given explains why you don't get the desired result (your WHERE clause removes rows), but not how to fix it.
The solution is to change WHERE to AND so that the condition is part of the join condition, not a filter applied after the join:
SELECT day.days, COUNT(*) as opens
FROM day
LEFT OUTER JOIN tracking
ON day.days = DAY(FROM_UNIXTIME(open_date))
AND tracking.open_id = 10
GROUP BY day.days
Now all rows in the left table will be present in the result.
You specify that the connected tracking.open_id must be 10. For the other rows it will be NULL, so they'll not show up!
The condition is in the WHERE clause. After joining the tables the WHERE conditions are evaluated to filter out everything matching the criteria.Thus anything not matching tracking.open_id = 10 gets discarded.
If you want to apply this condition while joining the two tables, a better way is to use it with the ON clause (i.e. joining condition) than the entire dataset condition.

SELECT FROM inner query slowdown

We have two very similar queries, one takes 22 seconds the other takes 6 seconds. Both use an inner select, have the exact same outer columns and outer joins. The only difference is the inner select that the outer query is using to join in on.
The inner query when run alone executes in 100ms or less in both cases and returns the EXACT SAME data.
Both queries as a whole have a lot of room for improvement, but this particular oddity is really puzzling to us and we just want to understand why. To me it would seem the inner query should be executed once in 100ms then the outer stuff happens. I have a feeling the inner select may be executed multiple times.
Query that takes 6 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
ORDER BY projectItemsID ASC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
Query that takes 22 seconds:
SELECT {whole bunch of column names}
FROM (
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
) projectItems
LEFT JOIN categories
ON projectItems.fk_category = categories.categoryID
...{more joins}
For every row in your projectItems table, in the second function, you search two columns instead of one. If projectItemsID isn't the primary key or if it isn't indexed, it takes longer to parse an extra column.'
If you look at the sizes of the tables and the number of rows each query returns, you can calculate how many comparisons need to be made for each of the queries.
I believe that you're right that the inner query is being run for every single row that is being left joined with categories.
I can't find a proper source on it right now, but you can easily test this by doing something like this and comparing the run times. Here, we can at least be sure that the inner query is only running one time. (sorry if any syntax is incorrect, but you'll get the general idea):
DECLARE #innerQuery TABLE ( [all inner query columns here] )
INSERT INTO #innerQuery
SELECT projectItems.* FROM projectItems
WHERE projectItems.isActive = 1
AND projectItemsID = 6539
SELECT {whole bunch of field names}
FROM #innerQuery as IQ
LEFT JOIN categories
ON IQ.fk_category = categories.categoryID
...{more joins}

Getting way more results than expected in SQL left join query

My code is such:
SELECT COUNT(*)
FROM earned_dollars a
LEFT JOIN product_reference b ON a.product_code = b.product_code
WHERE a.activity_year = '2015'
I'm trying to match two tables based on their product codes. I would expect the same number of results back from this as total records in table a (with a year of 2015). But for some reason I'm getting close to 3 million.
Table a has about 40,000,000 records and table b has 2000. When I run this statement without the join I get 2,500,000 results, so I would expect this even with the left join, but somehow I'm getting 300,000,000. Any ideas? I even refered to the diagram in this post.
it means either your left join is using only part of foreign key, which causes row multiplication, or there are simply duplicate rows in the joined table.
use COUNT(DISTINCT a.product_code)
What is the question are are trying to answer with the tsql?
instead of select count(*) try select a.product_code, b.product_code. That will show you which records match and which don't.
Should also add a where b.product_code is not null. That should exclude the records that don't match.
b is the parent table and a is the child table? try a right join instead.
Or use the table's unique identifier, i.e.
SELECT COUNT(a.earned_dollars_id)
Not sure what your datamodel looks like and how it is structured, but i'm guessing you only care about earned_dollars?
SELECT COUNT(*)
FROM earned_dollars a
WHERE a.activity_year = '2015'
and exists (select 1 from product_reference b ON a.product_code = b.product_code)

How to fix: Only one expression can be specified in the select list when the subquery is not introduced with EXISTS

I've been through a bunch of existing posts but couldn't get this to
work. I'm trying to build a query get all the records in a table and
an extra column. The extra column is populated by this logic - the
first value represented in the row which has same session ID as the
original record and has ToolName=ReportingTool. When I try to
implement the query like this, I get this error.
I tried doing a left join but the problem there is I don't know how to
limit the left join output (from the right table's select) to 1. This
causes multiple joins on the left and the no. of records returned
changes. My query is as follows:
SELECT
*
FROM [TraceDB].[dbo].[TelemetryLogs] AS TelemetryOuter
LEFT JOIN [TraceDB].[dbo].[TelemetryLogs] AS TelemetryInner
ON
TelemetryInner.SessionID = TelemetryOuter.SessionID AND
TelemetryInner.ToolName='ReportingTool' AND
TelemetryInner.Name='Identity' AND
TelemetryInner.SessionID = (
SELECT TOP 1 *
FROM [TraceDB].[dbo].[TelemetryLogs] AS TelemtryIntInt
WHERE TelemtryIntInt.SessionID=TelemetryInner.SessionID
)
WHERE
TelemetryOuter.ToolName ='ReportingTool'
EDIT: Fixed a comma which got introduced as a copy paste type
Try
SELECT TOP 1 TelemtryIntInt.SessionID
In your inner SELECT. You're currently returning the whole row and you can't compare a scalar sessionID against a whole row.