Select distinct rows from a table with an inner join - sql

Hi am trying to build a query that currently looks like this:
SELECT DISTINCT cit.ComputerName
FROM ComputerInvTracking cit
INNER JOIN (
SELECT ComputerName,
DATEDIFF(day, time, GetDate()) AS time,
REPLACE(REPLACE(room, ',OU=Rooms,OU=Computers,OU=student,DC=campus,DC=ads,DC=uwe,DC=ac,DC=uk',''), 'OU=CL_','') AS room
FROM ComputerInvTracking cit2
)
ON cit.ComputerName = cit2.ComputerName
ORDER BY cit2.time
It is currently complaining of a Syntax error near the close bracket.
I am using SQL server
I am completely stuck. Any ideas?

You need a table alias on the subquery. All subquerys need to have names:
SELECT DISTINCT cit.ComputerName
FROM ComputerInvTracking cit INNER JOIN
(SELECT ComputerName, DATEDIFF(day, time, GetDate()) AS time, REPLACE(REPLACE(room, ',OU=Rooms,OU=Computers,OU=student,DC=campus,DC=ads,DC=uwe,DC=ac,DC=uk',''), 'OU=CL_','') AS room
FROM ComputerInvTracking cit2
) cit2
--^
ON cit.ComputerName = cit2.ComputerName
ORDER BY cit2.time

As Gordon Linoff mentioned, you need to alias your subqueries. That is the cause of the Syntax error you are currently getting.
Looking at your query, it appears you are trying to get all of the unique ComputerNames in ComputerInvTracking and then track their history. I think you have your query backwards, and actually want to do the "unique ComputerName" filtering in your subquery (or in a Common Table Expression).
The way your query works right now, you will first get NxN rows for each computer (where N is the number of entries in the ComputerInvTracking table for that computer's ComputerName), and then filter out the duplicates. This is extra work, and it would remove rows where the same computer was in the same room multiple times within a day, since that would show as a duplicate row in your query.
I would recommend something similar to the following:
WITH
-- Get the unique computers from the tracking table
UniqueComputers AS (
SELECT DISTINCT ComputerName
FROM ComputerInvTracking
)
-- Match up each tracking record with it's computer
SELECT HIS.ComputerName,
DATEDIFF(day, time, GetDate()) AS time,
REPLACE(REPLACE(HIS.room, ',OU=Rooms,OU=Computers,OU=student,DC=campus,DC=ads,DC=uwe,DC=ac,DC=uk',''), 'OU=CL_','') AS room
FROM UniqueComputers CMP
INNER JOIN ComputerInvTracking HIS
ON CMP.ComputerName = HIS.ComputerName
ORDER BY time

Related

Select statement for 1 table returns new rows then the table actually have

Update: the issue was in saving results into a different table. Apologies, this question should be deleted.
I got this query:
SELECT DISTINCT
SubscriberKey,
'True' as Email_Opens
FROM LN_Journey_21
WHERE SubscriberKey in(
SELECT
LN.SubscriberKey
FROM
_Job J
join _Open O on J.JobID = O.JobID
join LN_Journey_21 LN on LN.SubscriberKey = O.SubscriberKey
WHERE
J.EmailName LIKE 'IQOS_LN%'
and j.CreatedDate >= '2021-05-10'
)
SubscriberKey is a PK in LN_Journey_21.
The results have more rows than LN_Journey_21 had before running the query, how is that?
The query should be (most importantly you don't need DISTINCT anywhere):
SELECT SubscriberKey,
'True' as Email_Opens
FROM dbo.LN_Journey_21 AS LN
WHERE EXISTS
(
SELECT 1
FROM dbo._Job AS J
INNER JOIN dbo._Open AS O
ON J.JobID = O.JobID
WHERE O.SubscriberKey = LN.SubscriberKey
AND J.EmailName LIKE 'IQOS_LN%'
AND J.CreatedDate >= '20210510'
);
Extra rows could be explained by:
different query than what's posted in your question
COUNT query is more complex than just SELECT COUNT(*) FROM dbo.table;
data has actually changed between when you ran the COUNT query and when you ran this query
using NOLOCK (perhaps you've hidden it from us, or it's used on your COUNT query, or both)
you are relying on the status bar in SSMS, which shows total rows for the batch by default, and you other queries that return those additional 500 rows
Like the comments suggest, it would be great if you could show a scenario (e.g. on db<>fiddle where COUNT produces fewer rows than this query. With the information we have so far, it's not possible, except for situations like those I mentioned above (that list may not be exhaustive, but probably the most common).

SQL - Difference between FROM(subquery) and WHERE - IN(subquery)

I would like to ask to diference between following two SQL statements.
The first one is working correctly, but the second one not. When I "create a new table" from subquery then result is correct, but if I use the same subquery in WHERE-IN statement then I get a different result.
SELECT `T`.`city`, COUNT(*)
FROM (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
) AS T
GROUP BY `T`.`city`
ORDER BY `COUNT(*)` ASC
///////////////////////////////////
SELECT `address`.`city`, COUNT(*)
FROM `address`
WHERE `address`.`city` IN (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
)
GROUP BY `address`.`city`
ORDER BY `COUNT(*)`;
The first query will run the subquery first which returns a distinct list of 'city'. You then do another group by on it with a count which should lead to a result set of 'city' with all ones next to it. In essence you are running your query off of the subquery (not the address table itself).
Your second query will run the subquery first, return the distinct list of 'city' then using that list, go back to the original table and return everything that matches (which should be the entire table of address) and then group by it and return a count. This will lead to a different result since you are hitting the original table vs hitting the subquery result.

Selecting ambiguous column from subquery with postgres join inside

I have the following query:
select x.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
Since id0 is in both sessions and clicked_products, I get the expected error:
column reference "id0" is ambiguous
However, to fix this problem in the past I simply needed to specify a table. In this situation, I tried:
select sessions.id0
from (
select *
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0
) x;
However, this results in the following error:
missing FROM-clause entry for table "sessions"
How do I return just the id0 column from the above query?
Note: I realize I can trivially solve the problem by getting rid of the subquery all together:
select sessions.id0
from sessions
inner join clicked_products on sessions.id0 = clicked_products.session_id0;
However, I need to do further aggregations and so do need to keep the subquery syntax.
The only way you can do that is by using aliases for the columns returned from the subquery so that the names are no longer ambiguous.
Qualifying the column with the table name does not work, because sessions is not visible at that point (only x is).
True, this way you cannot use SELECT *, but you shouldn't do that anyway. For a reason why, your query is a wonderful example:
Imagine that you have a query like yours that works, and then somebody adds a new column with the same name as a column in the other table. Then your query suddenly and mysteriously breaks.
Avoid SELECT *. It is ok for ad-hoc queries, but not in code.
select x.id from
(select sessions.id0 as id, clicked_products.* from sessions
inner join
clicked_products on
sessions.id0 = clicked_products.session_id0 ) x;
However, you have to specify other columns from the table sessions since you cannot use SELECT *
I assume:
select x.id from (select sessions.id0 id
from sessions
inner join clicked_products
on sessions.id0 = clicked_products.session_id0 ) x;
should work.
Other option is to use Common Table Expression which are more readable and easier to test.
But still need alias or selecting unique column names.
In general selecting everything with * is not a good idea -- reading all columns is waste of IO.

SQL Server : combine SELECT and related UDF results

I failed to google this scenario: I would like to insert new rows into (time slot) table, for some records (selected by WHERE clause) and add 3 columns as a result of User-Defined-Function (which calculates a free slot date, start and end time). This has to work, even if the UDF returns more than one row.
Based on Microsoft's suggestion about using UDF:
SELECT ContactID, FirstName, LastName, JobTitle, ContactType
FROM dbo.ufnGetContactInformation(1209);
I came up with this concept:
INSERT INTO PlanTimeSlots (........................)
SELECT
PRJ.ID as RID,
GST.SlotDate as SlotDate,
GST.SlotStart as TimeStart,
GST.SlotEnd as TimeEnd,
PRJ.WPGroupID as WPGroupID,
45 as Priority
FROM
PlanRJ as PRJ
LEFT JOIN
(SELECT
SlotDate, SlotStart, SlotEnd
FROM
dbo.GetSuitableTimeSlot(PRJ.ID, PRJ.WPGroupID,
PRJ.DateReqBy, PRJ.DurationMin)) AS GST ON GST.JID = PRJ.ID
WHERE
........;
So I redundantly pass an RID to the UDF, which is returned as GST.JID, so there's a key to join UDFs result set to the main select.
Is this OK, or is there a better solution? It will work with hundreds to thousands entries and I'm not sure if this concept could perform well.
1. query result will depend upon your where condition also.
2. If you want to get all the records from left query then use left join otherwise change it to INNER join
3. Treat your user defined function as other table no need for select statement.
INSERT INTO PlanTimeSlots (........................)
SELECT
PRJ.ID as RID,
GST.SlotDate as SlotDate,
GST.SlotStart as TimeStart,
GST.SlotEnd as TimeEnd,
PRJ.WPGroupID as WPGroupID,
45 as Priority
FROM
PlanRJ as PRJ
INNER JOIN
dbo.GetSuitableTimeSlot(PRJ.ID, PRJ.WPGroupID,
PRJ.DateReqBy, PRJ.DurationMin) AS GST ON GST.JID = PRJ.ID
WHERE
........;

Cumulative Summing Values in SQLite

I am trying to perform a cumulative sum of values in SQLite. I initially only needed to sum a single column and had the code
SELECT
t.MyColumn,
(SELECT Sum(r.KeyColumn1) FROM MyTable as r WHERE r.Date < t.Date)
FROM MyTable as t
Group By t.Date;
which worked fine.
Now I wanted to extend this to more columns KeyColumn2 and KeyColumn3 say. Instead of adding more SELECT statements I thought it would be better to use a join and wrote the following
SELECT
t.MyColumn,
Sum(r.KeyColumn1),
Sum(r.KeyColumn2),
Sum(r.KeyColumn3)
FROM MyTable as t
Left Join MyTable as r On (r.Date < t.Date)
Group By t.Date;
However this does not give me the correct answer (instead it gives values that are much larger than expected). Why is this and how could I correct the JOIN to give me the correct answer?
You are likely getting what I would call mini-Cartesian products: your Date values are probably not unique and, as a result of the self-join, you are getting matches for each of the non-unique values. After grouping by Date the results are just multiplied accordingly.
To solve this, the left side of the join must be rid of duplicate dates. One way is to derive a table of unique dates from your table:
SELECT DISTINCT Date
FROM MyTable
and use it as the left side of the join:
SELECT
t.Date,
Sum(r.KeyColumn1),
Sum(r.KeyColumn2),
Sum(r.KeyColumn3)
FROM (SELECT DISTINCT Date FROM MyTable) as t
Left Join MyTable as r On (r.Date < t.Date)
Group By t.Date;
I noticed that you used t.MyColumn in the SELECT clause, while your grouping was by t.Date. If that was intentional, you may be relying on undefined behaviour there, because the t.MyColumn value would probably be chosen arbitrarily among the (potentially) many in the same t.Date group.
For the purpose of this example, I assumed that you actually meant t.Date, so, I replaced the column accordingly, as you can see above. If my assumption was incorrect, please clarify.
Your join is not working cause he will find way more possibilities to join then your subselect would do.
The join is exploding your table.
The sub select does a sum of all records where the date is lower then the one from the current record.
The join joins every row multiple times aslong as the date is lower then the current record. This mean a single record could do as manny joins as there are records with a date lower. This causes multiple records. And in the end a higher SUM.
If you want the sum from mulitple columns you will have to use 3 sub query or define a unique join.