Join two tables and then pull Distinct Records - sql

I'm writing a VS 2010 program in C# and I've ran into a SELECT statement in SQL that is taking me to long to figure out and could use some help.
Table 1 - mailfiles
id,fname,lname,etc...
Table 2 - details,
id,timestamp,page_id,mailfile_id(FK),campaign_id
I want to pull the unique/distinct mailfile_id sorted by the most current timestamp and then join them to the mailfiles table to get the rest of my info.
I had something like this,
SELECT mailfiles.id,mailfiles.fname,mailfiles.lname,mailfiles.company2,mailfiles.city,etc...
FROM mailfiles
JOIN
(SELECT DISTINCT(details.mailfile_id)
FROM details
GROUP BY details.mailfile_id) as TMP
ON mailfiles.id = TMP.mailfile_id
ORDER BY TMP.mailfile_id DESC
Which gets me the distinct/unique records but I don't have access to the details columns, which I want to display the timestamp.
Any help would be much appreciated.
Thanks
Nick

How about:
SELECT *
FROM
(SELECT mailfile_id, Max(timestamp) m_timestamp FROM details GROUP BY mailfile_id) AS latest
INNER JOIN details on latest.mailfile_id = details.mailfile_id AND latest.m_timestamp = details.timestamp
INNER JOIN mailfiles ON mailfiles.id = details.mailfile_id
ORDER BY details.timestamp
There's an unresolved ambiguity in the case that there are two identical timestamps, but it seems that's something that would have to be solved in any case. For a given id, which of these 2 timestamps is actually the latest?

I'm afraid what you want to do is quite tricky since as you've found out the distinct operator works on all the fields you list in the query. If it works for you, you could return a list of ID's using your first query and then use these ID's to retrieve the details that your require in a second query. I know its two queries, but it may end up being faster than a large complex query anyway.

SELECT MAX (timestamp), fname, lname, ...
FROM mailfiles m, details d
WHERE m.id. = d.mailfiles_id
GROUP BY m.id, fname, lname, ...
Hm - yes, I see the problem rising - do you need more than timestamp from the details? Then it get's more tricky.
SELECT m.*, d.*
FROM mailfiles m, details d
WHERE m.id = d.mailfiles_id
AND (d.mailfiles_id, d.timestamp) IN (
SELECT mailfiles_id, MAX (timestamp)
FROM details
GROUP BY mailfiles_id);
If you need details from both tables.
I don't know whether your database supports ... and (a, b) in (SELECT c, d FROM ...; postgresql does.

Related

Long SQL subquery trouble

I just registered and want to ask.
I learn sql queries not so long time and I got a trouble when I decided to move a table to another database. A few articles were read about building long subqueries , but they didn't help me.
Everything works perfect before that my action.
I just moved the table and tried to rewrite the query while whole day.
update [dbo].Full
set [salary] = 1000
where [dbo].Full.id in (
select distinct k1.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.S_School)
) k1
where k1.id not in (
select distinct k2.id
from (
select id, Topic, User
from Full
where User not in (select distinct topic_name from [DB_1].dbo.Shool)
) k2,
List_School t3
where charindex (t3.NameApp, k2.Topic)>5
)
)
I moved table List_School to database [DB_1] and I can't to bend with it.
I can't write [DB_1].dbo.List_School. Should I use one more subquery?
I even thought about create a few temporary tables but it can influence on speed of execution.
Sql gurus , please invest some your time on me. Thank you in advance.
I will be happy for each hint, which you give me.
There appear to be a number of issues. You are comparing the user column to the topic_name column. An expected meaning of those column names would suggest you are not comparing the correct columns. But that is a guess.
In the final subquery you have an ansi join on table List_School but no join columns which means the join witk k2 is a cartesian product (aka cross join) which is not what you would want in most situations. Again a guess as no details of actual problem data or error messages was provided.

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10

Need to make SQL subquery more efficient

I have a table that contains all the pupils.
I need to look through my registered table and find all students and see what their current status is.
If it's reg = y then include this in the search, however student may change from y to n so I need it to be the most recent using start_date to determine the most recent reg status.
The next step is that if n, then don't pass it through. However if latest reg is = y then search the pupil table, using pupilnumber; if that pupil number is in the pupils table then add to count.
Select Count(*)
From Pupils Partition(Pupils_01)
Where Pupilnumber in (Select t1.pupilnumber
From registered t1
Where T1.Start_Date = (Select Max(T2.Start_Date)
From registered T2
Where T2.Pupilnumber = T1.Pupilnumber)
And T1.reg = 'N');
This query works, but it is very slow as there are several records in the pupils table.
Just wondering if there is any way of making it more efficient
Worrying about query performance but not indexing your tables is, well, looking for a kind word here... ummm... daft. That's the whole point of indexes. Any variation on the query is going to be much slower than it needs to be.
I'd guess that using analytic functions would be the most efficient approach since it avoids the need to hit the table twice.
SELECT COUNT(*)
FROM( SELECT pupilnumber,
startDate,
reg,
rank() over (partition by pupilnumber order by startDate desc) rnk
FROM registered )
WHERE rnk = 1
AND reg = 'Y'
You can look execution plan for this query. It will show you high cost operations. If you see table scan in execution plan you should index them. Also you can try "exists" instead of "in".
This query MIGHT be more efficient for you and hope at a minimum you have indexes per "pupilnumber" in the respective tables.
To clarify what I am doing, the first inner query is a join between the registered table and the pupil which pre-qualifies that they DO Exist in the pupil table... You can always re-add the "partition" reference if that helps. From that, it is grabbing both the pupil AND their max date so it is not doing a correlated subquery for every student... get all students and their max date first...
THEN, join that result to the registration table... again by the pupil AND the max date being the same and qualify the final registration status as YES. This should give you the count you need.
select
count(*) as RegisteredPupils
from
( select
t2.pupilnumber,
max( t2.Start_Date ) as MostRecentReg
from
registered t2
join Pupils p
on t2.pupilnumber = p.pupilnumber
group by
t2.pupilnumber ) as MaxPerPupil
JOIN registered t1
on MaxPerPupil.pupilNumber = t1.pupilNumber
AND MaxPerPupil.MostRecentRec = t1.Start_Date
AND t1.Reg = 'Y'
Note: If you have multiple records in the registration table, such as a person taking multiple classes registered on the same date, then you COULD get a false count. If that might be the case, you could change from
COUNT(*)
to
COUNT( DISTINCT T1.PupilNumber )

SQL query involving group by and joins

I couldn't be more specific in the title part but I want to do something a little bit complex for me. I thought I did it but it turned out that it is buggy.
I have three tables as following:
ProjectTable
idProject
title
idOwner
OfferTable
idOffer
idProject
idAccount
AccountTable
idAccount
Username
Now in one query I aim to list all the projects with most offers made, and in the query I also want to get details like the username of the owner, username of the offerer* etc. So I don't have to query again for each project.
Here is my broken query, it's my first experiment with GROUP BY and I probably didn't quite get it.
SELECT Project.addDate,Project.idOwner ,Account.Username,Project.idProject,
Project.Price,COUNT(Project.idProject) as offercount
FROM Project
INNER JOIN Offer
ON Project.idProject= Offer.idProject
INNER JOIN Account
ON Account.idAccount = Project.idOwner
GROUP BY Project.addDate,Project.idOwner,
Account.Username,Project.idProject,Project.Price
ORDER BY addDate DESC
*:I wrote that without thinking I was just trying to come up with example extra information, that is meaningless thanks to Hosam Aly.
Try this (modified for projects with no offers):
SELECT
Project.addDate,
Project.idOwner,
Account.Username,
Project.idProject,
Project.Price,
ISNULL(q.offercount, 0) AS offercount
FROM
(
SELECT
o.idProject,
COUNT(o.idProject) as offercount
FROM Offer o
GROUP BY o.idProject
) AS q
RIGHT JOIN Project ON Project.idProject = q.idProject
INNER JOIN Account ON Account.idAccount = Project.idOwner
ORDER BY addDate DESC
I might switch the query slightly to this:
select p.addDate,
p.idOwner,
a.Username,
p.idProject,
p.price,
o.OfferCount
from project p
left join
(
select count(*) OfferCount, idproject
from offer
group by idproject
) o
on p.idproject = o.idproject
left join account a
on p.idowner = a.idaccount
This way, you are getting the count by the projectid and not based on all of the other fields you are grouping by. I am also using a LEFT JOIN in the event the projectid or other id doesn't exist in the other tables, you will still return data.
Your question is a bit vague, but here are some pointers:
To list the projects "with most offers made", ORDER BY offercount.
You're essentially querying for projects, so you should GROUP BY Project.idProject first before the other fields.
You're querying for the number of offers made on each project, yet you ask about offer details. It doesn't really make sense (syntax-wise) to ask for the two pieces of information together. If you want to get the total number of offers, repeated in every record of the result, along with offer information, you'll have to use an inner query for that.
An inner query can be made either in the FROM clause, as suggested by other answers, or directly in the SELECT clause, like so:
SELECT Project.idProject,
(SELECT COUNT(Offer.idOffer)
FROM Offer
WHERE Offer.idProject = Project.idProject
) AS OfferCount
FROM Project

Select rows that are different in SQL

I have a table with way too many columns and a couple million rows that I need to query for differences.
On these rows there will hopefully be only one column that is different and that should be the Auto incremented id field.
What I need to do is check to see if these rows ARE actually the same and if there are any that have any differences in any of the fields.
So for example, if the "Name" column is supposed to be "Peter, Paul and Mary" and the "Order #" column is supposed to be "132" I need to find any rows where those values aren't true, but I need to find it for every column in the table AND I don't actually know what the correct values are (meaning I can't just create a "SELECT...WHERE Name='This'" for each column).
So how can I find the rows that are different? (using straight SQL, no programming)
Would you think this answer is what you are looking for and would help you? here's a Link to find the appropriate sql query.
Let's suppose you coded a email newsletter signup form, but you forgot to double check that the email address was not a duplicate, or already in the database. We can write a query to find all the emails in our table that are duplicates, or occurs in more than one row.
The following SQL query works great for finding duplicate values in a table.
SELECT email,
COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
By using group by and then having a count greater than one, we find rows with with duplicate email addresses using the above SQL.
Blockquote
If you know the limit of the wrong results (say 10 for example) then you could order them and get only the first 11 results. You see where I am going with this, right?
I have no SQL expertise whatsoever though :)
Do you need to do this programmatically, or can you just run a few queries yourself to check it?
If the latter, I'd just do "select distinct name, order#" to start. This should return a list that includes "Peter Paul and Mary, 132" and possibly some other things.
Then find the other things by doing select ... where name = "this" as you suggest.
You could get even more info out of that first query by doing "select distinct name, order#, count(*) from ... group by name, order#". This would give you both the list of values and the frequency of a given set of values.
if I understand you correctly, (your question is not 100% clear to me), you are tryin g to find the rows that are unnecessary duplicates ? If so, Try these SQL queries:
Select A.Id, B.Id
From Table A
Join Table B
On A.Id <> B.Id
And A.ColA = B.ColA
And A.ColB = B.Col
And A.ColC = B.ColC
...
Or
Select ColA, ColB, etc.
From Table
Group By ColA, ColB, etc.
Having Count(*) > 1
If you have a correlation between two "independent" columns where there is really only one "correct" value for column B whenever column A is a given value, then you have a broken database design, because these correlation should have been factored out as a separate table.
Try this:
SELECT Name, OrderNum
FROM Orders T1
FULL OUTER JOIN (
SELECT Name, OrderNum
FROM Orders
GROUP BY Name, OrderNum
HAVING COUNT(*) > 1) T2
ON T1.Name = T2.Name
AND T1.OrderNum = T2.OrderNum
The nested select is identifying the duplicates, so you will need to target your common fields, the FULL OUTER JOIN excludes the duplicates from your result set. So essentially you are joining the table on itself to identify the duplicates and exclude them from your results. If you want only the duplicates then change the FULL OUTER JOIN to just JOIN.