I am new to SQL and after writing some queries I wanted to understand how SQL "internally" processes the queries. I take one query from another post in stackoverflow:
select name from contacts
group by name
having count(*) > 1
My question is: group by name merges all rows with the same name into one row, how does then count know how many rows with the same name were merged. I am trying to split all steps in the processing of the query in order to understand how it is exactly working, but in this case it seems like you cannot split it. Thanks in advance.
From your sql query that you show there
the execution sequence will be like this show below
from contacts
knowing which tables's data you are getting, next will be your WHERE clause but in this case you don't have one so will follow to the next step which is
group by name
group all the same name to a row of record.
side note: Now the SELECT statement still haven run yet, therefore when HAVING statement run can count the row that the same name has
Next is your
having count(*) > 1
filter up all the record which count more than 1, and lastly will be the SELECT
select name
above was the execute sequence for your example shown.
And these is the full sequence of sql query
1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
Hope it help ya.
Related
Data I have a table in Access that has a Part Number and PriceYr and Price associated to each Part Number.There are over 10,000 records and the PartNumber are repeated and has different PriceYr and Price associated to it. However, I need a query to just find the 5 most recent price and date associated with it.
I tried using MAX(PriceYr) however, it only returns 1 most recent record for each PartNumber.
I also tried the following query but it doesn't seem to work.
SELECT Catalogs.PartNumber,Catalogs.PriceYr, Catalogs.Price FROM Catalogs
WHERE Catalogs.PriceYr in
(SELECT TOP 5 Catalogs.PriceYr
FROM Catalogs as Temp
WHERE Temp.PartNumber = Catalogs.PartNumber
ORDER By Catalogs.PriceYr DESC)
Any help will be greatly appreciated. Thank you.
Desired Result that i am trying to get.
Consider a correlated count subquery to filter by a rank variable. Right now, you pull top 5 overall on matching PartNumber not per PartNumber.
SELECT main.*
FROM
(SELECT c.PartNumber, c.PriceYr, c.Price,
(SELECT Count(*)
FROM Catalogs AS Temp
WHERE Temp.PartNumber = c.PartNumber
AND Temp.PriceYr >= c.PriceYr) As rank
FROM Catalogs c
) As main
WHERE main.rank <= 5
MAX() is an aggregating function, meaning that it groups all the data and takes the maximal value in the specified column. You need to use a GROUP BY statement to prevent the query from grouping the whole dataset in a single row.
On the other hand, your query seems to needlessly use a subquery. The following query should work quite fine :
SELECT TOP 5 c.PartNumber, c.PriceYr, c.Price
FROM Catalogs c
ORDER BY c.PriceYr DESC
WHERE c.PartNumber = #partNumber -- if you want the query to
-- work on a specific part number
(please post a table creation query to make sure this example works)
--I have the following sql. I only want the first row from the account but am receiving roughly 39 rows of each account. I have tried top 1, as shown below. I don't quite understand where to add the "distinct top 1" or where to add the Row_number =1 so as not to get duplicates --Any help appreciated.
A sample code would be
Select (select top 1 fieldname from table1 where id=mastertable.id order by fieldnamesort) result from mastertable.
If mastertable is unique otherwise put distinct.
Alternatively you can use max for top 1 depending on the requirement.
I know this has to be an easy select but I am having no luck figuring it out. I've got a table has a field of customer grouping codes and I'm trying to get a count of each distinct character 2 through 6 sets. In my past foxpro experience a simple
select distinct substr(custcode,2,5), count (*) from a group by 1
would work, but this doesn't appear to work in sql server queries. The error message indicated it didn't like using the number reference in the group by so I changed it to custcode but the count just returns 1 for each, as I assume the count is after the distinct occurs so there is only one. If I change the count to count(distinct substring(custcode,2,5)) and remove the first distinct substring I just get a count of how many different codes exist. Can someone point out what I'm doing wrong here? Thanks.
The DISTINCT and GROUP BY are redundant, you just want GROUP BY, and you want to GROUP BY the same thing you are selecting:
select substr(custcode,2,5), count (*)
from a
group by substr(custcode,2,5)
In SQL Server you can use column aliases/numbers in the ORDER BY clause, but not in GROUP BY.
Ie. ORDER BY 1 will order by the first selected column, but many consider it bad practice to use column indexes, using aliases/column names is clearer.
I was wondering how would I get the data for the next row in an SQL database, assuming I know the ID for the current entry and the table is ordered by ID.
Normally, when ordering by ID, one would think that to get the prev/next entry, you just need to substract/add 1 to the variable holding the ID, and run the SELECT query with the new ID, but this poses a problem when there are holes in the table, with ID's like so:
13,14,18,21...
And so on.
A way to do it would be by looping in your programming language, running a query and adding 1 every time it runs until it finds a row, but that could be potentially taxing to the database. Is there a way to find it in just a single query?
I was thinking about this being a plausible problem, considering I even thought about it for a second. So I thought of sharing my solution here!
What I would do to solve this, is to create a new query WHERE the new id is less/greater than the old one, like so:
SELECT *
FROM myTable t
WHERE t.id > 27
ORDER BY t.id
LIMIT 1
By doing this and limiting the results to 1, you can guarantee that you will get the entry that comes after 27.
This should also work for date orderings.
How about this:
Select MIN(myTable.Id)
FROM myTable
WHERE myTable.Id > 27
Get the next id number with min(). The next id after, say, 21 would be given by this query.
select min(test_id) as next_test_id
from test
where test_id > 21
Join that to the original table to get the row for that id number.
select *
from test
inner join (select min(test_id) as next_test_id
from test
where test_id > 3 ) t2
on test.test_id = t2.next_test_id
I realize my title probably doesnt explain my situation very well, but I honestly have no idea how to word this.
I am using SQL to access a DB2 database.
Using my screenshot image 1 below as a reference:
column 1 has three instances of "U11124", with three different descriptions (column 2)
I would like this query to return the first instance of "U11124" and its description, but then also unique records for the other rows. image 2 shows my desired result.
image 1
image 2
----- EDIT ----
to answer some of the questions / posts:
technically, it does not need to be the first , just any single one of those records. the problem is that we have three descriptions, and only one needs to be shown, i am now told it does not matter which one.
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;
In SQL Server:
select stvnst, stdesc
from (
select
stvnst, stdesc
row_number() over (order by stdesc partition by stvnst) row
from table
) a
where row = 1
This method has an advantage over a simple group by, in that it will also work when there's more than two columns in the table.
SELECT STVNST,FIRST(STDESC) from table group by STVNST ORDER BY what_you_want_first
All you need to do is use GROUP BY.
You say you want the first instance of the STDESC column? Well you can't guarntee the order of the rows without another column, however if you want to order by the highest ordered value the following will suffice:
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;