I need bit explanation on finding second highest valued column using query below
SELECT MAX( column )
FROM table
WHERE column < ( SELECT MAX( column )
FROM table )
assume i have a table with single column named as 'column' and it has values 10,20,30,40.
now my above query outputs 30 which is second highest anyway.
this is how it works as per my understanding.
inner query finds the MAX( column ) and it is 40 always in my case.
now for each column using WHERE we are checking whether its less than max value.
but how we store that result to find the MAX(column) using outer query ?
means somewhere we should have a list of values lesser than actual max value ie 40 in my case.
and that list would be 10,20,30.
and again we are finding max out of this list which is 30.
so here how and where it stores all the columns lesser than actual max value
(40 in this case) which is used at the end to find
max again using that list(10,20,30).
can any one explain me how this works ?
This is how the query is logically evaluated.
T (c) => { 10, 20, 30, 40 }
MAX(c) => 40
SELECT c FROM T WHERE c < 40 => { 10, 20, 30 }
SELECT MAX(c) FROM T WHERE c < 40 => 30
You have two queries here:
SELECT MAX( column )
FROM table
This finds the largest row in the column and returns it.
SELECT MAX( column )
FROM table
WHERE column <...
This finds the largest row in the column that is less than some condition
By their powers combined...
SELECT MAX( column )
FROM table
WHERE column < (SELECT MAX( column )
FROM table)
This finds the largest row in the column that is less than the largest row in the column (aka the second largest).
I see you understand this query, don't you?:
SELECT MAX( column )
FROM table
It simply returns max value stored in this table. So it is 40 in your example.
So now consider this query that simplifies issue a bit:
SELECT MAX( column )
FROM table
WHERE column < 40
I don't see any problems for you to understand this either. It is still the same query as above, but only considering rows that column value is less then 40. How exactly it is stored in database (as temporary table etc.) is problem of DBMS and you don't need to trouble yourself about it.
Please specify what exactly do not you understand and expect from us to clarify.
The inner query selects the top value from the table. The RDBMS is smart enough to hold this value in memory, then hit the table again looking for the top value from the table, except this time it's looking for the top value that's smaller than the original top value.
There's not much more too it than that. I didn't write the RDBMS, so I don't know exactly how it operates.
SELECT column FROM table ORDER BY column DESC LIMIT 2;
This will provide the top two, so you will need to parse the data from there.
Related
I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:
The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.
HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total
In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go
I have a simple, but large, database that I need to write a SQL statement for. The statements needs to do the following:
Get the 15 most popular values for a field.
From those 15, get the count that value has appeared within a particular time period.
My table contains both a Date and a Value field. I am able to extract the 15 most popular values, or get the count for a particular value in a given time period. I do not know how to put the two together.
This is my current SQL:
SELECT
Count( Value ) AS Total,
Value AS Value
FROM
Database
GROUP BY
Value
ORDER BY
Total DESC
LIMIT 15
That will get my most popular 15. But from that, I want to display the COUNT() that each Value is between two dates.
Would this require a HAVING clause?
I simplified the previous solution (which would also do a job) a little bit:
SELECT
Value,
Count(*) as TotalInPeriod
FROM Database
WHERE Value in (SELECT Value FROM Database GROUP BY Value
ORDER BY count(*) DESC LIMIT 15)
AND date_field BETWEEN your_start_date and your_end_date
GROUP BY Value
Try something like this. Make an inner query that finds the top 15 values overall, and join it to the main set to limit it to those values.
SELECT
Count( Value) as TotalInPeriod,
Value as Value
FROM
Database a
JOIN (SELECT
Count( Value ) AS Total,
Value AS Value
FROM
Database
GROUP BY
Value
ORDER BY
Total DESC
LIMIT 15) as topValues
ON
a.Value = topValues.Value
WHERE
a.date_field BETWEEN your_start_date and your_end_date
GROUP BY
a.Value
To give some context, I am using time series data (one column) and I want to study gaps in the data, represented by NULL values in the data set. Although I expect some leading NULL values that I am not interested in including in my final data set. However the number of leading NULL values will vary between data sets.
I would like to exclude the top x number of rows of my data set where the value of a particular column is NULL, without excluding NULL values that appear lower in the same column.
Any help would be much appreciated.
Thanks!
EDIT: I also know that my first record in the value column is always 1, if that helps.
Unfortunately, for SQL Server 2008, I can't think of anything cleaner than:
SELECT row_number,value FROM <table> t1
WHERE value is not NULL OR
EXISTS (select * FROM <table> t2
where t2.value is not null and
t2.row_number < t1.row_number)
Just as an aside, for SQL Server 2012, you could use MAX() with an appropriate OVER() clause such that it considers all previous rows. If that MAX() returns NULL then all preceding rows are known to be NULL, and that's what I'd recommend if/when you upgrade.
You could find the first non-null item for each data set and then just query everything after that:
WITH FirstItem AS
(
SELECT
DataSetID,
MIN(row_number) row_number
FROM Data
WHERE value IS NOT NULL
GROUP BY DataSetID
)
SELECT d.* FROM Data d
INNER JOIN FirstItem fi
ON d.DataSetID = fi.DataSetid
AND d.row_number >= fi.row_number
Here is the situation that I am trying to solve:
I have a query that could return a set of records. The field being sorted by could have a number of different values - for the sake of this question we will say that the value could be A, B, C, D, E or Z
Now depending on the results of the query, the sorting needs to behave as follows:
If only A-E records are found then sorting them "naturally" is okay. But if a Z record is in the results, then it needs to be the first result in the query, but the rest of the records should be in "natural" sort order.
For instance, if A C D are found, then the result should be
A
C
D
But if A B D E Z are found then the result should be sorted:
Z
A
B
D
E
Currently, the query looks like:
SELECT NAME, SOME_OTHER_FIELDS FROM TABLE ORDER BY NAME
I know I can code a sort function to do what I want, but because of how I am using the results, I can't seem to use because the results are being handled by a third party library, to which I am just passing the SQL query. It is then processing the results, and there seems to be no hooks for me to sort the results and just pass the results to the library. It needs to do the SQL query itself, and I have no access to the source code of the library.
So for all of you SQL gurus out there, can you provide a query for me that will do what I want?
How do you identify the Z record? What sets it apart? Once you understand that, add it to your ORDER BY clause.
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN (record matches Z) THEN 0
ELSE 1
END
),
name
This way, only the Z record will match the first ordering, and all other records will be sorted by the second-order sort (name). You can exclude the second-order sort if you really don't need it.
For example, if Z is the character string 'Bob', then your query might be:
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN name='Bob' THEN 0
ELSE 1
END
), name
My examples are for T-SQL, since you haven't mentioned which database you're using.
There are a number of ways to solve this problem and the best solution depends on a number of factors that you don't discuss such as the nature of those A..Z values and what database product you're using.
If you have only a single value that has to sort on top, you can ORDER BY an expression that maps that value to the lowest possible sort value (with CASE or IIF or IFEQ, depending on your database).
If you have several different special sort values you could ORDER BY a more complicated expression or you could UNION together several SELECTs, with one SELECT for the default sorts and an extra SELECT for each special value. The SELECTs would include a sort column.
Finally, if you have quite a few values you can put the sort values into a separate table and JOIN that table into your query.
Not sure what DB you use - the following works for Oracle:
SELECT
NAME,
SOME_OTHER_FIELDS,
DECODE (NAME, 'Z', '_', NAME ) SORTFIELD
FROM TABLE
ORDER BY DECODE (NAME, 'Z', '_', NAME ) ASC
What is the most efficient way to select the first and last element only, from a column in SQLite?
The first and last element from a row?
SELECT column1, columnN
FROM mytable;
I think you must mean the first and last element from a column:
SELECT MIN(column1) AS First,
MAX(column1) AS Last
FROM mytable;
See http://www.sqlite.org/lang_aggfunc.html for MIN() and MAX().
I'm using First and Last as column aliases.
if it's just one column:
SELECT min(column) as first, max(column) as last FROM table
if you want to select whole row:
SELECT 'first',* FROM table ORDER BY column DESC LIMIT 1
UNION
SELECT 'last',* FROM table ORDER BY column ASC LIMIT 1
The most efficient way would be to know what those fields were called and simply select them.
SELECT `first_field`, `last_field` FROM `table`;
Probably like this:
SELECT dbo.Table.FirstCol, dbo.Table.LastCol FROM Table
You get minor efficiency enhancements from specifying the table name and schema.
First: MIN() and MAX() on a text column gives AAAA and TTTT results which are not the first and last entries in my test table. They are the minimum and maximum values as mentioned.
I tried this (with .stats on) on my table which has over 94 million records:
select * from
(select col1 from mitable limit 1)
union
select * from
(select col1 from mitable limit 1 offset
(select count(0) from mitable) -1);
But it uses up a lot of virtual machine steps (281,624,718).
Then this which is much more straightforward (which works if the table was created without WITHOUT ROWID) [sql keywords are in capitals]:
SELECT col1 FROM mitable
WHERE ROWID = (SELECT MIN(ROWID) FROM mitable)
OR ROWID = (SELECT MAX(ROWID) FROM mitable);
That ran with 55 virtual machine steps on the same table and produced the same answer.
min()/max() approach is wrong. It is only correct, if the values are ascending only. I needed something liket this for currency rates, which are random raising and falling.
This is my solution:
select st.*
from stats_ticker st,
(
select min(rowid) as first, max(rowid) as last --here is magic part 1
from stats_ticker
-- next line is just a filter I need in my case.
-- if you want first/last of the whole table leave it out.
where timeutc between datetime('now', '-1 days') and datetime('now')
) firstlast
WHERE
st.rowid = firstlast.first --and these two rows do magic part 2
OR st.rowid = firstlast.last
ORDER BY st.rowid;
magic part 1: the subselect results in a single row with the columns first,last containing rowid's.
magic part 2 easy to filter on those two rowid's.
This is the best solution I've come up so far. Hope you like it.
We can do that by the help of Sql Aggregate function, like Max and Min. These are the two aggregate function which help you to get last and first element from data table .
Select max (column_name ), min(column name) from table name
Max will give you the max value means last value and min will give you the min value means it will give you the First value, from the specific table.