SQL Server : GROUP error - sql

I have a problem in SQL Server. I have a table name rTable and inside many columns of different types (Date, XML, varchar and so..)
Now I need group by one of this columns.
SELECT TOP 100 *
FROM rTable
GROUP BY integer_column
But this query give me error.
In MySQL it working normaly, but SQL Server asking me for do some operation in group with another columns, I need just show it.
What is recomendation?

You can not select all the column with group by clause in MS SQL.
SELECT integer_column, count(integer_column)
FROM rTable
GROUP BY integer_column
and also can use aggregate function like count, sum in select clause if you wanted.

Actually, this is not a limitation of MS SQL (although it has many), but it is conforming to the SQL standard.
You have to name all columns in the GROUP BY clause, except those using an aggregate function.

You are using two MySQL extensions that don't work in SQL Server.
In Standard SQL each column selected in an aggregation query must either be in the GROUP BY clause (integer_column in your case) or get aggregated (such as max(other_column)). You select * instead, so you select integer_column and other_column in this example. In MySQL you simply get a random match for other_column in the group. In SQL Server you get a syntax error instead.
Then: MySQL extends GROUP BY, so that it is guaranteed to get the records in the order specified in GROUP BY. In Standard SQL you must add an ORDER BY clause to be sure to get the records in the desired order. TOP without ORDER BY might give you 100 random records.
Something like this would work:
SELECT TOP 100
integer_column,
MIN(other_column) AS other_column
FROM rTable
GROUP BY integer_column
ORDER BY integer_column;
EDIT: I wonder whether you know what yor MySQL query is actually doing, so I think I'd better elaborate. Let's say your table only contains three columns integer_column, cola, and colb. And let's further say your table has just three records:
integer_column cola colb
1 a1 b1
1 a2 b2
2 a3 b3
Your query picks the lowest 100 integer_column. In this example exist just two: 1 and 2, so both are picked. Then per integer_column the query picks random cola and colb. Your result may look like this:
integer_column cola colb
1 a2 b1
2 a3 b3
(It usually won't. The dbms will play it easy and pick values from one record only, either the first or the last, so you probably get either 1-a1-b1 or 1-a2-b2, but there is no guarantee to that.)
Having said this, you will get the lowest integer_column along with indeterminate values. In Standard SQL you can use MIN or MAX, to get a value. It doesn't matter if you use MIN(cola) which is a1 or MAX(cola) which is a2, as you don't care anyhow.

Related

SQL for getting each category data in maria db

I need to fetch 4 random values from each category. What should be the correct sql syntax for maria db. I have attached one image of table structure.
Please click here to check the structure
Should i write some procedure or i can do it with basic sql syntax?
You can do that with a SQL statement if you only have a few rows:
SELECT id, question, ... FROM x1 ORDER BY rand() LIMIT 1
This works fine if you have only a few rows - as soon as you have thousands of rows the overhead for sorting the rows becomes important, you have to sort all rows for getting only one row.
A trickier but better solution would be:
SELECT id, question from x1 JOIN (SELECT CEIL(RAND() * (SELECT(MAX(id)) FROM x1)) AS id) as id using(id);
Running EXPLAIN on both SELECTS will show you the difference...
If you need random value for different categories combine the selects via union and add a where clause
http://mysql.rjweb.org/doc.php/groupwise_max#top_n_in_each_group
But then ORDER BY category, RAND(). (Your category is the blog's province.)
Notice how it uses #variables to do the counting.
If you have MariaDB 10.2, then use one of its Windowing functions.
SELECT column FROM table WHERE category_id = XXX
ORDER BY RAND()
LIMIT 4
do it for all categories

Get ANY(col) instead of MIN(col) from a group

I have a SQL Query (simplified from real use):
SELECT MIN(cola), colb FROM tbl GROUP BY colb;
But actually, I don't need the minimum value- any cola value will do- it's only used to show an example value from the group.
At the moment PG has to do the group and then sort each group by cola to find the minimum value in the group, but this is slow because there's a lot of records in each group.
Does Postgres have some kind of FIRST(cola) or ANY(cola) that would just return whatever cola it sees first (like MySQL does when you don't use an aggregate function) or without needing to sort / read cola from every row?
I think using DISTINCT ON() with no order by will achieve what you are after:
SELECT DISTINCT ON (ColB) ColA, ColB
FROM tbl;
Example on SQL Fiddle
The docs state
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
However, with no example data to work on I can't actually compare if this will outperform using MIN or any other aggregate function.
This statement:
At the moment PG has to do the group and then sort each group by cola
to find the minimum value in the group, but this is slow because
there's a lot of records in each group.
May logically describe what Postgres does, but it does not explain what is actually going on.
Postgres -- as with any database that I'm familiar with -- will keep a "register" for the minimum value. As new data comes in, it will compare the value in the next row to the minimum. If the new value is smaller, then it will be copied in. This, incidentally, is whay min(), max(), avg(), and count() are all faster than count(distinct). For the latter, the list of values within a group must be maintained.
The distinct on approach may be faster than the group by. The reason, however, is not because the database engine is sorting all values for a given colb to get the minimum.
Try using fetch first row at the end of your sql:
http://www.postgresql.org/docs/8.1/static/sql-fetch.html
SELECT MIN(cola), colb
FROM tbl
GROUP BY colb
FETCH FIRST ROW only;
Inspired by Gareth's answer above:
SQL Fiddle
; WITH C as (SELECT *, ROW_NUMBER() OVER (PARTITION BY ColB) as rn FROM tbl)
SELECT *
FROM c
WHERE rn = 1
Not sure if it will perform any better\worse than MIN().

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

counting rows in select clause with DB2

I would like to query a DB2 table and get all the results of a query in addition to all of the rows returned by the select statement in a separate column.
E.g., if the table contains columns 'id' and 'user_id', assuming 100 rows, the result of the query would appear in this format: (id) | (user_id) | 100.
I do not wish to use a 'group by' clause in the query. (Just in case you are confused about what i am asking) Also, I could not find an example here: http://mysite.verizon.net/Graeme_Birchall/cookbook/DB2V97CK.PDF.
Also, if there is a more efficient way of getting both these results (values + count), I would welcome any ideas. My environment uses zend framework 1.x, which does not have an ODBC adapter for DB2. (See issue http://framework.zend.com/issues/browse/ZF-905.)
If I understand what you are asking for, then the answer should be
select t.*, g.tally
from mytable t,
(select count(*) as tally
from mytable
) as g;
If this is not what you want, then please give an actual example of desired output, supposing there are 3 to 5 records, so that we can see exactly what you want.
You would use window/analytic functions for this:
select t.*, count(*) over() as NumRows
from table t;
This will work for whatever kind of query you have.

GROUP BY combined with ORDER BY

The GROUP BY clause groups the rows, but it does not necessarily sort the results in any particular order. To change the order, use the ORDER BY clause, which follows the GROUP BY clause. The columns used in the ORDER BY clause must appear in the SELECT list, which is unlike the normal use of ORDER BY. [Oracle by Example, fourth Edition, page 274]
Why is that? Why does using GROUP BY influence the required columns in the SELECT clause?
Also, in the case where I do not use GROUP BY: Why would I want to ORDER BY some columns but then select only a subset of the columns?
Actually the statement is not entirely true as Dave Costa's example shows.
The Oracle documentation says that an expression can be used but the expression must be based on the columns in the selection list.
expr - expr orders rows based on their value for expr. The expression is based on
columns in the select list or columns in the tables, views, or materialized views in the
FROM clause. Source: Oracle® Database
SQL Language Reference
11g Release 2 (11.2)
E26088-01
September 2011. Page 19-33
From the the same work page 19-13 and 19-33 (Page 1355 and 1365 in the PDF)
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#SQLRF01702
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#i2171079
The bold text from your quote is incorrect (it's probably an oversimplification that is true in many common use cases, but it is not strictly true as a requirement). For instance, this statement executes just fine, although AVG(val) is not in the select list:
WITH DATA AS (SELECT mod(LEVEL,3) grp, LEVEL val FROM dual CONNECT BY LEVEL < 100)
SELECT grp,MIN(val),MAX(val)
FROM DATA
GROUP BY grp
ORDER BY AVG(val)
The expressions in the ORDER BY clause simply have to be possible to evaluate in the context of the GROUP BY. For instance, ORDER BY val would not work in the above example, because the expression val does not have a distinct value for each row produced by the grouping.
As to your second question, you may care about the ordering but not about the value of the ordering expression. Excluding unneeded expressions from the select lists reduces the amount of data that must actually be sent from the server to the client.
First:
The implementation of group by is one which creates a new resultset that differs in structure to the original from clause (table view or some joined tables). That resultset is defined by what is selected.
Not every SQL RDBMS has this restriction, though it is a always requirement that what is ordered by be either an aggregate function of the non-grouped columns (AVG, SUM, etc) or one of the columns grouped by, or functions upon more than one of those results (like adding two columns), because this is a logical requirement of the result of the grouping operation.
Second:
Because you only care about that column for the ordering. For example, you might have a list of the top selling singles without giving their sales (the NYT Bestsellers keeps some details of their data a secret, but do have a ranked list). Of course, you can get around this by just selecting that column and then not using it.
The data is aggregated before it is sorted for the ORDER BY.
If you try to order by any other column (that is not in the group by list or an aggregation function), what value would be used? There is no single value to use for ordering.
I believe that you can use combinations of the values for sorting. So you can say:
order by a+b
If a and b are in the group by. You just cannot introduce columns not mentioned in the SELECT. I believe you can use aggregation functions not mentioned in the SELECT, however.
Sample table
sample.grades
Name Grade Score
Adam A 95
Bob A 97
Charlie C 75
First Query using GROUP BY
Select grade, count(Grade) from sample.grades GROUP BY Grade
Output
Grade Count
A 2
C 1
Second Query using order by
select Name, score from sample grades order by score
Output
Bob A 97
Adam A 95
Charlie C 75
Third Query using GROUP BY and ordering
Select grade, count(Grade) from sample.grades GROUP BY Grade desc
Output
Grade Count
A 2
C 1
Once you start using things like Count, you must have group by. You can use them together, but they have very different uses, as I hope the examples clearly show.
To try and answer the question, why does group by effect the items in the select section, because that is what group by is meant to do. You can't do the count of a column if you do not group by that column.
Second question, why would you want to order by but not select all the columns?
If I want to order by the score, but do not care about the actual grade or even the score I might do
select name from sample.grades order by score
Output
Name
Bob
Adam
Charlie
Which results do you expect to see ordering by columns not listed in the select list and not participated in group by clause? at any case all kind of sort by non-mentioned in SELECT list columns will be omitted so Oracle guys added the restriction correctly.
with c as (
select 1 id, 2 value from dual
union all
select 1 id, 3 value from dual
union all
select 2 id, 3 value from dual
)
select id
from c
group by id
order by count(*) desc
Here my inderstanding
"The GROUP BY clause groups the rows, but it does not necessarily sort the results in any particular order."
-> you can use Group by without order by
"To change the order, use the ORDER BY clause, which follows the GROUP BY clause."
-> the rows are selected by defaut with primary key, and if you add order by you must add after group by
"The columns used in the ORDER BY clause must appear in the SELECT list, which is unlike the normal use of ORDER BY."