Query works in MySQL not Oracle - sql

The following SQL statement works in MySQL but not with Oracle:
SELECT *, MAX(COLUMN_A)
FROM table_xyz
WHERE COLUMN_A <= 100
GROUP BY COLUMN_A
Oracle complaint: "FROM keyword not found where expected"
actually the statement was incorrect, we were not grouping by COLUMN_A but another column instead. actually what we want is this
SELECT *, MAX(COLUMN_A)
FROM table_xyz
WHERE COLUMN_A <= 100
GROUP BY COLUMN_B
this works but gives us only column A and B
SELECT COLUMN_B, MAX(COLUMN_A)
FROM table_xyz
WHERE COLUMN_A <= 100
GROUP BY COLUMN_B
what we want is this, but it doesn't work (group by error)
SELECT COLUMN_B, COLUMN_C .... COLUMN_X, MAX(COLUMN_A)
FROM table_xyz
WHERE COLUMN_A <= 100
GROUP BY COLUMN_B

That's because Oracle requires you to define all the columns not wrapped in an aggregate function (MIN, MAX, COUNT, etc). SQL Server would return a similar error. MySQL's behavior is documented here.
Because your query is using SELECT *, I can't re-write it properly for you. But I also can't guarantee a syntactically correct version would return the same results as you see on MySQL either. Grouping by the same column you want the MAX is quite odd...

If you want the max() for column_a you don't need the group by at all:
SELECT MAX(COLUMN_A)
FROM table_xyz
WHERE COLUMN_A <= 100

In addition to what everyone else is saying, Oracle does not allow mixing * with explicit column definitions in queries:
SQL> select *, table_name from user_tables;
select *, table_name from user_tables
*
ERROR at line 1:
ORA-00923: FROM keyword not found where expected
Oracle hasn't even looked at the fact that you are trying to get columns outside of those included in the group by clause. Which as others have stated, Oracle will not do.

This doesn't answer your MAX issue, but the only way to follow a '*' with other columns is if you use an explicit reference to a table alias - e.g.
SELECT e.*, zip_code
FROM addresses a,
employees e
WHERE e.addressId = a.Id
For the MAX value, you will either need to group by all other columns, or look into analytic functions (plenty of previous answers on Stack Overflow).

Multiple problems. Your GROUP BY clause is backwards. You need to define your GROUP BY by the columns in the *. Also what OMG Ponies said before.

Related

Select Distinct (case insensitive) on Postgres

This has been asked here and a few other places before but seems like the suggested answers either don't apply to postgres or don't work in this situation.
I'm looking to select distinct column names, eg:
SELECT DISTINCT column_name FROM table_name WHERE ... ORDER BY column_name however I'm looking to eliminate case sensitive duplicates (eg A and a should be considered the same thing)
I tried COLLATE but all available formats were case sensitive. And changing case via LOWER() OR UPPER() wont work because In this situation I need the case information.
I thought about something like this to grab unique values but still maintain the case:
SELECT DISTINCT upper(my_column) as upper_case, my_column
FROM my_table
ORDER BY upper(my_column)
But introducing my_column in the distinct query negates the whole thing.
How can I get unique values (case insensitive) without modifying the case of the results itself?
In PostgreSQL (but not many other databases), you can use a DISTINCT ON clause:
SELECT DISTINCT ON (upper(my_column)) my_column
FROM my_table
ORDER BY upper(my_column)
You can even choose which of the results you get, by adding another column to the ORDER BY clause to make the desired result appear first:
SELECT DISTINCT ON (upper(my_column)) my_column
FROM my_table
ORDER BY upper(my_column), other_column
Documentation: DISTINCT Clause
You can use an aggregation function:
SELECT MAX(my_column)
FROM my_table
GROUP BY upper(my_column);
This returns one value. If you want all the values:
SELECT ARRAY_AGG(DISTINCT my_column)
FROM my_table
GROUP BY upper(my_column);

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

Distinct with select *

Is it possible to use select * with distinct or write easily something that has the same impact?
I need to select all columns from a table with distinct value, but listing all the columns in select clause would be nerve-breaking because the number of columns is over 20!
In Microsoft SQL Server you can write:
select distinct * from MyTable
However, it is considered "best practice" to specify the columns explicitly, partly because it improves the performance of the query, but also to protect yourself from failures that would arise if the database schema were to change in the future
This should work:
SELECT DISTINCT * FROM TABLE_NAME
Use this query:
SELECT DISTINCT Employee, Rank
FROM Employees
Adding the "distinct" keyword right after "select" does the work.
For example:
SELECT DISTINCT * FROM TABLE_NAME

Counting DISTINCT over multiple columns

Is there a better way of doing a query like this:
SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
I need to count the number of distinct items from this table but the distinct is over two columns.
My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)
If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.
Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.
I believe a distinct count of the computed column would be equivalent to your query.
Edit: Altered from the less-than-reliable checksum-only query
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable
SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems
What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?
It certainly works as you might expect in Oracle.
SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
9 rows selected.
SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /
COUNT(*)
----------
9
SQL>
edit
I went down a blind alley with analytics but the answer was depressingly obvious...
SQL> select count(distinct concat(deptno,job)) from emp
2 /
COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9
SQL>
edit 2
Given the following data the concatenating solution provided above will miscount:
col1 col2
---- ----
A AA
AA A
So we to include a separator...
select col1 + '*' + col2 from t23
/
Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.
To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.
SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;
In MySQL you can do the same thing without the concatenation step as follows:
SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;
This feature is mentioned in the MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct
How about something like:
select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1
Probably just does the same as you are already though but it avoids the DISTINCT.
Some SQL databases can work with a tuple expression so you can just do:
SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
FROM DocumentOutputItems;
If your database doesn't support this, it can be simulated as per #oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.
COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).
MySQL specifically supports COUNT(DISTINCT expr, expr, ...) which is non-SQL standard syntax. It also notes In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).
A related use of tuples is performing IN queries such as:
SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));
Here's a shorter version without the subselect:
SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems
It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.
Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.
I have used this approach and it has worked for me.
SELECT COUNT(DISTINCT DocumentID || DocumentSessionId)
FROM DocumentOutputItems
For my case, it provides correct result.
There's nothing wrong with your query, but you could also do it this way:
WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery
If you're working with datatypes of fixed length, you can cast to binary to do this very easily and very quickly. Assuming DocumentId and DocumentSessionId are both ints, and are therefore 4 bytes long...
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.
However, using the above solution had virtually no increase on the query time (comparing with using simply the SUM), and should be completely reliable! It should be able to help others in a similar situation so I'm posting it here.
if you had only one field to "DISTINCT", you could use:
SELECT COUNT(DISTINCT DocumentId)
FROM DocumentOutputItems
and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:
SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId))
FROM DocumentOutputItems
but you'll have issues if NULLs are involved. I'd just stick with the original query.
Hope this works i am writing on prima vista
SELECT COUNT(*)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.
At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.
Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using
SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))
but this is also not conforting.
You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.
I would suggest using
SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems
I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)
SELECT COUNT(DISTINCT DocumentID) AS Count1,
COUNT(DISTINCT DocumentSessionId) AS Count2
FROM DocumentOutputItems
How about this,
Select DocumentId, DocumentSessionId, count(*) as c
from DocumentOutputItems
group by DocumentId, DocumentSessionId;
This will get us the count of all possible combinations of DocumentId, and DocumentSessionId
It works for me. In oracle:
SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;
In jpql:
SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
I had a similar question but the query I had was a sub-query with the comparison data in the main query. something like:
Select code, id, title, name
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1
ignoring the complexities of this, I realized I couldn't get the value of a.code into the subquery with the double sub query described in the original question
Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is
So eventually I figured out I could cheat, and combine the columns:
Select count(distinct(col1 || col2)) from mytable where code = a.code...
This is what ended up working
This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in MySQL like a charm.
select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)
from DocumentOutputItems
group by i ,s;
You can just use the Count Function Twice.
In this case, it would be:
SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId)
FROM DocumentOutputItems

What does "select count(1) from table_name" on any database tables mean?

When we execute select count(*) from table_name it returns the number of rows.
What does count(1) do? What does 1 signify here? Is this the same as count(*) (as it gives the same result on execution)?
The parameter to the COUNT function is an expression that is to be evaluated for each row. The COUNT function returns the number of rows for which the expression evaluates to a non-null value. ( * is a special expression that is not evaluated, it simply returns the number of rows.)
There are two additional modifiers for the expression: ALL and DISTINCT. These determine whether duplicates are discarded. Since ALL is the default, your example is the same as count(ALL 1), which means that duplicates are retained.
Since the expression "1" evaluates to non-null for every row, and since you are not removing duplicates, COUNT(1) should always return the same number as COUNT(*).
Here is a link that will help answer your questions. In short:
count(*) is the correct way to write
it and count(1) is OPTIMIZED TO BE
count(*) internally -- since
a) count the rows where 1 is not null
is less efficient than
b) count the rows
Difference between count(*) and count(1) in oracle?
count(*) means it will count all records i.e each and every cell
BUT
count(1) means it will add one pseudo column with value 1 and returns count of all records
This is similar to the difference between
SELECT * FROM table_name and SELECT 1 FROM table_name.
If you do
SELECT 1 FROM table_name
it will give you the number 1 for each row in the table. So yes count(*) and count(1) will provide the same results as will count(8) or count(column_name)
There is no difference.
COUNT(1) is basically just counting a constant value 1 column for each row. As other users here have said, it's the same as COUNT(0) or COUNT(42). Any non-NULL value will suffice.
http://asktom.oracle.com/pls/asktom/f?p=100:11:2603224624843292::::P11_QUESTION_ID:1156151916789
The Oracle optimizer did apparently use to have bugs in it, which caused the count to be affected by which column you picked and whether it was in an index, so the COUNT(1) convention came into being.
SELECT COUNT(1) from <table name>
should do the exact same thing as
SELECT COUNT(*) from <table name>
There may have been or still be some reasons why it would perform better than SELECT COUNT(*)on some database, but I would consider that a bug in the DB.
SELECT COUNT(col_name) from <table name>
however has a different meaning, as it counts only the rows with a non-null value for the given column.
in oracle i believe these have exactly the same meaning
You can test like this:
create table test1(
id number,
name varchar2(20)
);
insert into test1 values (1,'abc');
insert into test1 values (1,'abc');
select * from test1;
select count(*) from test1;
select count(1) from test1;
select count(ALL 1) from test1;
select count(DISTINCT 1) from test1;
Depending on who you ask, some people report that executing select count(1) from random_table; runs faster than select count(*) from random_table. Others claim they are exactly the same.
This link claims that the speed difference between the 2 is due to a FULL TABLE SCAN vs FAST FULL SCAN.