SQL query trick - sql

I need to do
select * from xxx where name in (a,b,c...);
but I want the result set to be in the order of (a,b,c...). is this possible?

I found this question which is looks like your original question: Ordering by the order of values in a SQL IN() clause

ah - I see. you could do something horrendous with a case statement, and then order by that.. you'd effectivley be adding another column to your query to be an "order" that you could then "order by"
its ugly, but if you control the query, and the number in the 'in' clause is low, it could work (beleive an 'in' clause is limited to 255 chars)
e.g "IF name = a then 1 else if name = b then 2"
Failing that, probably best to sort in the client using a similar technique (assuming it was the client that injected the information into the 'in' clause in the first place)
-Ace

The method to do this will be DB-specific.
In Oracle, you could do something like:
SELECT * FROM xxx
where name in (a,b,c...)
ORDER BY DECODE(name,a,1,b,2,c,3);

IN statements are pretty limited, but you could get a similar effect by joining on a subquery.
here's an example:
SELECT x.*
FROM xxx as x
INNER JOIN ((select a as name, 1 as ord)
UNION
(select b as name, 2 as ord)
UNION
(select c as name, 3 as ord)) as t
ON t.name = x.name
ORDER BY t.ord
its pretty ugly, but it should work on just about any sql database. The ord field explicitly allows you to set the ordering of the result. some databases such as SqlServer support a ROWINDEX feature so you may be able to use that to clean it up a bit.

Related

SQL simple GROUP BY query

Is there a way to make a simple GROUP BY query with SQL and not use COUNT,AVG or SUM? I want to show all columns and group it with a single column.
SELECT * FROM [SPC].[dbo].[BoardSFC] GROUP BY boardsn
The query above is working on Mysql but not on SQL, is there a way to achieve this? any suggestion would be great
UPDATE: Here is my data I just need to group them by boardsn and get imulti equals to 1
I thing you just understand 'group data' in a different way than it is implemented in sql server. You simply want rows that have the same value together in the result and that would be ordering not grouping. So maybe what you need is:
SELECT *
FROM [SPC].[dbo].[BoardSFC]
WHERE imulti = 1
ORDER BY boardsn
The query above is working on Mysql but not on SQL, is there a way to achieve this? any suggestion would be great
No, there is not. MySQL only lets you do this because it violates the various SQL standards quite egregiously.
You need to name each column you want in the result-set whenever you use GROUP BY. The SELECT * feature is only provided as a convenience when working with data interactively - in production code you should never use SELECT *.
You could use a TOP 1 WITH TIES combined with a ORDER BY ROW_NUMBER.
SELECT TOP 1 WITH TIES *
FROM [SPC].[dbo].[BoardSFC]
ORDER BY ROW_NUMBER() OVER (PARTITION BY boardsn ORDER BY imulti)
Or more explicitly, use ROW_NUMBER in a sub-query
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY boardsn ORDER BY imulti) as RN
FROM [SPC].[dbo].[BoardSFC]
) q
where RN = 1

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

Is it possible to use column values to determine the tablename in a from clause of a subselect?

I would like to run a query that uses a column value (in which a tablename is stored) to determine the tablename in a from clause of a subselect.
Something like that:
SELECT column_with_tablename,
(SELECT COUNT(*) FROM VALUEOF(column_with_tablename)) as
numberofitems
FROM table1
I know that this is very fragile but i need it work.
(I inherited a database that stores tablenames in a column)
Is there a way ?
for a pure sql solution, refer to this answer
select column_with_tablename,
to_number(extractvalue(xmltype
(dbms_xmlgen.getxml('select count(*) c from '||column_with_tablename)
),'/ROWSET/ROW/C')) as count
from table1

orderby in sql query

I need to order sql query by a column (the three different values in this column are C,E,T).
I want the results in order of E,C,T. So, of course I can't use ascending or descending orderby on this column.
Any suggestions how can I do this? I don't know if that matters or not but I am using sybase data server on tomcat.
You could do it by putting a conditional in your select clause. i'm not Sybase guy but it might look something like this:
SELECT col, if col = 'E' then 1 else if col = 'C' then 2 else 3 end AS sort_col
FROM some_table
ORDER BY sort_col
If your AS alias doesn't work you could sort by column 1-based index like this:
ORDER BY 2
The other methods work, but this is an often overlooked trick (in MSSQL, I'm not positive if it works in Sybase or not):
select
foo,
bar
from
bortz
order by
case foo
when 'E' then 1
when 'C' then 2
when 'T' then 3
else 4
end
You could use a per-row function to change the columns as other answers have stated but, if you expect this database to scale well, per-row functions are rarely a good idea.
Feel free to ignore this advice if your table is likely to remain small.
The advice here works because of a few general "facts" (I enclose that in quotation marks since it's not always the case but, in my experience, it mostly is):
The vast majority of databases are read far more often than they're written. That means it's usually a good idea to move the cost of calculation to the write phase rather than the read phase.
Most problems with database tend to be the "my query is slow" type rather than the "there's not enough disk space" type.
Tables always grow bigger than you thought they would :-)
If your situation is matched by those "facts", it makes sense to sacrifice a little disk space in order to speed up your queries. It's also better to incur the cost of calculation only when necessary (insert/update), not when the data hasn't actually changed (select).
To do that, you would create a new column (ect_col_sorted for example) in the table which would hold a numeric sort value (or more than one column if you want different soert orders).
The have an insert/update trigger so that, whenever a row is added to, or changed in, the table, you populate the sort field with the correct value (E = 1, C = 2, T = 3, anything else = 0). Then put an index on that column and your query becomes a much simpler (and faster):
select ect_col, other_col_1, other_col_2
from ect_table
order by ect_col_sorted;
Idea is to add subquery with condition that will return your data row plus fictive value which will be 0 if there is E, 1 for E and 2 for T. Then simply order it by this column.
Hope it helps.
psasik's solution will work, as will this one (which to use and which is faster depends on what else is going on in the query):
select *
from some_table
where col = 'E'
UNION ALL
select *
from some_table
where col = 'C'
UNION ALL
select *
from some_table
where col = 'E'
that should work, but you can also do this which will be "safer" for large dataset which may be paged...
select *, 1 as o
from some_table
where col = 'E'
UNION ALL
select *, 2 as o
from some_table
where col = 'C'
UNION ALL
select *, 3 as o
from some_table
where col = 'E'
ORDER BY o
After I wrote the above I decided this is the best solution (note, I do not know if this will work on a sybase server as I don't have access to one right now but if it does not work on there just pull the creation of the keysort memory table out to a variable or temporary table -- which ever sybase supports)
;WITH keysort (k,o) AS
(
SELECT 'E',0
UNION ALL
SELECT 'C',1
UNION ALL
SELECT 'E',2
)
SELECT *
FROM some_table
LEFT JOIN keysort ON some_table.col = keysort.k
ORDER BY keysort.o
This should be the fastest of all choices -- uses in memory table to exploit sql's optimized joining.
You can even go about using Field() function.
Order by Field(columnname, E, C, T)
Hope this helps you

Counting DISTINCT over multiple columns

Is there a better way of doing a query like this:
SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery
I need to count the number of distinct items from this table but the distinct is over two columns.
My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)
If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.
Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.
I believe a distinct count of the computed column would be equivalent to your query.
Edit: Altered from the less-than-reliable checksum-only query
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable
SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems
What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?
It certainly works as you might expect in Oracle.
SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
9 rows selected.
SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /
COUNT(*)
----------
9
SQL>
edit
I went down a blind alley with analytics but the answer was depressingly obvious...
SQL> select count(distinct concat(deptno,job)) from emp
2 /
COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9
SQL>
edit 2
Given the following data the concatenating solution provided above will miscount:
col1 col2
---- ----
A AA
AA A
So we to include a separator...
select col1 + '*' + col2 from t23
/
Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.
To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.
SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;
In MySQL you can do the same thing without the concatenation step as follows:
SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;
This feature is mentioned in the MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct
How about something like:
select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1
Probably just does the same as you are already though but it avoids the DISTINCT.
Some SQL databases can work with a tuple expression so you can just do:
SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
FROM DocumentOutputItems;
If your database doesn't support this, it can be simulated as per #oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.
COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).
MySQL specifically supports COUNT(DISTINCT expr, expr, ...) which is non-SQL standard syntax. It also notes In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).
A related use of tuples is performing IN queries such as:
SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));
Here's a shorter version without the subselect:
SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems
It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.
Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.
I have used this approach and it has worked for me.
SELECT COUNT(DISTINCT DocumentID || DocumentSessionId)
FROM DocumentOutputItems
For my case, it provides correct result.
There's nothing wrong with your query, but you could also do it this way:
WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery
If you're working with datatypes of fixed length, you can cast to binary to do this very easily and very quickly. Assuming DocumentId and DocumentSessionId are both ints, and are therefore 4 bytes long...
SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems
My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.
However, using the above solution had virtually no increase on the query time (comparing with using simply the SUM), and should be completely reliable! It should be able to help others in a similar situation so I'm posting it here.
if you had only one field to "DISTINCT", you could use:
SELECT COUNT(DISTINCT DocumentId)
FROM DocumentOutputItems
and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:
SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId))
FROM DocumentOutputItems
but you'll have issues if NULLs are involved. I'd just stick with the original query.
Hope this works i am writing on prima vista
SELECT COUNT(*)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.
At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.
Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using
SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))
but this is also not conforting.
You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.
I would suggest using
SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems
I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)
SELECT COUNT(DISTINCT DocumentID) AS Count1,
COUNT(DISTINCT DocumentSessionId) AS Count2
FROM DocumentOutputItems
How about this,
Select DocumentId, DocumentSessionId, count(*) as c
from DocumentOutputItems
group by DocumentId, DocumentSessionId;
This will get us the count of all possible combinations of DocumentId, and DocumentSessionId
It works for me. In oracle:
SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;
In jpql:
SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
I had a similar question but the query I had was a sub-query with the comparison data in the main query. something like:
Select code, id, title, name
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1
ignoring the complexities of this, I realized I couldn't get the value of a.code into the subquery with the double sub query described in the original question
Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is
So eventually I figured out I could cheat, and combine the columns:
Select count(distinct(col1 || col2)) from mytable where code = a.code...
This is what ended up working
This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in MySQL like a charm.
select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)
from DocumentOutputItems
group by i ,s;
You can just use the Count Function Twice.
In this case, it would be:
SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId)
FROM DocumentOutputItems