select distinct over specific columns - sql

A query in a system I maintain returns
QID AID DATA
1 2 x
1 2 y
5 6 t
As per a new requirement, I do not want the (QID, AID)=(1,2) pair to be repeated. We also dont care what value is selected from "data" column. either x or y will do.
What I have done is to enclose the original query like this
SELECT * FROM (<original query text>) Results group by QID,AID
Is there a better way to go about this? The original query uses multiple joins and unions and what not, So I would prefer not to touch it unless its absolutely necesary

If you don't care which DATA will be selected, GROUP BY is nice, though using ungrouped and unaggregated columns in SELECT clause of a GROUP BY statement is MySQL specific and not portable.

Related

How can select different records from a list with duplicated values?

I'm new in the SQL/Oracle universe and I would like to ask for your help. This is a very simple question that I'm stuck in.
So, let me give you a picture. I have a regular table, let's call it "table1". The PK is the first column, "c1". Let's suppose that I would like to make the following select:
select (1) from table1 where c1 in ('1','2','3')
This will give me
(1)
1
1
2
1
3
1
However, if I make the following select
select (1) from table1 where c1 in ('1','2','2')
this will give me
(1)
1
1
2
1
My question is, why in the second case there is not 3 records? Can I modify the second case to give 3 records, in other words, how can I prevent to the selection acts like a "distinct" clause?
I know that it may be a dummy question, so let me thank you all in advance.
The where clause filters rows generated by the from clause.
Conditions in the where clause only specify whether or not a given row is in the result set. They do not specify how many times a given row is in the result set.
If you want to "multiply" the number of rows, you would need to use a join with a derived table that has duplicate values.

Remove duplicate rows on a SQL query [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Duplicate result
Interview - Detect/remove duplicate entries
I have a SQL Query, which returns a table with one column.
The returned data may be duplicate. for example, my query may be something like:
SELECT item FROM myTable WHERE something = 3
and the returned data may be something like this:
item
-----
2
1
4
5
1
9
5
My Question is, How to remove duplicated items from my query?
I mean, I want to get these results:
item
-----
2
1
4
5
9
Please note that I don't want to change or delete any rows in table. I just want to remove duplicates in that query.
How to do that?
SELECT DISTINCT item FROM myTable WHERE something = 3
As noted, the distinct keyword eliminates duplicate rows—where the rows have identical values in column—from the result set.
However, for a non-trivial query against a properly designed database, the presence of duplicate rows in the result set — and their elimination via select distinct or select ... group by is, IMHO, most often a "code smell" indicating improper or incorrect join criteria, or a lack of understanding of the cardinalities present in relationships between tables.
If I'm reviewing the code, select distinct or gratuitous group by without any obvious need present will get the containing query flagged and that query gone over with a fine toothed comb.
You need to add the DISTINCT keyword to your query.
This keyword is pretty standard and supported on all major databases.
See DISTINCT refs here
SELECT DISTINCT item FROM myTable WHERE something = 3
You just have to use distinct

Custom Sorting in SQL order by clause?

Here is the situation that I am trying to solve:
I have a query that could return a set of records. The field being sorted by could have a number of different values - for the sake of this question we will say that the value could be A, B, C, D, E or Z
Now depending on the results of the query, the sorting needs to behave as follows:
If only A-E records are found then sorting them "naturally" is okay. But if a Z record is in the results, then it needs to be the first result in the query, but the rest of the records should be in "natural" sort order.
For instance, if A C D are found, then the result should be
A
C
D
But if A B D E Z are found then the result should be sorted:
Z
A
B
D
E
Currently, the query looks like:
SELECT NAME, SOME_OTHER_FIELDS FROM TABLE ORDER BY NAME
I know I can code a sort function to do what I want, but because of how I am using the results, I can't seem to use because the results are being handled by a third party library, to which I am just passing the SQL query. It is then processing the results, and there seems to be no hooks for me to sort the results and just pass the results to the library. It needs to do the SQL query itself, and I have no access to the source code of the library.
So for all of you SQL gurus out there, can you provide a query for me that will do what I want?
How do you identify the Z record? What sets it apart? Once you understand that, add it to your ORDER BY clause.
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN (record matches Z) THEN 0
ELSE 1
END
),
name
This way, only the Z record will match the first ordering, and all other records will be sorted by the second-order sort (name). You can exclude the second-order sort if you really don't need it.
For example, if Z is the character string 'Bob', then your query might be:
SELECT name, *
FROM [table]
WHERE (x)
ORDER BY
(
CASE
WHEN name='Bob' THEN 0
ELSE 1
END
), name
My examples are for T-SQL, since you haven't mentioned which database you're using.
There are a number of ways to solve this problem and the best solution depends on a number of factors that you don't discuss such as the nature of those A..Z values and what database product you're using.
If you have only a single value that has to sort on top, you can ORDER BY an expression that maps that value to the lowest possible sort value (with CASE or IIF or IFEQ, depending on your database).
If you have several different special sort values you could ORDER BY a more complicated expression or you could UNION together several SELECTs, with one SELECT for the default sorts and an extra SELECT for each special value. The SELECTs would include a sort column.
Finally, if you have quite a few values you can put the sort values into a separate table and JOIN that table into your query.
Not sure what DB you use - the following works for Oracle:
SELECT
NAME,
SOME_OTHER_FIELDS,
DECODE (NAME, 'Z', '_', NAME ) SORTFIELD
FROM TABLE
ORDER BY DECODE (NAME, 'Z', '_', NAME ) ASC

distinct values from multiple fields within one table ORACLE SQL

How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...

SQL - Use results of a query as basis for two other queries in one statement

I'm doing a probability calculation. I have a query to calculate the total number of times an event occurs. From these events, I want to get the number of times a sub-event occurs. The query to get the total events is 25 lines long and I don't want to just copy + paste it twice.
I want to do two things to this query: calculate the number of rows in it, and calculate the number of rows in the result of a query on this query. Right now, the only way I can think of doing that is this (replace #total# with the complicated query to get all rows, and #conditions# with the less-complicated conditions that rows, from #total#, must have to match the sub-event):
SELECT (SELECT COUNT(*) FROM (#total#) AS t1 WHERE #conditions#) AS suboccurs,
COUNT(*) AS totaloccurs FROM (#total#) as t2
As you notice, #total# is repeated twice. Is there any way around this? Is there a better way to do what I'm trying to do?
To re-emphasize: #conditions# does depend on what #total# returns (it does stuff like t1.foo = bar).
Some final notes: #total# by itself takes ~250ms. This more complicated query takes ~300ms, so postgres is likely doing some optimization, itself. Still, the query looks terribly ugly with #total# literally pasted in twice.
If your sql supports subquery factoring, then rewriting it using the WITH statement is an option. It allows subqueries to be used more than once. With will create them as either an inline-view or a temporary table in Oracle.
Here is a contrived example.
WITH
x AS
(
SELECT this
FROM THERE
WHERE something is true
),
y AS
(
SELECT this-other-thing
FROM somewhereelse
WHERE something else is true
),
z AS
(
select count(*) k
FROM X
)
SELECT z.k, y.*, x.*
FROM x,y, z
WHERE X.abc = Y.abc
SELECT COUNT(*) as totaloccurs, COUNT(#conditions#) as suboccurs
FROM (#total# as t1)
Put the reused sub-query into a temp table, then select what you need from the temp table.
#EvilTeach:
I've not seen the "with" (probably not implemented in Sybase :-(). I like it: does what you need in one chunk then goes away, with even less cruft than temp tables. Cool.