EXCEPT query in SQL - sql

I am doing an except query in SQL like this
Q1 EXCEPT Q2 EXCEPT Q3
Where Q1, Q2 and Q3 are sub-query.
I just want to know what will be its output, (Q1-Q2)-Q3 or Q1-(Q2-Q3)? and if 2nd, how to get 1st one as output?

I don't have an installation of DB2, but in PostgreSQL, Q1 except Q2 except Q3 appears to be interpolated as (Q1 except Q2) except Q3 (note: generate_series(m,n) is a function found in PostgreSQL that generates a single column of integer values from m to n (where m<n, of course)):
select generate_series(1,10) except select generate_series(5,15) except select generate_series(10,20);
generate_series
-----------------
1
2
3
4
(4 rows)
select * from (select generate_series(1,10) except select generate_series(5,15))a except select generate_series(10,20);
generate_series
-----------------
1
2
3
4
(4 rows)
select generate_series(1,10) except select * from (select generate_series(5,15) except select generate_series(10,20))a;
generate_series
-----------------
1
2
3
4
10
(5 rows)
However, it's best to use parentheses to make certain that the order of evaluation is how you want it.

The SQL-92 Standard confirms (Annex C) what should be fairly intuitive: in the absence of parentheses, both occurrences of the EXCEPT in the sample code will (obviously) be at the same level of precedence, in which case they shall be evaluated in left-to-right order.
Out of interest, the precedence of EXCEPT, INTERSECT and UNION (parentheses aside) is implementation dependent, so consult the SQL product's documentation. However, a quick google seems to suggest that most SQL products (including DB2) have EXCEPT and UNION at the same level of precedence and INTERSECT at a higher level of precedence.

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Grouping sets columns in aggregate arguments and NULL replacement

There are many grouping sets examples on the internet like query Q1 in the example below. But query Q2 is different because A2 is a grouping column and it is used as the argument to SUM().
Which one of the following is correct for Q2 according to the SQL Standard (any version since 2003 that supports grouping sets)? If (1) is correct, please explain why with reference to the Standard.
A2 is replaced by NULL unless it is in an argument to an aggregate. This interpretation would give results R1 below. This is Oracle's behaviour (which seems more useful).
A2 is replaced by NULL including where it is used in an aggregate: this means that the aggregate will return NULL.
This interpretation would give results R2 below. This is how I have understood the SQL Standard (possibly incorrectly).
Example code:
-- Setup
create table A (A1 int, A2 int, A3 int);
insert into A values (1, 1, 100);
insert into A values (1, 2, 40);
insert into A values (2, 1, 70);
insert into A values (5, 1, 90);
-- Query Q1
-- Expected/Observed results:
--
-- A1 A2 SUM(A3)
-- ---------- ---------- ----------
-- 1 - 140
-- 2 - 70
-- 5 - 90
-- - 1 260
-- - 2 40
-- - - 300
select A1, A2, sum (A3)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;
-- Query Q2
-- Results R1 (Oracle):
-- A1 A2 SUM(A2)
-- ---------- ---------- ----------
-- 1 - 3
-- 2 - 1
-- 5 - 1
-- - 1 3
-- - 2 2
-- - - 5
--
-- Results R2 (SQL Standard?):
-- A1 A2 SUM(A2)
-- ---------- ---------- ----------
-- 1 - -
-- 2 - -
-- 5 - -
-- - 1 3
-- - 2 2
-- - - - -- NULL row
select A1, A2, sum (A2)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;
I am aware of this from SQL 2003 7.9 Syntax 17, which describes how columns are replaced with NULLs. However, I might have missed or misunderstood a rule elsewhere that excludes arguments to aggregates.
m) For each GS_i:
iii) Case:
1) If GS_i is an <ordinary grouping set>, then
A) Transform SL2 to obtain SL3, and transform HC to obtain
HC3, as follows:
II) Replace each <column reference> in SL2 and HC that
references PC_k by "CAST(NULL AS DTPCk)"
As with many difficult SQL features, it can help to look at earlier versions of the standard where the phrasing might be simpler. And it turns out that grouping sets were introduced in SQL 1999 and were then revised in SQL 2003.
SQL 1999
Syntax Rule 4 states:
Let SING be the <select list> constructed by removing from SL every <select
sublist> that is not a <derived column> that contains at least one <set
function specification>.
Then Syntax Rule 11 defines PC_k as the column references contained in the group by list. It constructs a derived table projecting the union of GSQQL_i, which are query specifications projecting the PC_k or NULL as appropriate, the PCBIT_i grouping function indicators and SING.
Thus any that contains a set function will not have its argument replaced, and its columns won't be replaced either. So answer (1) is correct.
However, in the following query the GSQQL_i corresponding to the <grand total> doesn't group by C1 so I think it will give an error rather than replacing C1 with NULL for that grouping set.
select C1 + MAX(C2) from T group by grouping sets ((C1), ());
SQL 2003 - 2011
I still don't have a definitive answer for this. It hinges on what they meant (or forgot to specify?) by "references" in the replacement rule. It would be clearer if it said one of "immediately contained", "simply contained" or "directly contained", as defined in ISO 9075-1 (SQL Part 1: Framework).
The note (number 134 in SQL 2003) at the start of General Rules says "As a
result of the syntactic transformations specified in the Syntax Rules of this
Sub-clause, only primitive <group by clause>s are left to consider." So the
aggregate argument either has or has not actually been replaced: we aren't
expected to evaluate aggregates in a special way (whereas if General Rule 3 were in effect applied before the NULL substitution of Syntax Rule 17 then answer (1) would be correct).
I found a draft of Technical Corrigendum 5 [pdf], which is a "diff" towards SQL 2003. This includes the relevant changes to on pages 80-87. Unfortunately the bulk of the change has only the brief rationale "Provide a correct, unified treatment of CUBE and ROLLUP". General Rule 3, quoted above, has the rationale "clarify the semantics of column references".

How to group by a column

Hi I know how to use the group by clause for sql. I am not sure how to explain this so Ill draw some charts. Here is my original data:
Name Location
----------------------
user1 1
user1 9
user1 3
user2 1
user2 10
user3 97
Here is the output I need
Name Location
----------------------
user1 1
9
3
user2 1
10
user3 97
Is this even possible?
The normal method for this is to handle it in the presentation layer, not the database layer.
Reasons:
The Name field is a property of that data row
If you leave the Name out, how do you know what Location goes with which name?
You are implicitly relying on the order of the data, which in SQL is a very bad practice (since there is no inherent ordering to the returned data)
Any solution will need to involve a cursor or a loop, which is not what SQL is optimized for - it likes working in SETS not on individual rows
Hope this helps
SELECT A.FINAL_NAME, A.LOCATION
FROM (SELECT DISTINCT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.NAME,
YT.LOCATION
FROM YOUR_TABLE_7 YT) A
As Jirka correctly pointed out, I was using the Outer select, distinct and raw Name unnecessarily. My mistake was that as I used DISTINCT , I got the resulted sorted like
1 1
2 user2 1
3 user3 97
4 user1 1
5 3
6 9
7 10
I wanted to avoid output like this.
Hence I added the raw id and outer select
However , removing the DISTINCT solves the problem.
Hence only this much is enough
SELECT DECODE((LAG(YT.NAME, 1) OVER(ORDER BY YT.NAME)),
YT.NAME,
NULL,
YT.NAME) AS FINAL_NAME,
YT.LOCATION
FROM SO_BUFFER_TABLE_7 YT
Thanks Jirka
If you're using straight SQL*Plus to make your report (don't laugh, you can do some pretty cool stuff with it), you can do this with the BREAK command:
SQL> break on name
SQL> WITH q AS (
SELECT 'user1' NAME, 1 LOCATION FROM dual
UNION ALL
SELECT 'user1', 9 FROM dual
UNION ALL
SELECT 'user1', 3 FROM dual
UNION ALL
SELECT 'user2', 1 FROM dual
UNION ALL
SELECT 'user2', 10 FROM dual
UNION ALL
SELECT 'user3', 97 FROM dual
)
SELECT NAME,LOCATION
FROM q
ORDER BY name;
NAME LOCATION
----- ----------
user1 1
9
3
user2 1
10
user3 97
6 rows selected.
SQL>
I cannot but agree with the other commenters that this kind of problem does not look like it should ever be solved using SQL, but let us face it anyway.
SELECT
CASE main.name WHERE preceding_id IS NULL THEN main.name ELSE null END,
main.location
FROM mytable main LEFT JOIN mytable preceding
ON main.name = preceding.name AND MIN(preceding.id) < main.id
GROUP BY main.id, main.name, main.location, preceding.name
ORDER BY main.id
The GROUP BY clause is not responsible for the grouping job, at least not directly. In the first approximation, an outer join to the same table (LEFT JOIN below) can be used to determine on which row a particular value occurs for the first time. This is what we are after. This assumes that there are some unique id values that make it possible to arbitrarily order all the records. (The ORDER BY clause does NOT do this; it orders the output, not the input of the whole computation, but it is still necessary to make sure that the output is presented correctly, because the remaining SQL does not imply any particular order of processing.)
As you can see, there is still a GROUP BY clause in the SQL, but with a perhaps unexpected purpose. Its job is to "undo" a side effect of the LEFT JOIN, which is duplication of all main records that have many "preceding" ( = successfully joined) records.
This is quite normal with GROUP BY. The typical effect of a GROUP BY clause is a reduction of the number of records; and impossibility to query or test columns NOT listed in the GROUP BY clause, except through aggregate functions like COUNT, MIN, MAX, or SUM. This is because these columns really represent "groups of values" due to the GROUP BY, not just specific values.
If you are using SQL*Plus, use the BREAK function. In this case, break on NAME.
If you are using another reporting tool, you may be able to compare the "name" field to the previous record and suppress printing when they are equal.
If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
Relying on implicit GROUP BY sorting in MySQL 5.6 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause. GROUP BY sorting is a MySQL extension that may change in a future release; for example, to make it possible for the optimizer to order groupings in whatever manner it deems most efficient and to avoid the sorting overhead.
For full information - http://academy.comingweek.com/sql-groupby-clause/
SQL GROUP BY STATEMENT
SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical data into groups.
Syntax:
1. SELECT column_nm, aggregate_function(column_nm) FROM table_nm WHERE column_nm operator value GROUP BY column_nm;
Example :
To understand the GROUP BY clauserefer the sample database.Below table showing fields from “order” table:
1. |EMPORD_ID|employee1ID|customerID|shippers_ID|
Below table showing fields from “shipper” table:
1. | shippers_ID| shippers_Name |
Below table showing fields from “table_emp1” table:
1. | employee1ID| first1_nm | last1_nm |
Example :
To find the number of orders sent by each shipper.
1. SELECT shipper.shippers_Name, COUNT (orders.EMPORD_ID) AS No_of_orders FROM orders LEFT JOIN shipper ON orders.shippers_ID = shipper.shippers_ID GROUP BY shippers_Name;
1. | shippers_Name | No_of_orders |
Example :
To use GROUP BY statement on more than one column.
1. SELECT shipper.shippers_Name, table_emp1.last1_nm, COUNT (orders.EMPORD_ID) AS No_of_orders FROM ((orders INNER JOIN shipper ON orders.shippers_ID=shipper.shippers_ID) INNER JOIN table_emp1 ON orders.employee1ID = table_emp1.employee1ID)
2. GROUP BY shippers_Name,last1_nm;
| shippers_Name | last1_nm |No_of_orders |
for more clarification refer my link
http://academy.comingweek.com/sql-groupby-clause/

How can I use an alias inside a sub-query?

What I am trying to achieve is to get the rows having the maximum value from a table of the following form:
A | B | Ratio
0 | 1 | 1.0
1 | 2 | 1.0
2 | 3 | 1.0
3 | 4 | 0.33
4 | 5 | 0.25
I am trying to display only rows containing the maximum value (in this case 1.0). May be I am not doing this right. I have a query of the form:
SELECT A,B,C
FROM (---Long Sub Query--- Aliased as Full_Table)
WHERE RATIO=(SELECT MAX(RATIO) FROM Full_Table);
But Full_Table cannot be referenced from the second sub-query. There are some rows having the same maximum value which is the reason I was using this query. Is there a better construct to achieve this? In the worst case, I have to replace the second Full_Table by the entire long query but I'm hoping there is a better way to do this.
You can use a Common Table Expression:
WITH Full_Table AS (---Long Sub Query---)
SELECT A,B,C
FROM Full_Table
WHERE RATIO=(SELECT MAX(RATIO) FROM Full_Table);
Use:
SELECT full_table.a,
full_table.b,
full_table.c
FROM (SELECT ...,
RANK() OVER (ORDER BY ratio DESC) AS rank
FROM Sub Query---) full_table
WHERE full_table.rank = 1
It's not clear if there can be more than one record returned, so I used RANK() rather than ROW_NUMBER() because ROW_NUMBER would only return one record.
You could incorporate that into a WITH clause, but it's still one pass over the derived table/inline view vs your two passes...
Oracle 9i+ supports the WITH syntax, calling it "Subquery Factoring". Those coming from SQL Server 2005+ know the WITH syntax as a Common Table Expression (CTE). Unlike SQL Server's implementation, the WITH syntax on Oracle 9i - 11g is not recursive - Oracle only added recursive WITH support (now that it's ANSI) in 11g R2, in part due to Oracle supporting recursive functionality with the CONNECT BY syntax (supported since Oracle v2). WITH syntax is syntactic sugar for derived tables/inline views - the query plan isn't reused for each instance.
WITH full_table AS (
SELECT...)
SELECT x.a, x.b, x.c
FROM full_table x
JOIN (SELECT MAX(t.ratio) AS max_ratio
FROM full_table t) y ON y.max_ratio = x.ratio
...is the identical to using:
SELECT x.a, x.b, x.c
FROM (SELECT ...) x
JOIN (SELECT MAX(t.ratio) AS max_ratio
FROM (SELECT ...) t) y ON y.max_ratio = x.ratio

Is it possible to SELECT multiple constants into multiple resultset rows in SQL?

I know I can "SELECT 5 AS foo" and get the resultset:
foo
5
(1 row)
...is there any way to "SELECT 5,6,7 AS foo" and get the resultset:
foo
5
6
7
(3 rows)
...I'm well aware that this is not typical DB usage, and any conceivable usage of this is probably better off going w/ a more ordinary technique. More of a technical question.
Note: I know I could use a big gross list of UNIONs -- I'm trying to find something else.
this is easy with a number table, here is an example
select number as foo
from master..spt_values
where type = 'p'
and number between 5 and 7
or if you want to use in
select number as foo
from master..spt_values
where type = 'p'
and number in(5,6,7)
select foo
from (select 1 as n1, 2 as n2, 3 as n3) bar
unpivot (foo for x in (n1, n2, n3)) baz;
It's possible using these and other techniques (as anyone who has interviewed for a database developer's position will tell you). But it's usually easier (and the tools are more appropriate) to do this in another abstraction layer; i.e. your DAL, or beyond, where you view the data as a list of some kind. Although rdbms products provide facilitators, it's a distortion of the relational conceptual model.
Just for fun, wouldn't dream of using it for real:
WITH numbers AS
(
SELECT ROW_NUMBER() OVER (ORDER BY name) AS 'RowNumber'
FROM sys.all_objects
)
SELECT RowNumber
FROM numbers
WHERE RowNumber BETWEEN 5 AND 7;