For the following sql
CREATE or replace TABLE
temp.t1 ( a STRING)
;
insert into temp.t1 values ('val_a');
CREATE or replace TABLE
temp.t2 (b STRING)
;
insert into temp.t2 values ('val_b');
create or replace table `temp.a1` as
select distinct b
from temp.t2
;
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`)
;
Since there is no a in temp.a1 and there should be an error here, However, the output of Bigquery is
Row a
1 val_a
Why the result happened?
On the other side, when run select distinct a from temp.a1; there is one error Unrecognized name: a comes up.
Your query is:
select distinct a
from `temp.t1`
where a in (select distinct a from `temp.a1`);
You think this should be:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct a1.a from `temp.a1` a1);
And hence generate an error. However, the rules of SQL interpret this as:
select distinct t1.a
from `temp.t1` t1
where t1.a in (select distinct t1.a from `temp.a1` a1);
Because the scoping rules say that if a is not found in the subquery then look for it in the outer query.
That is the definition of SQL.
The solution? Always qualify column references. Qualify means to include the table alias in the reference.
Also note that select distinct is meaningless in the subquery for an in, because in does not create duplicates. You should get rid of the distinct in the subquery.
Table1:
Name, Value A, Value B, Value C
I would like to find the largest Name by Value A, the largest Name by Value B and the largest Name by Value C. Does anyone have a quick way to do this? The table itself is rather large and I would really want to avoid running through it multiple times for each value.
Thank you!
If you have an index on each of the columns (a), (b), and (c), you can do:
select t.*
from t
where t.a = (select max(t2.a) from t t2) or
t.b = (select max(t2.b) from t t2) or
t.c = (select max(t2.c) from t t2) ;
The where clause should be able to make use of the indexes. If not, you can split this into subqueries. Something like this:
select t.*
from t
where t.a = (select max(t2.a) from t t2)
union all
select t.*
from t
where t.b = (select max(t2.b) from t t2)
select t.*
from t
where t.c = (select max(t2.c) from t t2) ;
If column A is not empty I should add one condition and if it is empty, then I should add another condition. Something like this:
select *
from table t
where case when len(t.A) > 0 then t.A = (select B from anothertable )
else t.C = (select D from anothertable)
As this does not compiles, and I can't use IF clause within WHERE is there any other way to achieve this?
We can rephrase the login in the WHERE clause to make it work:
SELECT *
FROM table_t
WHERE
(LEN(t.A) > 0 AND t.A IN (SELECT B FROM anothertable) ) OR
(LEN(t.A) <= 0) AND t.C IN (SELECT D FROM anothertable) );
To address the comment by #HoneyBadger if the subqueries on anothertable return more than one record, then this query would error out if we used t.A = (subquery). If you intend to use equals, then you would have to ensure that the subquery only returns a single record. Your suggestion to use WHERE IN might fix the problem.
I stuck on an SQL statement since 2 days now and I hope you can help me with that.
The result of my select is a list with 4 attributes A, B, C and D (below is an example list of 5 datasets):
1. A=1 B=100 C=200 D=300
2. A=2 B=200 C=100 D=300
3. A=3 B=300 C=200 D=100
4. A=3 B=100 C=100 D=200
5. A=3 B=300 C=100 D=200
The list shall be reduced, so that every attribute A is in the list only once.
In the example above the dataset 1. and 2. should be in the list, because A=1 and A=2 exists only once.
For A=3 I have to build a query to identify the dataset, that will be in the final list. Some rules should apply:
Take the dataset with the highest value of B; if not distinct then
Take the dataset with the highest value of C; if not distinct then
Take the dataset with the highest value of D.
In the example above the dataset 3. should be taken.
The expected result is:
1.A=1 B=100 C=200 D=300
2.A=2 B=200 C=100 D=300
3.A=3 B=300 C=200 D=100
I hope you understand my problem. I've tried various versions of SELECT-statements with HAVING and EXISTS (or NOT EXISTS), but my SQL knowledge isn't enough.
Probably there is an easier way to solve this problem, but this one works:
CREATE TEMP TABLE TEST (
A INTEGER,
B INTEGER,
C INTEGER,
D INTEGER
);
insert into TEST values (1,1,1,1);
insert into TEST values (2,1,5,1);
insert into TEST values (2,2,1,1);
insert into TEST values (3,1,4,1);
insert into TEST values (3,2,1,4);
insert into TEST values (3,2,3,1);
insert into TEST values (3,3,1,5);
insert into TEST values (3,3,2,3);
insert into TEST values (3,3,2,7);
insert into TEST values (3,3,3,1);
insert into TEST values (3,3,3,2);
select distinct
t1.A,
t2.B as B,
t3.C as C,
t4.D as D
from TEST t1
join (select A ,MAX (B) as B from TEST group by A)t2 on t2.A=t1.A
join (select A, B, MAX(C) as C from TEST group by A,B)t3 on t3.A=t2.A and t3.B=t2.B
join (select A, B, C, MAX (D) as D from TEST group by A,B,C)t4 on t4.A=t3.A and t4.B=t3.B and t4.C=t3.C;
Result:
a b c d
1 1 1 1
2 2 1 1
3 3 3 2
Tested on IBM Informix Dynamic Server Version 11.10.FC3.
This type of prioritization query is most easily done with row_number(), but I don't think Informix supports that.
So, one method is to enumerate the rows using a correlated subquery:
select t.*
from (select t.*,
(select count(*)
from t t2
where (t2.b > t.b) or
(t2.b = t.b and t2.c > t.c) or
(t2.b = t.b and t2.c = t.c and t2.d > t.d)
) as NumGreater
from t
) t
where NumGreater = 0;
I have no idea about Informix but you can try. This works in Sql Server. May be it will also work in Informix:
select * from tablename t1
where id = (select first 1 id from tablename t2
where t2.A = t1.A order by B desc, C desc, D desc)
SELECT A, MAX(B) AS B, MAX(C) AS C, MAX(D) AS D
FROM table_name
GROUP BY A
Can a query which uses Group By/Having clauses, be modified to another query which uses just Select/From/Where clauses?
TABLE T(a, b, c)
SELECT a, sum(c)
FROM T
WHERE b>10
GROUP BY a
HAVING sum(c)>5
Would appreciate it if you could explain in detail why it can(not) be done.
You could, of course, resort to using window functions only, if your specific database supports those:
SELECT a, s
FROM (
SELECT DISTINCT a, sum(c) OVER (PARTITION BY a) s
FROM t1
WHERE b > 10
) t2
WHERE s > 5
Another option is to use correlated subqueries, which work on all databases:
SELECT a, s
FROM (
SELECT DISTINCT a, (SELECT sum(c) FROM t t3 WHERE t1.a = t3.a AND b > 10) s
FROM t t1
WHERE b > 10
) t2
WHERE s > 5
These alternatives would yield the same result without using GROUP BY or HAVING. But either of these would be (much) slower, and I don't really see the point...