Strange WHERE statement that behaves like a GROUP BY? - sql

I've come across a simple SQL query that should return a single row, but instead returns results like a GROUP BY statement.
Here is the query:
select
column_a,
column_b
from table_1 A1
where column_b = (
select MIN(column_b)
from table_1 A2
where A1.column_a = A2.column_a
)
;
And here is a table_1, the only table the query uses:
column_a column_b
a 3
a 2
a 1
b 5
b 4
c 6
The strange thing is that the query should only return a single row, column_a = "a" and column_b = "1", because the subquery will evaluate to "1".
But the actual result is the minimum for each letter in column_a. So:
a 1
b 4
c 6
Can anyone help me understand why the query is behaving like this?
I've setup a SQL Fiddle page with the example here: http://sqlfiddle.com/#!9/4b4d6f/1

This is a (correlation clause:
where column_b = (select MIN(column_b)
from table_1 A2
where A1.column_a = A2.column_a
------------------------^
)
It connects the subquery to the outer query. You can think of this as looping through the table in the outer row and keeping the row only when column_b has the minimum column_b value for the column_a value. Note that the query may not be executed by using such a nested loop!
If you wanted the overall minimum, you would leave out the correlation clause:
where column_b = (select MIN(column_b)
from table_1 A2
)

That is a correlated subquery, it filters using the value of A1.column_a. That means for each different value of column_a you will get a (potentially) different result.
This is why it looks like it is grouping by column_a.
You should also note that because you are not grouping you could get duplicate rows where there exist ties for the minimum column_b for a value of column_a

Related

If I have two columns, how can I select all distinct values of columnA where columnB is never a specific value given that columnA is the same value?

So let's say I have two columns:
A
B
1
300
1
299
2
300
2
300
3
299
3
299
I want to look for distinct values of A such that there is never a combination of A and B where B equals 300.
In my example, I would want to return the columnA value 3.
Result
A
3
How do I accomplish this with SQL?
What you are looking for is called conditional aggregation. You want to aggregate by A (i.e. show A values in your result) and have a check only applied on particular B values. For instance:
select a
from mytable
group by a
having count(case when b = 300 then 1 end) = 0;
A simple subquery will give the results.
Exclude the Rows using the NOT IN keyword
SELECT DISTINCT ColumnA
FROM TABLE
WHERE ColumnA NOT IN(
SELECT ColumnA FROM Table WHERE ColumnB=300)
You can also use NOT EXISTS
SELECT DISTINCT A
FROM tbl t1
WHERE NOT EXISTS(
SELECT 1 FROM Tbl t2 WHERE t2.A=t1.B AND t2.B=300)

Sum and Compare columns in Hive?

This may be really nothing but as i am new to hive. I don't know how to do this in Hive?
I have a sample dataset that looks like this:
column_A column_B column_C
1 1 0
1 1 0
1 0 1
Now, i need to find out the sum of each column and then compare them to get the highest.
for example:
column_A column_B column_C
3 2 1
Output should be:
column_A
3
Query that I wrote is unable to perform the sum of each columns and compare columns to find the greatest among them.
SELECT (sum(column_A) as A,sum(column_B) as B,sum(column_C) as C) as xyz
from table_name where A IN (SELECT GREATEST(A,B,C) from xyz) ;
You can use greatest() after the aggregation:
SELECT greatest(sum(column_A), sum(column_B), sum(column_C))
from table_name;

Simple WHERE clause but keep extracted rows and fill them will null values

I have a table which basically looks like this one:
Date | Criteria
12-04-2016 123
12-05-2016 1234
...
Now I want to select those rows with values in the column 'Criteria' within a given range but I want to keep the extracted rows. The extracted rows should get the value 'null' for the column 'Criteria'. So for example, if I want to select the row with 'Criteria = 123' my result should look like this:
Date | Criteria
12-04-2016 123
12-05-2016 null
Currently I am using this query to get the result:
SELECT b.date, a.criteria
FROM (SELECT id, date, criteria FROM ABC WHERE criteria > 100 and criteria < 200) a
FULL OUTER JOIN ABC b ON a.id = b.id ORDER BY a.criteria
Someone told me that full outer joins perform very badly. Plus my table has like 400000 records and the query is used pretty often. So anyone has an idea to speed up my query? Btw I am using the Oracle11g database.
Do you just want a case expression?
SELECT date,
(case when criteria > 100 and criteria < 200 then criteria end) as criteria
FROM ABC;

SQL : Update table_1.column_b if table_1.column_a = table_2.column_a

I have two tables like these. I want to update table_1.column_b if table_1.column_a = table_2.column_a
table_1
column_a | column_b
----------------------------
X1 | 0
X2 | 0
X3 | 0
X4 | 0
X5 | 0
table_2
column_a
--------
X1
X2
X3
the result should be:
table_1
column_a | column_b
----------------------------
X1 | 1
X2 | 1
X3 | 1
X4 | 0
X5 | 0
update table_1
set column_b =
(
select count(*)
from table_2 s
where table_1.column_a = s.column_a
)
/* oracle can bug out when a subquery returns nothing, i.e. Null */
where exists
(
select 1
from table_2 s
where table_1.column_a = s.column_a
)
;
Extra info about the select 1 in where exists. It is just something I tend to add, just in case. It is defensive programming.
Not entirely sure it is needed, in this particular case, because I assume the count(*) subquery will return 0 for your X4 and X5 entries in table a above.
Suppose you were doing this instead... (The 1 in the select isn't really below isn't important, it could be any number or a numeric column from table_2 if you had one). The important thing is that we are not doing a count(*) this time so X4 and X5 will not get anything from table_2.
update table_1
set column_b =
(select 1
from table_2 s
where table_1.column_a = s.column_a
)
In this case, X4 and X5 will not get a subquery result and Oracle will try to assign null to table_1.column_b for those rows. If that column is set to NOT NULL, you will get an error.
By adding the where exists at the end of the query you are telling Oracle not to try updating table_1 where there is no matching table_2 row. So this update null issue never happens.
The basic idea for the boilerplate is to first qualify your update subquery as you need to update the column. Then repeat the conditions as a where exists at the end of the query to avoid Oracle trying an update where the subquery doesn't return anything.
Note that I could have left count() in the where exists subquery. Except that I am only looking for the existence of a matching row and I am not actually interested in a count() there so I optimize a bit by asking Oracle to select a 'cheaper' result set.
If you had multiple column updates with different subquery criteria, then that won't work. You'd have to split the update up into different queries. Or use Oracle NVL (https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions105.htm) to catch the NULLs and replace them.
This is one possible solution:
update table_1 a
set a.column_b = 1
where a.column_a in (select * from table_2);

Update with results of another sql

With the sql below I count how many records I have in tableB for each code. The total field is assigned the result of the count and the code the code field of the record.
SELECT
"count" (*) as total,
tableB."code" as code
FROM
tableB
WHERE
tableB.code LIKE '%1'
GROUP BY
tableB.code
In tableA I have a sequence field and I update with the result of total (obtained in the previous sql) plus 1 Do this for each code.
I tried this and it did not work, can someone help me?
UPDATE tableA
SET tableA.sequence = (tableB.total + 1) where tableA."code" = tableB.code
FROM
(
SELECT
"count" (*) as total,
tableB."code" as code
FROM
tableB
WHERE
tableB.code LIKE '%1'
GROUP BY
tableB.code
)
I edited for my tables are as mostar believe facillita understanding of my need
tableA
code sequence
100 null
200 null
table B
code sequence
100 1
100 2
100 3
100 4
......
100 17
200 1
200 2
200 3
200 4
......
200 23
Need to update the sequence blank field in tableA with the number 18 to code = 100
Need to update the sequence blank field in tableA with the number 24 to code = 200
This assumes that code is unique in table_a:
with max_seq as (
select code,
max(sequence) + 1 as max_seq
from table_b
group by code
)
update table_a
set sequence = ms.max_seq
from max_seq ms
where table_a.code = ms.code;
SQLFiddle example: http://sqlfiddle.com/#!15/745a7/1
UPDATE tbl_a a
SET sequence = b.next_seq
FROM (
SELECT code, max(sequence) + 1 AS next_seq
FROM tbl_b
GROUP BY code
) b
WHERE a.code = b.code;
SQL Fiddle.
Only columns of the target table can be updated. It would not make sense to table-qualify those. Consequently, this is not allowed.
Every subquery must have a table alias for the derived table.
I would not use a CTE for a simple UPDATE like this. A subquery in the FROM clause is typically simpler and faster.
No point in double-quoting the aggregate function count(). No pint in double-quoting perfectly legal, lower case identifiers, either. No point in table-qualifying columns in a subquery on a single table in a plain SQL command (no harm either).
You don't need a WHERE condition, since you want to UPDATE all rows (as it seems). Note that only row with matching code are updated. Other rows in tbl_b remain untouched.
Basically you need to read the manual about UPDATE before you try any of this.