group by with where not working - sql

SELECT A.ID, A.COLUMN_B, A.COLUMN_C FROM A
WHERE A.COLUMN_A IN
(
SELECT A.COLUMN_A
FROM B
INNER JOIN A ON B."COLUMN_A" = A."COLUMN_A"
WHERE B."COLUMN_B" = 'something'
UNION
SELECT A."COLUMN_A"
FROM A
WHERE A."COLUMN_D" IN (X,Y,Z) OR A."COLUMN_D" = 'something'
)
Now I want add a group by (A.ID) , and order by (A.COLUMN_B) DESC, and then select first to it. But DB won't allow. Any suggestions ? I can use LINQ to solve it once inner Union part is returned. But do now want to go that way.

There's a couple of things here.
First off - in DB2, when using GROUP BY, you can only select those columns listed in the grouping statement - everything else must be part of an aggregation function. So, grouping by a.Id and ordering by a.Column_B won't work - you'll need to order by SUM(a.Column_B) or something applicable.
Second... your query could use a bit of work in the general sense - specifically, you're self-joining twice, which you don't need to do at all. Try this instead:
SELECT a.Id, SUM(a.Column_B) as total, SUM(a.Column_C)
FROM a
WHERE a.Column_D in (X, Y, Z, 'Something')
OR EXISTS (SELECT '1'
FROM b
WHERE b.Column_A = a.Column_A
AND b.Column_B = 'Something')
GROUP BY a.Id
ORDER BY total DESC
FETCH FIRST 1 ROW ONLY
Swap out the SUM function for whatever is appropriate.

You can't use a column in the ORDER BY or SELECT that you haven't included in the GROUP BY, unless it's being aggregated (in a function like MAX() or COUNT() or SUM().
So, you could GROUP BY A.ID,A.COLUMN_B, and then ORDER BY COLUMN_B. Using a TOP 1 should work, too.
I just noticed that you're on DB2. I know that it will work this way on SQLServer. DB2 should be similar.

Worked the oterh way around. Just used Order By on A.ID and select row with max identity column.

Related

Use value from select in subquery

I want to build query that will use parameter from query in subquery such as:
SELECT
A.Name,
CASE (SELECT SUM(A.Time) FROM A WHERE A.Name = A.Name)
FROM A
How can I do something like this. Thing needed mentioning is that the query will return multiple rows and for each of them I want to take sum based on the name from query.
You don't need a subquery here, use a windowed SUM:
SELECT A.[Name]
CASE SUM(A.[Time]) OVER (PARTITION BY A.[Name]) WHEN ... END AS Something --HOw do you SUM a "time"???
FROM A;
You can use analytical function for such requirements but if you want to know how to use subquery in select clause then you can use the alias as follows:
SELECT
A.Name,
(SELECT SUM(A.Time) FROM A WHERE t.Name = A.Name)
FROM A t -- t is the alias

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

How to use to functions - MAX(smthng) and after COUNT(MAX(smthng)

I don't understand why I can't use this in my code :
SELECT MAX(SMTHNG), COUNT(MAX(SMTHNG))
FROM SomeTable;
Searched for an answer but didn't find it in documentation about these aggregate functions.
Also I get an SQL-compiler error "Invalid column name "SMTHNG"".
You want to know what the maximum SMTHNG in the table is with:
SELECT MAX(SMTHNG) FROM SomeTable;
This is an aggregation without GROUP BY and hence results in one single row containing the maximum SMTHNG.
Now you also want to know how often this SMTHNG occurs and you add COUNT(MAX(SMTHNG)). This, however, does not work, because you can not aggregate an aggregate directly.
This doesn't work either:
SELECT ANY_VALUE(max_smthng), COUNT(*)
FROM (SELECT MAX(smthng) AS max_smthng FROM sometable) t;
because the sub query only contains one row, so it's too late to count.
So, either use a sub query and select from the table again:
SELECT ANY_VALUE(smthng), COUNT(*)
FROM sometable
WHERE smthng = (SELECT MAX(smthng) FROM sometable);
Or count per SMTHNG before looking for the maximum. Here is how to get the counts:
SELECT smthng, COUNT(*)
FROM sometable
GROUP BY smthng;
And the easiest way to get the maximum from this result is:
SELECT TOP(1) smthng, COUNT(*)
FROM sometable
GROUP BY smthng
ORDER BY COUNT(*) DESC;
First of all, please read my comment.
Depending on what you're trying to achieve, the statement have to be changed.
If you want to count the highest values in SMTHNG field, you may try this:
SELECT T1.SMTHNG, COUNT(T1.SMTHNG)
FROM SomeTable T1 INNER JOIN
(
SELECT MAX(SMTHNG) AS A
FROM SomeTable
) T2 ON T1.SMTHNG = T2.A
GROUP BY T1.SMTHNG;
use cte like below or subquery
with cte as
(
select count(*) as cnt ,col from table_name
group by col
) select max(cnt) from cte
you can not use double aggregate function at a time on same column

SQL combine two query results

I can't use a Union because it's not the result I want, and I can't use join because I haven't any common column. I have tried many different SQL query structures and nothing works as I want.
I need help to achieve what I believe is a really simple SQL query. What I am doing now is
select a, b
from (select top 4 a from element_type order by c) as Y,
(SELECT * FROM (VALUES (NULL), (1), (2), (3)) AS X(b)) as Z
The first is a part of a table and the second is a hand created select that gives results like this:
select a; --Give--> a,b,c,d (1 column)
select b; --Give--> 1,2,3,4 (1 column)
I need a query based on the two first that give me (2 column) :
a,1
b,2
c,3
d,4
How can i do this? UNION, JOIN or anything else? Or maybe I can't.
All I can get for now is this:
a,1
a,2
a,3
a,4
b,1
b,2
...
If you want to join two tables together purely on the order the rows appear, then I hope your database support analytic (window) functions:
SELECT * FROM
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table1 t) t1
INNER JOIN
(SELECT t.*, ROW_NUMBER() OVER(ORDER BY x) as rown FROM table2 t) t2
ON t1.rown = t2.rown
Essentially we invent something to join them on by numbering the rows. If one of your tables already contains incrementing integers from 1, you dont need to ROW_NUMBER() OVER() on that table, because it already has suitable data to join to; you just invent a fake column of incrementing nubmers in the other table and then join together
Actually, even if it doesn't support analytics, there are ugly ways of doing row numbering, such as joining the table back to itself using id < id and COUNT(*) .. GROUP BY id to number the rows. I hate doing it, but if your DB doesnt support ROW_NUMBER i'll post an example.. :/
Bear in mind, of course, that RDBMS have R in the name for a reason - related data is.. well.. related. They don't do so well when data is unrelated, so if your hope is to join the "chalks" table to the "cheese" table even though the two are completely unrelated, you're finding out now why it's hard work! :)
Try using row_number. I've created something that might help you. See below:
declare #tableChar table(letter varchar)
insert into #tableChar(letter)
select 'a';
insert into #tableChar(letter)
select 'b';
insert into #tableChar(letter)
select 'c';
insert into #tableChar(letter)
select 'd';
select letter,ROW_NUMBER() over(order by letter ) from #tableChar
You can user row_number() to achieve this,
select a,row_number() over(order by a) as b from element_type;
As you are not taking second part from other table, so you do not need to use join. But if you are doing this on different tables the you can use row_number() to create key for both the tables and bases on those keys, you can join.
Hope it will help.

SQL Server, include columns that are not in group by statement

I have a permanent problem,
lets assume that I have a following columns:
T:A(PK), B, C, D, E
Now,
select A, MAX(B) from T group BY A
No, I cant do:
select A, C, MAX(B) from T group BY A
I don't understand why - when in comes to AVG or SUM I get it. However, MAX or MIN is getting from exactly one row.
How to deal with it?
You can use ROW_NUMBER() for that like this:
select A, C, B
from (
select *
, row_number() over (partition by A order by B desc) seq
-- group by ^ max(^)
from yourTable ) t
where seq = 1;
That's cause columns included in the select list should also be part of group by clause. You may have column which re part of group by but not present in select list but vice-versa not possible.
You generally, put only those columns in select clause on which you want the grouping to happen.
try this. it can help you find the MAX by just 1 column (f1), and also adding the column you wanted(f3) but not affecting your MAX operation
SELECT m.f1,s.f2,m.maxf3 FROM
(SELECT f1,max(f3) maxf3 FROM t1 GROUP BY f1) m
CROSS APPLY (SELECT TOP(1) f2,f1 FROM t1 WHERE m.f1 = f1) s
Your question isn't very clear in that we aren't sure what you are trying to do.
Assuming you don't actually want to do a group by in your main query but want to return the max of B based on column A you can do it like so.
select A, C,(Select Max(B) from T as T2 WHERE T.A = T2.A) as MaxB from T