Get any not null value of other fileds in aggregations - sql

I want to aggregate on some fields and get any not null value on others. To be more precise the query looks something like:
SELECT id, any_value(field1), any_value(field2) FROM mytable GROUP BY ID
and the columns are like:
ID | field1 | field 2
-----------------
id | null | 3
id | 1 | null
id | null | null
id | 2 | 4
and the output can be like (id, 1,4) or (id,2,4) or ... but not something like (id, 1, null)
I can't find in the docs if any_value() is guaranteed to return a not null row if there is one (although it did so in my experiments) or may return a row with null value even if there are some not null values.
Does any_value() perform the task I described? If not what way to you suggest for doing it?

This is sort of a guess, but have you tried:
SELECT id, MIN(field1), MAX(field2)
FROM mytable
GROUP BY id;
This will ignore NULL values return different values from the two columns.

You can use analyatical functions as well.
Below is the query (SQL server):
select id, field1, field2
from (select id, field1, field2, row_number()
over (partition by id order by isnull(field1, 'ZZZ') asc, isnull(field2, 'ZZZ') asc) as RNK from mytable) aa
where aa.RNK = 1;
This will return only one row, you can change the order in order by clause if you are looking for maximun value in any column.

This could be achieved by aggregating to array with 'ignore nulls' specified and taking the first element of the resulting array. Unlike MIN/MAX solution, you can use it with structs
SELECT
id,
ARRAY_AGG(field1 IGNORE NULLS LIMIT 1)[SAFE_OFFSET(0)],
FROM
mytable
GROUP BY
id

Related

Delete rows based on group by

Let's say I have the following table:
Id | QuestionId
----------------
1 | 'MyQuestionId'
1 | NULL
2 | NULL
2 | NULL
It should behave like so
Find all the results of the same Id
If ANY of them has QuestionId IS NOT NULL, do not touch any rows with that Id.
Only if ALL the results for the same Id have QuestionId IS NULL, delete all the rows with that Id.
So in this case it should only delete rows with Id=2.
I haven't found an example for such a case anywhere. I've tried some options with rank, count, group by, but nothing worked. Can you help me?
You can use an updatable CTE or derived table for this, and calculate the count using a window function.
WITH cte AS (
SELECT t.*,
CountNonNulls = COUNT(t.QuestionId) OVER (PARTITION BY t.Id)
FROM YourTable t
)
DELETE cte
WHERE CountNonNulls = 0;
db<>fiddle
Note that this query does not contain any self-joins at all.

Selecting rows based on priority of a column

refers below as my source table:
Entry ID
Name
Colour
1
John
Red
2
John
null
3
Steve
null
4
Steve
null
I would like to select the entire row based on
Distinct value of the "name" column
If "colour" column is not null, I will select that row. Otherwise, I will accept the null value row.
Lastly, I will select based on the entry ID which is smaller (the smaller the ID, the earlier the entry ID)
The expected result should be:
Entry ID
Name
Colour
1
John
Red
3
Steve
null
I am new to SQL and wondering there's any way to achieve the expected result?
Many thanks and much appreciated.
(pardon my bad English/grammar)
You can use row_number(). This should work in almost any database:
select t.*
from (select t.*,
row_number() over (partition by name order by colour desc, id) as seqnum
from t
) t
where seqnum = 1;
A descending order by puts NULL values last, so non-NULL rows are chosen first.
An alternative method is:
select t.*
from t
where t.colour is not null or
t.id = (select (case when count(colour) = 0 then min(t2.id) end)
from t
)

SQL Order by descending total last

I would like to set my order by to descending but with my total row(sums up all numerical values in a specific column) as last row as last in sequence, is this possible?
DB is netezza, and there are only three columns: ID, name and revenue which is an aggregated column to begin with
This fiddle was created in MySQL, but the query should work in any standard SQL.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE t1 ( id int, name varchar(10), revenue int ) ;
INSERT INTO t1 (id, name, revenue)
VALUES (1,'first',10), (2,'first',10), (3,'last',1), (4,'last',1) ;
Main Query:
SELECT name, sum(revenue) AS totalRevenue
FROM t1
GROUP BY name
ORDER BY totalRevenue DESC
Results:
| name | totalRevenue |
|-------|--------------|
| first | 20 |
| last | 2 |
You can accomplish this with a UNION ALL.
SELECT column_A , column_B , ... {your query}
UNION ALL
SELECT '' as Column_A , SUM(column_B) as column_B,... {your totals query}
column_B in this case can be whichever numerical column that you want totals for. column_A is a non-numerical column or any column you do not want a total for. Note that all columns selected in {your totals query} should be aggregated or NULL if you want this to return only a single row.
You will also need to include the same number of columns in both queries for the UNION method to work properly.
If you include your RMDB, then we can tailor a more specific solution. UNION ALL is the syntax for SQL-Server.
-- so in response to your comment 9/6, it sounds like you can run an offset fetch command.
SELECT {your query} OFFSET 1
UNION ALL
SELECT {your query} OFFSET 0 FETCH NEXT 1 ROW ONLY
Keep the ORDER BY the same for both queries or you will get unexpected results.
One approach is to use grouping sets:
select id, name, sum(revenue)
from t
group by grouping sets ( (id, name), () )
order by (case when grouping(id) = 1 then 1 else 2 end) desc,
sum(revenue) desc;

SQL Query to group text based on numeric column

I have a table 'TEST' as shown below
Number | Seq | Name
-------+-------+------
123 | 1 | Hello
123 | 2 | Hi
123 | 3 | Greetings
234 | 1 | Goodbye
234 | 2 | Bye
I want to write a query, to group the table by 'Number', and select the rows with the maximum sequence number (MAX(Seq)). The output of the query would be
Number | Seq | Name
-------+-------+------
123 | 3 | Greetings
234 | 2 | Bye
How do I go about this?
EDIT: TEST is actually a table that is the result from a long query (joining multiple tables) that I have already written. I already have a (SELECT ...) statement to get the values I need. Is there a way to remove duplicate rows (with the same 'Number' as shown above) and select only the one with maximum 'Seq' value.
I am on Microsoft SQL Server 2008 (SP2)
I was hoping there would be a way to achieve this by
SELECT * FROM (SELECT ...) TEST <condition to group>
You can use a select win in clause
select * from test
where (number, count) in (select number, max(count) from test group by Number)
Another option is to use a windowed ROW_NUMBER() function with a partition on the number:
With Cte As
(
Select *,
Row_Number() Over (Partition By Number Order By Count Desc) RN
From TEST
)
Select Number, Count, Name
From Cte
Where RN = 1
SELECT *
FROM (SELECT test.*, MAX (seq) OVER (PARTITION BY num) max_seq
FROM test)
WHERE seq = max_seq
I changed the column name from number because you can't use a reserved word for a column name. This is pretty much the same as the other answers, except that it explicitly gets the maximum sequence number for each NUM.
You want to use an ANALYTIC function together with a conditional clause to get you only the rows of TEST that you desire.
WITH TEST as (
...your really complex query that generates TEST...
)
SELECT
Number, Seq, Name,
RANK() OVER (PARTITION By Number ORDER BY Seq DESC) AS aRank
FROM Test
WHERE aRank = 1
;
This returns the Number, Seq, Name for each Number grouping where the Seq is maximum. Yes, it also returns a column named aRank with all '1' in it...hopefully it can be ignored.
The solution to this is to do an self join on only the MAX(Seq) values.
This answer can be found at SQL Select only rows with Max Value on a Column

SQL Query to get all rows with duplicate values but are not part of the same group

The database schema is organized as follows:
ID | GroupID | VALUE
--------------------
1 | 1 | A
2 | 1 | A
3 | 2 | B
4 | 3 | B
In this example, I want to GET all Rows with duplicate VALUE, but are not part of the same group. So the desired result set should be IDs (3, 4), because they are not in the same group (2, 3) but still have the same VALUE (B).
I'm having trouble writing a SQL Query and would appreciate any guidance. Thanks.
So far, I'm using SQL Count, but can't figure out what to do with the GroupId.
SELECT *
FROM TABLE T
HAVING COUNT(T.VALUE) > 1
GROUP BY ID, GroupId, VALUE
The simplest method for this is using EXISTS:
SELECT
ID
FROM
MyTable T1
WHERE
EXISTS (SELECT 1
FROM MyTable
WHERE Value = t1.Value
AND GroupID <> t1.GroupID)
Here is one method. First you have to identify the values that appear in more than one group and then use that information to find the right rows in the original table:
select *
from t
where value in (SELECT value
FROM TABLE T
GROUP BY VALUE
HAVING COUNT(distinct groupid) > 1
)
order by value
Actually, I prefer a slight variant in this case, by changing the HAVING clause:
HAVING min(groupid) <> max(groupid)
This works when you are looking for more than one group and should be faster than the COUNT DISTINCT version.
SELECT ALL_.*
FROM (SELECT *
FROM TABLE_
GROUP BY ID, GROUPID, VALUE
ORDER BY ID) GROUPED,
TABLE_ ALL_
WHERE GROUPED.VALUE = ALL_.VALUE
AND GROUPED.GROUPID <> ALL_.GROUPID