I have a 'vendors' table that looks like this...
**company itemKey itemPriceA itemPriceB**
companyA, 203913, 20, 10
companyA, 203914, 20, 20
companyA, 203915, 25, 5
companyA, 203916, 10, 10
It has potentially millions of rows per company and I want to query it to bring back a representative delta between itemPriceA and itemPriceB for each company. I don't care which delta I bring back as long as it isn't zero/null (like row 2 or 4), so I was using ANY_VALUE like this...
SELECT company
, ANY_VALUE(CASE WHEN (itemPriceA-itemPriceB)=0 THEN null ELSE (itemPriceA-itemPriceB) END)
FROM vendors
GROUP BY 1
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
If ANY_VALUE returns null "when expression is NULL for all rows in the group" it should NEVER return null for companyA right (since only 2 of 4 rows are null)? But the second sentence sounds like it will indeed include the null rows.
P.s. you may be wondering why I don't simply add a WHERE clause saying "WHERE itemPriceA-itemPriceB>0" but in the event that a company has ONLY matching prices, I still want the company to be returned in my results.
Clarification
I'm afraid the accepted answer will have to show stronger evidence that contradicts the docs.
#Raul Saucedo suggests that the following BigQuery documentation is referring to WHERE clauses:
rows for which expression is NULL are considered and may be selected
This is not the case. WHERE clauses are not mentioned anywhere in the ANY_VALUE docs. (Nowhere on the page. Try to ctrl+f for it.) And the docs are clear, as I'll explain.
#d3wannabe is correct to wonder about this:
It seems to be working but I notice 2 sentences that seem contradictory from Google's documentation...
"Returns NULL when expression is NULL for all rows in the group. ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected."
But the docs are not contradictory. The 2 sentences coexist.
"Returns NULL when expression is NULL for all rows in the group." So if all rows in a column are NULL, it will return NULL.
"ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which expression is NULL are considered and may be selected." So if the column has rows mixed with NULLs and actual data, it will select anything from that column, including nulls.
How to create an ANY_VALUE without nulls in BigQuery
We can use ARRAY_AGG to turn a group of values into a list. This aggregate function has the option to INGORE NULLS. We then select 1 item from the list after ignoring nulls.
If we have a table with 2 columns: id and mixed_data, where mixed_data has some rows with nulls:
SELECT
id,
ARRAY_AGG( -- turn the mixed_data values into a list
mixed_data -- we'll create an array of values from our mixed_data column
IGNORE NULLS -- there we go!
LIMIT 1 -- only fill the array with 1 thing
)[SAFE_OFFSET(0)] -- grab the first item in the array
AS any_mixed_data_without_nulls
FROM your_table
GROUP BY id
See similar answers here:
https://stackoverflow.com/a/53508606/6305196
https://stackoverflow.com/a/62089838/6305196
Update, 2022-08-12
There is evidence that the docs may be inconsistent with the actual behavior of the function. See Samuel's latest answer to explore his methodology.
However, we cannot know if the docs are incorrect and ANY_VALUE behaves as expected or if ANY_VALUE has a bug and the docs express the intended behavior. We don't know if Google will correct the docs or the function when they address this issue.
Therefore I would continue to use ARRAY_AGG to create a safe ANY_VALUE that ignores nulls until we see a fix from Google.
Please upvote the issue in Google's Issue Tracker to see this resolved.
This is an explanation about how “any_value works with null values”.
With any_value always return the first value, if there is a value different from null.
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, "banana",null,null]) as fruit;
Return null if all rows have null values. Refers at this sentence
“Returns NULL when expression is NULL for all rows in the group”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST([null, null, null]) as fruit
Return null if one value is null and you specified in the where clause. Refers to these sentences
“ANY_VALUE behaves as if RESPECT NULLS is specified; rows for which
expression is NULL are considered and may be selected.”
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST(["apple", "banana", null]) as fruit
where fruit is null
Always depends which filter you are using and the field inside the any_value.
You can see this example, return two rows that are different from 0.
SELECT ANY_VALUE(e).company, (itemPriceA-itemPriceB) as value
FROM `vendor` e
where (itemPriceA-itemPriceB)!=0
group by e.company
The documentation says that "NULL are considered and may be" returned by an any_value statement. However, I am quite sure the documentation is wrong here. In the current implementation, which was tested on 13th August 2022, the any_value will return the first value of that column. However, if the table does not have an order by specified, the sorting may be random due to processing of the data on several nodes.
For testing a large table of nulls is needed. To generate_array will come handy for that. This array will have several entries and the value zero for null. The first 1 million entries with value zero are generated in the table tmp. Then table tbl adds before and after the [-100,0,-90,-80,3,4,5,6,7,8,9] the 1 million zeros. Finally, calculating NULLIF(x,0) AS x replaces all zeros by null.
Several test of any_value using the test table tbl are done. If the table is not further sorted, the first value of that column is returned: -100.
WITH
tmp AS (SELECT ARRAY_AGG(0) AS tmp0 FROM UNNEST(GENERATE_ARRAY(1,1000*1000))),
tbl AS (
SELECT
NULLIF(x,0) AS x,
IF(x!=0,x,NULL) AS y,
rand() AS rand
FROM
tmp,
UNNEST(ARRAY_CONCAT(tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0)) AS x )
SELECT "count rows", COUNT(1) FROM tbl
UNION ALL SELECT "count items not null", COUNT(x) FROM tbl
UNION ALL SELECT "any_value(x): (returns first non null element in list: -100)", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "2nd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "3rd run", ANY_VALUE(x) FROM tbl
UNION ALL SELECT "any_value(y)", ANY_VALUE(y) FROM tbl
UNION ALL SELECT "order asc", ANY_VALUE(x) FROM (Select * from tbl order by x asc)
UNION ALL SELECT "order desc (returns largest element: 9)", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order desc", ANY_VALUE(x) FROM (Select * from tbl order by x desc)
UNION ALL SELECT "order abs(x) desc", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) desc )
UNION ALL SELECT "order abs(x) asc (smallest number: 3)", ANY_VALUE(x) FROM (Select * from tbl order by abs(x) asc )
UNION ALL SELECT "order rand asc", ANY_VALUE(x) FROM (Select * from tbl order by rand asc )
UNION ALL SELECT "order rand desc", ANY_VALUE(x) FROM (Select * from tbl order by rand desc )
This gives following result:
The first not null entry, -100 is returned.
Sorting the table by this column causes the any_value to always return the first entry
In the last two examples, the table is ordered by random values, thus any_value returns random entries
If the dataset is larger than 2 million rows, the table may be internally split to be processed; this will result in a not ordered table. Without the order by command the first entry on the table and thus the result of any_value cannot be predicted.
For testing this, please replace the 10th line by
UNNEST(ARRAY_CONCAT(tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0,tmp0, [0,0,0,0,0,-100,0,-90,-80,3,4,5,6,7,8,9] , tmp0,tmp0)) AS x )
I am getting blank value with this query from sql server
SELECT TOP 1 Amount from PaymentDetails WHERE Id = '5678'
it has no row,that is why its returning blank,So I want if no row then it should return 0
I already tried with COALESCE ,but its not working
how to solve this?
You are selecting an arbitrary amount, so one method is aggregation:
SELECT COALESCE(MAX(Amount), 0)
FROM PaymentDetails
WHERE Id = '5678';
Note that if id is a number, then don't use single quotes for the comparison.
To be honest, I would expect SUM() to be more useful than an arbitrary value:
SELECT COALESCE(SUM(Amount), 0)
FROM PaymentDetails
WHERE Id = '5678';
You can wrap the subquery in an ISNULL:
SELECT ISNULL((SELECT TOP 1 Amount from PaymentDetails WHERE Id = '5678' ORDER BY ????),0) AS Amount;
Don't forget to add a column (or columns) to your ORDER BY as otherwise you will get inconsistent results when more than one row has the same value for Id. If Id is unique, however, then remove both the TOP and ORDER BY as they aren't needed.
You should never, however, use TOP without an ORDER BY unless you are "happy" with inconsistent results.
I found this piece of code a while ago, and I don't seem to find any explanation about how it really works:
SELECT account_id from accounts order by account_id = 100;
So, I know what order by [column] desc|asc does to the result set. But I don't seem to find the explanation for giving a value to the [column] and how that affects the result set. It's clearly affected, but I don't seem to find a pattern.
Try rewriting your query using an explicit CASE expression in the ORDER BY clause:
SELECT account_id
FROM accounts
ORDER BY CASE WHEN account_id = 100 THEN 1 ELSE 0 END;
You will observe that all records having account_id != 100 will appear before all records where this is true. When you use:
ORDER BY account_id = 100
Then you are ordering by the boolean equality itself. So, when not true, it would evaluate to zero, and when true would evaluate to one.
Postgres supports boolean types, which take on two values "true" and "false" (plus NULL, of course). A boolean value is being used as the order by key.
The expression account_id = 100 evaluates to "true" for account_id 100 and false for others (or NULL).
What does this do? Well, "true" > "false" and the ordering is ascending. Hence, the true value is ordered after all other values; account_id 100 goes at the end. Well not quite the end. NULL values are lastest -- they would go at the very end.
More commonly, this is done with a descending sort:
order by (account_id = 100) desc
This puts account 100 first in the list.
Note: I put the expression in parentheses in such cases to make it clear that the intent really is to order by the expression. That is, there is no typo.
Basically ORDER BY is of two types ASC and DESC
By Default it is ASC
Let us take an example
SELECT * from Person
ORDER BY Age DESC
Above query returns the Age of Persons in Descending Order
I have a small piece of data (15 records) where part of it I want ordered in alphabetical order, and part of it ordered by ID
Image 1 shows my data in the original order
After doing the query SELECT * FROM tableName ORDER BY code;
Image 2 shows my data now in alphabetical order, which is great however I would like the top 2 records to be ordered by id
Image 3 shows how I would like my data to look
Could someone help with my query please?
i assumed id is an integer. You can use conditional CASE in the ORDER BY clause.
Note for the first expression case when code in ('LUX-INT', 'LUX-CONT') then -id end desc, it will return id or NULL. As NULL will comes first in ORDER BY, I use DESC and negate the id value so that id is in ascending order
order by case when code in ('LUX-INT', 'LUX-CONT') then -id end desc, code
use case when
SELECT * FROM tableName ORDER BY case when code in ('LUX INT','LUX-CONT') then id else code end
I would write this as:
order by (case when code in ('LUX-INT', 'LUX-CONT') then 1 else 2 end), -- put the special codes first
(case when code in ('LUX-INT', 'LUX-CONT') then code end), -- order them alphabetically
id -- order the rest by id
This works regardless of the types and collations of the underlying columns.
I want to retrieve a full table with some of the values sorted. The sorted values should all appear before the unsorted values. I though I could pull this off with a UNION but order by is only valid to use after unioning the table and my set of data isn't set up such that that is useful in this case. I want rows with a column value of 0-6 to show up sorted in DESC order and then the rest of the results to show up after that. Is there some way to specify a condition in the order by clause? I saw something that looked close to what I wanted to so but I couldn't get the equality condition working in sql. I'm going to try to make a query using WHEN cases but I'm not sure if there's a way to specify a case like currentValue <= 6. If anyone has any suggestions that would be awesome.
You could do something like this:
order by (case when currentValue <= 6 then 1 else 0 end) desc,
(case when currentValue <= 6 then column end) desc
The first puts the values you care about first. The second puts them in sorted order. The rest will be ordered arbitrarily.
Try this:
SELECT *
FROM yourdata
ORDER BY CASE WHEN yourColumn BETWEEN 0 AND 6 THEN yourColumn ELSE -1 End Desc
One RDBMS-agnostic solution would be to add a second field that takes the same value as the field you wish to sort when that field is less than or equal to six. Then just sort by that field.