I have a function named get_open_profit that calculated some data.
input of this function does not work properly.
I have a table named results that if we querying on it the result is :
select sum_buy_trades from results order by sum_buy_trades limit 1 : 274
select total_avg_buy from results order by sum_buy_trades limit 1 : 2019746
when I write function like this
select get_open_profit(274, 2019746) result is : 30192700
But if write like this I got error
select get_open_profit(select sum_buy_trades from results order by sum_buy_trades limit 1, select total_avg_buy from results order by sum_buy_trades limit 1
why it does not worked?
If you want to use scalar subqueries (that is, subqueries that return one value), then each needs their own parentheses:
select get_open_profit( (select sum_buy_trades
from results
order by sum_buy_trades
limit 1
),
(select total_avg_buy
from results
order by sum_buy_trades
limit 1
)
);
In this case, though, the query might be more naturally written as:
select get_open_profit( r.sum_buy_trades, r.total_avg_buy )
from (select sum_buy_trades, total_avg_buy
from results
order by sum_buy_trades
limit 1
) r;
Related
We would expect that this Google BigQuery query would remove at most 10 rows of results - but this query gives us zero results - despite that table A has thousands of rows all with unique ENCNTR_IDs.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_B`
LIMIT 10
)
If we make the query self-referential, it behaves as expected: we get thousands of results with just 10 rows removed.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_A` # <--- same table name
LIMIT 10
)
What are we doing wrong? Why does the first query give us zero results rather than just remove 10 rows of results?
Solution: Use NOT EXISTS instead of NOT IN when dealing with possible nulls:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE NOT EXISTS (SELECT * FROM UNNEST([2,3,null]) i2 WHERE i=i2)
# 1
Previous guess - which turned out to be the cause:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3])
# 1
vs
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3,null])
# This query returned no results.
Are there any nulls in that project.dataset.table_B?
I have the following query:
select count(*), cvc_date, cvc_product_id, cvc_seller_product_id, competitor_wid
from dw.f_catalogvscompetitors
group by cvc_date, cvc_product_id, cvc_seller_product_id, competitor_wid
having count(*) > 1
When I execute it, I get 0 rows as a result. However, when I make this:
select count(*) from (
select count(*), cvc_date, cvc_product_id, cvc_seller_product_id, competitor_wid
from dw.f_catalogvscompetitors
group by cvc_date, cvc_product_id, cvc_seller_product_id, competitor_wid
having count(*) > 1)
I get the number 22584. How is this possible?
order by has only one reducer, so slow.I'm trying to find a fast way.sort by sorts in each reducer,then how can we get global ordering?
I got this by search engine:
select * from
(select title,cast(price as FLOAT) p from tablename
distribute by time
sort by p desc
limit 10 ) t
order by t.p desc
limit 10;
Then try to validate it.
1.Get right answer in my hive table.There are 215666 records in the table named tablename.
SELECT title,cast(price as FLOAT) p
from tablename
WHERE dt='2020-03-08'
and price IS NOT NULL
ORDER BY p DESC
LIMIT 10
;
2.Use the searched clause.
set hive.execution.engine=mr;
set mapred.reduce.tasks=5;
SELECT title,cast(price as FLOAT) p
from tablename
WHERE dt='2020-03-08'
and price IS NOT NULL
DISTRIBUTE BY title
SORT BY p desc
LIMIT 10
;
The result is the same as the right answer!
Here are my questions:
1.Why only return 10 lines? There are 5 reducer, each reducer returns 10, should be 5*10=50?
2.If should return 10 lines, why the result is global ordering? This 10 line is not from the same reducer ? The limit is random, it cannot get global order in 5 reducer.
3.If should return 10 lines, the outer part in the searched clause is redundant?
select * from
(
) t
order by t.p desc
limit 10;
Consider using total order partitioner, see https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad#HBaseBulkLoad-PrepareRangePartitioning for details (just ignore part with HBase)
Objective : To take the users count which has event_name= 'Wallet'
Problem : I have limited the query's result to 100 to check so the expected result must be 100 but when I use count(params.value.string_value) it shows 124 .
Code : SELECT count(params.value.string_value) FROM "myproj.analytics_197163127.events_20190528",UNNEST(event_params) as params where event_name ='Wallet' and params.key = 'UserId' limit 100
Expected Result : if the query is returning 100 records the count should be 100 but how is it showing 124?
Hope the question is clear
limit is applied to the result set produced by the query.
Your query is an aggregation query with no group by. Such an aggregation always returns one row. So, the limit does not affect the results.
If you want to see 100 for the result set, use a CTE or subquery:
SELECT count(params.value.string_value)
FROM (SELECT params
FROM "myproj.analytics_197163127.events_20190528" e CROSS JOIN
UNNEST(e.event_params) params
WHERE e.event_name ='Wallet' AND params.key = 'UserId'
LIMIT 100
) ep
The query shows 100 records because of the limit 100 at the end:
SELECT event_date,event_timestamp,event_name, params.value.string_value
FROM myproj.analytics_197163127.events_20190528, UNNEST(event_params) as params
where event_name ='Wallet' and params.key = 'UserId'
limit 100
Remove that and check again.
The LIMIT 100 specifies the number of rows to be returned from the SQL statement. It does not influence the COUNT() in your query. So there is a difference between :
select count(*) from table limit 100
this will return a single value with the number of rows in the table. On the other hand :
select count(*) from (select * from table limit 100)
This will return 100 (if the table has more than 100 rows - otherwise it will return the number of rows in table)
In my application I use SELECT TOP 12 * clause to select top 12 records from database and show it to user. In another case I have to show the same result one by one. So I use SELECT TOP 1 * clause,rest of the query is same. I used Sql row_number() function to select items one by on serially.
The problem is SELECT TOP 1 * doesn't return me same row as I get in SELECT TOP 12 *. Also the result set of SELECT TOP 12 * get changed each time I execute the query.
Can anybody explain me why the result is not get same in SELECT TOP 12 * and SELECT TOP 1 *.
FYI: here is my sql
select distinct top 1 * from(
select row_number() over ( ORDER BY Ratings desc ) as Row, * from(
SELECT vw.IsHide, vw.UpdateDate, vw.UserID, vw.UploadPath, vw.MediaUploadID, vw.Ratings, vw.Caption, vw.UserName, vw.BirthYear, vw.BirthDay, vw.BirthMonth, vw.Gender, vw.CityProvince, vw.Approved
FROM VW_Media as vw ,Users as u WITH(NOLOCk)
WHERE vw.IsHide='false' and
GenderNVID=5 and
vw.UserID=u.UserID and
vw.UserID not in(205092) and
vw.UploadTypeNVID=1106 and
vw.IsDeleted='false' and
vw.Approved = 1 and
u.HideProfile=0 and
u.StatusNVID=126 and
vw.UserID not in(Select BlockedToUserID from BlockList WITH(NOLOCk) where UserID=205092) a) totalres where row >0
Thanks in Advance
Sachin
When you use SELECT TOP, you must use also the ORDER BY clause to avoid different results every time.
For performance resons, the database is free to return the records in any order it likes if you don't specify any ordering.
So, you always have to specify in which order you want the records, if you want them in any specific order.
Up to some version of SQL Server (7 IIRC) the natural order of the table was preserved in the result if you didn't specify any ordering, but this feature was removed in later versions.