Where clause with dates in hive - hive

The where clause in the below hive query is not working
select
e.num as badge
from dbo.events as e
where TO_DATE(e.event_time_utc) > TO_DATE(select event_date from DL_EDGE_LRF_facilities.card_swipes_lastpulldate)
both event_time_utc and event_date fields are defined as strings and event_time_utc has timestamp values like '2017-09-18 20:10:19.000000' and event_date has only one date value like '2018-01-25'
i am getting an error like "cannot recognize input near 'select' 'event_date' 'from' in function specification " when i run the query, Please help

#user86683; hive does not recognize the syntax since it does not allow in-query in the inequality condition (>). You may try this query and let me know the result.
select e.num as badge
from dbo.events as e, DL_EDGE_LRF_facilities.card_swipes_lastpulldate c
where TO_DATE(e.event_time_utc) > TO_DATE(c.event_date)
You will get a warning but you may ignore it since the table for event_date has only one record.
Warning: Map Join MAPJOIN[10][bigTable=e] in task 'Map 1' is a cross product
Query ID = xxx_20180201102128_aaabb2235-ee69275cbec1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_09fdf345)
Hope this helps. Thanks.

Related

Build query that brings only sessions that have only errors?

I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!
It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines
SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?

SQL BigQuery - Error that variable is not grouped by even though it is

SQL Code:
SELECT community_table.community_name,
community_table.id,
DATE(timestamp) as date,
ifnull(COUNT(distinct app_opened.user_id), 0) as num_opened_DAU,
lag(COUNT(distinct app_opened.user_id)) OVER
(ORDER BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
FROM *** app_opened
LEFT JOIN (
SELECT DISTINCT id, community_id_2, context_traits_first_name, context_traits_last_name
FROM (
SELECT *
FROM ***,
UNNEST (JSON_EXTRACT_ARRAY(context_traits_community_ids, "$")) as community_id_2
)
GROUP by community_id_2, id, context_traits_first_name, context_traits_last_name) as community_id_table
ON community_id_table.id = app_opened.user_id
LEFT JOIN (
SELECT DISTINCT id, name as community_name
FROM ***) as community_table
ON TO_JSON_STRING(community_table.id) = community_id_table.community_id_2
WHERE app_opened.user_id is not null AND
EXTRACT(DAYOFWEEK FROM DATE(timestamp)) = 2 AND
community_table.community_name is not null
GROUP BY community_table.community_name, community_table.id, DATE(timestamp)
Error Message:
I am quite confused on what could be going wrong here, as the error says that timestamp is not grouped, even though I have grouped it at the bottom. I tried including just timestamp rather than Date(timestamp), but that ruins the table data that I am trying to create, where I find the number of users on a single day. Does anyone have any other ideas? My goal is for a single row, get the previous row's data, but because I am grouping by specific metrics, I need to make sure they are ordered by them as well. Thank you so much!
I think you simply need to modify OVER part as:
OVER (PARTITION BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
UPDATE. Seems that the problem was caused by using DATE() function within OVER so it can be solved by using DATE(timestamp) inside of subquery and passing alias to OVER

Case Statement in SQL Query Issue

I'm trying to run a SQL query but an error happens when I run it.
Error:
[Code: -811, SQL State: 21000]
The result of a scalar fullselect, SELECT INTO statement, or VALUES INTO statement is more than one row..
SQLCODE=-811, SQLSTATE=21000, DRIVER=4.19.49
This is the SQL query that I am trying to run, I believe there is a problem with my CASE statement, I'm running out of solution. Please help, thanks a lot!
SELECT
ES.SHPMNT_REF,
(CASE
WHEN (ES.SERVICE_PROVIDER_NAME) IS NULL
THEN (SELECT BRDB.EXPORT_ONHAND.SERVICE_PROVIDER_NAME
FROM BRDB.EXPORT_ONHAND
WHERE BRDB.EXPORT_ONHAND.SHPMNT_REF = ES.SHPMNT_REF)
ELSE (ES.SERVICE_PROVIDER_NAME)
END) AS SP
FROM
BRDB.EXPORT_SHIPMENT ES
WHERE
ES.DATE_CREATE > CURRENT TIMESTAMP - 30 DAYS
I think this is what you are after. Joining on the table data you might need, then letting COALESCE check for null and get the other data if it is.
SELECT
ES.SHPMNT_REF,
COALESCE(ES.SERVICE_PROVIDER_NAME, OH.SERVICE_PROVIDER_NAME) AS SP
FROM BRDB.EXPORT_SHIPMENT ES
LEFT JOIN BRDB.EXPORT_ONHAND AS OH
ON ES.SHPMNT_REF = OH.SHPMNT_REF
WHERE
ES.DATE_CREATE > CURRENT TIMESTAMP - 30 DAYS
Maybe you should put inside the case:
THEN (SELECT TOP 1 BRDB.EXPORT_ONHAND.SERVICE_PROVIDER_NAME
The error is thrown because your subquery returns multiple values.
The best way would be joining the two table first and then get the value you need when the value is NULL.
That should work:
SELECT
ES.SHPMNT_REF
,CASE
WHEN ES.SERVICE_PROVIDER_NAME IS NULL
THEN EO.SERVICE_PROVIDER_NAME
ELSE ES.SERVICE_PROVIDER_NAME
END AS 'SP'
FROM BRDB.EXPORT_SHIPMENT ES
LEFT JOIN BRDB.EXPORT_ONHAND EO
ON ES.SHPMNT_REF = EO.SHPMNT_REF
WHERE ES.DATE_CREATE > CURRENT TIMESTAMP - 30 DAYS

Google BiqQuery Internal Error

Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?
The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.
If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.

Only a single result allowed for a SELECT that is part of an expression

I have the following SQL statement. It's throws the following error: "Only a single result allowed for a SELECT that is part of an expression". The goal of my sql statement is to get the name of the employee who made the 'cheapest' bribe.
The part between the brackets return the employee_id and the money it costs a day (of the relative cheapest bribe). These are two results while I only want the employee_id. So I just want to use the MIN part to get the right employee_id. How can I do this?
SELECT Voornaam, Achternaam
FROM Medewerker m JOIN
(
SELECT Medewerker_id
FROM Steekpenning
ORDER BY -1*Bedrag/(julianday(Begindatum) - julianday(Einddatum))
limit 1
) s
on m.Medewerker_id = s.Medewerker_id;
EDITED the answer. How can I expand this query to only show the bribes started this month? I think I need to use something like this? (julianday(Begindatum) - julianday('now')) > 31 but where?
Regards.
Cas
I think the following will work in SQLite:
select Firstname, Surname
from Employee e join
(select employee_id
from bribe
order by -1*Amount/(julianday(Startdate) - julianday(Enddate))
limit 1
) b
on e.employee_id = b.employee_id;