Google BiqQuery Internal Error - google-bigquery

Edit: Tidied up the query a bit. Checked running on one day (versus the 27 I need) and the query runs. With 27 days of data it's trying to process 5.67TB. Could this be the issue?
Latest ID of error run:
Job ID: ee-corporate:bquijob_3f47d425_1530e03af64
I keep getting this error message when trying to run a query in BigQuery, both through the UI and Bigrquery.
Query Failed
Error: An internal error occurred and the request could not be completed.
Job ID: ee-corporate:bquijob_6b9bac2e_1530dba312e
Code below:
SELECT
CASE WHEN d.category_grouped IS NULL THEN 'N/A' ELSE d.category_grouped END AS category_grouped_cleaned,
COUNT(UNIQUE(msisdn_token)) AS users,
(SUM(up_link_data_bytes) + SUM(down_link_data_bytes))/1000000 AS tot_data_mb
FROM (
SELECT
request_domain, up_link_data_bytes, down_link_data_bytes, msisdn_token, timestamp
FROM (TABLE_DATE_RANGE([helpful-skyline-97216:WEBLOG_Staging.WEBLOG_], TIMESTAMP('20160101'), TIMESTAMP('20160127')))
WHERE SUBSTR(http_status_code,1,1) IN ('1',
'2',
'3')) a
LEFT JOIN EACH web_usage_201601.domain_to_cat_lookup_27JAN_with_groups d
ON
a.request_domain = d.request_domain
WHERE
DATE(timestamp) >= '2016-01-01'
AND DATE(timestamp) <= '2016-01-27'
GROUP EACH BY
1
Is there something I'm doing wrong?

The problem seems to be coming from UNIQUE() - it returns repeated field with too many elements in it. The error could be improved, but workaround for you would be to use explicit GROUP BY and then run COUNT on top of it.

If you are okay with an approximation, you can also use
COUNT(DISTINCT msisdn_token) AS users
or a higher approximation parameter than the default 1000,
COUNT(DISTINCT msisdn_token, 5000) AS users
GROUP BY is the most general approach, but these can be faster if they do what you need.

Related

Build query that brings only sessions that have only errors?

I have a table with sessions events names. Each session can have 3 different types of events.
There are sessions that have only error type event and I need to identify them by getting a list those session.
I tried the following code:
SELECT
test.SessionId, SS.RequestId
FROM
(SELECT DISTINCT
SSE.SessionId,
SSE.type,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId, SSE.type) AS total_XSESIONID_TYPE,
COUNT(SSE.SessionId) OVER (ORDER BY SSE.SessionId) AS total_XSESIONID
FROM
[CMstg].SessionEvents SSE
-- WHERE SSE.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
) AS test
WHERE
test.total_XSESIONID_TYPE = test.total_XSESIONID
AND test.type = 'Errors'
-- AND test.SessionId IN ('fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb' )
Each session can have more than one type, and I need to count only the sessions that have only type 'errors'. I don't want to include sessions that have additional types of events in the count
While I'm running the first query I'm getting a count of 3 error event per session, but while running the all procedure the number is multiplied to 90?
Sample table :
sessionID
type
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
NonError
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
00c896a0-dccc-41bf-8dff-a5cd6856bb76
Errors
In this case I should get
sessionid = fa3ed523-60f9-4af0-a85f-1dec9e9d2cdb
Please advice - hope this is clearer now, thanks!
It's been a long time but I think something like this should get you the desired results:
SELECT securemeSessionId
FROM <TableName> -- replace with actual table name
GROUP BY securemeSessionId
HAVING COUNT(*) = COUNT(CASE WHEN type = 'errors' THEN 1 END)
And a pro tip: When asking sql-server questions, it's best to follow these guidelines
SELECT *
FROM NameOfDataBase
WHERE type!= 'errors'
Is it what you wanted to do?

Netezza Box reboots when following query is executed

When I run the following query, my Netezza NPS reboots. Would someone please let me know what is causing this behaviour?
select avg ( bse.WEEKS_BETWEEN_RESPONSES_HR ) as g_AVG
, sqlext.median( bse.WEEKS_BETWEEN_RESPONSES_HR ) as g_med
from (
select WEEKS_BETWEEN_RESPONSES_HR
FROM (
select distinct LOYALTY_ACCOUNT_CARD_ID
, BONUS_END_DATE
, LAG(BONUS_END_DATE,1) OVER (partition by LOYALTY_ACCOUNT_CARD_ID order by BONUS_END_DATE) as PRIOR_BONUS_END_DATE
,(( BONUS_END_DATE - PRIOR_BONUS_END_DATE)/7) as WEEKS_BETWEEN_RESPONSES_HR
from JO_ACT_PTD_STEP_1 bse
where upper ( bonus_desc ) like '%SPEND%'
and redemption = 1
) BSE
where WEEKS_BETWEEN_RESPONSES_HR is not null and WEEKS_BETWEEN_RESPONSES_HR > 0
) bse limit 500 ```
You need to call the support people at IBM
There is probably a stack trace or a dump file somewhere that will tell them what happened
If I was experiencing your problem I would remove each of the function calls one by one and make the sql simpler and simpler until the error disappeared
But of course you will need to do that in the middle of the night or at a time when nobody else is being bothered by the constant re-boots

Where clause with dates in hive

The where clause in the below hive query is not working
select
e.num as badge
from dbo.events as e
where TO_DATE(e.event_time_utc) > TO_DATE(select event_date from DL_EDGE_LRF_facilities.card_swipes_lastpulldate)
both event_time_utc and event_date fields are defined as strings and event_time_utc has timestamp values like '2017-09-18 20:10:19.000000' and event_date has only one date value like '2018-01-25'
i am getting an error like "cannot recognize input near 'select' 'event_date' 'from' in function specification " when i run the query, Please help
#user86683; hive does not recognize the syntax since it does not allow in-query in the inequality condition (>). You may try this query and let me know the result.
select e.num as badge
from dbo.events as e, DL_EDGE_LRF_facilities.card_swipes_lastpulldate c
where TO_DATE(e.event_time_utc) > TO_DATE(c.event_date)
You will get a warning but you may ignore it since the table for event_date has only one record.
Warning: Map Join MAPJOIN[10][bigTable=e] in task 'Map 1' is a cross product
Query ID = xxx_20180201102128_aaabb2235-ee69275cbec1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_09fdf345)
Hope this helps. Thanks.

IBM Informix-SQL syntax error, basic query from Microsoft BIDS to Cisco UCCX database

I'm running the below query against an IBM Informix database and getting an ERROR 42000: A syntax error has occurred. The FROM and WHERE clauses run fine in other queries, so I'm looking at the SELECT and GROUP BY portions. Any ideas what's wrong with the syntax?
SELECT COUNT(DISTINCT "informix".agentconnectiondetail.sessionid) AS calls_abandoned,
DAY("informix".agentconnectiondetail.startdatetime) AS Expr2
FROM "informix".agentconnectiondetail, "informix".contactqueuedetail, "informix".contactservicequeue
WHERE "informix".agentconnectiondetail.sessionid = "informix".contactqueuedetail.sessionid AND
"informix".contactqueuedetail.targetid = "informix".contactservicequeue.recordid AND "informix".contactqueuedetail.disposition = 1 AND
"informix".agentconnectiondetail.startdatetime BETWEEN '2016-10-1 00:00:00' AND CURRENT
GROUP BY DAY("informix".agentconnectiondetail.startdatetime)
The goal btw is to find the total number of unique calls (calls_abandoned) that occur on each day of the month (1-31).
Replace the
GROUP BY DAY("informix".agentconnectiondetail.startdatetime)
by
GROUP BY 2

if clause in bigquery

I have a (join) query that works as expected. But as soon as I add the following column, it does not show any results nor does it complete. (Query running counter keeps growing)
IF((d.network_type contains '_user' AND d.is_network=1),s.impressions,0) AS effimp
Is there any other way to optimize this?
The full query is as follows and it was working when I tried it in the last month.
SELECT s.date_time AS date_time
, s.requests AS requests, s.impressions AS impressions
, s.clicks AS clicks, s.conversions AS conversions
, IF((d.network_type contains '_user'
AND d.is_network=1),s.impressions,0) AS effimp
, s.total_revenue AS total_revenue
, s.total_basket_value AS total_basket_value
, s.total_num_items AS total_num_items
, s.zone_id as zone_id
FROM company.ox_data_summary s
INNER JOIN company.ox_banners1 AS d ON d.bannerid=s.ad_id
limit 100
Query Failed
Error: Unexpected. Please try again.
If I remove the "IF clause it does work.
Looks like you're hitting a query processing bug. We're investigating.