Arithmetic operation overflow in Azure Stream Analytics with reference data - azure-stream-analytics

I have an Azure Stream Analytics job that enhances incoming data with some reference data. Some of them, e. g. physical addresses change not so often.
A generic script would look like this.
SELECT
a.input1
a.input2
b.input1
INTO output
from a
LEFT JOIN metadata AS b ON a.input1 = b.input1
If I try to run this script, I get an error:
System Exception Arithmetic operation resulted in an overflow.
at Microsoft.Streaming.StreamingNode.ReferenceData.DiscoveryCursor.get_NextDiscoveryDueTimeInMillisecond()
at Microsoft.Streaming.StreamingNode.ReferenceData.ReferenceDataDiscoveryStrategyBase.Discover()
at Microsoft.Streaming.StreamingNode.ReferenceData.ReferenceDataDiscoveryProcessor.Process(Boolean isInitialization)
at Microsoft.Streaming.StreamingNode.ReferenceData.ReferenceDataDiscoveryProcessor.InitializeInput()
at Microsoft.Streaming.StreamingNode.ProcessorScheduler.InitializeTopology(CancellationToken cToken)
at Microsoft.Streaming.StreamingNode.ProcessorScheduler.ThreadProc()
I simplified the script step by step. Eliminating every notion of b in the select statement was not sufficient, I even had to delete the Join statement.
What troubles me is that there is no arithmetic operation, so I have no idea where the overflow should come from. And the reference data in my database are ridiculously simple.

The solution to the above problem struck me by chance. Due to the rarely changing reference data I set the refresh-rate of the reference data to something like 30 days.
This leads to the overflow error. After reducing the refresh limit to 10 days I encountered no further issues and my ASA query worked.
Hope this helps somebody and saves some time.

Related

Resources Exceeded on simple query when using DAY() or DATE() functions

This query runs fails with resources exceeded:
SELECT
*,
DAY(event_timestamp) as whywontitwork,
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
But this one works fine:
SELECT
*
FROM
looker_scratch.LR_78W8A60O4MQ20L2U6OA5B_events_sql_doctor_activity
The source table is 14m rows but I've run similar queries on much larger datasets before. We have large results enabled and have tried both flattened results and not (though there are no nested fields anyway). The error also occurs if you use the DATE() function instead of DAY(), or a REGEXP_EXTRACT() function
The job id is realself-main:bquijob_69e3a888_152f1fdc205.
You've hit an internal error in BigQuery. We tweaked our query engine's configuration at around 3pm (US Pacific Time) in an effort to prevent the error.
Update: After observing the error rate, it looks like this change has fixed the problem. If you see any other issues, please let us know. Note that StackOverflow is best for usage questions, but if you suspect a bug, you can file an issue at our public issue tracker.

How to use BigQuery Slots

Hi,there.
Recently,I want to run a query in bigquery web UI by using "group by" over some tables(tables' name suits xxx_mst_yyyymmdd).The rows will be over 10 million. Unhappily,the query failed with this error:
Query Failed
Error: Resources exceeded during query execution.
I did some improvements with my query language,the error may not happen for this time.But with the increasement of my data, the Error will also appear in the future.So I checked the latest release of Bigquery,maybe there two ways to solve this:
1.After 2016/01/01,Bigquery will change the Query pricing tiers to satisfy the "High Compute Tiers" so that the "resourcesExceeded error" will not happen again.
2.BigQuery Slots.
I checked some documents in Google and didn't find a way on how to use BigQuery Slots.Is there any sample or usecase of BigQuery Slots?Or I have to contact with BigQuery Team to open the function?
Hope someone can help me to answer this question,thanks very much!
A couple of points:
I'm surprised that a GROUP BY with a cardinality of 10M failed with resources exceeded. Can you provide a job id of the failed query so we can investigate? You mention that you're concerned about hitting these errors more often as your data size increases; you should likely be able to increase your data size by a few more orders of magnitude without seeing this; likely you've encountered either a bug or something was strange with either your query or your data.
"High Compute Tiers" won't necessarily get rid of resourcesExceeded. For the most part, resourcesExceeded means that BigQuery ran into memory limitations; high compute tiers only address CPU usage. (and note, they haven't been enabled yet).
BigQuery slots enable you to process data faster and with more reliable performance. For the most part, they also wouldn't help prevent resourcesExceeded errors.
There is currently (as of Nov 5) a bug where you may need to provide an EACH keyword with a GROUP BY. Recent changes should enable BigQuery to automatically select the execution strategy, so EACH shouldn't be needed, but there are a couple of cases where it doesn't pick the right one. When in doubt, add an EACH to your JOIN and GROUP BY operations.
To get your project eligible for using slots you need to contact support.

Way to optimise a mapping on informatica

I would like to optimise a mapping developped by one of my colleague and where the "loading part" (in a flat file) is really really slow - 12 row per sec
Currently, to get to the point where I start writting in my file, I take about 2 hours, so I would like to know where I should start looking first otherwise, I will need at least 2 hours between each improvment - which is not really efficient.
Ok, so to describe simply what is done :
Oracle table (with big query inside - takes about 2 hours to get a result)
SQ
2 LKup on ref table (should not be heavy)
update strategy
1 transformer
2 Lk up (on big table - that should be one optimum point I guess : change them to joiner)
6 stored procedure (these also seem a bit heavy, what do you think ?)
another tranformer
load in the flat file
Can you confirm that either the LK up or the stored procedur part could be the reason why it is so slow ?
Do you think that I should look somewhere else to optimize ? I was thinking may be only 1 transformer.
First check the logs carefuly. Look at the timestamps. It should give you initial idea what part causes delay.
Lookups to big tables are not recommended. Joiners are a better way, but they still need to cache data. Can you limit the data for cache, perhaps? It'll be very hard to advise without seeing it.
Which leads us to the Stored Procedures: it's simply impossible to tell anything about them just like that.
So: first collect the stats and do log analysis. Next, read some tuning guides on the Net - there's plenty. Here's a more comprehensive one, but well... large - so you might like to try and look for some other ones.
Powercenter Performance Tuning Guide

out of memory sql execution

I have the following script:
SELECT
DEPT.F03 AS F03, DEPT.F238 AS F238, SDP.F04 AS F04, SDP.F1022 AS F1022,
CAT.F17 AS F17, CAT.F1023 AS F1023, CAT.F1946 AS F1946
FROM
DEPT_TAB DEPT
LEFT OUTER JOIN
SDP_TAB SDP ON SDP.F03 = DEPT.F03,
CAT_TAB CAT
ORDER BY
DEPT.F03
The tables are huge, when I execute the script in SQL Server directly it takes around 4 min to execute, but when I run it in the third party program (SMS LOC based on Delphi) it gives me the error
<msg> out of memory</msg> <sql> the code </sql>
Is there anyway I can lighten the script to be executed? or did anyone had the same problem and solved it somehow?
I remember having had to resort to the ROBUST PLAN query hint once on a query where the query-optimizer kind of lost track and tried to work it out in a way that the hardware couldn't handle.
=> http://technet.microsoft.com/en-us/library/ms181714.aspx
But I'm not sure I understand why it would work for one 'technology' and not another.
Then again, the error message might not be from SQL but rather from the 3rd-party program that gathers the output and does so in a 'less than ideal' way.
Consider adding paging to the user edit screen and the underlying data call. The point being you dont need to see all the rows at one time, but they are available to the user upon request.
This will alleviate much of your performance problem.
I had a project where I had to add over 7 million individual lines of T-SQL code via batch (couldn't figure out how to programatically leverage the new SEQUENCE command). The problem was that there was limited amount of memory available on my VM (I was allocated the max amount of memory for this VM). Because of the large amount lines of T-SQL code I had to first test how many lines it could take before the server crashed. For whatever reason, SQL (2012) doesn't release the memory it uses for large batch jobs such as mine (we're talking around 12 GB of memory) so I had to reboot the server every million or so lines. This is what you may have to do if resources are limited for your project.

NHibernate taking a long time to run query

This is being done using Fluent NHibernate
I've got a NHibernate lookup that is retrieving data from one table. If i take the generated sql and run it through query analyzer, it takes ~18ms to run.
Using NHProfiler, i'm getting the duration of this query as ~1800ms - 100 times longer than sql !
Query duration
- Database only:1800ms
- Total: 1806ms
The object that is being populated contains a child class, but this child is being loaded from the NHibernate 2nd level cache
The data that is being returned is paged (50 per query) although as far as i can tell, this shouldn't make any difference
I've also got a count running, and again, this is taking ~4ms in query analyzer and ~1800ms according to NHProfiler.
Is NH Profiler displaying the query execution time, or the complete time to retrieve, map the classes and construct the object graph? And if it's the former - why's it taking so much longer than running the query directly?
EDIT: Just found this post by Ayende about the Query Duration value given in NH Profiler: http://ayende.com/Blog/archive/2009/06/28/nh-prof-query-duration.aspx - so it is definitely the query of the database that is taking a long time
Finally managed to track down the problem.
The primary key for the object is a varchar in the database. NHibernate was converting the value to an nvarchar when it ran the query. Unfortunately this wasn't obvious when looking at the generated sql in NH Profiler. The slowdown was caused by sql converting the nvarchar back to a varchar
I've specified the mapping to use a custom type
map.Id(x => x.Id).CustomType("AnsiString");
and the problem is solved
Cheers for all the help people :)
generally these problems resolve to the network between you and your data base. QA usually connects directly to the data base and all it has to send is the raw data back where its formatted. Your app is probably converting your result set into a data set or similar construct. To prove this, change a bit of code (not your entire data layer) to use a SQL Data Reader to read your data. Just read all of the records without trying to parse out all of the columns and save the data. It will likely perform as fast as your network will let it.