Is there any other way to get the aggregation query execution time metrics in MongoDB atlas? - mongodb-query

Having read that MongoDB Atlas is high performant, I'm eager to know the Execution Time differences between my aggregation queries run on my local MongoDB Server and on my MongoDB Atlas instance. Hence I have set up a M0(Free Tier) MongoDB Atlas Database.
In my local MongoDB Server, I can see the aggregation query execution time using the following command:
db.collection.explain("executionStats").aggregate([
{ $match....},
{ $facet...},
...])
My collection contains 10k documents with nested arrays and complex aggregation query.Execution time is almost 30 seconds.
But when it comes to MongoDB Atlas, it is specified in the doc that Atlas M0 Free Tier clusters do not support the helper methods like explain() for the aggregation command. So I cannot see the execution time.
Is there any other way to get the aggregation query execution time metrics without upgrading?

It can be done simply using a Javascript function.
d = new Date;
db.coll.aggregate(...);
print(new Date - d + 'ms')

Related

Hasura querying becomes slower For Tracked SQL Function

For tracked SQL function, hasura query is taking a lot of time but when it is executed from SQL directly it takes only few milliseconds to get the data. We are not able to figure out what is the actual problem as we are using Postgresql DB
We followed some steps to reduce the response time
Applying indexes on DB
Analysing the query plan to reduce the cost
Querying only a limited set of data to reduce the response size
We tried to run that query from SQL directly which only took few milliseconds but when when try to run from hasura query it took a lot of time for same parameters
I suspect that it is probably due to permissions that are being evaluated when you run the function through Hasura.
When you were analysing the query plan, did you also make sure that you were passing in roles to ensure that the plan captures any additional changes to the query that are required in order for the permissions to be evaluated?

Bad performance on simple SQL update on Azure DB

I have a table with about 4 million rows. What I'd like to do is to add two more columns and then update the values of these two columns based on the third column in this same table. Basically I'm trying to set IsoWeek and IsoYear based on ReportDate.
I've added the columns and all the values are NULL, now I've started with simple update all script like below:
UPDATE Report
SET IsoWeek = DATEPART(ISO_WEEK, ReportDate), IsoYear = dbo.ISO_YEAR(ReportDate)
It took 5sec locally, but it was over 10min on Azure test DB so I cancelled and reimplemented the query with batches. It was around the same 5sec locally, but on Azure test DB it was still super slow. This time I've waited more and it completed in about 45 minutes.
I have to run a similar script on PROD Azure DB, so now I'm trying to find ways to optimize this update.
I've added WHERE Id <= 50000 to update only one chunk:
UPDATE Report
SET IsoWeek = DATEPART(ISO_WEEK, ReportDate), IsoYear = dbo.ISO_YEAR(ReportDate)
WHERE Id <= 50000
This query executed locally in 0sec and about 7sec on Azure TEST db. This seems like a good comparison test and I started comparing execution plans.
Locally:
Azure TEST db:
So I'm not sure why is it different on local and Azure Test DB and how can I make it faster on Azure.
Any ideas?
UPD:
When I removed dbo.ISO_YEAR, execution plan is now better but execution time went down from 7sec to 6sec only.
Looks like you have a scalar UDF in your query, causing a table spool, plus a lot of context switching. Azure will not inline these UDFs.
The table spool might be removed by changing the UDF to use SCHEMABINDING, but you're best off inlining it yourself, either direct in the query or as an iTVF.
Here is a request to add scalar UDF inlining to Azure:
https://feedback.azure.com/forums/217321-sql-database/suggestions/38955436-bring-scalar-udf-inlining-to-azure-sql-database
There are many things that could be different on Azure SQL vs SQL Server on-premises and that may affect performances. For example:
are you using Simple Recovery model on SQL Server? Azure SQL always run in Full Recovery
are you using ADR on SQL Server? Azure SQL always run with ADR on
are you using TDE on SQL Server? Azure SQL has TDE enabled by default
Also, you don't mention with Azure SQL Tier are you using. Standard/GeneralPurpose or Premium/BusinessCritical? Or Hyperscale? How many cores or DTUs?

How to optimize the downloading of data to the server in SSIS

Good day.
Need to get records from an Oracle database to a database in SQL Server. The data source type (ODBC) the performed using a SQL command, where I am taking all possible indices according to my requirement. The process runs fine, the problem is that it takes a long time and I need to be something quick. The process can not be performed with lookup, requires merge or merge join, simply load a table from Oracle to SQL under certain conditions.
Thank you for your help
Check what is your limiting factor. Generally there are 3 points to check:
Remote server is slow.
Source DB can run low on memory, read speed or free CPU. Substitute you query with a straight SELECT statement with no WHERE clause or JOINs and see if your SSIS package runs faster.
Target DB.
You may have indexes enabled, high write latency on HDD or not enough CPU.
Run an INSERT for your target table and see how longer it takes.
Problem may be in the middle: transfer between 2 servers. Network usually is main bottleneck. Is SSIS hosted on the same server as SQL server? then you have 2 network connections + possible hardware bottleneck on dedicated SSIS machine.
Depending on the bottleneck there are different solutions.
If you have network capacity and bottleneck is 1 CPU per query on Oracle, then you can partition your data horisontally (IDs 1 to 100, 101 to 200 etc); establish multiple connections to Oracle and load data in several streams. Number of streams is 1 less then number of CPUs on Oracle, SSIS or SQL Server (which ever is smaller).

Need nosql database for queries with bitwise condition

I am currently using apache cassandra database for storing information.
But cassandra does not allow to perform queries with bitwise operations.
I need to execute query:
select count(*) from table where field1 = ? and BIT_COUNT(field2 ^ ?) <= 10;
But cassandra does not allow it.
Can you advice some nosql or embedded fast sql solution?
Database contains greater than 1 million rows.
If you're happy with Cassandra otherwise, you could add Spark and use Spark SQL to do queries like that. Spark has an open-source connector to use Cassandra as its distributed database.
There's also DataStax Enterprise which would allow you to integrate with Hadoop/Hive and get similar analytic capabilities. (DataStax Enterprise is also an easy way to get Spark functionality.)

Retrieving billions of rows from remote server?

I am trying to retrieve around 200 billion rows from a remote SQL Server. To optimize this, I have limited my query to use only an indexed column as a filter and am selecting only a subset of columns to make the query look like this:
SELECT ColA, ColB, ColC FROM <Database> WHERE RecordDate BETWEEN '' AND ''
But it looks like unless I limit my query to a time window of a few hours, the query fails in all cases with the following error:
OLE DB provider "SQLNCLI10" for linked server "<>" returned message "Query timeout expired".
Msg 7399, Level 16, State 1, Server M<, Line 1
The OLE DB provider "SQLNCLI10" for linked server "<>" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7421, Level 16, State 2, Server <>, Line 1
Cannot fetch the rowset from OLE DB provider "SQLNCLI10" for linked server "<>".
The timeout is probably an issue because of the time it takes to execute the query plan. As I do not have control over the server, I was wondering if there is a good way of retrieving this data beyond the simple SELECT I am using. Are there any SQL Server specific tricks that I can use? Perhaps tell the remote server to paginate the data instead of issuing multiple queries or something else? Any suggestions on how I could improve this?
This is more of the kind of job SSIS is suited for. Even a simple flow like ReadFromOleDbSource->WriteToOleDbSource would handle this, creating the necessary batching for you.
Why read 200 Billion rows all at once?
You should page them, reading say a few thousand rows at a time.
Even if you do genuinely need to read all 200 Billion rows you should still consider using paging to break up the read into shorter queries - that way if a failure happens you just continue reading where you left off.
See efficient way to implement paging for at least one method of implementing paging using ROW_NUMBER
If you are doing data analysis then I suspect you are either using the wrong storage (SQL Server isn't really designed for processing of large data sets), or you need to alter your queries so that the analysis is done on the Server using SQL.
Update: I think the last paragraph was somewhat misinterpreted.
Storage in SQL Server is primarily designed for online transaction processing (OLTP) - efficient querying of massive datasets in massively concurrent environments (for example reading / updating a single customer record in a database of billions, at the same time that thousands of other users are doing the same for other records). Typically the goal is to minimise the amout of data read, reducing the amount of IO needed and also reducing contention.
The analysis you are talking about is almost the exact opposite of this - a single client actively trying to read pretty much all records in order to perform some statistical analysis.
Yes SQL Server will manage this, but you have to bear in mind that it is optimised for a completely different scenario. For example data is read from disk a page (8 KB) at a time, despite the fact that your statistical processing is probably only based on 2 or 3 columns. Depending on row density and column width you may only be using a tiny fraction of the data stored on an 8 KB page - most of the data that SQL Server had to read and allocate memory for wasn't even used. (Remember that SQL Server also had to lock that page to prevent other users from messing with the data while it was being read).
If you are serious about processing / analysis of massive datasets then there are storage formats that are optimised for exactly this sort of thing - SQL Server also has an add on service called Microsoft Analysis Services that adds additional online analytical processing (OLAP) and data mining capabilities, using storage modes more suited to this sort of processing.
Personally I would use a data extraction tool such as BCP to get the data to a local file before trying to manipulate it if I was trying to pull that much data at once.
http://msdn.microsoft.com/en-us/library/ms162802.aspx
This isn't A SQL Server specific answer, but even when the rDBMS supports server side cursors, it's considered poor form to use them. Doing so means that you are consuming resources on the server even though the server is still waiting for you to request more data.
Instead you should reformulate your query usage so that the server can transmit the entire result set as soon as it can, and then completely forget about you and your query to make way for the next one. When the result set is too large for you process all in one go, you should keep track of the last row returned by the current batch so that you can fetch another batch starting at that position.
Odds are the remote server has the "Remote Query Timeout" set. How long does it take for the query to fail?
Just run into the same problem, I also had the message at 10:01 after running the query.
Check this link. There's a remote query timeout setting under Connections that's setup to 600secs by default and you need to change it to zero (unlimited) or other value you think is right.
Try to change remote server connection timeout property.
For that go to SSMS, connect to the server, right click on server's name in object explorer, further select Properties -> Connections and change value in the Remote query timeout (in seconds, 0 = no timeout) text box.