Streaming query with mssql and node, very slow the first time - sql

I am using node js 10.16.0 and the node-mssql module to connect to a DB. Everything works fine and my simple queries work fine.
If I try to stream data from a query, using the node-mssql example , the first time I execute its very slow. It doesnt show a Timeout Error, but takes about a minute or more to complete.
According to the console log, it brings the first 55 rows and then stops for a while. It looks like it takes some time between the "sets" of data, as I divide them, according to my code below . If I execute the same query a second or third time, it takes only a second to complete. The total amount of rows is about 25.000 or more
How can I make my stream queries faster, at least the first time
Here is my code
following the example, the idea is, start streaming, set 1000 rows, pause streaming, process that rows, send them back with websockets, empty all arrays, continue with streaming, until done
let skate= [];
let leather= [];
let waterproof = [];
let stream_start = new Date();
const request = new sql.Request(pool);
request.stream = true;
request
.input('id_param', sql.Int, parseInt(id))
.input('start_date_param', sql.VarChar(50), startDate)
.input('stop_date_param', sql.VarChar(50), stopDate)
.query('SELECT skate, leather , waterproof FROM shoes WHERE id = #id_param AND CAST(startTime AS date) BETWEEN #start_date_param AND #stop_date_param ');
request.on('row', row => {
rowc++; console.log(rowc);
rowsToProcess.push(row);
if (rowsToProcess.length >= 1000) {
request.pause();
processRows();
}
});
const processRows = () => {
rowsToProcess.forEach((item, index) => {
skate.push(item.skate);
leather.push(item.leather );
waterproof.push(item.waterproof);
});
measurementsData.push(
{title: 'Skate shoes', data: skate},
{title: 'Leather shoes', data: leather},
{title: 'Waterproof shoes', data: waterproof}
);
console.log('another processRows done');
//ws.send(JSON.stringify({ message: measurementsData }));
rowsToProcess = [];
skate= [];
leather= [];
waterproof = [];
measurementsData = [];
request.resume();
}
request.on('done', () => {
console.log('rowc , ', rowc);
console.log('stream start , ', stream_start);
console.log('stream done , ', new Date());
processRows();
});

I would try to improve the indexing of shoes table. From what I see, 2 possible issues with your query/indexing :
You filter by datetime startTime column but there is index only on the id column (according to the comments)
You cast datetime to date within the where clause of the query
Indexes
As you're filtering only on date without time part, I'd suggest you to create a new column startDate which is the conversion of startTime to date and create an index on it. And then use this indexed column in the query.
Also, since you select only skate, leather , waterproof columns, including them in the index could give better performances. Read about indexes with included columns.
If you are always selecting data that is greater or older than certain date then you may look into filtered indexes.
Avoid cast in where
Even if in general cast does not cost but when using it within where clause it might keep SQL Server from making efficient use of the indexes. So you should avoid it.
If you create a new column with just the date part and index it as cited above, you don't need to use cast here:
WHERE id = #id_param AND startDate BETWEEN #start_date_param AND #stop_date_param

When a query runs slow the first time but fast in subsequent executions, as someone suggested earlier, its generally due to caching. The performance is quite likely related to the storage device that the database is operating on.
I expect the explain plan does not change between executions.

you should remove the cast on where clause or create a computed index (if possible in your db)
operations in the column always may hurt your query, avoid it if possible
try just set your where parameters
#start_date_param to date yyyy-mm-dd 00:00:00
#stop_date_param to date yyyy-mm-dd 23:59:59
AND startTime BETWEEN #start_date_param AND #stop_date_param

Related

Filter a Postgres Table based on multiple columns

I'm working on an shopping website. User selects multiple filters on and sends the request to backend which is in node.js and using postgres as DB.
So I want to search the required data in a single query.
I have a json object containing all the filters that user selected. I want to use them in postgres query and return to user the obtained results.
I have a postgres Table that contains a few products.
name Category Price
------------------------------
LOTR Books 50
Harry Potter Books 30
Iphone13 Mobile 1000
SJ8 Cameras 200
I want to filter the table using n number of filters in a single query.
I have to make it work for multiple filters such as the ones mentioned below. So I don't have to write multiple queries for different filters.
{ category: 'Books', price: '50' }
{ category: 'Books' }
{category : ['Books', 'Mobiles']}
I can query the table using
SELECT * FROM products WHERE category='Books' AND 'price'='100'
SELECT * FROM products WHERE category='Books'
SELECT * FROM products WHERE category='Books' OR category='Mobiles'
respectively.
But I want to write my query in such a way that it populates the Keys and Values dynamically. So I may not have to write separate query for every filter.
I have obtained the key and value pairs from the request.query and saved them
const params = req.query;
const keys: string = Object.keys(params).join(",")
const values: string[] = Object.values(params)
const indices = Object.keys(params).map((obj, i) => {
return "$" + (i + 1)
})
But I'm unable to pass them in the query in a correct manner.
Does anybody have a suggestion for me? I'd highly appreciate any help.
Thank you in advance.
This is not the way you filter data from a SQL database table.
You need to use the NodeJS pg driver to connect to the database, then write a SQL query. I recommend prepared statements.
A query would look like:
SELECT * FROM my_table WHERE price < ...
At least based on your question, to me, it is unclear why would want to do these manipulations in JavaScript, nor what you want to be accomplished really.

Laravel, query to show big data is slow

I have a page for user searching data from specific date range and it will show result in datatable e.g (ID, Audit type, user, new value, old value etc.) and it was from 2 table relationship.
Here is my query:
$audits = \OwenIt\Auditing\Models\Audit::with('user')
->orderBy('updated_at', 'DESC')
->where('created_at', '>=', $date1)
->where('created_at', '<=', $date2)
->get;
The problem is if amount of data is big, the process so slow. How to optimize the query?
I've tried to use paginate(10) or take(10), but it only show 10 data not all data.
To enhance performance at database level, create an index on the column which appears in where constraints.
So create an index on the created_at column.
Also to compare dates, why not use the whereDate rather than comparing string literals for dates
$audits = \OwenIt\Auditing\Models\Audit::with('user')
->orderBy('updated_at', 'DESC')
->whereDate('created_at','>=',$date1)
->whereDate('created_at','<=',$date2)
->paginate(25);
Once the query is getting paginated records, the view can then provide the pagination links for the visitors/users to loop through the paginated result sets
{{ $audit->links() }}
The links() will insert pagination link buttons on the view, which users/visitors can click to shuffle through the result sets.
Laravel docs: https://laravel.com/docs/8.x/pagination#displaying-pagination-results
For large datasets/records in database tables, its always wise to query paginated result sets - to reduce memory usage.

Results DataSet from DynamoDB Query using GSI is not returning correct results

I have a dynamo DB table where I am currently storing all the events that are happening in my system with respect to every product. There is a primary key with a Hash combination of productid,eventtype and eventcategory and Sort Key as Creation Time on the main table. The table was created and data was added into it.
Later I added a new GSI on the table with the attributes being Secondary Hash (which is just the combination of eventcategory and eventtype (excluding productid) and CreationTime as Sort Key. This was added so that I can query for multiple products at once.
The GSI seems to work fine, However only later I realized the data being returned is incorrect
Here is the scenario. (I am running all these queries against the newly created index)
I was querying for products with in the last 30 days and the Query returns 312 records, However, when I run the same query for last 90 days, it returns me only 128 records (which is wrong, should be atleast equal or greater than last 30 days records)
I have the pagination logic already embedded in my code, so that the lastEvaluatedKey is verified every-time, to loop and fetch the next set of records and after the loop, all the results are combined.
Not sure if I am missing something.
ANy suggestions would be appreciated.
var limitPtr *int64
if limit > 0 {
limit64 := int64(limit)
limitPtr = &limit64
}
input := dynamodb.QueryInput{
ExpressionAttributeNames: map[string]*string{
"#sch": aws.String("SecondaryHash"),
"#pkr": aws.String("CreationTime"),
},
ExpressionAttributeValues: map[string]*dynamodb.AttributeValue{
":sch": {
S: aws.String(eventHash),
},
":pkr1": {
N: aws.String(strconv.FormatInt(startTime, 10)),
},
":pkr2": {
N: aws.String(strconv.FormatInt(endTime, 10)),
},
},
KeyConditionExpression: aws.String("#sch = :sch AND #pkr BETWEEN :pkr1 AND :pkr2"),
ScanIndexForward: &scanForward,
Limit: limitPtr,
TableName: aws.String(ddbTableName),
IndexName: aws.String(ddbIndexName),
}
You reached the maximum number of items to evaluate (not necessarily the number of matching items). The limit is 1 MB.
The response will contain a LastEvaluatedKey parameter, it is the last item's id. You have to perform a new query with an extra ExclusiveStartKey parameter. (ExclusiveStartKey should be equal with LastEvaluatedKey's value.)
When the LastEvaluatedKey is empty you reached the end of the table.

How to remove collection from model [duplicate]

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
This does not seem to be very efficient. Is there a better way?
EDIT
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:
MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
Needless to say, that the foreach-found-remove approach works and yields the expected results.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
You can remove it directly using MongoDB scripting language:
db.mycoll.remove({_id:'your_id_here'});
Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
I would recommend paging if large number of records.
First: Get the count of data you want to delete:
-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
{$match:query},
{$count: "all"}
])
Second: Start deleting chunk by chunk:
-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
{$match:query},
{ $limit : 5 }
])
cursor.forEach(function (doc){
db.COL.remove({"_id": doc._id});
});
and this should be faster:
var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});
Run this query in cmd
db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});
If you are using node.js write this code
User.remove({ _id: req.body.id },, function(err){...});

SQL select Count with where very slow

I use a count() function many times in my program:
var housewith2floor = from qry in houses
where qry.floor == 2
select qry;
var counthousewith2floor = housewith2floor.count();
var housecolorwhite = from qry in house
where qry.color == "white"
select qry;
var countwhotehouse = housecolorwhite.count();
Each count method takes a lot of time to be executed. The database has 2 millions rows of data. I already put an unclustered index for floor column and color column, but the count still takes too long. Is there any other way I can make my count run much faster?
It's not so much the count that takes time. The initial statement won't actually execute until it needs to (called deferred execution). So it's the query generated by
var housewith2floor = from qry in houses
where qry.floor == 2
select qry;
that takes the time.
Edit to remove statment about indexes as I see you have already created them.
Are there any tables which reference or are referenced by "houses", and do they use Lazy Loading?