Instead of doing an each loop on a JSON file containing a list of SQL statments and passing them one at a time, is it possible with Safari client side storage to simply wrap the data in "BEGIN TRANSACTION" / "COMMIT TRANSACTION" and pass that to the database system in a single call? Looping 1,000+ statements takes too much time.
Currently iterating one transaction at a time:
$j.getJSON("update1.json",
function(data){
$j.each(data, function(i,item){
testDB.transaction(
function (transaction) {
transaction.executeSql(data[i], [], nullDataHandler, errorHandler);
}
);
});
});
Trying to figure out how to make just one call:
$j.getJSON("update1.json",
function(data){
testDB.transaction(
function (transaction) {
transaction.executeSql(data, [], nullDataHandler, errorHandler);
}
);
});
Has anybody tried this yet and succeeded?
Every example I could find in the documentation seems to show only one SQL statement per executeSql command. I would just suggest showing an "ajax spinner" loading graphic and execute your SQL in a loop. You can keep it all within the transaction, but the loop would still need to be there:
$j.getJSON("update1.json",
function(data){
testDB.transaction(
function (transaction) {
for(var i = 0; i < data.length; i++){
transaction.executeSql(data[i], [], nullDataHandler, errorHandler);
}
}
);
}
);
Moving the loop inside the transaction and using the for i = should help get a little more speed out of your loop. $.each is good for less than a 1000 iterations, after that the native for(var = i... will probably be faster.
Note Using my code, if any of your SQL statements throw errors, the entire transaction will fail. If that is not your intention, you will need to keep the loop outside the transaction.
I haven't ever messed with HTML5 database storage (have with local/sessionStorage though) and I would assume that it's possible to run one huge string of statements. Use data.join(separator here) to get the string representation of the data array.
Yes, it is possible to process a whole group of statements within a single transaction with webSQL. You actually don't even need to use BEGIN or COMMIT, this is taken care of for you automatically as long as you make all your executeSql calls from the same transaction. As long as you do this every statement gets included within the transaction.
This makes the process much faster and also makes it so that when one of your statements has an error it rolls back the entire transaction.
Related
I use spring boot with spring data jpa, hibernate and oracle.
Actually, I my table I have arount 10 millions of record, I need to do some operation, write info to a file and after delete the record.
It's a basic sql query
select * from zzz where status = 2;
I done a test without doing operation and delete record
long start = System.nanoTime();
int page = 0;
Pageable pageable = PageRequest.of(page, LIMIT);
Page<Billing> pageBilling = billingRepository.findAllByStatus(pageable);
while (true) {
for (Billing: pageBilling .getContent()) {
//process
//write to file
//delete element
}
if (!pageBilling .hasNext()) {
break;
}
pageable = pageBilling .nextPageable();
pageBilling = billingRepository.findAllByStatus(pageable);
}
long end = System.nanoTime();
long microseconds = (end - start) / 1000;
System.out.println(microseconds + " to write");
Result it's bad, with a limit of 10 000, that took 157 minutes, with 100 000 28 minutes, with millions 19 minutes.
It's there a better solution to increase performance?
The following are likely to improve the performance significantly:
You should not iterate past the first page. Instead, delete the processed data and select the first page again. Actually you don't need a page for that you can encode the limit in the method name. Selecting late pages is rather inefficient.
The process of loading, processing and deleting one batch of items should be in a separate transaction. Otherwise the EntityManager will hold all the entities ever loaded which will make things really slow.
If that still isn't sufficient yet you may look into the following:
Inspect the SQL executed. Does it look sensible? If not consider switching to JdbcTemplate or NamedParameterJdbcTemplate with a query method that takes a RowCallbackHandler you should be able to load and process all rows with a single select statement and at the end to process one delete statement to remove all rows. This requires that the status that you use for filtering does not change in the mean time.
How do the execution plans look like? If they seem of inspect your indices.
I wonder if it's a good/bad idea to use deepstream record.getList for storing a lot of unique values, for example, emails or any other unique identifiers. The main purpose is to be able to answer a question quickly whether we already have, say, a user with such email (email in use) or another record by specific unique field.
I made few experiments today and got two problems:
1) when I tried to populate the list with few thousands values I got
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
and my deepstream server went off. I was able to fix it by adding more memory to the server node process with this flag
--max-old-space-size=5120
it doesn't look fine but allowed me to make a list with more than 5000 items.
2) It wasn't enough for my tests so I precreated the list with 50000 items and put the data directly to rethinkdb table and got another issue on getting the list or modifing it:
RangeError: Maximum call stack size exceeded
I was able to fix it with another flag:
--stack-size=20000
It helps but I believe it's only matter of time when one of those errors appear in production when the list size reaches proper value. I don't know really whether it's nodejs, javascript, deepstream or rethinkdb issue. That's all in general made me think that I try to use deepstream List wrong way. Please, let me know. Thank you in advance!
Whilst you can use lists to store arrays of strings, they are actually intended as collections of recordnames - the actual data would be stored in the record itself, the list would only manage the order of the records.
Having said that, there are two open Github issues to improve performance for very long lists by sending more efficient deltas and by introducing a pagination option
Interesting results in regards to memory though, definitely something that needs to be handled more gracefully. In the meantime you could drastically improve performance by combining updates into one:
var myList = ds.record.getList( 'super-long-list' );
// Sends 10.000 messages
for( var i = 0; i < 10000; i++ ) {
myList.addEntry( 'something-' + i );
}
// Sends 1 message
var entries = [];
for( var i = 0; i < 10000; i++ ) {
entries.push( 'something-' + i );
}
myList.setEntries( entries );
developers, I need again your help.
I work with a very big MS SQL database, records in this db more than 1kk.
So, when i try to make query, i get error with limiting binding of 2100 parameters. I try to make also query with DB::raw, but it doesn't help.
I not want to make a pure SQL query. Any solutions?
$product_codes = array_slice($product_codes, 0, 1999); // in original $product_codes about 4k codes.
$wheels = Wheel::select(array('Wheel_Size'))->whereIn('Product_code', $product_codes)->lists('Wheel_Size');
Write the following code inside vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php in the WhereIn method after the check has been made to see whether $values is an instance of Closure (Line number 672 in my project)
if(count($values)>800)
{
return $this->whereRaw($column. ' IN ('.implode(',', $values).')');
}
I am using Postgres as main DB and REDIS for caching. I am working on caching mechanism for one db query which takes to much time (It's about 5-6 JOINS + nested SELECTS). For now I am caching results of this query using SET 'some key' JSON.stringify(query.result). This works fine, however I have one column that cannot be cached - it is called commentsCount. It has to be always up to date. As a temporary solution, I am querying db just for this one particular field like this:
app.get('/post/getBySlug/:slug',function(req,res,next){
var cacheKey = req.params.slug+'|'+req.params.language; // "my-post-slug|en-us" for example
cache.get(cacheKey, function(err, post){
throw err if err;
if(post) {
db.getPostCommentsCount({ where: { id: post.id }}).done(function(err,commentsCount){
throw err if err;
post.commentsCount = commentsCount;
res.json(post);
next()
})
} else {
db.getFullPostBySlug(req.params.slug, req.params.language).done(function(err, post){
throw err if err;
cache.set(cacheKey, post);
res.json(post);
next();
})
}
})
})
But it is still now what I want, because main DB is still queried. Is there any standard/good practise on storing counters in REDIS? My comment insert function looks like this:
START TRANSACTION
INSERT INTO "Comments" VALUES (...) // insert comments
UPDATE "Posts" SET "commentsCount" = "commentsCount" + 1 WHERE "Posts"."id" = 123456 // update counter on post
COMMIT TRANSACTION
I am using transaction because I dont want comment to be inserted without incrementing comments count. As a "side" question - is it better to make 2 sql queries in transaction or write a trigger to handle incrementing counter??
According to my query (I posted link to gist in comments):
We dont plan more than 2 languages (though it is possible)
I made those counters because I have to keep counters separate per language, be able to order by those separate counters and also be able to order by sum of the counters (total for all languages) - I found it hard to make query that would order by sum of columns from separate rows while still returning those rows... (At the begining counters were stored in language translations).
Generally this query looks for post where exists translation with specific 'slug' and 'language' (slug+language on post translation is unique index). Morover post has to be published (isPublished = boolean) and post.status has to be 'published' (status = enum) or post.iscomingSoon has to be true (isComingSoon = boolean). Do you have idea what index/ordering I could add to this query? Or should I just remove limit?
In every translation table I keep language as TEXT. It can be for example en-us or zh-cn etc. Do you think I should make it enum or maybe I should make another table to store languages and just keep language_id in translations?
Author actually can be null :)
I have a package which looks as follows:
Notice the two areas which I have marked with a red rectangle: they are identical in every way. Can I make changes to the package so I can avoid this duplication? It seems to me I cannot move them to a Data Flow Task since loops and File System Task do not exists there.
You can create a sub package for the Loop logic as #Bill mentioned, and the way to 'pass the resultset of a query on to another package' is as below (use SSIS 2012 as the example, I did the similar work SSIS 2005, so you only need to change the c# code to vb.net)
In your parent package, create a variable to hold the name of the resultSet variable in the parent package:
In your sub package, create a string variable parentResultSetName:
In your sub package, add a package configuration to mapping the parentResultSetName to the parent package variable resultSetVariableName:
Now, we can read the resultSet variable by the name in the script task of sub package:
public void Main()
{
// TODO: Add your code here
var dsName = Dts.Variables["parentResultSetName"].Value.ToString();
Variables variables = null;
DataSet resultSet = null;
Dts.VariableDispenser.LockForRead(dsName);
Dts.VariableDispenser.GetVariables(ref variables);
try
{
resultSet = variables[dsName].Value as DataSet;
if (resultSet != null)
{
MessageBox.Show("Sub package get: " + resultSet.Tables[0].Rows[0][0].ToString());
}
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception e)
{
Dts.Events.FireError(-1, "", e.Message, "", 0);
}
}
Here is the result:
Just place both your queries from "select batch logins" tasks to recordset and make another foreach loop over that recordset, executing those two queries from a variable.
I can see that in your second foreach loop there are some additional tasks, so you'll have incorporate them to "Select all batch logins" task somehow, or make constraints that'd match in both loops.
An alternative approach is to add an 'action' column to your 'batches' table. (or add an outrigger table). Work out beforehand what you want to do to the records. Then just delete the records and files once.
It looks like you are doing RBAR operations here.
So for example you run a couple of UPDATE statements against your tables that leave the tables in a state where each record is flagged for deletion or not.
Then after that you go through the loop and delete based off what is in the table.
It would make your package a lot simpler. and you can incorporate some 'commit' logic that makes sure a record and file are always deleted at the same time.