How to ignore DUPLICATE ENTRY error when updating multiple records at once using TypeORM - sql

I am trying to update hundreds of database records using the TypeORM library. Problem is that sometimes DUPLICATE ERR is returned from SQL when the bulk upload is performed and stops the whole operation. Is possible to set up TypeORM in a way so duplicate entries are ignored and the insert is performed?
The table is using two primary keys:
This is my insert command (TypeORM + Nestjs):
public async saveBulk(historicalPrices: IHistoricalPrice[]) {
if (!historicalPrices.length) {
return;
}
const repoPrices = historicalPrices.map((p) => this.historicalPricesRepository.create(p));
await this.historicalPricesRepository.save(repoPrices, { chunk: 200 });
}
Thanks in advance

You will have to use InsertQueryBuilder to save the entities instead of repository.save method. InsertQueryBuilder will allow you to call an additional method orIgnore() which will add IGNORE literal into your mysql INSERT statement. From mysql official doc:
When INSERT IGNORE is used, the insert operation fails silently for rows containing the unmatched value, but inserts rows that are matched.
One demerit is obviously that you'll have to now chunk the rows on your own. InsertQueryBuilder doesn't provide any options to chunk the entities. Your code should look like this:
for (let i = 0; i < historicalPrices.length; i += 200) {
const chunk = historicalPrices.slice(i, i + 200);
const targetEntity = this.historicalPricesRepository.target;
await this.historicalPricesRepository
.createQueryBuilder()
.insert()
.into(targetEntity)
.values(chunk)
.orIgnore()
.execute();
}

Related

How can I validate if a record can be added at the SQL level using Entity Framework Core

If I want to make some checks before inserting a row into the database, I know that I can run the following code:
public bool BookSeat(int userId, string seatNumber)
{
If (IsSeatStillAvailable(seatNumber))
{
var ticket = new Ticket(userId, seatNumber);
_dbContext.Tickets(ticket);
_dbContext.SaveChanges();
return true;
}
return false;
}
private bool IsSeatStillAvailable(seatNumber)
{
var seatTaken = _dbcontext.Tickets.Any(w => w.seatNumber == seatNumber);
return !seatTaken;
}
This will do a call to the database to see if the seat is taken and then do a second call to book the seat. BUT in that time it might have already been booked.
I know in simple examples, I can create an index, but my use case is much more complex.
Is there a way that I can generate the SQL statement on the insert so that it can produce an ATOMIC transaction?
To produce something like (excuse the clumsy SQL):
IF (SELECT TOP 1 Count(*) FROM Tickets = 0)
BEGIN
INSERT INTO Tickets (UserId, SeatNumber)
VALUES (#UserId, #SeatNumber);
RETURN true
END;
RETURN false
What you are looking for is concurrency handling and optimistic locking :
https://learn.microsoft.com/en-us/ef/core/saving/concurrency?tabs=data-annotations

API returns array of JSON for only one result

Currently working on an API which given an address returns information about is. Some of the rows in our tables are duplicates, however being as there is over 15 million I cant go and find the duplicates. Instead I have opted to use
var query = `SELECT TOP 1 * from my_TABLE where..conditions`;
This ensures that only one row of the duplicates are returned.
The problem is when this is sent back as a JSON it comes as an array with one object.
In the Server.js file
// create Request object
var request = new sql.Request();
// query to the database
request.query(query, function (err, result) {
if (err) {
console.log("Error while querying database :- " + err);
res.send(err);
}
else {
res.send(result)
}
Returns this:
[{
Address:'our info'
}]
is there a way to have it respond with
{
Address:'our info'
}
Because from DB you've get list of object anyway, even there is only 1 item.
It works as you expected when you try to return json with the first element of your array.

Sqlite3 - node js insert multiple rows in two tables, lastID not working

I know there are many solutions provided regarding multiple insertion in sqlite3 but I am looking for efficient method in which data is getting inserted into two tables and data of second table is dependent on first table. This is a node js application.
I have two tables programs and tests in sqlite3. Table tests contains id of programs i.e. one program can contains multiple tests.
The suggested method on official page of sqlite3 module is as follows:
var sqlite3 = require('sqlite3').verbose();
var db = new sqlite3.Database(':memory:');
db.serialize(function() {
db.run("CREATE TABLE lorem (info TEXT)");
var stmt = db.prepare("INSERT INTO lorem VALUES (?)");
for (var i = 0; i < 10; i++) {
stmt.run("Ipsum " + i);
}
stmt.finalize();
db.each("SELECT rowid AS id, info FROM lorem", function(err, row) {
console.log(row.id + ": " + row.info);
});
});
db.close();
As In my requirement I have to insert data in tow tables so I am using the following code:
var programData = resp.program_info; // contains complete data of programs and tests
db1.run("INSERT INTO programs (`parent_prog_id`, `prog_name`, `prog_exercises`, `prog_orgid`, `prog_createdby`, `prog_created`, `prog_modified`, `prog_status`) VALUES (?,?,?,?,?,?,?,?)",prog.parent_program_id,prog.programName, JSON.stringify(prog), req.session.org_id, prog.created_by, prog.created_at, prog.updated_at, prog.program_status,function(err){
if(err){
throw err;
}else{
var count = 1;
var step2PostedData = prog;
for (i in step2PostedData.testsInfo) {
var c = 0;
for(j in step2PostedData.testsInfo[i]){
var obj = Object.keys(step2PostedData.testsInfo[i])[c];
db1.prepare("INSERT INTO `tests` ( `parent_name`,`test_name`, `test_alias`, `sequences`, `duration`, `prog_id`, `org_id`, `test_createdby`, `test_created`, `test_modified`, `test_status`) VALUES(?,?,?,?,?,?,?,?,?,?,?)")
.run(count, obj,
step2PostedData.testsInfo[i][j].alias,
step2PostedData.testsInfo[i][j].sequences,
step2PostedData.testsInfo[i][j].duration,
this.lastID, // using this I am getting program id
req.session.org_id,
prog.created_by,
prog.created_at,
prog.updated_at,
prog.program_status);
c++;
count++;
}
}
Now, my query is If I am using the suggested method then I am not getting last inserted program id from programs table. without callback.
e.g. If I use the following code;
var stmt = db1.prepare("INSERT INTO programs (`parent_prog_id`, `prog_name`, `prog_exercises`, `prog_orgid`, `prog_createdby`, `prog_created`, `prog_modified`, `prog_status`) VALUES (?,?,?,?,?,?,?,?)";
var stmt2 = db1.prepare("INSERT INTO `tests` ( `parent_name`,`test_name`, `test_alias`, `sequences`, `duration`, `prog_id`, `org_id`, `test_createdby`, `test_created`, `test_modified`, `test_status`) VALUES(?,?,?,?,?,?,?,?,?,?,?)")
programData.forEach(function(prog){
// Inserts data in programs table
stmt.run(prog.parent_program_id,prog.programName, JSON.stringify(prog), req.session.org_id, prog.created_by, prog.created_at, prog.updated_at, prog.program_status);
for (i in step2PostedData.testsInfo) {
var c = 0;
for(j in step2PostedData.testsInfo[i]){
var obj = Object.keys(step2PostedData.testsInfo[i])[c];
stmt2.run(
count,
obj,
step2PostedData.testsInfo[i][j].alias,
step2PostedData.testsInfo[i][j].sequences,
step2PostedData.testsInfo[i][j].duration,
'what should be there',// How to get last program inserted ID there
req.session.org_id,
prog.created_by,
prog.created_at,
prog.updated_at,
prog.program_status);
} // inner for loop ends
} // outer for loop ends
stmt.finalize();
stmt2.finalize();
});
If I use this.lastID that returns null, obviously as no callback is
now.
If I use sqlite3_last_insert_rowid() then
sqlite3_last_insert_rowid is not defined error.
If I use last_insert_rowid() then last_insert_rowid() is not
defined.
Query:
How can I insert last inserted program id there, Currently I am getting last program id as null?
Edit:
If and only if using callback is the way or method to get the last ID then I will keep my code running as it is currently. Can anyone please suggest how can I increase the speed of insertion.
Thank you!
sqlite3_last_insert_rowid() is a part of the C API.
last_insert_rowid() is an SQL function.
If the documentation tells you that lastID is valid inside the callback, then you must use the callback.
Just move all the child INSERTs into the completion callback of the parent INSERT.
(The suggested code is not intended to show how to get the last inserted ID; it just demonstrates that the values actually have been inserted.)

Prevent Ember Data bulk commit for single item?

I need Ember Data to update one of my models in bulk. So I set
bulkCommit: true
in the DS.RESTAdapter. But now it even uses bulk commits for updates to a single record!
This is very unexpected behaviour.
So how do I modify Ember Data to only use bulk commits when more than 1 item is committed?
Here is what I've done now:
updateRecords: function(store, type, records) {
var arr;
arr = records.list;
if (arr.length === 1) {
return this.updateRecord(store, type, arr[0]);
} else {
return this._super(store, type, records);
}
}
This checks if records consists of a single item and calls updateRecord then.
createRecords and deleteRecords are changed accordingly.

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.