SQLDelight batch insert list - sqldelight

I'm having a problem inserting many rows with SQLDelight. I'm not sure if the query is deadlocked, or just taking many minutes. I'm inserting about 10k rows in two different tables.
Both have a .sq file like:
save:
INSERT OR REPLACE INTO TableA VALUES ?;
my insert:
val tableAQueries = database.tableAQueries
withContext(Dispatchers.IO) {
tableAQueries.transaction {
rows.forEach { tableAQueries.save(it) }
}
}

Related

How can I validate if a record can be added at the SQL level using Entity Framework Core

If I want to make some checks before inserting a row into the database, I know that I can run the following code:
public bool BookSeat(int userId, string seatNumber)
{
If (IsSeatStillAvailable(seatNumber))
{
var ticket = new Ticket(userId, seatNumber);
_dbContext.Tickets(ticket);
_dbContext.SaveChanges();
return true;
}
return false;
}
private bool IsSeatStillAvailable(seatNumber)
{
var seatTaken = _dbcontext.Tickets.Any(w => w.seatNumber == seatNumber);
return !seatTaken;
}
This will do a call to the database to see if the seat is taken and then do a second call to book the seat. BUT in that time it might have already been booked.
I know in simple examples, I can create an index, but my use case is much more complex.
Is there a way that I can generate the SQL statement on the insert so that it can produce an ATOMIC transaction?
To produce something like (excuse the clumsy SQL):
IF (SELECT TOP 1 Count(*) FROM Tickets = 0)
BEGIN
INSERT INTO Tickets (UserId, SeatNumber)
VALUES (#UserId, #SeatNumber);
RETURN true
END;
RETURN false
What you are looking for is concurrency handling and optimistic locking :
https://learn.microsoft.com/en-us/ef/core/saving/concurrency?tabs=data-annotations

How to ignore DUPLICATE ENTRY error when updating multiple records at once using TypeORM

I am trying to update hundreds of database records using the TypeORM library. Problem is that sometimes DUPLICATE ERR is returned from SQL when the bulk upload is performed and stops the whole operation. Is possible to set up TypeORM in a way so duplicate entries are ignored and the insert is performed?
The table is using two primary keys:
This is my insert command (TypeORM + Nestjs):
public async saveBulk(historicalPrices: IHistoricalPrice[]) {
if (!historicalPrices.length) {
return;
}
const repoPrices = historicalPrices.map((p) => this.historicalPricesRepository.create(p));
await this.historicalPricesRepository.save(repoPrices, { chunk: 200 });
}
Thanks in advance
You will have to use InsertQueryBuilder to save the entities instead of repository.save method. InsertQueryBuilder will allow you to call an additional method orIgnore() which will add IGNORE literal into your mysql INSERT statement. From mysql official doc:
When INSERT IGNORE is used, the insert operation fails silently for rows containing the unmatched value, but inserts rows that are matched.
One demerit is obviously that you'll have to now chunk the rows on your own. InsertQueryBuilder doesn't provide any options to chunk the entities. Your code should look like this:
for (let i = 0; i < historicalPrices.length; i += 200) {
const chunk = historicalPrices.slice(i, i + 200);
const targetEntity = this.historicalPricesRepository.target;
await this.historicalPricesRepository
.createQueryBuilder()
.insert()
.into(targetEntity)
.values(chunk)
.orIgnore()
.execute();
}

BigQuery: Split table based on a column

Short question: I would like to split a BQ table into multiple small tables, based on the distinct values of a column. So, if column country has 10 distinct values, it should split the table into 10 individual tables, with each having respective country data. Best, if done from within a BQ query (using INSERT, MERGE, etc.).
What I am doing right now is importing data to gstorage -> local storage -> doing splits locally and then pushing into tables (which is kind of a very time consuming process).
Thanks.
If the data has the same schema, just leave it in one table and use the clustering feature: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#creating_a_clustered_table
#standardSQL
CREATE TABLE mydataset.myclusteredtable
PARTITION BY dateCol
CLUSTER BY country
OPTIONS (
description="a table clustered by country"
) AS (
SELECT ....
)
https://cloud.google.com/bigquery/docs/clustered-tables
The feature is in beta though.
You can use Dataflow for this. This answer gives an example of a pipeline that queries a BigQuery table, splits the rows based on a column and then outputs them to different PubSub topics (which could be different BigQuery tables instead):
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());
PCollection<TableRow> weatherData = p.apply(
BigQueryIO.Read.named("ReadWeatherStations").from("clouddataflow-readonly:samples.weather_stations"));
final TupleTag<String> readings2010 = new TupleTag<String>() {
};
final TupleTag<String> readings2000plus = new TupleTag<String>() {
};
final TupleTag<String> readingsOld = new TupleTag<String>() {
};
PCollectionTuple collectionTuple = weatherData.apply(ParDo.named("tablerow2string")
.withOutputTags(readings2010, TupleTagList.of(readings2000plus).and(readingsOld))
.of(new DoFn<TableRow, String>() {
#Override
public void processElement(DoFn<TableRow, String>.ProcessContext c) throws Exception {
if (c.element().getF().get(2).getV().equals("2010")) {
c.output(c.element().toString());
} else if (Integer.parseInt(c.element().getF().get(2).getV().toString()) > 2000) {
c.sideOutput(readings2000plus, c.element().toString());
} else {
c.sideOutput(readingsOld, c.element().toString());
}
}
}));
collectionTuple.get(readings2010)
.apply(PubsubIO.Write.named("WriteToPubsub1").topic("projects/fh-dataflow/topics/bq2pubsub-topic1"));
collectionTuple.get(readings2000plus)
.apply(PubsubIO.Write.named("WriteToPubsub2").topic("projects/fh-dataflow/topics/bq2pubsub-topic2"));
collectionTuple.get(readingsOld)
.apply(PubsubIO.Write.named("WriteToPubsub3").topic("projects/fh-dataflow/topics/bq2pubsub-topic3"));
p.run();

Bigquery: getNumRows() returns null after tables.get?

// Retrieve the specified table resource
public Table getTable(String tableId) {
Tables tableRequest = sBIGQUERY.tables();
Table table = null;
try {
table = tableRequest.get(mProjectId, mDataset, tableId).execute();
} catch (IOException e) {
logger.error(e);
}
return table;
}
after this, in my main function:
Table info = mBigquery.getTable(tableId);
logger.info(tableId + "#" + info.getCreationTime() + "#" + info.getLastModifiedTime()+"#"+info.getNumBytes()+"#"+info.getNumRows());
sometimes info.getNumBytes() and info.getNumRows() return null but info.getCreationTime() and info.getLastModifiedTime() are just fine. Why is that?
BigQuery will not return the number of rows in a table if the number of rows is not known ahead of time. The primary way this happens is with tables that have been written to recently via streaming operations (tabledata.insertall()). It can also happen with certain types of tables that are links to Google internal data (this is rare), or with table views, where the number of rows is not computed when the underlying tables change.

Prevent Ember Data bulk commit for single item?

I need Ember Data to update one of my models in bulk. So I set
bulkCommit: true
in the DS.RESTAdapter. But now it even uses bulk commits for updates to a single record!
This is very unexpected behaviour.
So how do I modify Ember Data to only use bulk commits when more than 1 item is committed?
Here is what I've done now:
updateRecords: function(store, type, records) {
var arr;
arr = records.list;
if (arr.length === 1) {
return this.updateRecord(store, type, arr[0]);
} else {
return this._super(store, type, records);
}
}
This checks if records consists of a single item and calls updateRecord then.
createRecords and deleteRecords are changed accordingly.