JRuby and HSQLDB: randomly missing rows - sql

I have the following codes:
# my db connection is here:
# #connection = DriverManager.getConnection("jdbc:hsqldb:file:../db/rdata", "user", "user")
#the method recieves a array of hash {:rev,:time,:path,:result}
def create(list)
st=#connection.createStatement()
sql=""
list.each do |i|
sql.concat("INSERT INTO RESULTS_LOGIN (REVISION, TIMESTAMP, ARCHIVEPATH, RESULTS) VALUES (#{i[:rev]},'#{i[:time]}','#{i[:path]}','#{i[:result]}');\n")
end
p sql.lines.to_a.size
st.execute(sql)
st.close
end
Problem:
The sql.lines.to_a.size is 128, which means I have 128 rows to be inserted. However, after rerunning for many times, I found that everytime I got different rows in the db. For example, this time the RESULT_LOGIN table has 114 rows, and next time it was 99.... There are also successful runs, which made 128 rows in the table.
I can't understand why such thing happens. No errors, no exceptions, and what's strange is that in my code there is a method right after the "create" method, whcih executes "select * from RESULTS_LOGIN" and print it, like this:
create(list) # insert rows
read_all # print all rows
Everytime the "read_all" right after "create" prints out 128 rows and the correct data; However, if I run read_all seperately after closing and recreate the connection, or browse the db using HSQLDB's GUI management tool, there will be some rows missing....

For tests such as this, you need to add an extra setting either to URL or as an executed SQL statement. This is to ensure all unwritten data is persisted at the end of the test. See my answer to this question:
something funny with embedded hsql

Related

GtkTreeView stops updating unless I change the focus of the window

I have a GtkTreeView object that uses a GtkListStore model that is constantly being updated as follows:
Get new transaction
Feed data into numpy array
Convert numbers to formatted strings, store in pandas dataframe
Add updated token info to GtkListStore via GtkListStore.set(titer, liststore_cols, liststore_data), where liststore_data is the updated info, liststore_cols is the name of the columns (both are lists).
Here's the function that updates the ListStore:
# update ListStore
titer = ls_full.get_iter(row)
liststore_data = []
[liststore_data.append(df.at[row, col])
for col in my_vars['ls_full'][3:]]
# check for NaN value, add a (space) placeholder is necessary
for i in range(3, len(liststore_data)):
if liststore_data[i] != liststore_data[i]:
liststore_data[i] = " "
liststore_cols = []
[liststore_cols.append(my_vars['ls_full'].index(col) + 1)
for col in my_vars['ls_full'][3:]]
ls_full.set(titer, liststore_cols, liststore_data)
Class that gets the messages from the websocket:
class MyWebsocketClient(cbpro.WebsocketClient):
# class exceptions to WebsocketClient
def on_open(self):
# sets up ticker Symbol, subscriptions for socket feed
self.url = "wss://ws-feed.pro.coinbase.com/"
self.channels = ['ticker']
self.products = list(cbp_symbols.keys())
def on_message(self, msg):
# gets latest message from socket, sends off to be processed
if "best_ask" and "time" in msg:
# checks to see if token price has changed before updating
update_needed = parse_data(msg)
if update_needed:
update_ListStore(msg)
else:
print(f'Bad message: {msg}')
When the program first starts, the updates are consistent. Each time a new transaction comes in, the screen reflects it, updating the proper token. However, after a random amount of time - seen it anywhere from 5 minutes to over an hour - the screen will stop updating, unless I change the focus of the window (either activate or inactive). This does not last long, though (only enough to update the screen once). No other errors are being reported, memory usage is not spiking (constant at 140 MB).
How can I troubleshoot this? I'm not even sure where to begin. The data back-ends seem to be OK (data is never corrupted nor lags behind).
As you've said in the comments that it is running in a separate thread then i'd suggest wrapping your "update liststore" function with GLib.idle_add.
from gi.repository import GLib
GLib.idle_add(update_liststore)
I've had similar issues in the past and this fixed things. Sometimes updating liststore is fine, sometimes it will randomly spew errors.
Basically only one thread should update the GUI at a time. So by wrapping in GLib.idle_add() you make sure your background thread does not intefer with the main thread updating the GUI.

Snowflake COPY INTO from JSON - ON_ERROR = CONTINUE - Weird Issue

I am trying to load JSON file from Staging area (S3) into Stage table using COPY INTO command.
Table:
create or replace TABLE stage_tableA (
RAW_JSON VARIANT NOT NULL
);
Copy Command:
copy into stage_tableA from #stgS3/filename_45.gz file_format = (format_name = 'file_json')
Got the below error when executing the above (sample provided)
SQL Error [100069] [22P02]: Error parsing JSON: document is too large, max size 16777216 bytes If you would like to continue loading
when an error is encountered, use other values such as 'SKIP_FILE' or
'CONTINUE' for the ON_ERROR option. For more information on loading
options, please run 'info loading_data' in a SQL client.
When I had put "ON_ERROR=CONTINUE" , records got partially loaded, i.e until the record with more than max size. But no records after the Error record was loaded.
Was "ON_ERROR=CONTINUE" supposed to skip only the record that has max size and load records before and after it ?
Yes, the ON_ERROR=CONTINUE skips the offending line and continues to load the rest of the file.
To help us provide more insight, can you answer the following:
How many records are in your file?
How many got loaded?
At what line was the error first encountered?
You can find this information using the COPY_HISTORY() table function
Try setting the option strip_outer_array = true for file format and attempt the loading again.
The considerations for loading large size semi-structured data are documented in the below article:
https://docs.snowflake.com/en/user-guide/semistructured-considerations.html
I partially agree with Chris. The ON_ERROR=CONTINUE option only helps if the there are in fact more than 1 JSON objects in the file. If it's 1 massive object then you would simply not get an error or the record loaded when using ON_ERROR=CONTINUE.
If you know your JSON payload is smaller than 16mb then definitely try the strip_outer_array = true. Also, if your JSON has a lot of nulls ("NULL") as values use the STRIP_NULL_VALUES = TRUE as this will slim your payload as well. Hope that helps.

synchronization between 2 applications pooling a SQL table

I have 2 instances of a VB.NET application each running on their own dedicated servers. The said application runs a While true loop with a 5s sleep on IDLE (IDLE is when the Table doesn't have any ProcessQuery to be treated). On each iteration, the application questions a table in the SQL Database to know if there is anything it could process.
The problem is that i sometimes encounter the problem where both of the instances are "taking" the same ProcessQuery.
I'm using EntityFramework6. I have looked into EntityState but i don't think it does exactly what i'm trying to accomplish.
I was wondering what would be my solution to have perfect parallel instances. It's not impossible at some point i have 12 instances running on 12 machines.
Thanks!
Dim conn As New Info_IndusEntities()
Dim DemandeWilma As WilmaDemandes = conn.WilmaDemandes.Where(Function(x) x.Site = 'LONDON' AndAlso x.Statut = 'toProcess').OrderBy(Function(x) x.RequestDate).FirstOrDefault
If Not IsNothing(DemandeWilma) Then
DemandeWilma.Statut = Statuts.EnTraitement.ToString
DemandeWilma.ServerName = Environment.MachineName
DemandeWilma.ProcessDate = DateTime.Now
conn.SaveChanges()
Return DemandeWilma
end if
UPDATE (21/06/19)
I found an article that I find interesting.
I started by adding a column to my Table :
UPDATED (21/06/19)
I then refreshed my model and changed the Concurrency Check property of RowVersion column in my ORM :
When I tested the update, here's the log of EF6 :
UPDATE [dbo].[WilmaDemandes] SET [Statut] = #0, [ServerName] = #1,
[DateDebut] = #2 WHERE (([ID] = #3) AND ([RowVersion] = #4)) SELECT
[RowVersion] FROM [dbo].[WilmaDemandes] WHERE ##ROWCOUNT > 0 AND [ID]
= #3
-- #0: 'EnTraitement' (Type = String, Size = 20)
-- #1: 'TRB5995' (Type = String, Size = 20)
-- #2: '2019-06-25 7:31:01 AM' (Type = DateTime2)
-- #3: '124373' (Type = Int32)
-- #4: 'System.Byte[]' (Type = Binary, Size = 8)
-- Executing at 2019-06-25 7:31:24 AM -04:00
-- Completed in 95 ms with result: SqlDataReader
Closed connection at 2019-06-25 7:31:24 AM -04:00
Exception thrown:
'System.Data.Entity.Infrastructure.DbUpdateConcurrencyException' in
EntityFramework.dll
UPDATED (25/06/19)
The problems, as explained in this post, starts when you are using DB-First instead of Code-First. Your property will get overwritten silently as soon as you update the model. Some people back then coded a console app workaround that they run on pre-build. I'm not sure i'm quite ready to take this solution as final solution.
Interesting tutorial on how to test optimistic concurrency and ways to resolve such an exception.
Add an "owner" column to your queue table
Your application updates one record (TOP 1) and sets the owner value to their identifier (WHERE Owner IS NULL)
Now your application goes back and reads their owned rows and processes them
It's a simple pattern and it works great. If any processes happen to take ownership 'simultaneously', only one will actually get the reservation.
I'm not very good at LINQ so here's a brute force method, multiline for clarity:
// First try reserving a row
conn.Database.ExecuteSqlCommand(
"WITH UpdateTop1 AS
(SELECT TOP 1 * FROM WilmaDemandes
WHERE Owner IS NULL
AND Site = 'LONDON'
ORDER BY RequestDate)
UPDATE UpdateTop1 SET Owner='ThisApplication'"
);
// See if we got one
Dim DemandeWilma As WilmaDemandes =
conn.WilmaDemandes.
Where(x => x.Owner=='ThisApplication').FirstOrDefault
// If we got a row, process it. Otherwise Idle and repeat
There's also no reason that you must reserve one row. You could reserve all the free rows and work your way through them. Meanwhile other processes will pick up any subsequently arriving rows
Personally I would refactor your status column and make it NULL for new records ready to be processed, otherwise it's the worker ID that has reserved it.
It also helps to add things like timestamp columns to record when the row was reserved etc.

Use Multiple DBs With One Redis Lua Script?

Is it possible to have one Redis Lua script hit more than one database? I currently have information of one type in DB 0 and information of another type in DB 1. My normal workflow is doing updates on DB 1 based on an API call along with meta information from DB 0. I'd love to do everything in one Lua script, but can't figure out how to hit multiple dbs. I'm doing this in Python using redis-py:
lua_script(keys=some_keys,
args=some_args,
client=some_client)
Since the client implies a specific db, I'm stuck. Ideas?
It is usually a wrong idea to put related data in different Redis databases. There is almost no benefit compared to defining namespaces by key naming conventions (no extra granularity regarding security, persistence, expiration management, etc ...). And a major drawback is the clients have to manually handle the selection of the correct database, which is error prone for clients targeting multiple databases at the same time.
Now, if you still want to use multiple databases, there is a way to make it work with redis-py and Lua scripting.
redis-py does not define a wrapper for the SELECT command (normally used to switch the current database), because of the underlying thread-safe connection pool implementation. But nothing prevents you to call SELECT from a Lua script.
Consider the following example:
$ redis-cli
SELECT 0
SET mykey db0
SELECT 1
SET mykey db1
The following script displays the value of mykey in the 2 databases from the same client connection.
import redis
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
lua1 = """
redis.call("select", ARGV[1])
return redis.call("get",KEYS[1])
"""
script1 = r.register_script(lua1)
lua2 = """
redis.call("select", ARGV[1])
local ret = redis.call("get",KEYS[1])
redis.call("select", ARGV[2])
return ret
"""
script2 = r.register_script(lua2)
print r.get("mykey")
print script2( keys=["mykey"], args = [1,0] )
print r.get("mykey"), "ok"
print
print r.get("mykey")
print script1( keys=["mykey"], args = [1] )
print r.get("mykey"), "misleading !!!"
Script lua1 is naive: it just selects a given database before returning the value. Its usage is misleading, because after its execution, the current database associated to the connection has changed. Don't do this.
Script lua2 is much better. It takes the target database and the current database as parameters. It makes sure that the current database is reactivated before the end of the script, so that next command applied on the connection still run in the correct database.
Unfortunately, there is no command to guess the current database in the Lua script, so the client has to provide it systematically. Please note the Lua script must reset the current database at the end whatever happens (even in case of previous error), so it makes complex scripts cumbersome and awkward.

Log every SQL query to database in Rails

I want to save to a log file some SQL query rails performs, (namely the CREATE, UPDATE and DELETE ones)
therefore I need to intercept all queries and then filter them maybe with some regexp and log them as needed.
Where would I put such a thing in the rails code?
Here a simplified version of what c0r0ner linked to, to better show it:
connection = ActiveRecord::Base.connection
class << connection
alias :original_exec :execute
def execute(sql, *name)
# try to log sql command but ignore any errors that occur in this block
# we log before executing, in case the execution raises an error
begin
File.open(Rails.root.join("/log/sql.txt"),'a'){|f| f.puts Time.now.to_s+": "+sql}
rescue Exception => e
;
end
# execute original statement
original_exec(sql, *name)
end
end
SQL logging in rails -
In brief - you need to override ActiveRecord execute method. There you can add any logic for logging.
As a note for followers, you can "log all queries" like Rails - See generated SQL queries in Log files and then grep the files for the ones you want, if desired.
If you are using mysql I would look into mysqlbinlog . It is going to track everything that potentially updates data. you can grep out whatever you need from that log easily.
http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html
http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
SQL Server? If so...
Actually, I'd do this at the SQL end. You could set up a trace, and collect every query that comes through a connection with a particular Application Name. If you save it to a table, you can easily query that table later.
Slightly updated version of #luca's answer for at least Rails 4 (and probably Rails 5)
Place this in config/initializers/sql_logger.rb:
connection = ActiveRecord::Base.connection
class << connection
alias :original_exec :execute
def execute(sql, *name)
# try to log sql command but ignore any errors that occur in this block
# we log before executing, in case the execution raises an error
begin
File.open(Rails.root.join("log/sql.log"), 'a') do |file|
file.puts Time.now.to_s + ": " + sql
end
rescue Exception => e
"Error logging SQL: #{e}"
end
# execute original statement
original_exec(sql, *name)
end
end