Synchronize Data Access - sql

Some years ago I built an Excel file for my colleagues that displays lots of data from an external ODBC data source. The data is partitioned into lots of data tables in different sheets. The file also contains a button that allows the user to update the data.
Since accessing the data from the external source was very slow, I implemented some caching logic that stored parts of the results, that were unlikely to change, in external tables on our SQL server and did some magic to keep the data synchronized. The excel file itself only accesses the SQL server. Every data table uses an SPROC to get part of the data.
Fast forward 5 years. The Excel file has grown in size and contain so many sheets and data that our Excel (still version 2003) got problems with it. So my colleagues split the file into two halfs.
The problem is now, that both excel files contain the logic to update the data and it can happen that a user clicks the update button in file no. 1 while another user is already updating file no. 2.
That's the point where the updating logic goes berserk and produces garbage.
The update run is only required once for both excel files because it updates all the data that's displayed in both files. It's quite expensive and lasts from 5 to 15 minutes.
I could split the update run into two halves as well, but that wouldn't make it any faster and updating the two files would take twice as long.
What I think about is some kind of mutex: User A clicks on the update button and the update run starts. User B wants to update too, but the (VBA/SPROC) logic detects that there's already an update running and waits till it finishes.

You could perform the updates in a Transaction with Serializable isolation level; your update code would need to detect and handle SQL Server error 1205 (and report to user that another update is in process).
Alternatively, add a rowversion timestamp to each row and only update a row if it hasn't been changed since you loaded it.

But when A has finished, B will run the update 'for nothing'.
Instead: When A clicks update, call a stored proc which fires the update asynchronously.
When the update starts, it looks at the last time it ran itself and exits if it was less than X minutes ago.

Related

How to avoid running query twice - ODBC Excel (new query or editing existing query)

I have an ODBC connection to an AWS mysql database instance. It's extremely frustrating that it appears I'm obligated by the excel UI to run the query twice.
First, I have to run the query like this:
After this runs, which returns a limited amount of rows (2nd image below), then I have to run it again to load the data into excle.
My question is, is there any possible way to skip step 1 or step 2, so that I can input my query and have it load directly into the workbook?
I'm not understanding the problem. You are configuring a Query Connection. The first execution returns a preview and the "Transform data" option (if you want to further tailor the query). The second execution loads it. From that point on the query is set up. It only needs configured once.
To get new/changed data, you just need to do a "Refresh All" or configure it to automatically Refresh Data when the Excel Workbook is opened.
If you are adding a Query to many workbooks you could probably setup one then code a query substitution script.

Will BigQuery finish long running jobs with a destination table if my browser crashes / computer turns off?

I frequently run BigQuery jobs in the web gui that take 30 minutes or more, saving the results into another table to view later.
Since I'm not waiting for the result to come soon, and not storing them in my computer's memory, it would be great if I could start a query and then turn off my computer, to come back the next day and look at the results in the destination table.
Will this work?
The same applies if my computer crashes, or browser runs out of memory, or anything else that causes me to lose my connection to Bigquery while the job is running.
The simple answer is yes, the processing takes place in the cloud, not on your browser. As long as you set a destination table, the results will be saved there or if not, you can check the query history to see if there were any issues which caused it not to be produced.
If you don't set a destination table it will save to a temporary table which may not be available if you don't return in time.
I'm sure someone can give you a much more detailed answer.
Even if you have not defined destination table - you still can access result of the query by checking Query History. You should locate your query in the list of presented queries and then expand respective item and locate value of Destination Table.
Note: this is not regular table - rather so called anonymous table that is being available for about 24 hours after query was executed
So, knowing that table you can just use it in whatever way you want - for example just simply query it as in below
SELECT *
FROM `yourproject._1e65a8880ba6772f612fbe6ff0eee22c939f1a47.anon9139110fa21b95d8c8729cf0bb6e4bb6452946d4`
Note: anonymous table is being "saved" in a "system" dataset that is started with underscore so you will not be able to see it in UI. Also table name startes with 'anon' which I believe states for 'anonymous'

How to sync/update a database connection from MS Access to SQL Server

Problem:
I need to get data sets from CSV files into SQL Server Express (SSMS v17.6) as efficiently as possible. The data sets update daily into the same CSV files on my local hard drive. Currently using MS Access 2010 (v14.0) as a middleman to aggregate the CSV files into linked tables.
Using the solutions below, the data transfers perfectly into SQL Server and does exactly what I want. But I cannot figure out how to refresh/update/sync the data at the end of each day with the newly added CSV data without having to re-import the entire data set each time.
Solutions:
Upsizing Wizard in MS Access - This works best in transferring all the tables perfectly to SQL Server databases. I cannot figure out how to update the tables though without deleting and repeating the same steps each day. None of the solutions or links that I have tried have panned out.
SQL Server Import/Export Wizard - This works fine also in getting the data over to SSMS one time. But I also cannot figure out how to update/sync this data with the new tables. Another issue is that choosing Microsoft Access as the data source through this method requires a .mdb file. The latest MS Access file formats are .accdb files so I have to save the database in an older .mdb version in order to export it to SQL Server.
Constraints:
I have no loyalty towards MS Access. I really am just looking for the most efficient way to get these CSV files consistently into a format where I can perform SQL queries on them. From all I have read, MS Access seems like the best way to do that.
I also have limited coding knowledge so more advanced VBA/C++ solutions will probably go over my head.
TLDR:
Trying to get several different daily updating local CSV files into a program where I can run SQL queries on them without having to do a full delete and re-import each day. Currently using MS Access 2010 to SQL Server Express (SSMS v17.6) which fulfills my needs, but does not update daily with the new data without re-importing everything.
Thank you!
You can use a staging table strategy to solve this problem.
When it's time to perform the daily update, import all of the data into one or more staging tables. Execute SQL statement to insert rows that exist in the imported data but not in the base data into the base data; similarly, delete rows from the base data that don't exist in the imported data; similarly, update base data rows that have changed values in the imported data.
Use your data dependencies to determine in which order tables should be modified.
I would run all deletes first, then inserts, and finally all updates.
This should be a fun challenge!
EDIT
You said:
I need to get data sets from CSV files into SQL Server Express (SSMS
v17.6) as efficiently as possible.
The most efficient way to put data into SQL Server tables is using SQL Bulk Copy. This can be implemented from the command line, an SSIS job, or through ADO.Net via any .Net language.
You state:
But I cannot figure out how to refresh/update/sync the data at the end
of each day with the newly added CSV data without having to re-import
the entire data set each time.
It seems you have two choices:
Toss the old data and replace it with the new data
Modify the old data so that it comes into alignment with the new data
In order to do number 1 above, you'd simply replace all the existing data with the new data, which you've already said you don't want to do, or at least you don't think you can do this efficiently. In order to do number 2 above, you have to compare the old data with the new data. In order to compare two sets of data, both sets of data have to be accessible wherever the comparison is to take place. So, you could perform the comparison in SQL Server, but the new data will need to be loaded into the database for comparison purposes. You can then purge the staging table after the process completes.
In thinking further about your issue, it seems the underlying issue is:
I really am just looking for the most efficient way to get these CSV
files consistently into a format where I can perform SQL queries on
them.
There exist applications built specifically to allow you to query this type of data.
You may want to have a look at Log Parser Lizard or Splunk. These are great tools for querying and digging into data hidden inside flat data files.
An Append Query is able to incrementally add additional new records to an existing table. However the question is whether your starting point data set (CSV) is just new records or whether that data set includes records already in the table.
This is a classic dilemma that needs to be managed in the Append Query set up.
If the CSV includes prior records - then you have to establish the 'new records' data sub set inside the CSV and append just those. For instance if you have a sequencing field then you can use a > logic from the existing table max. If that is not there then one would need to do a NOT compare of the table data with the csv data to identify which csv records are not already in the table.
You state you seek something 'more efficient' - but in truth there is nothing more efficient than a wholesale delete of all records and write of all records. Most of the time one can't do that - but if you can I would just stick with it.

MS Access event when new table is created

I would need to come up with a robust solution to detect when a new table is created in MS Access and then copy its content to a master table.
One application is writing new data as new data tables into MS Access. This part can't be changed. Now these tables have to be copied into a master table to be picked up by an interface.
Is there a trigger in MS Access when a new table is created?
I was also thinking about a timer and then to look up all tables.
Any ideas or suggestions?
Access does not expose an event for table creation. So you will have to check whether a new table has been created.
If you're not deleting tables, you could examine whether CurrentDb.TableDefs.Count has increased since the last time you checked.
You need to trap COMPLETION of putting all the data into the table,
not the initial creation and not sometime during adding data when the count may be greater then before.
The copy operation cannot start until all the data is in the table.
Thus the creating program needs to send a signal when its done.

How to resume data migration from the point where error happened in ssis?

I am migrating data from an Oracle database to a SQL server 2008 r2 database using SSIS. My problem is that at a certain point the package fails, say some 40,000 rows out of 100,000 rows. What can I do so that the next time when I run the package after correcting the errors or something, I want it to be restarted from the 40,001st row, i.e, the row where the error had occured.
I have tried using checkpoint in SSIS, but the problem is that they work only between different control flow tasks. I want something that can work on the rows that are being transferred.
There's no native magic I'm aware of that is going to "know" that it failed on row 40,000 and when it restarts, it should start streaming row 40,001. You are correct that checkpoints are not the answer and have plenty of their own issues (can't serialize Object types, loops restart, etc).
How you can address the issue is through good design. If your package is created with the expectation that it's going to fail, then you should be able to handle these scenarios.
There are two approaches I'm familiar with. The first approach is to add a Lookup Transformation in the Data Flow between your source and your destination. The goal of this is to identify what records exist in the target system. If no match is found, then only those rows will be sent on to destination. This is a very common pattern and will allow you to also detect changes between source and destination (if that is a need). The downside is that you will always be transferring the full data set out of the source system and then filtering rows in the data flow. If it failed on row 99,999 out of 1,000,000 you will still need to stream all 1,000,000 rows back to SSIS for it to find the 1 that hasn't been sent.
The other approach is to use a dynamic filter in your WHERE clause of your source. If you can make assumptions like the rows are inserted in order, then you can structure your SSIS package to look like Execute SQL Task where you run a query like SELECT COALESCE(MAX(SomeId), 0) +1 AS startingPoint FROM dbo.MyTable against the Destination database and then assign that to an SSIS variable (#[User::StartingId]). You then use an expression on your select statement from the Source to be something like "SELECT * FROM dbo.MyTable T WHERE T.SomeId > " + (DT_WSTR, 10) #[User::StartingId] Now when the data flow begins, it will start where it last loaded data. The challenge on this approach is finding those scenarios where you know data hasn't been inserted out of order.
Let me know if you have questions, need things better explained, pictures, etc. Also, above code is freehanded so there could be syntax errors but the logic should be correct.