How to import complex relational data into SQL Server from Excel

How to import complex relational data into SQL Server from Excel - sql

We have business users who are entering product information into excel spreadsheets. I have been tasked with coming up with a way of entering this information into our SQL Server DB. The problem is that the excel spreadsheets aren't just a flat table, they're hierarchical. They're something like this
-[Product 1] [Other fields]...
-[Maintenance item 1] [Other fields]...
-[Maintenance task 1] [other fields]...
-[Maintenance item 2] [Other fields]...
-[Maintenance task 2] [other fields]...
-[Maintenance task 3] [other fields]...
-[Product 2] [Product Description] [Other fields]...
ETC.......
So there can be 0-many maintenance items for a product and 0-many maintenance tasks for a maintenance items. This is how the database is structured. I need to come up with a standard excel template I can send out to our business users so they can input this information and then figure out how to export this into sql server. The volume is going to be high so I need to have the import somewhat automated. How should I do this?

Welcome to the worst possible way to store data and try to import it into a database. If at all possible do not let them create garbage Excel spreadsheets like that. That method is bound to create very many bugs in the data imports and you will hate your life forever if you have to support this mess.
I can't believe I'm even suggesting this, but can you get them to use a simple Access database instead? It could even link directly to the SQL server database and store the data correctly. By using Access forms, the users will find it relatively easy to add and maintain information and you will have far fewer problems than trying to import Excel data in the form you described. It would be a far less expensive and far less error prone solution to your problem.
You are stuck with the format, the best way I have found to do something like ths is to import it as is into a staging table add the ids to every subordinate row (you may end up looping to do this) and then drag the information out to relational staging tables and then import into the production database.
You can create all this using SSIS but it won't be easy, it won't quick and it will be very prone to bugs if users aren't disciplined abnout exactly how they enter data (and they never are without a set of forms to fill out). Make sure you reject the Excel spreadsheet completely and send it back to the user if it strays at all from the prescribed struture. Trust me on this.
I's estimate the Access solution to take about a month and the Excel solution to take at least six months of development. Really that's how bad this is going to be.

I don't believe you'll find an import tool that will do this for you. Instead, you're going to have to write a script to ETL the spreadsheet files. I do a lot of this in Python (I'm doing it today, in fact).
Make sure that you handle exceptions on per-cell level, reporting to the user exactly which cell had unexpected information. With spreadsheets created by hand it is guaranteed that you will have to handle this on a regular basis.
That said, if this is coming to you as XLSX it might be possible to develop an XML translation to convert it to some more tractable XML document.

It probably makes more sense to break it up into several Excel sheets...one for product, but then another for maintenance items, and another for maintenance tasks. For each one, they'll have to enter some kind of ID to link them back together (ex: maintenance_task_id=1 links to maintenance_item_id=4). That can be a pain for business users to remember, but the only alternative is to enter lots of redundant data for each line.
Next, create a normalized database model (to avoid storing redundant data) and fill it by writing an app or script to parse-through your Excel sheets. Vague and high-level, but that's how I'd do it.

I agree with previous posts in general...
my suggestion - avoid the spreadsheet entirely. Spend your time making a simple front end form - preferably a web based one. catch the data as cleanly as possible (ANYTHING here will be better than the spreadsheet cleanliness. - including just having named fields)
you will spend less time in the end.

I would add VBA code to the template to add as much structure and intelligence as possible to the user data entry and validation.
In the extreme case of this you make the user enter all data via Forms which put all the validated data on the sheet, and then have an overall validation routine built into the Save or Close event.
less extreme would be to add 3 command buttons driving code for
- add product
- add maintenance item
- add maintenance task
and some overall validation code at save/closeThis way you add as much smarts as possible to the data entry tasks.
Use Named Cells or other hidden metadata created by the VBA code as markers so that your DB update routine can make better sense of the data.The last one I did like this took 3-4 manweeks including the DB update routines, but I think it was probably more complicated than your example. But if you are not experienced with VBA and the Excel object model and events it would obviously take much longer.

Related

How to force a cache refresh in MS Access

I am working on migrating a MS Access Database over to a newer SQL platform.
But, with all of the users who are currently using it, we're migrating slowly/carefully.
The first step is that we are re-writing the VBA code into C#, which is then deployed in a .dll along with the database.
Now, the VBA code calls into the C# to do the business logic, then the VBA continues to do the displays/UI, while Access still hosts the database.
The problem comes in that I have a report that is being run after the business logic from the C# in one place, and apparently MS Access has a cache, which clears every 5 seconds. So, the transaction that occurs in the C# code writes to the database, but the VBA code is still using the cache. This is causing errors, as the records added to the database (which the VBA report is trying to report on) don't exist in the cache yet...
I'm guessing that the C# .dll must be getting treated as a "second connection" to the MS Access database, which is what seems to typically cause this error in my searches (thinks that one process is writing, and the other is reading).
Since the cache is cleared out every 5 seconds, we can just put the process to sleep, and wake it up after 5 seconds, and then run the report, but that's pretty terrible for an end user.
And, making things difficult, the cache seems like it only gets used in the deployed version (so, when running from source / in debug mode, the error never happens).
Doing some searches, there seems to be plenty of people who have said "just refresh the cache." But, the question is: within VBA, how do you refresh the cache?
Any advice would be welcome.
Thanks

I've been fighting the same issue for years as I write a lot of tools around an old Powerbuilder application that has an Access MDB back end.
The cache does exist and it is VERY real. When data is inserted on a different connection than it is queried on, the cache can be directly observed and measured. It was also documented by Microsoft before they blackholed a bunch of their old articles...
Microsoft Jet has a read-cache that is updated every PageTimeout milliseconds (default is 5000ms = 5 seconds). It also has a lazy-write mechanism that operates on a separate thread to main processing and thus writes changes to disk asynchronously. These two mechanisms help boost performance, but in certain situations that require high concurrency, they may create problems.
I've found a couple workarounds that are not the best, but somewhat make due until I find something better or can re-write the app with a better back end database.
The seemingly best answer I've found (that may actually work for you since you say you need VBA) is to use JRO.RefreshCache. I've been trying to figure out how to implement this using C# or VB.net without any luck. Below is a link to a code example where you execute the RefreshCache method on your 2nd connection that needs to pull the data. I have not tested this myself.
https://documentation.help/MSJRO/jrmthrefreshcachex.htm
A workaround I've found that will deliver the query results within 500ms to 1000ms of insert time (instead of anywhere between 500 and 5000 ms - or more):
Use System.Data.ODBC instead of OleDB, with connection string: Driver={Microsoft Access Driver (*.mdb, *.accdb)};Dbq=;
If someone knows how to use the JRO.RefreshCache method with OLEDB and C# or VB.net, I'd be forever grateful. I believe the issue is it's looking for an ADO connection to be passed in, not an OLEDB connection.

I not aware of ANY suggesting that some 5 second cache exits? Where did this idea come from????
Furthermore, if you have 5 users, then you not going to be able to update their cache, are you?
In other words, the issue of some cache for one user still not going to solve or work with mutli-users anyway, is it?
The simple matter is if you load up a form with 100 reocrds, and then other users are ALSO working on that 100 rows, then all users will not see other changes until such time you tell access to re-load the form.
You can do this with a me.Refresh in the form, and then it will show changes made by other users (or even your c# code!!!).
However, that not really the soluion here.
How does near EVERY system deal with this issue?
Answer:
You don't, you "design" the software to take the user work flow into account.
So, in place of loading up a form with 100 rows of data? (which you should not, unless SUPER DUPER reason exists for doing that).
The you provide a UI in which the user FIRST searches for whatever it is they want to work on.
In other words, say you just booked a user on a tour. Now, they call the office back, and want to change some details of that tour. But, a different tour staff might pick up the phone. So, now a 2nd user opens the tour?
So, you solve that issue by NOT loading all the tours into that form in the first place.
you provide a search screen, so they can search for the user, find the user, maybe type in a invoice number or whatever.
You display the results in a pick list, and then launch the form to the ONE record (and perhaps detail records from child tables).
So there no concpet of a cache in Access anymore then there is in c#.
However, if you load up a datatable in c#, and then display that data?
Well, what about the other users on that system. They will not see changes to that data ANY MORE then the current access form.
So, if you want to update some data in c#? Then fine, but you need/want to do two things:
First, before you call any c# code that may update the current form reocrd? You need to FORCE a data save of that current record BEFORE you call any code, be it VBA code, or c# code that going to update that current reocrd the user is working on.
You can in Access save the current reocrd in MANY different ways, but the typical approach is:
' single record save - current record
if me.dirty then me.dirty = false
' VBA or c# code goes here.
' optional refresh the current form to reflect changes
me.Refresh
So, in most cases, it is the "design" of your software that will solve this issue.
For example, in the tour example, or in fact ANY system, the user can't work, can't update, and can't do their job UNLESS they first find/search and have a means to bring up that form + record data in the first place.
So, ANY typical good design will:
Ask the user for that name, invoce number or whatever.
Display the results of the search, and THEN allow the user to pick the record/data to work on. When they are done, they close that form and are RIGHT BACK to the search form to do battle with the next customer or task or phone call or whatever.
So, a search form might look like this:
In above, I typed in smi, and then displayed a pick list.
The user can further type in say part of the first name, and thus now get this:
So, maybe they type in a invoice number, customer number, booking number or whatever.
So, you display the results, and then they can select the row or "thing" to work on.
thus, we click on the row (or above glasses button), and then jump to the ONE record.
so, the user does whatever they have to do with the customer. Now, when done, they close the ONE thing, the ONE main reocrd.
This not only saves the data (so others in the office can now use that booking data), but it also means the data is saved. and they are NOW right back at the search screen, ready to do battle with the next customer.
So, not only does this mean we have a VERY bandwith friednly design (we only pull the one main reocrd into that form), but it also is better for work flow.
The Access form's cache thus becomes a non issue, since we only dealing with the one record.
And as I pointed out, if the system is multi-user, then you NOT going to be able to udpate and deal with multiple users cached data anyway, are you?
Think of ANY system you EVER used from a software point of view.
When you use google, does it download the WHOLE internet, and then you use ctrl-f to search megs and megs of data in the browser?
Nope!
you search first, get a list of that search, and THEN pick one!!
And when that list is display, maybe others on the internet are udpateing, and add new data - but if that was cached in your browser, then it would not work!!!
And same goes for a desktop accounting system. You don't load up all accounts, and THEN have the user go ctrl-f to search all the data. You search for the customer, invoice number and PICK ONE to work on.
And it does not make sense to load up a form with 1000 customers, and then go ctrl-f to find that customer. Same goes for a instant banking machine. It does not download ALL customers and THEN let you search. It asks you FIRST to get what you need. So, be it browser based, desktop based, or JUST ABOUT ANY software you use?
You quite much elminate the cache issue, since not pre-loading boatloads of data, but asking and letting the user search for the data they need.
So, in regards to the Access form data and cache?
If you are on a form, and call VBA code, or c# code or whatever?
If that code update the current form, you have NO MORE OR LESS of a issue when calling VBA code, or c# code!!!! If that code updates the current form, and the reocrd is dirty (has pending edits), then you get that message about the current form's reocrd having been udpated by another user!!!
So, your cache issue does NOT IN ANY WAY exist MORE or LESS as a issue in typical Access software.
As a genreal rule, if you are on a form with pending edits, and say want to pop up some form to edit releated data?
You have to ensure that pending edits are SAVED before you launch an form that can edit the same data, or run code that can/may edit that data.
As a result, ZERO cache issues should exist, and they no more or no less exist when calling sql or VBA update code in a form then calling some c# code from that form.
So, write the pending update for that form.
Then run your VBA, SQL, or c# code.
And then do a me.Refresh to display any changes made by those external routines.
there is no documetjion, or ANY article I can find that suggests some kind of 5 seocnd cache or update - it is a urban myth, and your software challenge here in regards to use c# or VBA, or even SQL server stored procedures?
They are all the same issue, and I dare say that often access is used as a front end to SQL server, and ALL OF the SAME issues exist when using SQL server with ms-access.

Best way to export data from other company's SAP

I need to extract some data from my client's SAP ECC (the SUIM -> Users by Complex Selection Criteria -program RSUSR002)
Normally I give them a table of values that I they have to fill some field to extract what I need.
They have to make 63 different extractions (with different values of objects, for example - but inside the same transaction - you can see in the print) from their SAP, to later send to me all extracted files.
Do you know if there is an automated way to extract that, so they don't have to make 63 extractions?
My biggest problem is that every time they make mistakes. It's a lot of things to fill..
Can I create a variant and send it to them? Is it possible to export my variant so they can import it without the need to fill 63x different data?
Thank you.

When this is a task which takes considerable effort by multiple people each year, then it is something which might be worth automatizing.
First you need to find out where that transaction gets its data from. If you spend some time analyzing and debugging the program behind the transaction, you will surely find which SELECT's on which database table(s) provide that data. If you are lucky, there might even be a function module for it.
Then you just need to write an own ABAP program which performs the same selections.
Now about the interesting part: How to get that data to you. There are several approaches here. The best one depends on your requirements and your technical infrastructure. Some possibilities are:
Let users run the program in foreground, use the method cl_gui_frontend_services=>gui_download to save the data to a file on the user's PC and ask them to send it to you via email
Run the program in background and save the file on the application server. Then ask your sysadmins how to get that file from their application server to you. The simplest way would be to just map a network fileserver so they all write to the same place, but there might be some organizational hurdles in the way which prevent that. (Our security people would call me crazy if I proposed to allow access to SMB shares from outside of our network, but your mileage may vary)
Have the program send the data to you directly via email. You can send emails from an SAP system using the function module SO_NEW_DOCUMENT_ATT_SEND_API1. This of course requires that the system was configured to be able to send emails (which you can do with transaction code SCOT). Again, security considerations apply. When it's PII or other confidential data, then you should not send it in an unencrypted email.
Use an RFC call to send the data to your own SAP system which aggregates the data
Use a webservice call to send the data to your own non-SAP system which aggregates the data

You can create a recording in transaction SM35.
There you fill a tcode (SUIM), start recording, make some input in transaction SUIM and then press 'Execute'. Then you can go back to recording (F3 multiple times) and the system will generate some table with commands (structure is BDCDATA). You can delete unnecessary part (i.e. BACK button click) and save it to use as a 'macro'. Then you can replay this recording and it will do exactly what you did.
Also it's possible to export/import the recording to text file, so you can explore it's structure, write some VBA script to create such recording from your parameters and sent it to users. But keep in mind that blanks are meaningful.
It's a standard tools so there's no any coding in the system.

You can save the selection as a variant.
Fill in the selection criteria and press Save.
It can be reused.
You can also transport Variants if the they have a special name

Automating the process of creating doc word

I have a .doc template I use for building CVs for many friends.
I'm trying to automate this process using simple library/program, for exmaple, that can accept data like name, email, phone number, job title, and can create the .doc automatically.
What framework can be used for that to make it fastest i can?
Thanks,
Tal

Where exactly are keeping this template and are your friends plugging in the data or are you doing it all yourself?
No matter what, you're basically looking to do a data merge. An example of a data merge is a mail merge:
https://support.microsoft.com/en-us/help/294683/how-to-use-mail-merge-to-create-form-letters-in-word
The same thing really applies to what you're accomplishing to do.
You can take a template, specify the fields that require variable data (aka the different information that's changing), and then just use a spreadsheet to pull the data from and plug it in.
Now the question you'll probably be wondering next is how data merges use spreadsheets. The way data merges work is that each column you set with data in it, that should correspond to the changing lines in your template. I strongly recommend you read up on this further - it's not that difficult to do once you get the hang of it.
The last question is probably how you'll compile the data into this spreadsheet. Are your friends going to fill out an online form perhaps? If so, you'll need an online form of some sort perhaps, so you'll need to use some PHP, have a database to store the information from the form, and then just go to the table and export the information as a .csv file after you see you have enough data populated in your database table to do a data merge.
If you don't have access to MS Office, I'm sure you can accomplish this in OpenOffice.org instead (which is free/open-source).
Hope this helps.
At my job we do data merges all the time - for mail merges, for letters that need to be personally address to individual recipients, and we do this for people who need to print dozens of different business cards for different employees. We take their business card template and just do a data merge from a spreadsheet to save time on needing to set up individual files. P.S. you can also use Adobe inDesign for this, if you know how to use it.

MS ACCESS TransferSpreadsheet VBA to include extra information in import data

I am building an Access 2010 db which will store and query information relating to time spend by users in our team. Part of the reporting needs to include whether timesheets have been submitted on time.
The process is currently being managed in Excel but is becoming cumbersome due to the growing size of the consolidated data. In the current process, the flag on whether someone is late with their timesheet is applied manually.
Instead of manually adding a Yes / No value to the excel data, I wondered whether it was possible to set up separate TransferSpreadsheet processes in Access to upload the excel data (and attach them to separate command buttons) such that, depending on which one is executed, the import process adds a Yes or a No value to the last column in the data as it's being uploaded.
That way we can import the excel data for those who submitted their timesheets on time (and 'stamp' them Yes for being on time) and then any subsequently late submitted timesheet data can be imported later (and 'stamped' with a No).
I have spent several hours looking at online forums and instruction pages but cannot find anything close to what I am trying to achieve, hence the reason for posting this here.
This is just one of the options I am considering but my VBA skills are insufficient to establish whether such a process could be handled in VBA. All help appreciated. Thanks.

Solved this one myself with a bit of perseverance. Ended up running a few DoCmd.RunSQL commands to Alter / Delete / Insert the tables I had and used a 'join' table to load the data from excel and then ran a command to append the data from the 'join' table to the main table. I just invoke slightly different commands to update the table field based on whether the data has been submitted late or on time.

Auto-generating reports in Microsoft Access

Currently I am in the process of building an updated Microsoft Access database for our affiliated hospital to keep a timeline of documentation for potential medical studies. The database itself is standalone, so there is no real back end to speak of (which hurts me, but I have no control over how they handle their data). It currently holds only one table with all the necessary fields needed. I'm looking to have reports generate automatically every week, month, quarter, and year. However, I must admit my vba coding abilities are spotty at best. My questions are this:
What code would I need to use to generate said reports?
Would I need to create separate queries just for each type of report based on when it needs to be generated, or can I code it all in one query without the different codes interfering with each other?
If any other information or clarification is needed, I'd be happy to give you what I can.

When calling a Report from VBA or from a macro, two things could be usefull to you:
providing a condition, like
docmd.OpenReport "someReport",acViewPreview,,"[examDate] > Date() - 30"
providing an OpenArg, which can be used by vba in your report to change the periodicity for example ("M" for Monthly and "w" for Weekly"), like
docmd.OpenReport "someReport",acViewPreview,,"[examDate] > Date() - 30",,"W"
Those tricks should allow you to design a report ONCE, then use it with dynamic time range and grouping.

If you don't want to use VBA, for creating automatic reports printing/export you'll need:
Create reports which don't require user entry, use date functions for querying required date intervals.
Create macroses for starting reports and closing the database, no VBA knowledge required
Create tasks in Windows Task Scheduler for starting MS Access at requred days/time. Command line should contain parameters for opening specified database and starting specified macro (/x macro_name). Parameters description you can find, for instance, here
.

The comments say it all.
Access is designed to make report writing of the nature you describe very easy, and a lot can be achieved without using VBA at all or macros for that matter.
If you want to use the same report to display the same sets of data for different date ranges (or perhaps with other changes in criteria), you should prompt the user to enter a date range or the other criteria that will cause the data displayed to be changed. (ie the parameters they entered are used in the query to change the data returned).
This is infinitely preferable to having many many reports and queries that are all the same except they get data for a different set of criteria.
There are many Access Programming books that will help you re-learn these skills. I have found this channel useful https://www.youtube.com/user/ProgrammingMadeEZ/search?query=reports for other Access videos, but it's not as useful for reports.
Harvey

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas