I'm wondering what the community suggests for extracting data from an OData API to SQL 2008 R2. I need to create a nightly job that imports the data to SQL. Should I create a simple console app that iterates through the OData API and imports to SQL? Or can I create some type of SQL Server BI app? Or is there a better way to do this?
This is going to be sooo slow. OData is not an API for bulk operations. It is designed for clients to access individual entities and navigate relations between them, at most paginate across some filtered lists.
Extracting a entire dump via OData is not going to make anybody happy. The OData API owner will have to investigate who is doing all these nightly crawls over his API and discover it is you and likely cut you off. You on the other hand will discover that OData is not an efficient bulk transport format and marshaling HTTP encoded entities back and forth is not exactly the best way to spend your bandwidth. And crawling the entire database every time, as opposed to just discovering the deltagrams from last crawl, is only going to work until the database reaches that critical size S at which the update takes longer than the frequency you're pooling!
Besides, if is not your data, it is extremely likely that the use terms for the OData API explicitly prevent such bulk crawls.
Get a dump of the data, archive it, and copy it using FTP.
Related
I'm using a tool called Teamwork to manage my team's projects.
The have an online API that consists of JSON files that are accessible with authorisation
https://developer.teamwork.com/projects/introduction/welcome-to-the-teamwork-projects-api
I would like to be able to convert this online data to an sql db so i can create custom reports for my management.
I can't seem to find anything ready to do that.
I need a strategy to do this..
If you know how to program, this should be pretty straightforward.
In Python, for example, you could:
Come up with a SQL schema that maps to the JSON data objects you want to store. Create it in a database of your choice.
Use the Requests library to download the JSON resources, if you don't already have them on your system.
Convert each JSON resource to a python data structure using json.loads.
Connect to your database server using the appropriate Python library for your database. e.g., PyMySQL.
Iterate over the python data, inserting rows into the database as appropriate. This is essentially the JSON-to-Tables mapping from step 1 made procedural.
If you are not looking to do this in code, you should be able to use an open-source ETL tool to do this transformation. At LinkedIn a coworker of mine used to use Talend Data Integration for solid ETL work of a very similar nature (JSON to SQL). He was very fond of it and I respected his opinion, so I figured I should mention it, although I have zero experience of it myself.
I'm looking for some resources to get me started on how to design and implement an API for SQL.
Is this done by writing a series of functions and/or stored procedures to process your transactions on the SQL server (T-SQL).
I have read a bit about Transaction API's v Table API's. While you don't have to chose one of the other I would prefer to avoid the Table API's and focus more on Transaction API's to keep performance high and avoid using cursors.
Also from what I understand RESTFUL API's are just making the requests through HTTP requests (Using JSON) rather than connecting to the DB directly.
If my understanding is completely wrong on this subject please correct me as I am trying to learn.
Thanks
My question is not about a specific code. I am trying to automate a business data governance data flow using a SQL backend. I have put a lot of time searching the internet or reaching out people for the right direction, but unfortunately I have not yet found something promising so I have a lot of hope I would find some people here to save from a big headache.
Assume that we have a flow (semi static/dynamic flow) for our business process. We have different departments owning portions of data. we need to take different actions during the flow such as data entry, data validation, data exportation, approvals, rejections, notes etc and also automatically define deadlines, create reports of overdue tasks and people accountable for them etc.
I guess the data management part would not be extremely difficult, but how to write an application (codes) to run the flow (workflow engine) is where I struggle. Should I use triggers or should I choose to write codes to frequently run queries to push the completed steps to next step, how I can use SQL tables to keep the track of flow etc
If one could give me some hints on this matter, I would be greatly appreciated
I would suggest using the sql server integration services SSIS, you can easily mange the scripts and workflow based on some lookup selections, and also you can schedule SSIS package on timely bases to trigger and do the job.
It's hard task to implement application server on sql server. Also it's will be very vendor depended solution. Best way i think to use sql server as data storage and some application server for business logic over data storage.
I'm currently developing a service for an App with WCF. I want to host this data on windows-azure and it should host data from differed users. I'm searching for the right design of my database. In my opinion there are only two differed possibilities:
Create a new database for every customer
Store a customer-id to every table (or the main table when every table is connected via entities)
The first approach has very good speed and isolating, but it's very expansive on windows azure (or am I understanding something of the azure pricing wrong?). Also I don't know how to configure a WCF- Service that way, that it always use another database.
The second approach is low on speed and the isolating is poor. But it's easy to implement and cheaper.
Now to my question:
Is there any other way to get high isolation of data and also easy integration in a WCF- service using azure?
What design should I use and why?
You have two additional options: build multiple schema containers within a database (see my blog post about this technique), or even better use SQL Database Federations (you can use my open-source project called Enzo SQL Shard to access federations). The links I am providing give you access to other options as well.
In the end it's a rather complex decision that involves a tradeoff of performance, security and manageability. I usually recommend Federations, even if it has its own set of limitations, because it is a flexible multitenant option for the cloud with the option to filter data automatically. Check out the open source project - you will see how to implement good separation of customer of data independently of the physical storage.
I have a program that uploads about 1gb of data to a SQL Azure Database.
I use a SqlBulkCopy to upload this data. I upload about 8,000,000 entities, on average 32,000 entities at a time, with a maximum of about 1,200,000 in one time.
I am receiving a lot of SqlExceptions, with error code 4815.
At first I thought this may be due to me uploading too many at a time and Azure throttling my connection or employing ddos defense, but I allowed mhy program to only submit 25,000 entities with each SqlBulkCopy, and I got even more errors! A lot more!
I have had good results using BCP to move large amounts of data into SQL Azure. The SQL Azure migration wizard uses this approach behind the scenes. This blog post is a bit dated, but the concepts are sound when it comes to importing a lot of data:
Brute Force Migration of Existing SQL Server Databases to SQL Azure
Question did not specify source of the data, so obviously this will not work for you if you are not importing from another database.
In my case, I got a 4815 when the data I was sending in one of the fields was larger than the field size in the table definition... sending 13 characters into a VARCHAR(11).