I am trying to Copy a dataset from one region to another region. I would want to understand each data transfer job run copies the entire data again or only copies the delta(changes happened after previous data transfer run). Kindly help me to understand.
Related
Our nonprofit uses google sheets to transform data. The first file has the raw data, which comes to us in a CSV. Data gets passed from one file to another with =importrange. Intermediate files transform various parts of it with lot of google sheet formulas such as =split, =vlookup, =if, =textjoin, =concatenate, etc. The final file has the data in the form that we can use to create pages in our website.
The first file has about 150 columns. The new 10M cell limit should let us get about 60k rows, but even that number freezes up, and we need to get up into the millions of rows. All of the transformer files, together, add up to about 3k columns.
We assume that the ultimate solution is to re-create it all in a SQL database, but we do not have any expertise of that type, nor the funding to hire someone.
Is there an easy way to transform a google sheet (with formulas) into
a SQL file?
Is there an easy interim solution, which we can use for a
while?
I need to create a pipeline to read CSVs from a folder, load from Row 8 into an Azure SQL table, Frist 5 rows will go into a different table ([tblMetadata]).
So far I have done it using Lookup Activity, works fine, but one of the files is bigger than 6 MB and it fails.
I checked all options in Lookup, read everything about Copy Activity (which I am using to load main data - skip 7 rows). The pipeline is created using GUI.
The output from the Lookup is used as parameters for a Stored Procedure to insert into tblMetadata
Can someone advise me how to deal with this? At the moment I am on the training, no one can help me on site.
You could probably do this with a single Data Flow activity that has a couple of transformations.
You would use a Source transformation that reads from a folder using folder paths and wildcards, then add a conditional split transformation to send different rows to different sinks.
I did workaround in different way, modified CSVs that are bing imported to have whole Metadata in the first row (as this was part of my different project). Then used FirstRow only in Lookup.
I wanted to get rid off one data source in Tableau, that's why instead of using 2 different data source for one dashboard, I wanted to copy all relevant fields from one data source to other. Is there any way in Tableau, by which I can copy-paste tos field from one to other data source?
In the attached screenshot, I wanted to copy the advisor sales field in data source biadvisorSalesmonth24 to bitransactionPartnerDay365:
You cannot make schema or structure changes to a table / datasource from within Tableau. If advisor sales is not in the bitransactionPartnerDay365 data source, then you will have to keep both data sources in the workbook and join them together.
Now, if you are familiar with the datasets and know the necessary table layout, you could write a custom SQL command and use that SQL command to retrieve the desired data as a single data source.
I've created a piece of code that allows me to populate a data table after looping through some information. I would then like to compare it with the information that was gathered the last time the tool was executed and copy the information in to a new data table. Finally the code will then take a copy of the new information gathered in order to have it checked for next time. The system should basically work like this:
Get new information
Compare against last times information
Copy information from task 1 ready for next time task 2 is done.
I've done some reading up and a lot of Inner Joins are being thrown about, but my understanding of this is that it will return fields that are the same, not different.
How would I go about attempting this?
Update
I forgot to mention that I've already achieved steps 1 and 3, I can store the data, copy the data for the next run, but can't do step 2, comparing the data
How about using a SQL "Describe table" query to get the structure?
If you are talking about differences in the records contained, then you will need to keep an "old data" table and your current data table and do a right or left join to find data that is in one but not the other.
I'm having troubles solving this little problem, hopefully someone can help me.
In my SSIS package I have a data flow task.
There's several different transforms, merges and conversions that happen in here.
At the end of the dataflow task, there is two datasets, one that contains two numbers that need to be compared, and another dataset that contains a bunch of records.
Ideally, I would like to have these passed onto a whole new data flow task (separate sequence container) where I can do some validation work on it and separate the logic.
I cant for the life of me figure out how to do it. Iv tried looking into scripting and storing the datasets as variables, but I'm not sure this is the right way to do it.
The next step is to export the large dataset as a spreadsheet, but before this happens i need to compare the two numbers from the other dataset and ensure they're correct.
To pass data flowing in one dataflow to another, You have to have a temporary location.
This means that You have to put data in destination in one dataflow and then read that data in another dataflow.
You can put data in number of destinations:
database table
raw file
flat file
dataset variable (recordset destination)
any other destination component that you can read from with corresponding source component or by writing script or whatever
Raw files are meant to be used for cases like this. They are binary and as such they are extremely fast to write to and read from.
In case You insist to use recordset destination take a look at http://consultingblogs.emc.com/jamiethomson/archive/2006/01/04/SSIS_3A00_-Recordsets-instead-of-raw-files.aspx because there is no recordset source component.
A Data Flow Task needs to have a destination; a Data Flow Task likewise is NOT a destination. Otherwise the data doesn't go anywhere in the pipeline. From my experience, your best bets are to:
1) Pump the data into staging tables in SQL Server, and then pick up the validations from there.
2) Do the validations in the same Data Flow task.