We need to migrate the data from the old GCP instance to new instance( with new organization node). I am using the "share dataset" option to move the data. It is very convenient approach. Do you think this is a good way to migrate data or should we create new tables and then load the data into the tables?
Thanks in advance!
It's depend on what you want to achieve. The share dataset feature allow other to access the data because you have granted the permission.
However, the data doesn't move and still belong to the old GCP project. If you remove the project, you remove the data. In addition, it's still the old project that pay for the data storage, the new one only for the data processing.
If you plan to shut down the old project, you have to copy the data. Automatically with the data transfert service, or by querying them if you want to filter/transform the existing data before storing them in the new project.
Related
I'm using web api to import data into powerbi. After every refresh, old data is replaced by new data of web api so my question is how can I store that old data in power bi ?
Power BI will not store data, unless you have a query source that will support incremental refresh.
https://learn.microsoft.com/en-us/power-bi/admin/service-premium-incremental-refresh
It would be best to use a tool like Azure Function, Azure Logic Apps or Power Automate to get the data and save it as file to a folder then import the data from the folder. Other option would be to move the data to a database table to preserve the history.
Alright, so I am working on a project at work and I need to append data to a new history table every time the data in our other table is updated or deleted. However, we are getting access to our sql tables from another company and they only gave us read-only privileges and we can only view them through Microsoft Power BI and Excel.
So I wanted to see if there was any way of creating a trigger of some sorts.
Thank You
From your question, you are trying to do an incremental load of data, to be able to append new data to a table. Also you are looking to have some sort of archive process to a history table, via a trigger. Incremental loads are a Power BI Premium feature only. However for the way you want to move the data based on a trigger, this is not supported in Power BI.
I would recommend trying to get better access to the SQL, or use Excel to get the data, dump it into Excel/CSV files, then create a process to load the new file(s) and figure out the changes, using some other database/etl process, then output to a file/table the results that PBI can read from.
Hope that helps
I need to create a database solely for analytical purposes. The idea here is for it to start off as a 1:1 replica of a current SQL Server database but we will then add in additional tables. The idea here is to be able to have read-write access to a db without dropping anything in production inadvertently.
We would ideally like to set a daily refresh schedule to update all tables in the new tb to match the tables in the live environment.
In terms of the DBMS for the new database, I am very easy - MySQL, SQL Server, PostgreSQL would be great -- I am not hugely familiar with the Google Storage/BigQuery stack but if this is an easy option, I'm open to it.
You could use a standard HA/DR solution with a readable secondary (Availability Groups/mirroring /log shipping).
then have a second database on the new server for your additional tables.
Cloud Storage and BigQuery are not RDBMS services themselves, but could be used in this case to store the backups/exports/dumps from the replica, and then have the analytical work performed on those backups.
Here is an example workflow:
Perform a backup and restore in a different database
Add the new tables in the new database
Export the database as a CSV file on your local machine
Here you could either directly load the CSV file in BigQuery, or upload that file in a Cloud Storage bucket previously created
Query the data
I suggest to take a look at the multiple methods for loading data in BigQuery, as well as the methods for querying against external data sources which may help to determine which database replication/export method might be best for your use case.
This is the currently situation:
I've created an external table in Bigquery against json in Cloud Storage.
I'm testing how it works regarding to the schema auto-detect.
When I create the table, there were 2 json files with different schemas, and Bigquery does it well.
When I load a new file with a new schema (adding a new attribute to a record field), Bigquery recognizes the new record, but this new field doesn't appear. So the schema auto-detect doesn't work as I expected.
How can I get schema auto-detect when new files arrives to my cloud storage folder?
Any help?
Culprit: AFAIK the auto-schema detection happens when you create a table, and not updated as you add new files.
Possible solution:
Re-create the tables when new files arrive.
Straightforward implementation:
Add a pub/sub notification on GCS for new arriving files, have a Google Cloud Function that re-creates the table trigger on this.
I'm working on a BI project and I want to retrieve data from ServiceNow and load it to Pentaho Data Integration so I can record it in my data warehouse, and I want to do this regulary, in other words I want to retrieve the new records regulary from servicenow , only the new ones that haven't been loaded yet to the data warehouse, someone knows how can I acheive my goal? Help me please
The question is too vague.
You need to set up an ETL job that incrementally loads data. That will require you to define a timestamp or incremental key to identify which records are more recent than the ones already loaded.
You will need to schedule that job, e.g., using crontab and calling kitchen from the command line.
Your question pretty much translates to "please develop my ETL project". Too wide in scope.