The SchemaRefreshTime in icCube shows the last date/time a schema has been refreshed, including a load from an offline cube.
This could give the strange situation that you "restore" an old situation (using an offline cube) but the end-users will see in the last refresh time the date the cube was loaded. Not desirable. I would like them to show that the data in fact is old (namely the date/time of the data in the offline cube).
Is it possible to show the time the data was actually refreshed with data from a "real" source, and when you load data from an offline cube, show the data the contents of the offline cube was created?
Unfortunately, this information is not available in the snapshot. You can follow this issue (www).
Related
I am trying to create a dashboard app for my company that displays data from a few different sources that they use. I am starting with an in house system that stores data in MSSQL. I'm struggling to decide how I can display real time (or at least updated regularly) data based on this database.
I was thinking of writing a node server to poll the company database and check for updates and then store a copy of the relevant tables in my own database. Then creating another node server that computes metrics (average delivery time, Turnover, etc.) from my database and then a frontend (probably react) to display these metrics nicely and trigger the logic in the backend whenever the page is loaded by a user.
This is my first project so just need some guidance on whether this is the right way to go about this or if I'm over complicating it.
Thanks
One of solutions is to implement a CRON job in nodejs or in you frontEnd side, then you can retrieve new Data inserted to your Database.
You can reffer to this link for more information about the CROB job :
https://www.npmjs.com/package/cron
if you are using MySQL, you can use the mysql-events listner, it is a MySQL database and runs callbacks on matched events.
https://www.npmjs.com/package/mysql-events
Basically does any one know how to ask for delta changes that happened after certain time. I am saving all the changes that user has done to planner objects to the database, but I know eventually delta changes for 100 of plans will go insanely huge. GET /me/planner/all/delta GET /users/{id}/planner/all/delta. Does any one knows how to filter delta response with given time. My plan is to query delta after certain time.
It could be in any object that delta works. Right now I can bring all the delta changes but I do not see how I can ask for changes that happend after certain time.
Delta only works with the tokens presented in the links, it is not time based (we do not store it based on time internally). It is also best-effort, which means at certain time the delta changes will be cleared and the clients will be forced to read objects again to be in sync. So even if there was a time based query, there wouldn't be a guarantee that you can access older data.
What is your scenario? Some kind of history tracking or auditing?
As far as I know, nope. I have to cycle on all Planner plans and tasks in them to get the details. I am currently saving the planner task details to sharepoint and instead of updating it I am just deleting all old records and recreating them.
That makes sense, I was saving the deltas so that in future I could tell which user modified what planner objects. Since Microsoft has not implemented an audit trail for planner objects yet. Storing delta Link was just for my possible future rollback processes.
I realized deltaLink does not expire it is just using delta token to find the future changes from the time the delta was queried. Basically, I am requesting Microsoft teams to have some kind of audit trail for Planner objects changes(at least for who changed at what time) so we can query those activities and have those specific individuals held responsible for unwanted changes that they made. For instance, changing the due date of planner tasks
I want to know about the way of saving BigQuery data capacity with changing setting of Data Portal(Google BI tool/old name:Data Studio).
The reason is I can't execute SQL or defray the much cost , if I don't save my BigQuery data capacity .
I want to know the way is not used Changing BigQuery Setting(contain of change SQL code) , but Data Protal setting.
Because , the dashboard in data portal continue to use BigQuery data capacity , I can't solve my problem ,even if I change the SQL code.
My situations is below:
My situations:
1.I made a "view" in my BigQuery Enviroment.
I tried to make the query not to use a lot of BigQuery data capacity.
For example , I didn't use "SELECT * FROM ...".
I set the view to "data sorce" in the data portal.
And I made the dashboard using the "data sorce".
If someone open the dashboard , the view I made is executed.
And , BigQuery data capacity is used every time that someone open the dashboard.
If I'm understanding correctly, you're wanting to reduce the amount of data processed in BigQuery from your Data Studio (or in Japan, Data Portal) reports.
There are a few ways to do this:
Make sure that the "Enable Cache" option is checked in the report settings.
Avoid using BigQuery views as a query source, as these aren't cached at the BigQuery level (the view query is run every time, and likely many times per report for various charts). Instead, use a Custom Query connection or pull the table data directly to allow caching. Another option (which we use heavily) is to run a scheduled query that saves the output of a view as a table and replaces it regularly (or is triggered when the underlying data is refreshed). This way your queries can be cached, but the business logic can still exist within the view.
Create a BI Engine reservation in BigQuery. This adds another level of caching to Data Studio reports, and may give you better results for things that can't be query-cached or cached in Data Studio. (While there will be a cost to the service in the future based on the size of instance you reserve, it's free during their beta period.)
Don't base your queries on tables with a streaming buffer attached (even if it hasn't received rows recently), uses wildcard tables in the query, or is based on an external dataset (e.g. file in Cloud Storage or BigTable). See Caching Exceptions for details.
Pull as little data as possible by using the new Data Source Parameters. This means you can pass the values of your date range or other filters directly to BigQuery and filter the data before it reaches your report. This is especially helpful if you have a date-partitioned table, as you can only scan the needed partitions (which greatly reduces processing and the amount of data returned)
Also, sometimes it seems like you're moving a lot of data but that doesn't always relate to a high cost. Check your cost breakdowns or look at the logging filtered to the user your data source authenticates as, then see how much cost that's incurred. Certain operations fall under a free tier, and others don't result in cost for non-egress use cases like Data Studio. All that to say that you may want to make sure there's a cost problem at the BigQuery level in the first place before killing yourself trying to optimize the usage.
Been looking around for quite a bit to see if someone could provide me with any directions and/or tests to fix this issue. Unsuccessful so far.
I'm working on a clients multidimensional cube (they have several in the same warehouse), and have created my own development copy from that exact cube so i don't break anything in production, while developing.
The issue is that whenever i edit my cube, and then deploy it removes the data from the cube, and in some programs the cube disappears all together. The cube itself is still visible in SSMS but contains no data.
I then have to do a full process of the entire database to get data back, which is rather annoying given it takes around 30-40 minutes where i then cannot work on it and its a minor change i've made (such as changing the Order property of a dimension from Name to Key or creating a Measure group)
Some settings/extra info:
When i deploy i have specified the cube to Do Not Process due to some prior processing issues when processing from BIDS
I have a delta process to keep data up to date, that runs continuously and doesn't fail. It moves no data to the failed cube however, but other cubes present works just fine.
In script view the first mdx statement under calculations is a calculate statement as some source suggested could be an issue if not.
It is deployed from VS 2008 (clients version)
Deploying to Localhost
The view upon which some dimensions are built, contain Union statements, but only contain a few records
Scenarios where it fails:
Refresh data source view
Create new dimension
Change dimension properties
Create measure groups
Updating dimensions
Properly more that i either haven't tested or can't remember
Does anyone have any idea of the issue and how to fix it? I really appreciate it if someone could point me in the right direction. I haven't found a solution yet.
Well, this is expected behaviour. SSAS creates aggregations during processing; in case the structure of the cube/dimension is changed then the existing aggregations become invalid and the entire cube goes into the "Unprocessed" state. As you have found out yourself you need to do the full process then to be able to browse the cube.
Here's a blog post with the list of actions and their effect on the state of the cube: http://bimic.blogspot.com/2011/08/ssas-which-change-makes-cubedimension.html
I suggest you create a small data set for the development purposes and test the cube on that data before moving to production. You can also limit the data loaded to the cube by switching to the query (instead of the table) in the partition designer; in the query you can then use WHERE condition to limit the records loaded to the cube and make the processing go faster.
I need to synchronize between two data sources:
I have a web service running on then net. It continuously gathers data from the net and stores it on the database. It also provides the data to the client based on the client's request. I want to keep a repository of data as object for faster service.
On the client side, there is a windows service that calls the web service mentioned previously and synchronize its local database to the server.
Few of my restrictions:
The web service has very small buffer limit and it can only transfer less then 200 records per call which is not enough for data collected in a day.
I also can't copy the database files since the database structure is very different (sql and other is access)
The data is being updated on a hourly basis and there will be large amount of data that will be needed to be transfer.
Sync by date or other group is not possible with the size limitation. Paging can be done but the remote repository keeps changing (and I don't know how to take chunk of data from the middle of table of SQL database)
How do I use the repository for recent data update/or full database in sync with this limitation?
A better approach for the problem or an improvement of the current approach will be taken as the right answer
You mentioned that syncing by date or by group wouldn't work because the number of records would be too big, but what about syncing by date (or group or whatever) and then paging by that? The benefit is that you will have a defined batch of records and you can now page over that because that group won't change.
For example, if you need to pull data off hourly, as each hour elapses (so, when it goes from 8:59am to 9:00 am), you begin pulling down the data that was added between 8am and 9am in chunks of 200 or whatever size the service can handle.