Does fauna have any function to know the size of the child database.
I want to display the usage size of my database in fauna. Is this possible?
There is no FQL function to acquire a database's size.
However, the Dashboard does prevent size metrics for child databases. From the Dashboard "Home" page, click on a database that contains child databases, and you'll see in the lower metrics panel a list of child databases and their storage usage.
Related
I've enabled BigQuery Export in Google Analytics 4, but on inspection I noticed that roughly half of the events were missing in the raw data (in my case those were sign_up events that also had user_id as a parameter).
When inspecting the event stats in the standard GA4 report I noticed that "Data Thresholding" was applied when inspecting stats pertaining to the said event. My understanding is that when thresholding is applied, GA4 omits certain events from exporting, although I can't be sure of that.
Is there a way to make sure all the data gets exported to BigQuery?
I want to know about the way of saving BigQuery data capacity with changing setting of Data Portal(Google BI tool/old name:Data Studio).
The reason is I can't execute SQL or defray the much cost , if I don't save my BigQuery data capacity .
I want to know the way is not used Changing BigQuery Setting(contain of change SQL code) , but Data Protal setting.
Because , the dashboard in data portal continue to use BigQuery data capacity , I can't solve my problem ,even if I change the SQL code.
My situations is below:
My situations:
1.I made a "view" in my BigQuery Enviroment.
I tried to make the query not to use a lot of BigQuery data capacity.
For example , I didn't use "SELECT * FROM ...".
I set the view to "data sorce" in the data portal.
And I made the dashboard using the "data sorce".
If someone open the dashboard , the view I made is executed.
And , BigQuery data capacity is used every time that someone open the dashboard.
If I'm understanding correctly, you're wanting to reduce the amount of data processed in BigQuery from your Data Studio (or in Japan, Data Portal) reports.
There are a few ways to do this:
Make sure that the "Enable Cache" option is checked in the report settings.
Avoid using BigQuery views as a query source, as these aren't cached at the BigQuery level (the view query is run every time, and likely many times per report for various charts). Instead, use a Custom Query connection or pull the table data directly to allow caching. Another option (which we use heavily) is to run a scheduled query that saves the output of a view as a table and replaces it regularly (or is triggered when the underlying data is refreshed). This way your queries can be cached, but the business logic can still exist within the view.
Create a BI Engine reservation in BigQuery. This adds another level of caching to Data Studio reports, and may give you better results for things that can't be query-cached or cached in Data Studio. (While there will be a cost to the service in the future based on the size of instance you reserve, it's free during their beta period.)
Don't base your queries on tables with a streaming buffer attached (even if it hasn't received rows recently), uses wildcard tables in the query, or is based on an external dataset (e.g. file in Cloud Storage or BigTable). See Caching Exceptions for details.
Pull as little data as possible by using the new Data Source Parameters. This means you can pass the values of your date range or other filters directly to BigQuery and filter the data before it reaches your report. This is especially helpful if you have a date-partitioned table, as you can only scan the needed partitions (which greatly reduces processing and the amount of data returned)
Also, sometimes it seems like you're moving a lot of data but that doesn't always relate to a high cost. Check your cost breakdowns or look at the logging filtered to the user your data source authenticates as, then see how much cost that's incurred. Certain operations fall under a free tier, and others don't result in cost for non-egress use cases like Data Studio. All that to say that you may want to make sure there's a cost problem at the BigQuery level in the first place before killing yourself trying to optimize the usage.
Let's assume we have a ticketing system web page where are displayed tickets (tickets are distributed on multiple pages). Also, in the same page there is a search form which allows filtering.
Those tickets can be modified anytime (delete,update,insert).
So i'm a bit confused. How should the internal architecture look?I've been thinking for a while and I haven't found a clear path.
From my point of view there are 2 ways:
use something like an in-memory database and store all the data there. So it's very easy to filter content and to display the requested items. But this solution implies storing a lot of useless data in ram. Like tickets closed or resolved. And those tickets should be there because they can be requested.
use database for every search, page display, etc. So there will be a lot of queries. Every search, every page (per user) will result in a database query. Isn't this a bit too much ?
Which solution is better? Are there any better solutions ? Are my concerns futile?
You said "But this solution implies storing a lot of useless data in ram. Like tickets closed or resolved. And those tickets should be there because they can be requested."
If those tickets should be there because they can be requested, then it's not really useless data, is it?
It sounds like a good use case for a hybrid in-memory/persistent database. Keep the open/displayed tickets in in-memory tables. When closed, move them to persistent tables.
I am working on a web app and would like to be able to display some computed statistics about different objects to the user. Some good examples would be: The "Top Questions" page on this site - SO - which lists "Votes", "Answers", and "Views" or the "Comment" and "Like" counts for a list of posts on the Facebook News Feed. Actually computed values like these are used all over the web and I am not sure the best way to implement them.
I will describe in greater detail a generic view of the problem. I have a single parent table in my database (you can visualize it as a blog post). This table has a one-to-many realtionship with 5 other tables (visualize it as comments and likes etc.). I need to diplay a list of ~20 parent table objects with the counts of each related child object (visualize it as a list of blog posts each displaying the total number of coments and total number of likes). I can think of multiple ways to tackle the problem, but I am not sure which would be the FASTEST and most ACCURATE.
Here are a number of options I have come up with...
A) SQL Trigger - Create a trigger to increment and decrement a computed count column on the parent table as the child tables have inserts and deletes performed. In not sure about performance tradeoffs running the trigger every time a small child object is created or deleted. I am also unsure about potential concurrency issues (although in my current architecture each row in the db can only be added or deleted by the row creator).
B) SQL View - Just an easier way to query and will yield accurate results, but I am worried about the performance implications for this type of view.
C) SQL Indexed View - An indexed view would be accurate and potentially faster, but as each child table has rows that can be added or removed, the view would constatntly have to be recalculated.
D)Cached Changes - Some kind of interim in process solution that would cache changes to the child tables, computed net changes to the counts, and flush them to the db based on some parameter. This could potentially be coupled with a process that checks for accuracy every so often.
E) Something awesome I haven't thought of yet :) How does SO keep track of soooo many stats??
Using SQL Seerver2008R2
**Please keep in mind that I am building something custom and it is not a blog/FB/SO, I am just using them as an exaple of a similar problem so suggesting to just use those sites is unhelpful ;) I would love to hear how this is accomplished in live web apps that handle a decent volume of requests.
THANKS in Advance
What is the best way of MDX querying for each drill down of BI dash board chart? as a example if you have four drill level every drill down we should execute four MDX query or execute only one query in the initial time, and keep all data of four drill levels in object collection. If you can please explain with a example.
This depends a lot on what tool you are using to display the BI Dashboard. Is it SSRS, PerformancePoint, something else?
Pull all the data in the initial MDX query, configure the Dashboard software to display the top level of detail and provide user with options for drilldown. As users drilldown, unhide the next level of detail. This option only requires 1 roundtrip to the database. So intially loading the dashboard may be a bit slower, but drilldown experience will be very fast (since the data has already be retrieved).
Pull just the top level of detail in the initial MDX query, configure the Dashboard software to display results and provide users with options for drilldown. As users drilldown, Dashboard software will send another MDX query to retrieve the next level of detail from your data source. This option will require multiple roundtrips to the database...one for the intial top-level of detail when the user first loads the dashboard, and another for each time the user drills down.
Either option will work but you'll need to make the call on which option best suits your needs after weighing the pros and cons...
how fast is the network between your dashboard and the datasource?
how much concurrency can your data-source handle?
how "big" is the query to pull everything?
how important is speed to your users?
be sure and test each if you are unsure.