The database we use is structured in a way that we have a table for Training Programs, one for Training Events linked to the Training Program one, then another one for Training Event Activities that are linked to the Training Event. There's a lot more but that's the basics.
I joined the training event and activity tables together, then the training program to the training event tables.
In my select statement I did two counts, one of TrainingEvent.guTrainingProgramId (linked column that relates to TrainingProgram.rowguid). If I remove all instances of Training Event Activity I get what seems to be an accurate represenation of a count of the number of events for each entry in the Training Program table. However once I add in the count of TrainingEventActivity.guTrainingEventId (link to Training Event table) I get an accurate count of the total number of activities, however the coun in the TrainingEvent count changes to reflect the count of activities.
I essentially want to know a count of the number of times that TrainingEvent.guTrainingProgramId = TrainingProgram.rowguid, and a second count where TrainingEventActivity.guTrainingEventId = TrainingEvent.rowguid.
What am I doing wrong?
Thanks!
Without seeing your sql, it's hard to tell. But my initial thought would be you'll need to do a count distinct instead of count on the id. Oh well. Let's see the SQL.
Related
good afternoon. What is the point, the train has a geotag that determines its position in space. Location data is entered into a table. It is required to calculate how many times the train was at a certain point. But the problem is that being at a certain point, the geotag leaves several entries in the table by time. I wrote a query that allows you to count the number of arrivals, but the problem is that it works if you take only one train in the selection of values, and if you take several entries in a row, the query already counts incorrectly. Below I will attach a table and a written query
table
query1
if you select several train numbers, the values are confused and are considered as one
Now I have this request, it counts the number of arrivals, but it counts incorrectly, if instead of several trains you specify only one in the selection, everything will be correct, what is my mistake ???
query2
I think you should use group by for address and zone.
By "train"| you mean adress?
Imagine you have to display information about rainfall based on cities over time.
You have tables the provides the details on how much it rains in a specific city for every hour. There is an endpoint that returns the average amount of rainfall for the timeframe/city requested.
(so imagine a table called rainfall_california, rainfall_texas, etc... I realize this schema isn't ideal for rainfall, but using it for an example.)
So instead of calculating the average on each request, I setup a continuous aggregate to calculate the average into a new view and have a policy to refresh the last hour of data once every hour.
ca_texas_rainfall_1_day
ca_texas_rainfall_7_day
ca_texas_rainfall_30_day
ca_california_rainfall_1_day
ca_california_rainfall_7_day
ca_california_rainfall_30_day
This works great and is super fast, but I'm a little confused on the best way to set it up. Should I have a different view for each continuous aggregate and each city? Wouldn't that result in a ton of different views? Or should I consolidate the average of each table into a single view?
Recently I created an automated production scheduling tool through Excel that assigns a rank to items being produced in the same process, and then uses that rank in combination with the workload to create a schedule.
It functions exactly the way it is intended to, but due to the large amount of data and it being excel it has very slow performance, which is why I am looking to move the calculations over to SQL.
The general logic is like this:
-Always produce everything from the first day before the second day
-Always produce items from an earlier rank before items from a later rank
You can see how this plays out in the image below, where the line has 21.5 hours today, so items will be produced on day 1 until it equals 21.5, where the remainder is then carried over to day 2 and so on.
I was able to do this in excel using lengthy positional based formulas, but I am trying to think of a way to get the same result in SQL without having to rely on looking at the row above.
I am not sure how to convey something like 'Subtract from the available time production time of higher priority items produced on the same day'.
I apologize if the question is unclear, but any advice would be appreciated.
Image of Production Hours Cascading by Priority and Day
Example of Position-Based Fomula
Thanks to shawnt00, that put me in the right direction. Ultimately I had to modify the case statements a bit to go off of the cumulative total instead, but I was able to get the desired results using a sum() Over (partition by order by ) statement.
Well, I'm absolute newbie in Google Data Studio, but for any reason, my grand totals rows is not working.
I'm learning to use this tool, and I made an easy table with just countries and sessions.
Piece of Cake. Now I just want to add a total row where it sums all sessions. That's all. I activated option Show Summary Row but it shows nothing.
Thing's I've done and not worked:
Update and refresh
Changed time period and tried different dates just in case.
Delete and create again full table.
Checked connection. I get data and the data is right, I just cannot sum it.
Changed size and format of table, just in case it where a problems or margins or font color.
And I know it can be done, because different sources. I've read this question here:
Grand Total is wrong in Google Data Studio
But it did not help. In that question, a user posted an image in the comments:
As you can see, he managed to get what I'm trying to do.
So I must be doing something wrong, and I do not why.
UPDATE 2: If I apply a filter, I get no totals. You can see my config in the right side of image.
Can anybody give me a clue of how to make a grand totals row in Google Data Studio?
Thanks
Sounds like a bug. It should be a case of selecting that tick box. Strangely, I looked at an existing table I have with totals and when I unticked the box and then ticked again, the totals didn't reappear and disappeared off another table on the page (like your example). They did reappear eventually with some refreshing of the data and page but seems like there's something wrong with them.
I don't think this is a bug I think it part of the design.
I actually just discovered the reason this is happening at least for me, it doesn't actually sum the values in the table, the grand total summary of a table is a sum of whatever the metric being used is not the actual rows shown in the chart. so if you have a dimension (like age / gender) where there is data thresholding applied internally by google but are using a metric such as users you will see the grand total from the metric value without the thresholding applied from the dimension.
Proof below
You can see the grand total for column 2 is not 953.6 its 453.6 and if i look at a non threshold dimension (country)
you can see where the 953.6 comes from since the data source supplied to the table uses 80% of all users 1192 * .8 give me 953.6 which is what the grand total is displaying. Conclusion, the only way this number could be possible is if, when using a threshold dimension for a table with metric there will be a discrepancy since the grand total value is not coming from the table values but rather from metric source data, which will not have the tables dimension applied for some odd reason.
We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.