While querying the sum of a set of data, it takes only 30 rows while there is 31 rows, not sure what am i missing here?

While querying the sum of a set of data, it takes only 30 rows while there is 31 rows, not sure what am i missing here? - google-bigquery

There is set of consumer data for 31 days,
I queried the sum of the consumer data against the consumer id which is common
output is not correct
I tried messing around, did not work

Related

IBM Cognos Analytics Selecting top 2 from a dataset

I'm working on a report where I need data for current week, and the week before, and compare these two. I have a week column in my data, which are transactions, So my data looks something like:
Amount - Week
13 - 01
19 - 01
11 - 02
10 - 02
13 - 02
12 - 03
18 - 03
15 - 04
And I want to this as a result from the two most recent weeks and sum of Amount:
Week 03: 30
Week 04: 15
Now it easy to get the most recent week, just a maximum (Week for report), but when I want to select the 2nd largest I'm getting stuck.
I've tried to do a filter that is basically "Maximum( case when week = maximum(week) then null else week)", but either I have not figured out the syntax or I this approach does not work.
Other alternative which I tired was the rank() feature and then a query which selects rank in (1, 2) but for whatever reason I couldn't get this approach to work and only got the error
The function "to_char" is being used for local processing but is not available as a built-in function, or at least one of its parameters is not supported.
Which I believe has something to do with the aggregation (multiple records per occurence of week). Anyway I'm kind of stuck and the error messages aren't giving me any clues. Would very much appreicate some help!

RANK should work fine, but it may not work well if you try to get Cognos to do all of the work in one place. I thought I could filter on the ranked data item and set the Application property to After auto aggregation. But I got strange results.
Rather than trying to create one complicated solution, try breaking the problem into smaller, simpler components.
Define Query1
Data Items:
Week = [namespace].[query subject].[Week]
Amount = [namespace].[query subject].[Amount] with the detail aggregation set to Total
Rank = rank([namespace].[query subject].[Week])
Create Query2 and set Query1 as its source.
Data Items:
[Query1].[Week]
[Query1].[Amount]
Detail Filters:
[Query1].[Rank] <= 2
Use Query2 as the source for your list.

SUM of last 24 hour scores within a specific range in a sorted set (Redis)

Is there a way to calculate the SUM of scores saved under 24 hour respecting performance of the Redis server ? (For around 1Million new rows added per day)
What is the right format to use in order to store timestamp and score of users using sorted sets ?
Actually I am using this command:
ZADD allscores 1570658561 20
As score, it is the actual time in seconds ... and other field is the real score.
But, there is a problem here ! When another user get the same score (20), it is not added since it's already present - Any solution for this problem ?
I am thinking to use a LUA script, but there is 2 headaches:
The LUA script will block other commands from working until it is finished the job (Which is not a good practice for my case since the script have to work 24/24 7/7 meanwhile many users have to fetch datas in the same time from the Redis cache server like users scores, history infos ect.) - Plus, the LUA script have to deal each time with many records saved each day inside a specific key - So, while the Lua script is working, users can't fetch datas ... knowing that the Lua script will work in loop all time.
Second, it is related to the first problem that do not let me store same score if I use timestamp as score in the command so I can return 24 hour datas.
If you are in my case, how will you deal with this ? Thanks

Considering that the data is needed for last 24 hours(Sliding window) and the number of rows possible is 1 million. We cannot use sorted set data structure to compute sum with high performance.
High performance design and also solving your duplicate score issue:
Instead with a little decision on the accuracy, you can have a highly performant system by crunching the data within a window.
Sample Input data:
input 1: user 1 wants to add time: 11:10:01 score: 20
input 2: user 2 wants to add time: 11:11:02 score: 20
input 3: user 1 wants to add time: 11:17:04 score: 50
You can have 1 minute, 5 minutes or 1 hour accuracy and decide window based on that.
If you accept an approximation of 1 hour data, you can have this while insertion,
for input 1 :
INCRBY SCORES_11_hour 20
for input 2:
INCRBY SCORES_11_hour 20
for input 3:
INCRBY SCORES_11_hour 20
To get the data for last 24 hours, you need to sum up only 24 hourly keys.
MGET SCORES_previous_day_12_hour SCORES_previous_day_13_hour SCORES_previous_day_14_hour .... SCORES_current_day_10_hour SCORES_current_day_11_hour
If you accept an approximation of 5 minutes, you can have this while insertion, along with incrementing the hourly keys, you need to store the 5 minute window data.
for input 1 :
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_00_minutes 20
for input 2:
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_00_minutes 20
for input 3:
INCRBY SCORES_11_hour 20
INCRBY SCORES_11_hour_05_minutes 20
To get the data for last 24 hours, you need to sum up only 23 hour keys(whole hours data) + 12 five minute window keys
If the time added is based on the current time, you can optimize it further. (Assuming that if it is 11th hour and the data for 10th, 9th and the previous hours wont change at all).
As you told it is going to be 24/7, we can use some computed values from the previous iterations too.
Say it is computed on 11th hour, you would've got the values for past 24 hours.
If it is again computed on 12th hour, you can reuse the sum for 22 intermediate hours whose data is unchanged and get only the missing 2 hours data from redis.
Similarly further optimisations can be applied based on your need.

Postgresql- divide sum by total already in table

I have a table with several time intervals as rows with one "total" row. I have four columns; car, bus, truck, and total, that refer to the number of vehicles leaving a warehouse at each time interval by category and the total number of vehicles at each time interval. My table looks like this:
time car truck bus total
12-6am 10 15 10 35
7am-12pm 8 12 8 28
Total 18 27 18 63
I want to create a percent total row that takes the total value in each row (35 and 28) and divides it by the maximum value in the total row (63).
How do I do this?

If you look at the schema of your table, it doesn't make sense to have an extra row in it, but only an extra column.
However, even that is a bad idea. A database is not a spreadsheet, where you have largely free-form data. It's a collection of tables. Total rows should be calculated with SELECT statements, not make some attempt to have them in the table. Unlike a spreadsheet, Postgres won't auto-update that as rows are added and deleted. (Note: Yes, sometimes you need to materialize this summary stuff for efficiency, but that's the advanced course.)

Reuse logic to query data based on date filter

I have logic in place to pull records based on date. For example i have to check if a record that appeared in a week has also occurred in next 14 days then that records need to be flagged. So basically i have put self join to get that record.
Now i have to pull record for 3 months and see if that record appeared again but logic will be same(in next 14 days), so ideally i have to change date filter in query for every week and get data, is there a way i can do it in same query and get full 3 months data
let me know if more clarification required.

SQL Query to find instances within 60 days of an earlier instance

I have a table with Result Dates, Result Values and User IDs. I am looking for a way to find the people (number and specifics) that had result values greater than a number (20) that ALSO had a follow up result within 60 days of their initial result.
I have no problem getting the list of people with the result greater than 20, just don't know how to also find the people that were re-tested within 60 days and have a result.
I am very, very new to SQL, not sure what else is needed to help...thanks!!!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas