There is a table of M rows, with a column which records the date and time the row was updated and a numeric column to record the SCORE of this row.
QUERY:
Find the X most recently updated rows and compute the average SCORE.
EDITED:
Further, let's break it up into 2 querys:
Find the X most recently updated rows.
Compute the average SCORE of the rows returned in 1.
What is the time-complexity of query 1 and query 2?
Related
I have a set of records in BigQuery with a variable (CPIRating) that I would like to use to select a subset from.
CPIRating is an integer with a range from 0.1 to 250. I have over 10,000 records. What I am trying to create is a single subset/dataset of all the records where
It selects all records that have a CPIRating of 3.0 or greater
counts the number of records that have a CPIRating of 3.0 or greater
selects 4x that count of CPIRating 3.0 or greater and from the records that are less than 3.0 adds that number of records to the dataset, but does so from the lowest CPIRating value
As example, if the dataset has 1000 records with a CPIrating of 3.0 or greater, the query finds those, but also adds a further 4000 records (4x) that are below 3.0, but the 4000 records starts with the lowest CPIRating value (closest to 0.0) and adds those until it reaches the 4000.
Any ideas on how to structure that query in BigQuery?
First we generate some dummy data in table demo_tbl. Since CPIRating is normal distributed in this example, we choose values between zero and 3.2 as a maximum.
In the table help we calculate the rows, which have a CPIRating of 3 or higher. from demo_tbl,help joins both tables together and we obtain an additional column CPIRating_count. We numerate the rows by ascending CPIRating and create a row_number. Since this is a window function with over no where but a qualify clause is needed to filter the rows. In this filter the CPIRating<3.0 is not needed, but I find it easier to read.
With demo_tbl as (Select *, rand() *3.2 as CPIRating from unnest(generate_array(0,1*100)) id),
help as (select count(1) as CPIRating_count from demo_tbl where CPIRating>=3.0)
Select *,
row_number() over (order by CPIRating) as row_id
from demo_tbl,help
qualify (row_id < 4*help.CPIRating_count and CPIRating<3.0) or CPIRating>=3.0
order by row_id desc
The column CPIRating_count can also be generated by a window function instead of an join.
How do I take the following table:
and make it so the Amount 3 column subtracts from the remaining amount in the row above?
Basically, I know I can do Amount 1 - Amount 2 to get the difference, but if I have multiple values I am trying to subtract from an original value, how can I write a SQL function so the Amount 2 column is added to the cumulative remaining balance in the above Amount 3 column to have a new cumulative remaining amount?
I'm assuming it's some sort of LAG function, but I still need help.
I have a table as below.
Somehow I want to calculate the idle time between End Date and Start Date.
Example: Idle time between 11:38:30 with 11:40:08, 11:49:35 with 12:00:19.
The problem is the station index number 5. It has a null value in 2 columns so I want to calculate base on the previous row
If you only need one value for each station, than grouping will work:
select
StationIndex,
max(Statoin_End_Date) - min(Station_Start_Date) as Duration
from table_name
group by StationIndex
Otherwise please explain logic in greater details.
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3. My current query is:
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
where Multiple > 3
and SqlUnixTime > 1492225582
group by ID_BB_RT;
I have a table cdsData and the unix time is april 15th converted. Finally I want the group by calculated within the ID as I show. I'm not sure why it's failing but it says that the field Multiple is unknown in the where clause.
I am try to calculate the average since the last time stamp and pull all records where the average is greater than 3.
I think your intention is correctly stated as follows, "I am trying to calculate the average since the last time stamp and select all rows where the average is greater than 3 times the individual bid".
In fact, a still better restatement of your objective would be, "I want to select all rows since the last time stamp, where the bid is less than 1/3rd the average bid".
For this, the steps are as follows:
1) A sub-query finds the average bid divided by 3, of rows since the last time stamp.
2) The outer query selects rows since the last time stamp, where the individual bid is < the value returned by the sub-query.
The following SQL statement does that:
SELECT BID
FROM cdsData
WHERE SqlUnixTime > 1492225582
AND BID <
(
SELECT AVG(BID) / 3
FROM cdsData
WHERE SqlUnixTime > 1492225582
)
ORDER BY BID;
1)
SQL is evaluated backwards, from right to left. So the where clause is parsed and evaluate prior to the select clause. Because of this the aliasing of AVG(BID)/BID to Multiple has not yet occurred.
You can try this.
SELECT AVG(BID)/BID AS Multiple
FROM cdsData
WHERE SqlUnixTime > 1492225582
GROUP BY ID_BB_RT Having (AVG(BID)/BID)>3 ;
Or
Select Multiple
From (SELECT AVG(BID)/BID AS Multiple
FROM cdsData
Where SqlUnixTime > 1492225582 group by ID_BB_R)X
Where multiple >3
2)
Once you corrected the above error, you will be having one more error:
Column 'BID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
To correct this you have to insert BID column in group by clause.
I have one problem in PostgreSQL.
This is my table (this table does not showing all data in image).
What is my requirement is:
Step 1 : find count of value (this is a column in table) Order by value for today date. So it will be like this and I did it.
Step 2 : find count of value for last 30 days starting from today. I am stuck here. Also one another thing is included in this step --
Example : today has 10 count for a value - kash, this will be 10x30,
yesterday had 4 count for the same value , so will be 4x29, so the total sum would be
(10x30) + (4x29) = 416.
This calculation is calculated for each and every value.
This loop execute for 30 times (as I said before last 30 days starting from today). Take today as thirtieth day.
Query will just need to return two columns with value and sum, ordered by the sum.
Add a WHERE clause to your existing query:
WHERE Timestamp > current_date - interval '30' day;
As far as ordering by the sum, add an ORDER BY clause.
ORDER BY COUNT(*) DESC.
I do not believe that you will need a loop (CURSOR) for this query.