how would i get the average of a previous date and update it? - sql

I want to write a query that will have the average(that wont be hard) but when I get that average I want to save it somewhere. Let's I have a average save from last month table_a.last_month_average. And now I run the query again and this would be the current_month_average. I want to compare this two columns and see if the current_month_average increase from last_month_average.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
Is this possible in sql? or maybe there is a better way to do this?any suggestions will help.

After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
By my understanding, this operation is to select maximum month_average from all history records. So you don't need to keep a record of current_month_average and last_month_average. Instead, a table of all history month average is helpful. Assume there is table named monthaverage with columns (Id, Month, Average), you can query
SELECT TOP 1 T1.*
, CASE WHEN
T1.Average > (SELECT TOP 1 T2.Average
FROM monthaverage T2
WHERE T2.Month < T1.Month
ORDER BY Month DESC)
THEN 'Increased'
ELSE 'Not Increased'
END
FROM monthaverage T1
ORDER BY T1.Average DESC
If you have chance to run it from SQL SERVER 2012, you can leverage LAST_VALUE function. Query is like
SELECT TOP 1 *, CASE WHEN Average > LAST_VALUE(Average) OVER (ORDER BY Month) THEN 'Increased' ELSE 'Not Increased' END
FROM monthaverage
ORDER BY Average DESC

Related

How to write SQL statement to select for data broken up for each month of the year?

I am looking for a way to write an SQL statement that selects data for each month of the year, separately.
In the SQL statement below, I am trying to count the number of instances in the TOTAL_PRECIP_IN and TOTAL_SNOWFALL_IN columns when either column is greater than 0. In my data table, I have information for those two columns ("TOTAL_PRECIP_IN" and "TOTAL_SNOWFALL_IN") for each day of the year (365 total entries).
I want to break up my data by each calendar month, but am not sure of the best way to do this. In the statement below, I am using a UNION statement to break up the months of January and February. If I keep using UNION statements for the remaining months of the year, I can get the answer I am looking for. However, using 11 different UNION statements cannot be the optimal solution.
Can anyone give me a suggestion how I can edit my SQL statement to measure from the first day of the month, to the last day of the month for every month of the year?
select monthname(OBSERVATION_DATE) as "Month", sum(case when TOTAL_PRECIP_IN or TOTAL_SNOWFALL_IN > 0 then 1 else 0 end) AS "Days of Rain" from EMP_BASIC
where OBSERVATION_DATE between '2019-01-01' and '2019-01-31'
and CITY = 'Olympia'
group by "Month"
UNION
select monthname(OBSERVATION_DATE) as "Month", sum(case when TOTAL_PRECIP_IN or TOTAL_SNOWFALL_IN > 0 then 1 else 0 end) from EMP_BASIC
where OBSERVATION_DATE between '2019-02-01' and '2019-02-28'
and CITY = 'Olympia'
group by "Month"```
Your table structure is too unclear to tell you the exact query you will need. But a general easy idea is to build the sum of your value and then group by monthname and/or by month. Sice you wrote you only want sum values greater 0, you can just put this condition in the where clause. So your query will be something like this:
SELECT MONTHNAME(yourdate) AS month,
MONTH(yourdate) AS monthnr,
SUM(yourvalue) AS yoursum
FROM yourtable
WHERE yourvalue > 0
GROUP BY MONTHNAME(yourdate), MONTH(yourdate)
ORDER BY MONTH(yourdate);
I created an example here: db<>fiddle
You might need to modify this general construct for your concrete purpose (maybe take care of different years, of NULL values etc.). And note this is an example for a MYSQL DB because you wrote about MONTHNAME() which is in most cases used in MYSQL databases. If you are using another DB type, maybe you need to do some modifications. To make sure that answers match your DB type, tag it in your question, please.

How to flag consecutive shifts in SQL efficiently

I have a dataset that contains 250,000 rows and is expected to grow at around 100,000 rows a month.
I have data that contains the following columns:
ShiftDate (Day a shift occurred on),
Shift Start Time,
Shift End Time
and Employee Number.
I would like to flag consecutive shifts with a 1 when an Employees Shift End Time was within 4 hours of the start time of their next shift, otherwise flag it with a 0.
my data table
I have tried running a query that joins the table to itself but the run time is too long. I was planning to create the flag based on a case statement using 'NextStart':
select shiftdate,
shiftstarttime,
shiftendtime,
EmployeeID,
(select min(t2.shiftstarttime) from TABLE t2 where t1.EmployeeID=t2.EmployeeID and T2.shiftstarttime > t1.Shiftendtime) as NextStart
from
TABLE t1
I would love to know a more efficient way of trying to do this.
Thanks!
Select shiftdate, shiftstarttime, shiftendtime, employeeid,
(Case when
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime) - shiftendtime < 4 then 1 else 0 end) as consecutive_shift_flag
from table_name
In this query lead() window function is used to get the next shift start time for that employee
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime)
In case this is not what you are looking for then please share the Sample of correct output based on input data for couple of cases.

Returning single records per month

I have a use case function that needs to returns a single row only for every end of month.
I tried using select distinct and it is showing multiple records for the same end of month
SELECT DISTINCT CASE
WHEN eff_interest_balance < 0.01 THEN trial_balance_date
WHEN date_paid < trial_balance_date THEN date_paid
END as A
, period
FROM dbo.Intpayments[enter image description here][1]
WHERE loan_number = 60023
ORDER BY period ASC
Each row should return single date for each month
Distinct is returning unique rows, not grouping them. You are looking to aggregate rows. This means using some combination of aggregate functions and group by.
What your current query is missing is some sort of logic for aggregating the rows that are in the same period. Do you want to compare the sum of these values? The min, the max?
In any case, the basic idea of aggregating and grouping would look like this - I don't think this summing is what you want, but the query shows the basic idea of aggregating and grouping:
SELECT
period
, SUM(eff_interest_balance) AS SumOfBalance
FROM dbo.Intpayments
WHERE loan_number = 60023
GROUP BY period

SQL: Average value per day

I have a database called ‘tweets’. The database 'tweets' includes (amongst others) the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss), ‘classified’ and 'processed text'. Within the ‘processed text’ row there are certain strings such as {TICKER|IBM}', to which I will refer as ticker-strings.
My target is to get the average value of ‘classified’ per ticker-string per day. The row ‘classified’ includes the numerical values -1, 0 and 1.
At this moment, I have a working SQL query for the average value of ‘classified’ for one ticker-string per day. See the script below.
SELECT Date( `created_at` ) , AVG( `classified` ) AS Classified
FROM `tweets`
WHERE `processed_text` LIKE '%{TICKER|IBM}%'
GROUP BY Date( `created_at` )
There are however two problems with this script:
It does not include days on which there were zero ‘processed_text’s like {TICKER|IBM}. I would however like it to spit out the value zero in this case.
I have 100+ different ticker-strings and would thus like to have a script which can process multiple strings at the same time. I can also do them manually, one by one, but this would cost me a terrible lot of time.
When I had a similar question for counting the ‘tweet_id’s per ticker-string, somebody else suggested using the following:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG,
coalesce(BAC, 0) AS BAC
FROM dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
This script worked perfectly for counting the tweet_ids per ticker-string. As I however stated, I am not looking to find the average classified scores per ticker-string. My question is therefore: Could someone show me how to adjust this script in such a way that I can calculate the average classified scores per ticker-string per day?
SELECT d.date, t.ticker, COALESCE(COUNT(DISTINCT tweet_id), 0) AS tweets
FROM dates d
LEFT JOIN
(SELECT DATE(created_at) AS date,
SUBSTR(processed_text,
LOCATE('{TICKER|', processed_text) + 8,
LOCATE('}', processed_text, LOCATE('{TICKER|', processed_text))
- LOCATE('{TICKER|', processed_text) - 8)) t
ON d.date = t.date
GROUP BY d.date, t.ticker
This will put each ticker on its own row, not a column. If you want them moved to columns, you have to pivot the result. How you do this depends on the DBMS. Some have built-in features for creating pivot tables. Others (e.g. MySQL) do not and you have to write tricky code to do it; if you know all the possible values ahead of time, it's not too hard, but if they can change you have to write dynamic SQL in a stored procedure.
See MySQL pivot table for how to do it in MySQL.

Get a Row if within certain time period of other row

I have a SQL statement that I am currently using to return a number of rows from a database:
SELECT
as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE
(ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
I am wanting to do a query that will get the row with the oldest DateScanned from this query and also get another row from the database if there was one that was within a certain period of time from this row (say 5 seconds for an example). The oldest record would be relatively simple by selecting the first record in a descending sort, but how would I also get the second record if it was within a certain time period of the first?
I know I could do this process with multiple queries, but is there any way to combine this process into one query?
The database that I am using is SQL Server 2008 R2.
Also please note that the DateScanned times are just placeholders and I am taking care of that in the application that will be using this query.
Here is a fairly general way to approach it. Get the oldest scan date using min() as a window function, then use date arithmetic to get any rows you want:
select t.* -- or whatever fields you want
from (SELECT as1.AssetTagID, as1.TagID, as1.CategoryID,
as1.Description, as1.HomeLocationID, as1.ParentAssetTagID,
min(DateScanned) over () as minDateScanned, DateScanned
FROM Assets AS as1
INNER JOIN AssetsReads AS ar ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
) t
where datediff(second, minDateScanned, DateScanned) <= 5;
I am not really sure of sql server syntax, but you can do something like this
SELECT * FROM (
SELECT
TOP 2
as1.AssetTagID,
as1.TagID,
as1.CategoryID,
as1.Description,
as1.HomeLocationID,
as1.ParentAssetTagID ,
ar.DateScanned,
LAG(ar.DateScanned) OVER (order by ar.DateScanned desc) AS lagging
FROM
Assets AS as1
INNER JOIN AssetsReads AS ar
ON as1.AssetTagID = ar.AssetTagID
WHERE (ar.ReadPointLocationID='Readpoint1' OR ar.ReadPointLocationID='Readpoint2')
AND (ar.DateScanned between 'LastScan' AND 'Now')
AND as1.TagID!='000000000000000000000000'
ORDER BY
ar.DateScanned DESC
)
WHERE
lagging IS NULL or DateScanned - lagging < '5 SECONDS'
I have tried to sort the results by DateScanned desc and then just the top most 2 rows. I have then used the lag() function on DateScanned field, to get the DateScanned value for the previous row. For the topmost row the DateScanned shall be null as its the first record, but for the second one it shall be value of the first row. You can then compare both of these values to determine whether you wish to display the second row or not
more info on the lagging function: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/