Calculating incremental differences in a given column

Calculating incremental differences in a given column - sql

i was searching web and stackoverflow but didn,t find an answer. :( So please help me i am still learning and reading, but i am not yet thinking correctly, there are no IF and FOR LOOPs to do stuff. :)
I have table1:
id| date |state_on_date|year_quantity
1|30.12.2013|23 |100
1|31.12.2013|25 |100
1|1.1.2014 |35 |150
1|2.1.2014 |12 |150
2|30.12.2013|34 |200
2|31.12.2013|65 |200
2|1.1.2014 |43 |300
I am trying to get:
table2:
id| date |state_on_date|year_quantity|state_on_date_compare
1|30.12.2013| 23 |100 |23
1|31.12.2013| 25 |100 |-2
1|1.1.2014 | 35 |150 |-10
1|2.1.2014 | 12 |150 |23
2|30.12.2013| 34 |200 |34
2|31.12.2013| 65 |200 |-31
2|1.1.2014 | 43 |300 |22
Rules to get numbers:
id|date |state_on_date|year_quantity|state_on_date_compare
1|30.12.2013| 23 |100| 23 (lowest state_on_date for id 1)
1|31.12.2013| 25 |100| -2 (23-25)
1| 1.1.2014| 35 |150|-10 (25-35)
1| 2.1.2014| 12 |150| 23 (35-12)
2|30.12.2013| 34 |200| 34 (lowest state_on_date for id 2)
2|31.12.2013| 65 |200|-31 (34-65)
2| 1.1.2014| 43 |300| 22 (65-43)
Thanks in advace for every suggestion or solution you will make.

You have to understand that SQL is misleading because of presentation issues. Like in The Matrix ("there is no spoon"), in a query there is no previous record.
SQL is based on set theory, for which there IS NO ORDER of records. All records are just set members. The theory behind SQL is that anything you do normally should be considered as though you are doing it to ALL RECORDS AT THE SAME TIME! The fact that a datasheet view of a SELECT query shows record A before record B is an artifact of presentation - not of actual record order.
In fact, the records returned by a query are in the same order as they appear in a table UNLESS you have included a GROUP BY or ORDER BY clause. And the order of record appearance in a table is usually the order in which they were created UNLESS there is a functional primary key on that table.
However, both of these statements leave you with the same problem. There is no SYNTAX for the concepts of NEXT and PREVIOUS because it is the CONCEPT of order that doesn't exist in SQL.
VBA recordsets, though based on SQL as recordsources, create an extra context that encapsulates the SQL context. That is why VBA can do what you want and SQL itself cannot. It is the "extra" context in which VBA can define variables holding what you wanted to remember until another record comes along.
Having now rained on your parade, here are some thoughts that MIGHT help.
When you want to see "previous record" data, there MUST be a way for Access to find what you consider to be the "previous record." Therefore, if you have not allowed for this situation, it is a design flaw. (Based on you not realizing the implications of SET theory, which is eminently forgivable for new Access users, so don't take it too hard.) This is based on the "Old Programmer's Rule" that says "Access can't tell you anything you didn't tell it first." Which means - in practical terms - that if order means something to you, you must give Access the data required to remember and later impose that order. If you have no variable to identify proper order with respect to your data set, you cannot impose the desired order later. In this case, it looks like a combination of id and date together will give you an ordering variable.
You can SOMETIMES do something like a DLookup in a query where you look for the record that would precede the current one based on some order identifier.
e.g. if you were ordering by date/time fields and meant "previous" to imply the record with the next earlier time than the record in focus, you would choose the record with the maximum date less than the date in focus. Look at the DMax function. Also notice I said "record in focus" not "current record." This is a fine point, but "Current" also implies ordering by connotation. ("Previous" and "Next" imply order by denotation, a stronger definition.)
Anyway, contemplate this little jewel:
DLookup( "[myvalue]", "mytable", "[mytable]![mydate] = #" & CStr( DMax( "[mydate]", "mytable", "[mytable]![mydate] < #" & CStr( [mydate] ) & "# )" ) & "#" )
I don't guarantee that the parentheses are balanced for the functions and I don't guarantee that the syntax is exactly right. Use Access Help on DLookup, DMax, Cstr, and on strings (in functions) in order to get the exact syntax. The idea is to use a query (implied by DMax) to find the largest date less than the date in focus in order to feed a query (implied by DLookup) to find the value for the record having that date. And the CStr converts the date/time variable to a string so you can use the "#" signs as date-string brackets.
IF you are dealing with different dates for records with different qualifiers, you will also have to include the rest of the qualifies in BOTH the DMax and DLookup functions. That syntax gets awfully nasty awfully fast. Which is why folks take up VBA in the first place.

Johnny Bones makes some good points in his answer, but in fact there is a way to have Access SQL perform the required calculations in this case. Our sample data is in a table named [table1]:
id date state_on_date year_quantity
-- ---------- ------------- -------------
1 2013-12-20 23 100
1 2013-12-31 25 100
1 2014-01-01 25 150
1 2014-01-02 12 150
2 2013-12-30 34 200
2 2013-12-31 65 200
2 2014-01-01 43 300
Step 1: Determining the initial rows for each [id]
We start by creating a saved query in Access named [StartDatesById] to give us the earliest date for each [id]
SELECT id, MIN([date]) AS MinOfDate
FROM table1
GROUP BY id
That gives us
id MinOfDate
-- ----------
1 2013-12-30
2 2013-12-30
Now we can use that in another query to give us the initial rows for each [id]
SELECT
table1.id,
table1.date,
table1.state_on_date,
table1.year_quantity,
table1.state_on_date AS state_on_date_compare
FROM
table1
INNER JOIN
StartDatesById
ON table1.id = StartDatesById.id
AND table1.date = StartDatesById.MinOfDate
which gives us
id date state_on_date year_quantity state_on_date_compare
-- ---------- ------------- ------------- ---------------------
1 2013-12-30 23 100 23
2 2013-12-30 34 200 34
Step 2: Calculating the subsequent rows
This step begins with creating a saved query named [PreviousDates] that uses a self-join on [table1] to give us the previous dates for each row in [table1] that is not the first row for that [id]
SELECT
t1a.id,
t1a.date,
MAX(t1b.date) AS previous_date
FROM
table1 AS t1a
INNER JOIN
table1 AS t1b
ON t1a.id = t1b.id
AND t1a.date > t1b.date
GROUP BY
t1a.id,
t1a.date
That query gives us
id date previous_date
-- ---------- -------------
1 2013-12-31 2013-12-30
1 2014-01-01 2013-12-31
1 2014-01-02 2014-01-01
2 2013-12-31 2013-12-30
2 2014-01-01 2013-12-31
Once again, we can use that query in another query to derive the subsequent records for each [id]
SELECT
curr.id,
curr.date,
curr.state_on_date,
curr.year_quantity,
prev.state_on_date - curr.state_on_date AS state_on_date_compare
FROM
(
table1 AS curr
INNER JOIN
PreviousDates
ON curr.id = PreviousDates.id
AND curr.date = PreviousDates.date
)
INNER JOIN
table1 AS prev
ON prev.id = PreviousDates.id
AND prev.date = PreviousDates.previous_date
which returns
id date state_on_date year_quantity state_on_date_compare
-- ---------- ------------- ------------- ---------------------
1 2013-12-31 25 100 -2
1 2014-01-01 35 150 -10
1 2014-01-02 12 150 23
2 2013-12-31 65 200 -31
2 2014-01-01 43 300 22
Step 3: Combining the results of steps 1 and 2
To combine the results from the previous two steps we just include them both in a UNION query and sort by the first two columns
SELECT
table1.id,
table1.date,
table1.state_on_date,
table1.year_quantity,
table1.state_on_date AS state_on_date_compare
FROM
table1
INNER JOIN
StartDatesById
ON table1.id = StartDatesById.id
AND table1.date = StartDatesById.MinOfDate
UNION ALL
SELECT
curr.id,
curr.date,
curr.state_on_date,
curr.year_quantity,
prev.state_on_date - curr.state_on_date AS state_on_date_compare
FROM
(
table1 AS curr
INNER JOIN
PreviousDates
ON curr.id = PreviousDates.id
AND curr.date = PreviousDates.date
)
INNER JOIN
table1 AS prev
ON prev.id = PreviousDates.id
AND prev.date = PreviousDates.previous_date
ORDER BY 1, 2
returning
id date state_on_date year_quantity state_on_date_compare
-- ---------- ------------- ------------- ---------------------
1 2013-12-30 23 100 23
1 2013-12-31 25 100 -2
1 2014-01-01 35 150 -10
1 2014-01-02 12 150 23
2 2013-12-30 34 200 34
2 2013-12-31 65 200 -31
2 2014-01-01 43 300 22

I hope this would be helpful
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/calculating-mean-median-and-mode-with-sq
you can use select * from table1 into table2 where specify your conditions,I am not sure whetehr this would work

Related

How to LEFT JOIN on ROW_NUM using WITH

Right now I'm in the testing phase of this query so I'm only testing it on two Queries. I've gotten stuck on the final part where I want to left join everything (this will have to be extended to 12 separate queries). The problem is basically as the title suggests--I want to join 12 queries on the created Row_Num column using the WITH() statement, instead of creating 12 separate tables and saving them as table in a database.
WITH Jan_Table AS
(SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jan_Rev
FROM ba.SALE_TABLE a
WHERE a.SALE_DATE BETWEEN '2015-01-01' and '2015-01-31'
GROUP BY a.SALE_DATE)
SELECT ROW_NUMBER() OVER (ORDER BY a.SALE_DATE) as Row_ID, a.SALE_DATE, sum(a.revenue) as Jun_Rev, j.Jan_Rev
FROM ba.SALE_TABLE a
LEFT JOIN Jan_Table j
on "j.Row_ID" = a.Row_ID
WHERE a.SALE_DATE BETWEEN '2015-06-01' and '2015-06-30'
GROUP BY a.SALE_DATE
And then I get this error message:
ERROR: column "j.Row_ID" does not exist
I put in the "j.Row_ID" because the previous message was:
ERROR: column a.row_id does not exist Hint: Perhaps you meant to
reference the column "j.row_id".
Each query works individually without the JOIN and WITH functions. I have one for every month of the year and want to join 12 of these together eventually.
The output should be a single column with ROW_NUM and 12 Monthly Revenues columns. Each row should be a day of the month. I know not every month has 31 days. So, for example, Feb only has 28 days, meaning I'd want days 29, 30, and 31 as NULLs. The query above still has the dates--but I will remove the "SALE_DATE" column after I can just get these two queries to join.
My initially thought was just to create 12 tables but I think that'd be a really bad use of space and not the most logical solution to this problem if I were to extend this solution.
edit
Below are the separate outputs of the two qaruies above and the third table is what I'm trying to make. I can't give you the raw data. Everything above has been altered from the actual column names and purposes of the data that I'm using. And I don't know how to create a dataset--that's too above my head in SQL.
Jan_Table (first five lines)
Row_Num Date Jan_Rev
1 2015-01-01 20
2 2015-01-02 20
3 2015-01-03 20
4 2015-01-04 20
5 2015-01-05 20
Jun_Table (first five lines)
Row_Num Date Jun_Rev
1 2015-06-01 30
2 2015-06-02 30
3 2015-06-03 30
4 2015-06-04 30
5 2015-06-05 30
JOINED_TABLE (first five lines)
Row_Num Date Jun_Rev Date Jan_Rev
1 2015-06-01 30 2015-01-01 20
2 2015-06-02 30 2015-01-02 20
3 2015-06-03 30 2015-01-03 20
4 2015-06-04 30 2015-01-04 20
5 2015-06-05 30 2015-01-05 20

It seems like you can just use group by and conditional aggregation for your full query:
select day(sale_date),
max(case when month(sale_date) = 1 then sale_date end) as jan_date,
max(case when month(sale_date) = 1 then revenue end) as jan_revenue,
max(case when month(sale_date) = 2 then sale_date end) as feb_date,
max(case when month(sale_date) = 2 then revenue end) as feb_revenue,
. . .
from sale_table s
group by day(sale_date)
order by day(sale_date);
You haven't specified the database you are using. DAY() is a common function to get the day of the month; MONTH() is a common function to get the months of the year. However, those particular functions might be different in your database.

Median over time - SQL Server

I have data (support ticket statistics, to be exact), that I am trying to get the median of, over time.
Right now, I have a query that calculates the difference between the date the ticket was opened and the date it was closed. I take that data and pivot the average of the days to close over a series of months and years, using SQL Servers built in AVG function. This works well, however I am finding that the metrics are skewed due to outliers in the data.
What I really want, is the Median of the data, pivoted over months and years. I am having trouble achieving what I am after, and I am not positive if this is even possible.
The query I have right now, using the AVG function, is:
SELECT 'Support - Days To Close Escalation', *
FROM
(
SELECT
DATEDIFF(HOUR, e.CreatedDate, e.Escalation_Close_Date_Time__c) AS DaysToCloseEscalation,
LEFT(CONVERT(CHAR(10), e.CreatedDate,126), 7) AS EscalationCreateDate
FROM [dbo].[Escalations] AS e WITH(NOLOCK)
LEFT JOIN [dbo].[Case] AS c WITH(NOLOCK)
ON e.Case__c = c.Id
WHERE e.Escalation_Queue__c IN ('PM 10 Tier 2 Support', 'PM 11 Tier 2 Support')
AND e.CreatedDate BETWEEN '2017-04-01 00:00:00.000' AND '2018-04-01 00:00:00.000'
AND e.Escalation_Close_Date_Time__c IS NOT NULL
) AS SupportEscalationVolume
PIVOT
(
AVG(SupportEscalationVolume.DaysToCloseEscalation) FOR SupportEscalationVolume.EscalationCreateDate IN ([2017-04],[2017-05],[2017-06],[2017-07],[2017-08],[2017-09],[2017-10],[2017-11],[2017-12],[2018-01],[2018-02],[2018-03])
) AS SupportEscalationVolumePivot
The result of this query is something along the lines of (except all in a single row, since the data is pivoted):
StatDescription | Support - Days To Close Escalation
----------------------------------------------------
2017-04 | 107
2017-05 | 52
2017-06 | 101
2017-07 | 106
2017-08 | 69
2017-09 | 54
2017-10 | 49
2017-11 | 42
2017-12 | 51
2018-01 | 31
2018-02 | 23
2018-03 | 15
After some research on how to pull of a Median in SQL, I have resorted to using DENSE_RANK(), as shown in the query below. I started with ROW_NUMBER(), but that gave me a counter for ALL records, where what I really want is a median of the time to close the ticket for each month/year grouping.
;
WITH SupportDaysToClose(HoursToCloseEscalation, EscalationCreateDate, RowNumber)
AS
(
SELECT
DATEDIFF(HOUR, e.CreatedDate, e.Escalation_Close_Date_Time__c) AS HoursToCloseEscalation,
LEFT(CONVERT(CHAR(10), e.CreatedDate,126), 7) AS EscalationCreateDate,
DENSE_RANK() OVER(ORDER BY LEFT(CONVERT(CHAR(10), e.CreatedDate,126), 7) ASC) AS RowNumber
FROM [dbo].[Escalations] AS e WITH(NOLOCK)
LEFT JOIN [dbo].[Case] AS c WITH(NOLOCK)
ON e.Case__c = c.Id
WHERE e.Escalation_Queue__c IN ('PM 10 Tier 2 Support', 'PM 11 Tier 2 Support')
AND e.CreatedDate BETWEEN '2017-04-01 00:00:00.000' AND '2018-04-01 00:00:00.000'
AND e.Escalation_Close_Date_Time__c IS NOT NULL
)
SELECT *
FROM SupportDaysToClose
ORDER BY RowNumber,HoursToCloseEscalation
A sample of this data looks like
HoursToClose|CreateDate|RowNumber
---------------------------------
0 |2017-04 |1
7 |2017-08 |5
27 |2017-12 |9
Each RowNumber correlates to a given month and year, the maximum being at 12.
At this point, I am not really sure where to go.
Has anyone ever done anything like this before? I am not sure if I am on the right track or if I need to rethink the whole strategy. I apologize in advance if the syntax is difficult to follow.

Report on a point in time

I am about to create what I assume will be 2 new tables in SQL. The idea is for one to be the "live" data and a second which would hold all the changes. Dates are in DD/MM/YYYY format.
Active
ID | Name | State Date | End Date
1 Zac 1/1/2016 -
2 John 1/5/2016 -
3 Sam 1/6/2016 -
4 Joel 1/7/2016 -
Changes
CID | UID | Name | Start Date | End Date
1 1 Zac 1/1/2016 -
2 4 Joel 1/1/2016 -
3 4 Joel - 1/4/2016
4 2 John 1/5/2016 -
5 3 Sam 1/6/2016 -
6 4 Joel 1/7/2016 -
In the above situation you can see that Joel worked from the 1/1/2016 until the 1/4/2016, took 3 months off and then worked from the 1/7/2016.
I need to build a query where by I can pick a date in time and report on who was working at that time. The above table only lists the name but there will be many more columns to report on for a point in time.
What would be best way to structure the tables to be able to achieve this query.

I started writing this last night and finally coming back to it. Basically you would have to use your change table to create a Slowly Changing Dimension and then generate a row number to match your start and ends. This will assume however that your DB will never be out of sync by adding 2 start records or 2 end records in a row.
This also assumes you are using a RDBMS that supports common table expressions and Window Functions such as SQL Server, Oracle, PostgreSQL, DB2....
WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY UID ORDER BY ISNULL(StartDate,EndDate)) As RowNum
FROM
Changes c
)
SELECT
s.UID
,s.Name
,s.StartDate
,COALESCE(e.EndDate,GETDATE()) as EndDate
FROM
cte s
LEFT JOIN cte e
ON s.UID = e.UID
AND s.RowNum + 1 = e.RowNum
WHERE
s.StartDate IS NOT NULL
AND '2016-05-05' BETWEEN s.StartDate AND COALESCE(e.EndDate,GETDATE())

How to find out if a field's values for a given item are only increasing?

For this problem, I'm using Access as a front end for SQL Server, and calling Access through Excel VBA, although I can use a direct ADO connection if there are some T-SQL specific functions that would be more useful here.
I have a table that logs state changes for a set of items, e.g.:
+-------+-------+------------+
| docID | state | date |
+-------+-------+------------+
| 103 | 5 | 10/15/2013 |
| 103 | 6 | 10/18/2013 |
| 102 | 3 | 10/22/2013 |
| 103 | 2 | 11/1/2013 |
| 102 | 7 | 11/8/2013 |
+-------+-------+------------+
For each unique docID, I want to figure out whether its state is only increasing from first date to last date, or if it ever decreases. In the above data set, 103 decreases and 102 only increases. We can assume that the entries will be in date order.
One way to find this would be to create an object for each docID and add these objects to a dictionary, loading each state change into a list and checking to see whether the state has decreased, something like this:
function isDecreasing(cl as changeList) as boolean
for c=2 to cl.count
if cl.item(c).state < cl.item(c-1).state then
isDecreasing=true
exit function
end if
next
isDecreasing=false
end function
But this will slow my query down a lot because I'll have to convert all the table data into objects. It also means I'll have to write a lot of additional code to create and manage the objects for calculating and generating reports.
Is there any way to write a query in SQL Server or Access that can perform the same type of analysis on the whole data set?

In his otherwise excellent answer, Gordon Linoff said:
You have a problem using Access-only functionality
Really?
For the given data, which I've put in a table called [StateChanges]:
docID state date
----- ----- ----------
103 5 2013-10-15
103 6 2013-10-18
102 3 2013-10-22
103 2 2013-11-01
102 7 2013-11-08
I can create the following saved query in Access named [PreviousDates]
SELECT t1.docID, t1.date, MAX(t2.date) AS PreviousDate
FROM
StateChanges t1
INNER JOIN
StateChanges t2
ON t2.docID = t1.docID
AND t2.date < t1.date
GROUP BY t1.docID, t1.date
It returns
docID date PreviousDate
----- ---------- ------------
102 2013-11-08 2013-10-22
103 2013-10-18 2013-10-15
103 2013-11-01 2013-10-18
Then I can use the following query to identify the [docID]'s where the [state] went down
SELECT curr.docID
FROM
(
PreviousDates pd
INNER JOIN
StateChanges curr
ON curr.date = pd.date
)
INNER JOIN
StateChanges prev
ON prev.date = pd.PreviousDate
WHERE curr.state < prev.state
It returns
docID
-----
103
In fact, both queries are so simple that we can combine them into a single query that does the whole thing in one shot:
SELECT curr.docID
FROM
(
(
SELECT t1.docID, t1.date, MAX(t2.date) AS PreviousDate
FROM
StateChanges t1
INNER JOIN
StateChanges t2
ON t2.docID = t1.docID
AND t2.date < t1.date
GROUP BY t1.docID, t1.date
) PreviousDates
INNER JOIN
StateChanges curr
ON curr.date = PreviousDates.date
)
INNER JOIN
StateChanges prev
ON prev.date = PreviousDates.PreviousDate
WHERE curr.state < prev.state
So where's the problem?

You have a problem using Access-only functionality. But, if you have SQL Server 2012, you can use lead()/lag() functionality. There is another way, just using row_number(), which is available since SQL Server 2005.
Here is the idea. Enumerate the rows within each docId first by state and also by date. If the enumerations are the same, then the sequence is non-decreasing (essentially increasing). If different, then there is a bump in the road. Here is the code:
select docid,
(case when sum(case when rn_ds <> rn_sd then 1 else 0 end) = 0 then 'Increasing'
else 'Decreasing'
end) as SequenceType
from (select d.*,
row_number() over (partition by docId order by date, state) as rn_ds,
row_number() over (partition by docId order by state, date) as rn_sd
from d
) d
group by docid;
Note that I've made the sort a little more complicated by using both fields. This handles the case when two dates in a row have the same state (probably not allowed, but might as well make the technique more stable).

Question:
For each unique docID, I want to figure out whether its state is only increasing from first date to last date, or if it ever decreases.
So what you want to know is, for a given record a, does there exist a record b where the date of a is earlier but the state of b is lower.
So just ask that.
select docID
from T a
where
exists (
select 1 from T b where b.date > a.date and b.state < a.state
)

How to build a daily history from a set of rows in SQL?

I have a simple SQLite table that records energy consumption all day long. It looks like that:
rowid amrid timestamp value
---------- ---------- ---------- ----------
1 1 1372434068 5720
2 2 1372434075 0
3 3 1372434075 90
4 1 1372434078 5800
5 2 1372434085 0
6 3 1372434085 95
I would like to build a simplified history of the consumption of the last day by getting the closest value for every 10 minutes to build a CSV which would look like:
date value
---------------- ----------
2013-07-01 00:00 90
2013-07-01 00:10 100
2013-07-01 00:20 145
As for now I have a request that allows me to get the closest value for one timestamp:
SELECT *
FROM indexes
WHERE amrid=3
ORDER BY ABS(timestamp - strftime('%s','2013-07-01 00:20:00'))
LIMIT 1;
How can I build a request that would do the trick to get it for the whole day?
Thanks,

Let me define "closest value" as the first value after each 10-minute interval begins. You can generalize the idea to other definitions, but I think this is easiest to explain the idea.
Convert the timestamp to a string value of the form "yyyy-mm-dd hhMM". This string is 15 characters long. The 10-minute interval would be the first 14 characters. So, using this as an aggregation key, calculate the min(id) and use this to join back to the original data.
Here is the idea:
select isum.*
from indexes i join
(select substr(strftime('%Y-%m-%d %H%M', timestamp), 1, 14) as yyyymmddhhm,
min(id) as whichid
from indexes
group by substr(strftime('%Y-%m-%d %H%M', timestamp), 1, 14)
) isum
on i.id = isum.whichid

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas