Getting the Delta between the rows on SQL

Getting the Delta between the rows on SQL - sql

I need to get the difference between the row on the same column. I can use lag()over but how can I get the result like the pic, if the name changed then start counting as new

Something like this:
select t.*,
(call_received -
lag(call_received, 1, 0) over (partition by agent_name order by id)
) as delta
from t;
Note that this uses the rarely seen three-argument form of lag().

Related

Get first record based on time in PostgreSQL

DO we have a way to get first record considering the time.
example
get first record today, get first record yesterday, get first record day before yesterday ...
Note: I want to get all records considering the time
sample expected output should be
first_record_today,
first_record_yesterday,..

As I understand the question, the "first" record per day is the earliest one.
For that, we can use RANK and do the PARTITION BY the day only, truncating the time.
In the ORDER BY clause, we will sort by the time:
SELECT sub.yourdate FROM (
SELECT yourdate,
RANK() OVER
(PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
In the main query, we will sort the data beginning with the latest date, meaning today's one, if available.
We can try out here: db<>fiddle
If this understanding of the question is incorrect, please let us know what to change by editing your question.
A note: Using a window function is not necessary according to your description. A shorter GROUP BY like shown in the other answer can produce the correct result, too and might be absolutely fine. I like the window function approach because this makes it easy to add further conditions or change conditions which might not be usable in a simple GROUP BY, therefore I chose this way.
EDIT because the question's author provided further information:
Here the query fetching also the first message:
SELECT sub.yourdate, sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Or if only the message without the date should be selected:
SELECT sub.message FROM (
SELECT yourdate, message,
RANK() OVER (PARTITION BY DATE_TRUNC('DAY',yourdate)
ORDER BY DATE_TRUNC('SECOND',yourdate)) rk
FROM yourtable
) AS sub
WHERE sub.rk = 1
ORDER BY sub.yourdate DESC;
Updated fiddle here: db<>fiddle

Group by and calculation from value on the next row

I'm quite new to sql server. I can't seem to figure this out. I have a table that looks like this.
I need to be able to calculate the percentage change in the number for each name, for each year, in the column p. So the end result should look like this.

You can easily calculate the % difference using lag()
select name, date, number,
Cast(((number * 1.0) - Lag(number,1) over(partition by name order by date))/ Lag(number,1) over(partition by name order by date) * 100 as int)
from table

Stuck on what seems like a simple SQL dense_rank task

Been stuck on this issue and could really use a suggestion or help.
What I have in a table is basic user flow on a website. For every Session ID, there's a page visited from start (lands on homepage) to finish (purchase). This has been ordered by timestamp to get a count of pages visited during this process. This 'page count' has also been partitioned by Session ID to go back to 1 every time the ID changes.
What I need to do now is assign a step count (highlighted is what I'm trying to achieve). This should assign a similar count but doesn't continue counting at duplicate steps (ie, someone visited multiple product pages - it's multiple pages but still only one 'product view' step.
You'd think this would be done using a dense rank, partitioned by session id - but that's where I get stuck. You can't order on page count because that'll assign a unique number to each step count. You can't order by Step because that orders it alphabetically.
What could I do to achieve this?
Screenshot of desired outcome:
Many thanks!

Use lag to see if two values are the same then a cumulative sum:
select t.*,
sum(case when prev_cs = custom_step then 0 else 1 end) over (partition by session_id order by timestamp) as steps_count
from (select t.*,
lag(custom_step) over (partition by session_id order by timestamp) as prev_cs
from t
) t

Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(flag),
COUNTIF(IFNULL(flag, TRUE)) OVER(PARTITION BY session_id ORDER BY timestamp) AS steps_count
FROM (
SELECT *,
custom_step != LAG(custom_step) OVER(PARTITION BY session_id ORDER BY timestamp) AS flag
FROM `project.dataset.table`
)
-- ORDER BY timestamp

SQL Grabbing unque counts per category

I'm pretty new to SQL and Redshift, but there is a weird problem I'm getting.
So my data looks like below. Ignore id, date_time actual values... I just put random info, but its the same format
id date_time(var char 255)
1 2019-01-11T05:01:59
1 2019-01-11T05:01:59
2 2019-01-11T05:01:59
3 2019-01-11T05:01:59
1 2019-02-11T05:01:59
2 2019-02-11T05:01:59
I'm trying to get the number of counts of unique ID's per month.
I've tried the following command below. Given the amount of data, I just tried to do a demo of the first 10 rows of my table...
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 10*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7), id
I expect something like below.
count left
3 2019-01
2 2019-02
But I'm instead getting similar to what's below
I then tried the below command which seems correct.
SELECT COUNT(DISTINCT id),
LEFT(date_time,7)
FROM ( SELECT top 1000000*
FROM myTable.ME )
GROUP BY LEFT(date_time, 7)
However, if you remove the DISTINCT portion, you get the results below. It seems like it is only looking at a certain month (2019-01), rather than other months.
If anyone can tell me what is wrong with the commands I'm using or can give me the correct command, I'll be very grateful. Thank you.
EDIT: Could it possibly be because maybe my data isn't clean?

Why are you using a string for the date? That is simply wrong. There are built-in types. But assuming you have some reason or cannot change it, use string functions:
select left(date_time, 7) as yyyymm,
count(distinct id)
from t
group by yyyymm
order by yyyymm;
In your first query you have id in the group by which does not do what you want.

Calculated column reference in DB2

I have a table with the columns:date1,name and price. What i want to do is to add 2 columns one right having the minimum and maximum dates of the consecutive dates of the same name.
I have written the following query that explains the rule:
select date1,name,price
case
when lag(name,1) over(order by date1 ASC,name ASC)=name then lag(minDate,1) over(order by date1 ASC,name ASC)
else date1
end as minDate,
case
when lag(name,1) over(order by date1 DESC,name DESC)=name then lag(maxDate,1) over(order by date1 DESC,name DESC)
else date1
end as maxDate
from MyTable order by date1 ASC,name ASC
My problem is that i get an "invalid context for minDate/maxDate" (SQLCODE=-206, SQLSTATE=42703)
Why can't i refer to a calculated column? Is there any other way?

It's complaining about the lag(maxDate,1), because maxDate isn't defined in that scope; it's not a column in MyTable, and aliases aren't available until after the SELECT list finishes in DB2 (they become available by pushing this into a subquery, or in clauses like HAVING).
Incidentally, your query can be better written as the following:
SELECT date1, name, price,
LAG(date1, 1, date1) OVER(PARTITION BY name ORDER BY date1) AS minDate,
LEAD(date1, 1, date1) OVER(PARTITION BY name ORDER BY date1) AS maxDate
FROM MyTable
ORDER BY date1, name
(I've left out ASC, as it's the default for all ordering)
LEAD(...) is the opposite function to LAG(...), and looks one row ahead. Using both functions this way allows the optimizer to compute just one window (what's specified in the OVER(...)).
The third parameter to the windowing functions here is a default value - in the case there wasn't a next/previous row, it returns date1 from the current row.
PARTITION BY is essentially a grouping for windowing function, and takes the place of the CASE ... WHEN ... in your original query. (In general, I find that such constructs reflect mentality common to imperative programming, which tends to run counter to the set-based nature of SQL. There are usually better ways to do things)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Getting the Delta between the rows on SQL - sql

I need to get the difference between the row on the same column. I can use lag()over but how can I get the result like the pic, if the name changed then start counting as new

Something like this: select t.*, (call_received - lag(call_received, 1, 0) over (partition by agent_name order by id) ) as delta from t; Note that this uses the rarely seen three-argument form of lag().

Related

Get first record based on time in PostgreSQL

Group by and calculation from value on the next row

Stuck on what seems like a simple SQL dense_rank task

SQL Grabbing unque counts per category

Calculated column reference in DB2

Categories

Resources