How to get data from previos year? - sql

Here my base sample
I need get data from previous period with lag in Hello table
Could you help me?
+------+--------+------+-------+
| Year | Animal | Plus | Hello |
+-------+------+--------+------+
| 2 | Cat | 3 | |
| 2 | Dog | 4 | |
| 2 | Mouse | 5 | |
| 3 | Cat | 5 | 3 |
| 3 | Dog | 6 | 4 |
| 3 | Mouse | 6 | 5 |
| 3 | Horse | 6 | |
| 3 | Pig | 6 | |
| 3 | Goose | 6 | |
| 4 | Cat | | 5 |
| 4 | Dog | | 6 |
| 4 | Mouse | | 6 |
| 4 | Horse | | 6 |
| 4 | Pig | | 6 |
+-------+------+--------+------+

You are looking for LAG. This function looks into previous rows.
select
place, year, animal, plus,
lag(plus) over (partition by animal order by year) as hello
from mytable
order by year, animal;
The "previous" row is the closest previous one, i.e. if for ' Goose' there are rows for year 3 and 5 and none for year 4, then year 3 would be considered the previous row for year 5 and LAG would show that value.
If you really want the adjacent previous year, i.e. year - 1, then you can select this year as follows:
select
place, year, animal, plus,
(
select plus
from mytable prev_year
where prev_year.animal = mytable.animal
and prev_year.year = mytable.year - 1)
) as hello
from mytable
order by year, animal;
Same thing with an outer join:
select
t.place, t.year, t.animal, t.plus, prev_year.plus as hello
from mytable t
left join mytable prev_year on prev_year.animal = t.animal
and prev_year.year = t.year - 1
order by t.year, t.animal;

Related

Rolling Average in sqlite

I want to calculate a rolling average in a table and keep track of the starting time of each calculated window frame.
My problem is, that I expect result count reduced compared of the rows in the table. But my query retuns the exact same row number. I think I understand why it does not work, but I don't know the remedy.
Let's say I have a table with example data that looks like this:
+------+-------+
| Tick | Value |
+------+-------+
| 1 | 1 |
| 2 | 3 |_
| 3 | 5 |
| 4 | 7 |_
| 5 | 9 |
| 6 | 11 |_
| 7 | 13 |
| 8 | 15 |_
| 9 | 17 |
| 10 | 19 |_
+------+-------+
I want to calculate the average of every nth item, for example of two rows (see marks above) so that I get an result of:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 3 | 6 |
| 5 | 10 |
| 7 | 14 |
| 9 | 18 |
+--------------+--------------+
I tried that with
SELECT
FIRST_VALUE(Tick) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) OccurredTick,
AVG(Value) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) ValueAverage
FROM TableName;
What I get in return is:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
| 4 | 8 |
| 5 | 10 |
| 6 | 12 |
| 7 | 14 |
| 8 | 16 |
| 9 | 18 |
| 10 | 19 |
+--------------+--------------+
You could use aggregation. If tick is always increasing with no gaps:
select min(tick), avg(value) avg_value
from mytable
group by cast((tick - 1) / 2 as integer)
You can change 2 to whatever group size suits to best.
If tick are not sequentially increasing, we can generate a sequence with row_number()
select min(tick), avg(value) avg_value
from (
select t.*, row_number() over(order by tick) rn
from mytable t
) t
group by cast((rn - 1) / 2 as integer)

Is there an easier way to find the row with a max value?

I have a schema where these two tables exist (among others)
participation
+------+--------+------------------+
| movie| person | role |
+------+--------+------------------+
| 1 | 1 | "Regisseur" |
| 1 | 1 | "Schauspieler" |
| 1 | 2 | "Schauspielerin" |
| 2 | 3 | "Regisseur" |
| 3 | 4 | "Regisseur" |
| 3 | 5 | "Schauspieler" |
| 3 | 6 | "Schauspieler" |
| 4 | 7 | "Schauspielerin" |
| 4 | 8 | "Schauspieler" |
| 5 | 1 | "Schauspieler" |
| 5 | 8 | "Schauspieler" |
| 5 | 14 | "Schauspieler" |
+------+--------+------------------+
movie
+----+------------------------------+------+-----+
| id | title | year | fsk |
+----+------------------------------+------+-----+
| 1 | "Die Bruecke am Fluss" | 1995 | 12 |
| 2 | "101 Dalmatiner" | 1961 | 0 |
| 3 | "Vernetzt - Johnny Mnemonic" | 1995 | 16 |
| 4 | "Waehrend Du schliefst..." | 1995 | 6 |
| 5 | "Casper" | 1995 | 6 |
| 6 | "French Kiss" | 1995 | 6 |
| 7 | "Stadtgespraech" | 1995 | 12 |
| 8 | "Apollo 13" | 1995 | 6 |
| 9 | "Schlafes Bruder" | 1995 | 12 |
| 10 | "Assassins - Die Killer" | 1995 | 16 |
| 11 | "Braveheart" | 1995 | 16 |
| 12 | "Das Netz" | 1995 | 12 |
| 13 | "Free Willy 2" | 1995 | 6 |
+----+------------------------------+------+-----+
I want to get the movie with the highest number of people that participated. I figured out an SQL statement that actually does this, but looks super complicated. It looks like this:
SELECT titel
FROM movie.movie
JOIN (SELECT *
FROM (SELECT Max(count_person) AS max_count_person
FROM (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons) AS
maxCountPersons
JOIN (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons
ON maxCountPersons.max_count_person =
countPersons.count_person)
AS maxPersonsmovie
ON maxPersonsmovie.movie = movie.id
The main problem is, that I can't find an easier way to select the row with the highest value. If I simply could make a selection on the inner table and pick the row with the highest value on count_person without losing the information about the movie itself, this would look so much simpler. Is there a way to simplify this, or is this really the easiest way to do this?
Here is a way without subqueries:
SELECT m.title
FROM movie.movie m JOIN
movie.participation p
ON m.id = p.movie
GROUP BY m.title
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
You can use LIMIT 1 instead of FETCH, if you prefer.
Note: In the event of ties, this only returns one value. That seems consistent with your question.
You can use rank window function to do this.
SELECT title
FROM (SELECT m.title,rank() over(order by count(p.person) desc) as rnk
FROM movie.movie m
LEFT JOIN movie.participation p ON m.id=p.movie
GROUP BY m.title
) t
WHERE rnk=1
SELECT title
FROM movie.movie
WHERE id = (SELECT movie
FROM movie.participation
GROUP BY movie
ORDER BY count(*) DESC
LIMIT 1);

SQL Windowing accumulative sum with grouping

I've got a table like this
|week_no|value|attribute|
-------------------------
| 1 | 3 | a |
| 2 | 3 | a |
| 3 | 3 | a |
| 1 | 4 | b |
| 2 | 4 | b |
| 3 | 4 | b |
I'd like to have an accumulative account of the value column
|week_no|value|attribute|accum_value|
-------------------------------------
| 1 | 3 | a | 3 |
| 2 | 3 | a | 6 |
| 3 | 3 | a | 9 |
| 1 | 4 | b | 4 |
| 2 | 4 | b | 8 |
| 3 | 4 | b | 12 |
I've attempted doing the above by using this windowing function though it doesn't seem to be returning the correct values
SUM(value) OVER(ORDER BY 1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accum_value
The correct window function would use partition by:
SUM(value) OVER (PARTITION BY attribute ORDER BY week_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS accum_value

Looping through table to create a new table in SQL jointly with group by Postgres

Suppose a table has the following structure
product | day | transactionid | saleprice |
------------------------------------------------ |
Apple | 1 | 239849248 | 10 |
Apple | 2 | 239834328 | 10 |
Apple | 2 | 239849249 | 10 |
Apple | 3 | 239849234 | 11 |
Banana | 1 | 239843244 | 2 |
Banana | 2 | 239843244 | 2 |
Banana | 3 | 239843244 | 3 |
Banana | 4 | 239843244 | 3 |
Orange | 1 | 239234238 | 25 |
Orange | 2 | 239234238 | 25 |
Orange | 3 | 239234238 | 25 |
Orange | 3 | 239234238 | 26 |
Orange | 3 | 239234238 | 26 |
Orange | 4 | 239234238 | 27 |
Where a number of products are sold, every day, with multiple transactions at different prices. For each product, I am interested in a change-log of Min(SalePrice) (changelog because this rarely changes in my data). The following query gives me, for a particular product (say Orange):
SELECT max(product), day, min(saleprice)
FROM tableabove
where product = 'Orange'
group by day
order by day asc;
Gives me:
product | day | minsaleprice |
Orange | 1 | 25 |
Orange | 2 | 25 |
Orange | 3 | 25 |
Orange | 4 | 27 |
So, I have what I need for a product I specify, but now in the way I need it. For example, for orange I only need the days when the price changed (and Day 1) which means it should have only two rows for Day 1, and Day 4. I also do not know how to iterate this over all products in the table to generate a new table that looks as follows.
product | day | minsaleprice |
Apple | 1 | 10 |
Apple | 3 | 11 |
Banana | 1 | 2 |
Banana | 3 | 3 |
Orange | 1 | 25 |
Orange | 4 | 27 |
Any help is appreciated. Thanks.
I think you just want lag():
select t.*
from (select t.*,
lag(saleprice) over (partition by product order by day) as prev_saleprice
from tableabove t
) t
where prev_saleprice is null pr prev_saleprice <> saleprice;
EDIT:
If you only want changes day by day, then the same idea works with an additional aggregation:
select t.*
from (select t.product, t.day, min(salesprice) as min_saleprice
lag(min(saleprice)) over (partition by product order by day) as prev_minsaleprice
from tableabove t
group by t.product, t.day
) t
where prev_minsaleprice is null pr prev_minsaleprice <> minsaleprice;
Following on guidance from Gordon Linoff, I was was able to write the query as follows:
SELECT table2.*
FROM (SELECT table1.*, lag(table1.minsaleprice) OVER(partition by product) as prev_price
FROM (SELECT product, day, MIN(saleprice) as minsaleprice FROM tableabove
GROUP BY day, product ORDER BY product, day)
as table1)
as table2
WHERE prev_price IS null OR prev_fee <> minsaleprice

Oracle rank function issue

Iam experiencing an issue in oracle analytic functions
I want the rank in oracle to be displayed sequentialy but require a cyclic fashion.But this ranking should happen within a group.
Say I have 10 groups
In 10 groups each group must be ranked in till 9. If greater than 9 the rank value must start again from 1 and then end till howmuch so ever
emp id date1 date 2 Rank
123 13/6/2012 13/8/2021 1
123 14/2/2012 12/8/2014 2
.
.
123 9/10/2013 12/12/2015 9
123 16/10/2013 15/10/2013 1
123 16/3/2014 15/9/2015 2
In the above example the for the group of rows of the empid 123 i have split the rank in two subgroup fashion.Sequentially from 1 to 9 is one group and for the rest of the rows the rank again starts from 1.How to achieve this in oracle rank functions.
as per suggestion from Egor Skriptunoff above:
select
empid, date1, date2
, row_number() over(order by date1, date2) as "rank"
, mod(row_number() over(order by date1, date2)-1, 9)+1 as "cycle_9"
from yourtable
example result
| empid | date1 | date2 | rn | ranked |
|-------|----------------------|----------------------|----|--------|
| 72232 | 2016-10-26T00:00:00Z | 2017-03-07T00:00:00Z | 1 | 1 |
| 04365 | 2016-11-03T00:00:00Z | 2017-07-29T00:00:00Z | 2 | 2 |
| 79203 | 2016-12-15T00:00:00Z | 2017-05-16T00:00:00Z | 3 | 3 |
| 68638 | 2016-12-18T00:00:00Z | 2017-02-08T00:00:00Z | 4 | 4 |
| 75784 | 2016-12-24T00:00:00Z | 2017-11-18T00:00:00Z | 5 | 5 |
| 72836 | 2016-12-24T00:00:00Z | 2018-09-10T00:00:00Z | 6 | 6 |
| 03679 | 2017-01-24T00:00:00Z | 2017-10-14T00:00:00Z | 7 | 7 |
| 43527 | 2017-02-12T00:00:00Z | 2017-01-15T00:00:00Z | 8 | 8 |
| 03138 | 2017-02-26T00:00:00Z | 2017-01-30T00:00:00Z | 9 | 9 |
| 89758 | 2017-03-29T00:00:00Z | 2018-04-12T00:00:00Z | 10 | 1 |
| 86377 | 2017-04-14T00:00:00Z | 2018-10-07T00:00:00Z | 11 | 2 |
| 49169 | 2017-04-28T00:00:00Z | 2017-04-21T00:00:00Z | 12 | 3 |
| 45523 | 2017-05-03T00:00:00Z | 2017-05-07T00:00:00Z | 13 | 4 |
SQL Fiddle