Unable to use lag function correctly in sql - sql

I have created a table from multiple tables like this:
Week | Cid | CustId | L1
10 | 1 | 1 | 2
10 | 2 | 1 | 2
10 | 5 | 1 | 2
10 | 4 | 1 | 1
10 | 3 | 2 | 1
4 | 6 | 1 | 2
4 | 7 | 1 | 2
I want the output as:
Repeat
0
1
1
0
0
0
1
So, basically what I want is for each week, if a person (custid) comes in again with the same L1, then the value in the column Repeat should become 1, otherwise 0 ( so like, here, in row 2 & 3, custid 1, came with L1=2 again, so it will get 1 in column "Repeat", however in row 4, custid 1 came with L1=1, so it will get value as ).
By the way, the table isn't ordered (as I've shown).
I'm trying to do it as follows:
select t.*,
lag(0, 1, 0) over (partition by week, custid, L1 order by cid) as repeat
from
table;
But this is not giving the output and is giving empty result.

I think you need a case, but I would use row_number() for this:
select t.*,
(case when row_number() over (partition by week, custid, l1 order by cid) = 1
then 0 else 1
end) as repeat
from table;

This can also be computed without Window functions but by a self-join in the following way:
SELECT a.week, a.cid, a.custid, a.l1,
CASE WHEN b IS NULL THEN 1 ELSE 0 END AS repeat
FROM mytable a NATURAL LEFT JOIN
(SELECT week, min(cid) AS cid, custid, l1 FROM mytable
GROUP BY week,custid,l1) b
ORDER BY week DESC, custid, l1 DESC, cid;

It can be done simply by using an count(*) as analytic function. No case expression or self join needed. The query is even portable across databases that support analytic functions:
SELECT cust.*, least(count(*)
OVER (PARTITION BY Week, CustId, L1 ORDER BY Cid
ROWS UNBOUNDED PRECEDING) - 1, 1) repeat
FROM cust ORDER BY Week DESC, custId, L1 DESC;
Executing the query on your data results in the following output (last row is the repeat row):
Week | Cid | CustId | L1 | repeat
10 1 1 2 0
10 2 1 2 1
10 5 1 2 1
10 4 1 1 0
10 3 2 1 0
4 6 1 2 0
4 7 1 2 1
Tested on Oracle 11g and PostgreSQL 9.4. Note that the second ORDER BY is optional. See Oracle Language Reference, Analytic Functions for more details.

Related

Sql assign unique key to groups having particular pattern

Hi I was trying to group data based on a particular pattern.
I have a table with two column as below,
Name rollingsum
A 5
A 10
A 0
A 5
A 0
B 6
B 0
I need to generate a key column that increment only after rollingsum equals 0 is encountered.As given below
Name rollingsum key
A 5 1
A 10 1
A 0 1
A 5 2
A 0 2
B 6 3
B 0 3
I am using postgres, I tried to increment variable in case statement as below
Declare a int;
a:=1;
........etc
Case when rolling sum =0 then a:=a+1 else a end as key
But I am getting an error near :
Thanks in advance for all help
You need an ordering columns because the results depend on the ordering of the rows -- and SQL tables represent unordered sets.
Then do a cumulative sum of the 0 counts from the end of the data. That is in reverse order, so subtract that from the total:
select t.*,
(1 + sum( (rolling_sum = 0)::int ) over () -
sum( (rolling_sum = 0)::int ) over (order by ordercol desc)
) as key
from t;
Assuming that you have a column called id to order the rows, here is one option using a cumulative count and a window frame:
select name, rollingsum,
1 + count(*) filter(where rollingsum = 0) over(
order by id
rows between unbounded preceding and 1 preceding
) as key
from mytable
Demo on DB Fiddle:
name | rollingsum | key
:--- | ---------: | --:
A | 5 | 1
A | 10 | 1
A | 0 | 1
A | 5 | 2
A | 0 | 2
B | 6 | 3
B | 0 | 3

Is it possible to do projection in Google Big Query?

I have a query (due to restrictions, it is using Legacy SQL) that produces a column that is the rolling average of last 3 days of sale (excluding today)
SELECT
id, date, sales, AVG(sales) OVER (PARTITION BY id ORDER BY date RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING) AS projected_sale
FROM tableA
tableA
+-------+---------+---------+
| id | date | sales |
+-------+---------+---------+
| 1 | 01-01-17| 5 |
| 1 | 01-02-17| 6 |
| 1 | 01-03-17| 7 |
| 1 | 01-04-17| 10 |
+-------+---------+---------+
The query produces
+-------+---------+---------+--------------+
| id | date | sales |projected_sale|
+-------+---------+---------+--------------+
| 1 | 01-01-17| 5 | . |
| 1 | 01-02-17| 6 | . |
| 1 | 01-03-17| 7 | . |
| 1 | 01-04-17| 10 | 6 |
+-------+---------+---------+--------------+
Since the average is excluding the current row, theoretically I can project the sale for 01-05-17 using the sales from (01-02 to 01-04). However since tableA doesn't actually have a entry with date 01-05-17, my query stops at 01-04-17 as the last row.
Is what I am trying to do possible in Big Query?
Thank you
First, I think using RANGE is incorrect here - it should be ROWS instead
Anyway, below is an example for BigQuery Legacy SQL that demonstrates how to achieve result you need.
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
As you can see here you just simply adding (UNION ALL - comma in Kegacy SQL) that missing day. Of course you can transform that one such that it will add such missing row for all id's
Nevetherless - hope this is a good starting point for you
You can test / play with it using dummy data as in your question
#legacySQL
SELECT
id, dt, sales,
AVG(sales) OVER (
PARTITION BY id ORDER BY dt
ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING
) AS projected_sale
FROM (
SELECT * FROM
(SELECT 1 id, '01-01-17' dt, 5 sales),
(SELECT 1 id, '01-02-17' dt, 6 sales),
(SELECT 1 id, '01-03-17' dt, 7 sales),
(SELECT 1 id, '01-04-17' dt, 10 sales)
) tableA, (SELECT 1 id, '01-05-17' dt, 0 sales)
with result as
Row id dt sales projected_sale
1 1 01-01-17 5 null
2 1 01-02-17 6 5.0
3 1 01-03-17 7 5.5
4 1 01-04-17 10 6.0
5 1 01-05-17 0 7.0

How to calculate the value of a previous row from the count of another column

I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

SQL AVG(COUNT(*))?

I'm trying to find out the average number of times a value appears in a column, group it based on another column and then perform a calculation on it.
I have 3 tables a little like this
DVD
ID | NAME
1 | 1
2 | 1
3 | 2
4 | 3
COPY
ID | DVDID
1 | 1
2 | 1
3 | 2
4 | 3
5 | 1
LOAN
ID | DVDID | COPYID
1 | 1 | 1
2 | 1 | 2
3 | 2 | 3
4 | 3 | 4
5 | 1 | 5
6 | 1 | 5
7 | 1 | 5
8 | 1 | 2
etc
Basically, I'm trying to find all the copy ids that appear in the loan table LESS times than the average number of times for all copies of that DVD.
So in the example above, copy 5 of dvd 1 appears 3 times, copy 2 twice and copy 1 once so the average for that DVD is 2. I want to list all the copies of that (and each other) dvd that appear less than that number in the Loan table.
I hope that makes a bit more sense...
Thanks
Similar to dotjoe's solution, but using an analytic function to avoid the extra join. May be more or less efficient.
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, copyid, cnt, avg(cnt) over (partition by dvdid) as copy_avg
from loan_copy_total
)
select *
from loan_copy_avg lca
where cnt <= copy_avg;
This should work in Oracle:
create view dvd_count_view
select dvdid, count(1) as howmanytimes
from loans
group by dvdid;
select avg(howmanytimes) from dvd_count_view;
Untested...
with
loan_copy_total as
(
select dvdid, copyid, count(*) as cnt
from loan
group by dvdid, copyid
),
loan_copy_avg as
(
select dvdid, avg(cnt) as copy_avg
from loan_copy_total
group by dvdid
)
select lct.*, lca.copy_avg
from loan_copy_avg lca
inner join loan_copy_total lct on lca.dvdid = lct.dvdid
and lct.cnt <= lca.copy_avg;