ETL query need some changes go get it right

ETL query need some changes go get it right - sql

Hello guys I have a query which is working but when I remove 2 filters (2 where clauses at the end doesn't work as expected but still have to be removed from the query)
I have accounts 1000001,1000002,1000003,1000004 and 1000005
I only get 1000005 accounts, Pretty sure that it`s is about the window MAX function, but still.
I want to get the all values for the accounts.
SELECT a12.month_id,
a12.populate_id AS account_id,
LAST_VALUE(current_bal IGNORE NULLS) OVER
(PARTITION BY Populate_id ORDER BY date_id ASC ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS avg_dly_bal
FROM (SELECT TO_CHAR(date_id, 'YYYYMM') AS month_id,
date_id,
account_id AS "account_id",
MAX(account_id) OVER (PARTITION by TO_CHAR(date_id, 'YYYYMM')) as populate_id,
current_bal
FROM (SELECT t.date_id, ad.account_id, ad.current_bal
FROM timedate t
FULL OUTER JOIN (SELECT src_extract_dt, account_id, current_bal
FROM account_dly
WHERE account_id = 1000001) ad
on t.date_id = ad.src_extract_dt
WHERE TO_CHAR(date_id, 'YYYYMM') = '201908'
order by t.date_id)) a12;
https://i.stack.imgur.com/xphVh.png

Related

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS
(
SELECT
PLAYER_ID, EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
FROM
ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS
(
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM
ACTIVITY A
JOIN
CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE
NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY
A.PLAYER_ID
)
-- FRACTION
SELECT
NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION
FROM
ACTIVITY
JOIN
CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID
I am getting the following error when I run this query.
[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)
This is a leetcode medium question 550. Game Play Analysis IV. I wanted to know why it can't identify the column NEXT_DATE here and what am I missing? Thanks!

The problem is in this CTE:
-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY A.PLAYER_ID
)
Note that you are creating NEXT_DATE as a column alias in this CTE but also referring to it in the WHERE clause. This is invalid because by SQL clause-ordering rules the NEXT_DATE column alias does not exist until you get to the ORDER BY clause which is the last evaluated clause in a SQL query or subquery. You don't have an ORDER BY clause in this subquery, so technically the NEXT_DATE column alias only exists to [sub]queries that both come after and reference your CTE_CONSEC_PLAYERS CTE.
To fix this you'd probably want two CTEs like this (untested):
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
SELECT
PLAYER_ID,
RN,
EVENT_DATE,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
MAX(NEXT_DATE) AS NEXT_DATE,
FROM CTE_CONSEC_PLAYERS_pre
WHERE NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
GROUP BY PLAYER_ID
)

You gave every table an alias (for example JOIN CTE_FIRST_LOGIN C has the alias C), and every column access is via the alias. You need to add the correct alias from the correct table to NEXT_DATE.

Your primary issue is that NEXT_DATE is a window function, and therefore cannot be referred to in the WHERE because of SQL's order of operations.
But it seems this query is over-complicated.
The problem to be solved appears to be: how many players logged in the day after they first logged in, as a percentage of all players.
This can be done in a single pass (no joins), by using multiple window functions together:
WITH CTE_FIRST_LOGIN AS (
SELECT
PLAYER_ID,
EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
FROM ACTIVITY
),
BY_PLAYERS AS (
SELECT
c.PLAYER_ID,
SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
THEN 1 END) AS IsConsecutive
FROM CTE_FIRST_LOGIN AS c
GROUP BY c.PLAYER_ID
)
SELECT ROUND(
1.00 *
COUNT(c.IsConsecutive) /
NULLIF(COUNT(*), 0)
,2) AS FRACTION
FROM BY_PLAYERS AS c;
You could theoretically merge BY_PLAYERS into the outer query and use COUNT(DISTINCT but splitting them feels cleaner

Sum for a rolling total

I have the following query:
select b.month_date,total_signups,active_users from
(
SELECT date_trunc('month',confirmed_at) as month_date
, count(distinct id) as total_signups
FROM follower.users
WHERE confirmed_at::date >= dateadd(day,-90,getdate())::date
and (deleted_at is null or deleted_at > date_trunc('month',confirmed_at))
group by 1
) a ,
(
SELECT date_trunc('month', inv.created_at) AS month_date
,COUNT(DISTINCT em.user_id) AS active_users
FROM follower.invitees inv
INNER JOIN follower.events
ON inv.event_id = em.event_id
where inv.created_at::date >= dateadd(day,-90,getdate())::date
GROUP BY 1
) b
where a.month_date=b.month_date
This returns three columns month date, total signups and active users, what I need is a rolling total for all users in the fourth column (rolling total of signups). I've tried over and partition functions with no luck. Could someone help? Appreciate it very much.

Try adding this column definition to your first Select:
SUM(total_signups)
OVER (ORDER BY b.month_date ASC rows between unbounded preceding and current row)
AS running_total
Here's a mini-demo

How to generate session_id by sql?

My tracking system do not generate sessions IDS.
I have user_id & event_date_time.
I need a new session_id for each user's session that starts 30 minutes or more after last event_date_time of each user.
My final goal is to calculate median session time.
I tried to generate session_id=1 and session_id=2 once event_date_time-next_event_time>30 and guid=guid, but i'm stuck from here
select a.*,
case when (a.next_event_date-a.event_date)*24*60<30 and userID=next_userID
then 1
when (a.next_event_date-a.event_date)*24*60>=30 and userID=next_userID then
2
end session_id
from
(select f.userID,
lead(f.userID) over (partition by f.guid order by f.event_date)
next_guid,
f.event_date,
lead(f.event_date) over (partition by f.guid order by f.event_date)
next_event_date
from event_table f
)a
where next_event_date is not null

If I understood correctly you could generate ID's this way:
select id, guid, event_date,
sum(chg) over (partition by guid order by event_date) session_id
from (
select id, guid, event_date,
case when lag(guid) over (partition by guid order by event_date) = guid
and 24 * 60 * (event_date -lag(event_date)
over (partition by guid order by event_date) ) < 30
then 0 else 1
end chg
from event_table ) a
dbfiddle demo
Compare neighbouring rows, if there are different guids or time difference is greater than 30 minutes then assign 1. Then sum these values analytically.

I think you're on the right track using lead or lag. My recommendation would be to break this into steps and create a temp table to work against:
With the first query, assign every record its own unique ID, either a sequence number or GUID. You could also capture some of the lagged data in this step.
With a second query, find the overlaps (< 30 minutes) and make the overlapping records all the same -- either the same as the earliest or latest in that grouping, doesn't matter as long as it's consistent.
Something like this:
create table events_temp as (
select f.*,
row_number() over (partition by f.userID order by f.event_date) as user_row,
lag(f.userID) over (partition by f.userID order by f.event_date) as prev_userID,
lag(f.event_date) over (partition by f.userID order by f.event_date) as prev_event_date
from event_table f
order by f.userId, f.event_date
)
select a.*,
case when prev_userID = userID
and 24 * 60 * (event_date - prev_event_date) < 30
then lag(user_row) over (partition by userID order by user_row)
else user_row
end as session_id
from events_temp

SQL - values from two rows into new two rows

I have a query that gives a sum of quantity of items on working days. on weekend and holidays that quantity value and item value is empty.
I would like that on empty days is last known quantity and item.
My query is like this:
`select a.dt,b.zaliha as quantity,b.artikal as item
from
(select to_date('01-01-2017', 'DD-MM-YYYY') + rownum -1 dt
from dual
connect by level <= to_date(sysdate) - to_date('01-01-2017', 'DD-MM-YYYY') + 1
order by 1)a
LEFT OUTER JOIN
(select kolicina,sum(kolicina)over(partition by artikal order by datum_do) as zaliha,datum_do,artikal
from
(select sum(vv.kolicinaulaz-vv.kolicinaizlaz)kolicina,vz.datum as datum_do,vv.artikal
from vlpzaglavlja vz, vlpvarijante vv
where vz.id=vv.vlpzaglavlje
and vz.orgjed='01006'
and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum,vv.artikal
order by vv.artikal,vz.datum asc)
order by artikal,datum_do asc)b
on a.dt=b.datum_do
where a.dt between to_date('12102017','ddmmyyyy') and to_date('16102017','ddmmyyyy')
order by a.dt`
and my output is like this:
and I want this:

In short, if quantity is null use lag(... ignore nulls) and coalesce or nvl:
select dt, item,
nvl(quantity, lag(quantity ignore nulls) over (partition by item order by dt))
from t
order by dt, item
Here is the full query, I cannot test it, but it is something like:
with t as (
select a.dt, b.zaliha as quantity, b.artikal as item
from (
select date '2017-10-10' + rownum - 1 dt
from dual
connect by date '2017-10-10' + rownum - 1 <= date '2017-10-16' ) a
left join (
select kolicina, datum_do, artikal,
sum(kolicina) over(partition by artikal order by datum_do) as zaliha
from (
select sum(vv.kolicinaulaz-vv.kolicinaizlaz) kolicina,
vz.datum as datum_do, vv.artikal
from vlpzaglavlja vz
join vlpvarijante vv on vz.id = vv.vlpzaglavlje
where vz.orgjed = '01006' and vv.skladiste='01006'
and vv.artikal in (3069,6402)
group by vz.datum, vv.artikal)) b
on a.dt = b.datum_do)
select *
from (
select dt, item,
nvl(quantity, lag(quantity ignore nulls)
over (partition by item order by dt)) qty
from t)
where dt >= date '2017-10-12'
order by dt, item
There are several issues in your query, major and minor:
in date generator (subquery a) you are selecting dates from long period, january to september, then joining with main tables and summing data and then selecting only small part. Why not filter dates at first?,
to_date(sysdate). sysdate is already date,
use ansi joins,
do not use order by in subqueries, it has no impact, only last ordering is important,
use date literals when defining dates, it is more readable.

SQL Oracle/Aggregation query

I'm trying to run a query on Oracle. I've a table of settled payments for accounts, I have a query which pulls through the last three settled amounts, plus any amount which was written off, for any account I need this info for.
However, some of the accounts are weekly based, and for these I would like to aggregate their weekly settlements into their monthly groups. Here is the code I have so far:
SELECT *
FROM (
SELECT *
FROM (
SELECT gwod.account_id,
gwod.charge_period_start,
SUM(gwod.total_due_on_charge) total_due_on_charge,
SUM(gwod.amount_written_off) amount_written_off,
DENSE_RANK() over (PARTITION BY gwod.account_id
ORDER BY charge_period_start DESC) rownumber
FROM report.accounts_write_off gwod
WHERE account_id IN (‘account_number’)
GROUP BY gwod.account_id,
gwod.charge_period_start
HAVING SUM (gwod.total_due_on_charge) <> 0) t1
WHERE t1.rownumber <=3)
PIVOT (MAX(charge_period_start) charge_period,
MAX(total_due_on_charge) total_due_on_charge,
MAX(amount_written_off) amount_written_off
FOR rownumber IN (1,2,3))
ORDER BY account_id.*
This works perfectly but for the weekly based accounts, so rather than pulling through the last three weekly amounts which were settled, i.e. 25-09-17, 18-09-17, 11-09-2017, I'd like to pull through the aggregated payments for September, August, and July.
I hope all this makes sense.

Simply change your aggregation from current unit level (i.e., weekly) to month level with EXTRACT(month ...) in inner query's SELECT and GROUP BY as well as PARTITION and PIVOT clauses:
SELECT *
FROM (
SELECT *
FROM (
SELECT gwod.account_id,
EXTRACT(month FROM gwod.charge_period_start) charge_period_month,
SUM(gwod.total_due_on_charge) total_due_on_charge,
SUM(gwod.amount_written_off) amount_written_off,
DENSE_RANK() over (PARTITION BY gwod.account_id
ORDER BY EXTRACT(month FROM gwod.charge_period_start) DESC) rownumber
FROM report.accounts_write_off gwod
WHERE account_id IN ('account_number')
GROUP BY gwod.account_id,
EXTRACT(month FROM gwod.charge_period_start)
HAVING SUM (gwod.total_due_on_charge) <> 0) t1
WHERE t1.rownumber <=3)
PIVOT (MAX(charge_period_month) charge_period,
MAX(total_due_on_charge) total_due_on_charge,
MAX(amount_written_off) amount_written_off
FOR rownumber IN (1,2,3))
ORDER BY account_id.*
DEMO (with random data):
http://rextester.com/UJK84858

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

ETL query need some changes go get it right - sql

Related

SQL Server LEAD function

Sum for a rolling total

How to generate session_id by sql?

SQL - values from two rows into new two rows

SQL Oracle/Aggregation query

Categories

Resources