Filling in missing balance and dates in table to track balance - sql

I hope you can help me with this problem. I just started out on SQL using Bigquery so my problem can seem a bit tedious.
So I have a table that basically records the date and balance whenever the balance changes. It looks somewhat like this:
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
However, I need to create a new table with the daily balances that looks like this
+------------+-----------+------+---------+
| Date | seller_ID | Name | Balance |
+------------+-----------+------+---------+
| 2020-09-10 | 1 | John | 10 |
| 2020-09-11 | 1 | John | 10 |
| 2020-09-12 | 1 | John | 10 |
| 2020-09-13 | 1 | John | 8 |
| 2020-09-14 | 1 | John | 8 |
| 2020-09-15 | 1 | John | 6 |
+------------+-----------+------+---------+
I tried creating a separate table of all the dates between the first and final date, and then LEFT JOIN the original table with it but the resulting table isn't very helpful to draw from.
Does anyone have an idea of what to do in this case?

To fill null value with previous non-null value in BigQuery you can use LAST_VALUE with IGNORE NULLS:
WITH test_table AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
)
SELECT Date,
LAST_VALUE(seller_Id IGNORE NULLS) OVER (ORDER BY Date) AS seller_Id,
LAST_VALUE(Name IGNORE NULLS) OVER (ORDER BY Date) AS Name,
LAST_VALUE(Balance IGNORE NULLS) OVER (ORDER BY Date) AS purchase_date
FROM UNNEST(GENERATE_DATE_ARRAY('2020-09-10', '2020-09-15')) AS Date
LEFT JOIN test_table USING (Date)
ORDER BY Date

You can do this without window functions for the balance. The key is the window function only for the date:
WITH t AS (
SELECT DATE '2020-09-10' AS Date, 1 AS seller_Id, 'John' AS Name, 10 AS Balance UNION ALL
SELECT '2020-09-13', 1, 'John' AS Name, 8 UNION ALL
SELECT '2020-09-15', 1, 'John' AS Name, 6
),
tt as (
SELECT t.*, LEAD(date) OVER (PARTITION BY name ORDER BY date) as next_date
FROM t
)
SELECT dte, tt.name, tt.balance
FROM tt LEFT JOIN
UNNEST(GENERATE_DATE_ARRAY(tt.date, COALESCE(DATE_ADD(tt.next_date, INTERVAL - 1 DAY), DATE '2020-09-15'))) dte
ON true;
(Note: The ON clause is optional in this case. However, I am not a fan of having joins without ON -- unless it is a CROSS JOIN.)
This has two important advantages over Sergey's solution. The most important is that it will work for multiple names with different time periods.
The second advantage is that it is more efficient, because it is not using window functions to fetch values from previous rows.

Related

How do I repeat a value in a dataset until the next value appears - SQL (Teradata)

I have a dataset based on products that change on certain days and some products that change value daily.However its possible for customers to purchase that product up until the date it changes. So when I pull through the data it looks like this
EG:
+---------+-------+------------+
| Product | Value | Date |
+---------+-------+------------+
| B | 5 | 21/05/2022 |
| A | 1 | 27/05/2022 |
| B | 2 | 28/05/2022 |
| C | 3 | 27/05/2022 |
| C | 4 | 28/05/2022 |
| A | 7 | 29/05/2022 |
| C | 5 | 29/05/2022 |
+---------+-------+------------+
I am trying to get it into this format:
+------------+---+---+---+
| Date | A | B | C |
+------------+---+---+---+
| 27/05/2022 | 1 | 5 | 3 |
| 28/05/2022 | 1 | 2 | 4 |
| 29/05/2022 | 7 | 2 | 5 |
+------------+---+---+---+
Whats the best way to do this in Teradata SQL
(note the example is a bit small, its likely the minimum i would need to repeat certain products is 7 days)
You could try the use of pivot e.g.
SEL date, a, b, c
FROM your_table
PIVOT (
MAX(value)
FOR product IN ('a','b','c')
) piv;
Pivot over all dates (or at least include the previous x days/weeks to get rows for products like 'B'), apppy LAST_VALUE IGNORE NULLS on each product and then filter the range of dates.
with cte as
(
select
date
,last_value(a ignore nulls) over (order by date) as a
,last_value(b ignore nulls) over (order by date) as b
,last_value(c ignore nulls) over (order by date) as c
from tab
PIVOT (
MAX(value_)
FOR product IN ('a' as a
,'b' as b
,'c' as c)
) as pvt
)
select *
from cte
where date between date '2022-05-27'
and date '2022-05-29'
But using old-style MAX(CASE) will probably get a slightly better plan and it's easier create dynamically if needed.
select
date
,last_value(max(case when product = 'a' then value end) ignore nulls) over (order by date)
,last_value(max(case when product = 'b' then value end) ignore nulls) over (order by date)
,last_value(max(case when product = 'c' then value end) ignore nulls) over (order by date)
from tab
group by 1
qualify date between date '2022-05-27'
and date '2022-05-29'

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated
ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;
Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

SELECT based on multiple fields in MS-SQL

I have a table with 4 columns:
AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType
There are multiple records for each AcctNumb, with the date that each record was recorded.
What I want to do is grab the most recent date, consumption reading, and reading type for each account.
I have tried using MAX(PeriodEndingDate) and GROUP BY AcctNumb, but I would need to aggregate all the other values, and none of the aggregate functions help me for the WaterConsumption, etc.
Can anyone point me in the right direction?
Thanks
EDIT
Here is a sample table
+----------+------------------+------------------+-------------+
| AcctNumb | PeriodEndingDate | WaterConsumption | ReadingType |
+----------+------------------+------------------+-------------+
| 1000 | 2018-03-31 | 122230 | A |
| 1001 | 2018-03-31 | 24850 | A |
| 1002 | 2018-03-31 | 88540 | A |
| 1000 | 2017-12-31 | 123800 | A |
| 1001 | 2017-12-31 | 3000 | E |
+----------+------------------+------------------+-------------+
The ReadingType is whether it's an actual (A) reading, or an estimate (E).
Try this
SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType
FROM (SELECT
AcctNumb,
PeriodEndingDate,
WaterConsumption,
ReadingType,
ROW_NUMBER() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS MostrecentRecord
FROM <TableName>) dt
WHERE MostrecentRecord= 1
This can be done using ROW_NUMBER. It has been asked an answered thousands of times but the query is easier to write than find a duplicate.
select *
from
(
select *
, RowNum = ROW_NUMBER() over(partition by AcctNumb order by PeriodEndingDate)
from YourTable
) x
where x.RowNum = 1
SELECT DQ.* FROM
(SELECT *,
Row_Number() OVER (PARTITION BY AcctNumb ORDER BY PeriodEndingDate DESC) AS RN
FROM YourTable
) AS DQ
WHERE DQ.RN = 1

Changing a Select Query to a Count Distinct Query

I am using a Select query to select Members, a variable that serves as a unique identifier, and transaction date, a Date format (MM/DD/YYYY).
Select Members , transaction_date,
FROM table WHERE Criteria = 'xxx'
Group by Members, transaction_date;
My ultimate aim is to count the # of unique members by month (i.e., a unique member in day 3, 6, 12 of a month is only counted once). I don't want to select any data, but rather run this calculation (count distinct by month) and output the calculation.
This will give distinct count per month.
SQLFiddle Demo
select month,count(*) as distinct_Count_month
from
(
select members,to_char(transaction_date, 'YYYY-MM') as month
from table1
/* add your where condition */
group by members,to_char(transaction_date, 'YYYY-MM')
) a
group by month
So for this input
+---------+------------------+
| members | transaction_date |
+---------+------------------+
| 1 | 12/23/2015 |
| 1 | 11/23/2015 |
| 1 | 11/24/2015 |
| 2 | 11/24/2015 |
| 2 | 10/24/2015 |
+---------+------------------+
You will get this output
+----------+----------------------+
| month | distinct_count_month |
+----------+----------------------+
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
+----------+----------------------+
You might want to try this. This might work.
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/') AS [DATE], COUNT(MEMBERS) AS [NO OF MEMBERS]
FROM BAR
WHERE REPLACE(CONVERT(DATE,transaction_date,101),'-','/') IN
(
SELECT REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
FROM BAR
)
GROUP BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
ORDER BY REPLACE(CONVERT(DATE,transaction_date,101),'-','/')
Use COUNT(DISTINCT members) and date_trunc('month', transaction_date) to retain timestamps for most calculations (and this can also help with ordering the result). to_char() can then be used to control the display format but it isn't required elsewhere.
SELECT
to_char(date_trunc('month', transaction_date), 'YYYY-MM')
, COUNT(DISTINCT members) AS distinct_Count_month
FROM table1
GROUP BY
date_trunc('month', transaction_date)
;
result sample:
| to_char | distinct_count_month |
|---------|----------------------|
| 2015-10 | 1 |
| 2015-11 | 2 |
| 2015-12 | 1 |
see: http://sqlfiddle.com/#!15/57294/2

Select rows which repeat every month

I am trying to resolve on simple task for first look.
I have transactions table.
| name |entity_id| amount | date |
|--------|---------|--------|------------|
| Github | 1 | 4.80 | 01/01/2014 |
| itunes | 2 | 2.80 | 22/01/2014 |
| Github | 1 | 4.80 | 01/02/2014 |
| Foods | 3 | 24.80 | 01/02/2014 |
| amazon | 4 | 14.20 | 01/03/2014 |
| amazon | 4 | 14.20 | 01/04/2014 |
I have to select rows which repeat every month in same day with same the amount for entity_id.(Subscriptions). Thanks for help
If your date column is created as a date type,
you could use a recursive CTE to collect continuations
after that, eliminate duplicate rows with distinct on
(and you should rename that column, because it's a reserved name in SQL)
with recursive recurring as (
select name, entity_id, amount, date as first_date, date as last_date, 0 as lvl
from transactions
union all
select r.name, r.entity_id, r.amount, r.first_date, t.date, r.lvl + 1
from recurring r
join transactions t
on row(t.name, t.entity_id, t.amount, t.date - interval '1' month)
= row(r.name, r.entity_id, r.amount, r.last_date)
)
select distinct on (name, entity_id, amount) *
from recurring
order by name, entity_id, amount, lvl desc
SQLFiddle
group it by day, for sample:
select entity_id, amount, max(date), min(date), count(*)
from transactions
group by entity_id, amount, date_part('day', date)