SQL to Generate Periodic Snapshots from Transactions Table - sql

I'm trying to create a periodic snapshot view from a database's transaction table after the fact. The transaction table has the following fields:
account_id (foreign key)
event_id
status_dt
status_cd
Every time an account changes status in the application, a new row is added to the transaction table with the new status. I'd like to produce a view that shows the count of accounts by status on every date; it should have the following fields:
snapshot_dt
status_cd
count_of_accounts
This will get the count for any given day, but not for all days:
SELECT status_cd, COUNT(account_id) AS count_of_accounts
FROM transactions
JOIN (
SELECT account_id, MAX(event_id) AS event_id
FROM transactions
WHERE status_dt <= DATE '2014-12-05') latest
USING (account_id, event_id)
GROUP BY status_cd
Thank you!

Okay, this is going to be hard to explain.
On each date for each status, you should count up two values:
The number of customers who start with that status.
The number of customers who leave with that status.
The first value is easy. It is just the aggregation of the transactions by the date and the status.
The second value is almost as easy. You get the previous status code and count the number of times that that status code "leaves" on that date.
Then, the key is the cumulative sum of the first value minus the cumulative sum of the second value.
I freely admit that the following code is not tested (if you had a SQL Fiddle, I'd be happy to test it). But this is what the resulting query looks like:
select status_dte, status_cd,
(sum(inc_cnt) over (partition by status_cd order by status_dt) -
sum(dec_cnt) over (partition by status_cd order by status_dt)
) as dateamount
from ((select t.status_dt, t.status_cd, count(*) as inc_cnt, 0 as dec_cnt
from transactions t
group by t.status_dt, t.status_cd
) union all
(select t.status_dt, prev_status_cd, 0, count(*)
from (select t.*
lag(t.status_cd) over (partition by t.account_id order by status_dt) as prev_status_cd
from transactions t
) t
where prev_status_cd is null
group by t.status_dt, prev_status_cd
)
) t;
If you have dates where there is no change for one or more statuses and you want to include those in the output, then the above query would need to use cross join to first create the rows in the result set. It is unclear if this is a requirement, so I'm leaving out that complication.

Related

How to find difference in date between each unique ID across multiple rows when not ordered? (PostgreSQL)

I have a table with id, order sequence and date, and I am trying to add two columns, one with a difference in date function, and another with a status function that is reliant on the value of the difference in date.
Table looks like this:
The issue I am having is that, when I try to find the difference between the dates of each unique id, so that if it's the first order sequence, it should be null, if it's any subsequent order sequence, let's say 3, it will be the 3rd date - 2nd date. Now this all works with the code I have:
case
when ord_seq = 1 then null
else ord_date - lag(ord_date) over (order by id)
end as date_diff,
However, this only works when the table is already ordered. If I jumble up the order that I input the table in, the values come out a little different. I figured it might be because "lag" function only takes the previous row's value, so if the previous row does not belong to the same id, and is not in chronological order, the dates won't subtract well.
My code looks like this at the moment:
select
id,
ord_seq,
ord_date,
case
when ord_seq = 1 then null
else ord_date - lag(ord_date) over (order by id)
end as date_diff,
case
when ord_seq = 1 then 'New'
when ord_date - lag(ord_date) over (order by id, ord_seq) between 1 and 200 then 'Retain'
when ord_date - lag(ord_date) over (order by id, ord_seq) > 200 then 'Reactivated'
end as status
from t1
order by id, ord_seq, ord_date
My db<>fiddle
Am I using the correct function here? How do I find the difference in date between one unique ID, regardless of the order of the table?
Any help would be much appreciated.
In case you want to see end table result (error is on id 'ddd', ord seq '2' and '3'):
Ordered Input:
Not Ordered Input:
When using this:
You miss the partition by in your window frame definition. Here it is, working regardless of any table order:
select *,
ord_date - lag(ord_date) over (partition by id order by ord_seq) as date_diff
from t1;
Please note however that database tables have no natural order that you can not rely upon and can not be considered ordered, no matter in what sequence the records have been inserted. You must specify explicitly an order by clause if you need a specific order.

Best approach to display all the users who have more than 1 purchases in a month in SQL

I have two tables in an Oracle Database, one of which is all the purchases done by all the customers over many years (purchase_logs). It has a unique purchase_id that is paired with a customer_id.The other table contains the user info of all the customers. Both have a common key of customer_id.
I want to display the user info of customers who have more than 1 unique item (NOT the item quantity) purchased in any month (i.e if A customer bought 4 unique items in february 2020 they would be valid as well as someone who bought 2 items in june). I was wondering what should my correct approach be and also how to correct execute that approach.
The two approaches that I can see are
Approach 1
Count the overall number of purchases done by all customers, filter the ones that are greater than 1 and then check if they any of them were done within a month.
Use this as a subquery in the where clause of the main query for retrieving the customer info for all the customer_id which match this condition.
This is what i've done so far,this retrieves the customer ids of all the customers who have more than 1 purchases in total. But I do not understand how to filter out all the purchases that did not occur in a single arbitrary month.
SELECT * FROM customer_details
WHERE customer_id IN (
SELECT cust_id from purchase_logs
group by cust_id
having count(*) >= 2);
Approach 2
Create a temporary table to Count the number of monthly purchases of a specific user_id then find the MAX() of the whole table and check if that MAX value is bigger than 1 or not. Then if it is provide it as true for the main query's where clause for the customer_info.
Approach 2 feels like the more logical option but I cannot seem to understand how to write the proper subquery for it as the command MAX(COUNT(customer_id)) from purchase_logs does not seem to be a valid query.
This is the DDL diagram.
This is the Sample Data of Purchase_logs
Customer_info
and Item_info
and the expected output for this sample data would be
It is certainly possible that there is a simpler approach that I am not seeing right now.
Would appreciate any suggestions and tips on this.
You need this query:
SELECT DISTINCT cust_id
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1;
to get all the cust_ids of the customers who have more than 1 unique item purchased in any month and you can use with the operator IN:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT DISTINCT cust_id -- here DISTINCT may be removed as it does not make any difference when the result is used with IN
FROM purchase_logs
GROUP BY cust_id, TO_CHAR(purchase_date, 'YYYY-MON')
HAVING COUNT(DISTINCT item_id) > 1
);
One approach might be to try
with multiplepurchase as (
select customer_id,month(purchasedate),count(*) as order_count
from purchase_logs
group by customer_id,month(purchasedate)
having count(*)>=2)
select customer_id,username,usercategory
from mutiplepurchase a
left join userinfo b
on a.customer_id=b.customer_id
Expanding on #MT0 answer:
SELECT *
FROM customer_details CD
WHERE exists (
SELECT cust_id
FROM purchase_logs PL
where CD.customer_id = PL.customer_id
GROUP BY cust_id, item_id, to_char(purchase_date,'YYYYMM')
HAVING count(*) >= 2
);
I want to display the user info of customers who have more than 1 purchases in a single arbitrary month.
Just add a WHERE filter to your sub-query.
So assuming that you wanted the month of July 2021 and you had a purchase_date column (with a DATE or TIMESTAMP data type) in your purchase_logs table then you can use:
SELECT *
FROM customer_details
WHERE customer_id IN (
SELECT cust_id
FROM purchase_logs
WHERE DATE '2021-07-01' <= purchase_date
AND purchase_date < DATE '2021-08-01'
GROUP BY cust_id
HAVING count(*) >= 2
);
If you want the users where they have bought two-or-more items in any single calendar month then:
SELECT *
FROM customer_details c
WHERE EXISTS (
SELECT 1
FROM purchase_logs p
WHERE c.customer_id = p.cust_id
GROUP BY cust_id, TRUNC(purchase_date, 'MM')
HAVING count(*) >= 2
);

SQL get latest availability per member

I have a situation where I store in a table each member's availability.
It's a simple table with 4 column.
CREATE TABLE availablities (
availablity_id serial PRIMARY KEY,
member_id serial,
availablity_status_id serial,
start_date timestamp
);
Each member can have multiple records in the table and to get the current status
I get for each member the record that has the most recent start_date that is smaller then now().
I first tried with a naive Max() and Group by query
select
status_code, max(start_date) start_date,availablities.member_id
from
availablities
join
availablity_status on availablity_status.availablity_status_id = availablities.availablity_status_id
where
start_date <= now()
group by
status_code,availablities.member_id;
But this return multiple records per user as I get the most recent record by user and by status.
I finally came up with a query that gives me the expected result.
select status_code,start_date,a2.member_id from availablities a2
join availablity_status on availablity_status.availablity_status_id = a2.availablity_status_id
where a2.availablity_id in(
select
max(availablity_id)
from availablities a
where
a.member_id = a2.member_id and
start_date in(
select
max(start_date) start_date
from availablities
where
start_date <= now()
and a.member_id = availablities.member_id
)
);
But this query takes 60 times longer to execute and doesn't feel right.
I'm pretty sure there must be a better solution but I can't get my hands on it.
What is the correct way to get the expected result?
I've created a DB-fiddle to make it easier to see. Query 1 is incorrect and Query 2 is much slower when we have a couple more data.
https://www.db-fiddle.com/f/iWgvuj8kcms9F5CKuoKsny/2
It looks like you need to use a simple row_number window function here:
with a as (
select *, Row_Number() over(partition by member_id order by start_date desc, availablity_id desc) rn
from availablities
where start_date<now()
)
select s.status_code, a.start_date, a.member_id
from a join availablity_status s on s.availablity_status_id=a.availablity_status_id
where rn=1
Note your data is not selective enough, so for member_id 3, is it available or not? What is the most recent date when there are two identical dates?
I added a tie-breaker to also sort by availability_id to get your expected results
Actually it's availablity_id - you seem to have a common typo here!
See your updated Fiddle

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Remove duplicate batches of data

Due to a bug in my application a table that was built to carry daily records of each delivery period, was populated many times.
Lets say I have a delivery from 1st of June to 5 of June. My table should be populated with 5 records, one for each day. Now, I have havoc because I have many "batches" of the same content.
The table layout is as:
dummy_id -- identity column
delivery_id -- id of the delivery
on_date -- the day
charge -- the daily cost
Is there an elegant way to keep only the first batch of records and delete the batches that were inserted by mistake for all the deliveries?
To delete all dupes for delivery_id, on_date, charge keeping the one with the lowest dummy_id
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY delivery_id,
on_date,
charge
ORDER BY dummy_id) RN
FROM YourTable)
DELETE FROM cte
WHERE RN > 1
You can try:
This is to know which rows you will delete:
SELECT * FROM YOUR_TABLE WHERE DUMMY_ID NOT IN (
SELECT MIN(DUMMY_ID) FROM YOUR_TABLE GROUP BY DELIVERY_ID)
This will delete these rows:
DELETE FROM YOUR_TABLE WHERE DUMMY_ID NOT IN (
SELECT MIN(DUMMY_ID) FROM YOUR_TABLE GROUP BY DELIVERY_ID)
Try
DELETE FROM table WHERE dummy_id NOT IN (SELECT MIN(dummy_id) FROM table GROUP BY on_date)