Teradata recurring payment patterns - sql

I am using a customer period level data to find recurring patterns of the transactions they have done.
What I want is to find if the customer is doing recurring payments of the amounts +-10% either weekly, fortnightly or monthly.
I am restricted to Teradata for this, so I am using self join with the same table defining join on accounts and channel through which the transaction is going. As the channel can vary, there can be multiple transactions in a day so that filter on join is required. I am also giving a loose window as people can pay a day after or before also.
The query I am using currently is like this:
`
create multiset table weekly as
(
sel a.*, b.*, c.*
from table1 a
join table1 b
on a.account = b.account and a.channel = b.channel
join table1 c
on a.account = c.account and a.channel = c.channel
where c.date between b.date + 6 and b.date + 9 and b.amount between 0.9*(c.amount) and 1.1*(c.amount)
and b.date between a.date + 6 and a.date + 9 and a.amount between 0.9*(b.amount) and 1.1*(b.amount)
)
with data
primary index(account)
'
Is there any other efficient way of doing is becuase I am able to max join 3 self joins and I need more than 3 recurrence to confirm its weekly? I can migrate this data to R and then build some algo on it but data size is around 40 million.

Related

Finding days when users haven't created any entries

I've 2 tables: users and time_entries, time entries has a foreign key to the users table. Users may create time entries with some time amount in it. I want to write a query which could return summarized amounts of time in arbitrary dates range grouped by user and date - it's easy but I need to include also days when nobody entered any time_entry. I've tried to create an additional table called calendar with dates and left join time_entries to it but I couldn't retrieve a list of users that haven't entered any time_entry. Here is my query:
SELECT te.date, SUM(te.amount), user_name
FROM calendar c
LEFT JOIN time_entries te on c.date = te.date
RIGHT JOIN asp_net_users anu on te.user_id = anu.id
GROUP BY user_name, te.date
If you just want the days no user made any entry. you can use NOT EXISTS and a correlated subquery.
SELECT c.date
FROM calendar c
WHERE NOT EXISTS (SELECT *
FROM time_entries te
WHERE te.date = c.date);
If you want all users along with the days they haven't made any entry cross join the users and the days and then also use a NOT EXISTS.
SELECT anu.user_name,
c.date
FROM asp_net_users anu
CROSS JOIN calendar c
WHERE NOT EXISTS (SELECT *
FROM time_entries te
WHERE te.user_id = anu.id
AND te.date = c.date);
Thanks to sticky bit examples I was able to write the following query which solves my problem:
SELECT c.date, a.id, COALESCE(sum(te.amount), 0)
FROM asp_net_users a
CROSS JOIN (SELECT *
FROM calendar
WHERE date BETWEEN '2019-10-01 00:00:00'::timestamp AND '2019-10-31 00:00:00'::timestamp) c
LEFT JOIN time_entries te on a.id = te.user_id AND c.date = te.date
WHERE a.department_guid = '95b7538d-3830-48d7-ba06-ad7c51a57191'
GROUP BY c.date, a.id
ORDER BY c.date

Transpose only certain data in SQL

My data looks like this:
Company Year Total Comment
Comp A 01-01-2000 5,000 Checked
Comp A 01-01-2001 6,000 Checked
Comp B 05-05-2007 3,000 Not checked completely
Comp B 05-05-2008 4,000 Checked
Comp C 18-01-2003 1,500 Not checked completely
Comp C 18-01-2002 3,500 Not checked completely
I've been asked to transpose certain data, but I do not believe this can be done using SQL (Server) so that it looks like this:
Company Base Date Base Date-1 Comment Base Date Comment Base Date-1
Comp A 01-01-2001 01-01-2000 Checked Checked
Comp B 05-05-2008 05-05-2007 Checked Not completely checked
Comp C 18-01-2003 18-01-2002 Not completely checked Not completely checked
I have never built anything like this. If I would then maybe Excel is a better alternative? How should I tackle this?
Is it possible using SELECT MAX(Base Date) and MIN(Base Date)? And how would I then tackle the strings like that..
You can use a self join to do this. However, you should think about dates like February 29 as they only occur in leap years.
select t1.company,t1.year as basedate,t2.year as basedate_1,
t1.comment as comment_basedate,t2.comment as comment_basedate_1
from t t1
left join t t2 on t1.company=t2.company dateadd(year,1,t2.year)=t1.year
Change the left join to an inner join if you only need results where both the date values exist for a company. This solution assumes there can only be one comment per day.
I'd assign a row number to each record partitioned by company ordered by year desc though an analytical function in a common table expression... then use a left self join... on the row number + 1 and company.
This assumes you only want 1 record per company using the 2 most recent years. and if only 1 record exists for a company null values are acceptable for the second year. If not we can change the left join to an inner and eliminate both records...
We use a common table expression (though a inline view would work as well) to assign a row number to each record. That value is then made available in our self join so we don't have to worry about different dates and max values. We then use our RowNumber (RN) and company to join the 2 desired records together. To save on some performance we limit 1 table to RN 1 and the second table to RN 2.
WITH CTE AS (
SELECT *, Row_Number() over (Partition by Company Order by Year Desc) RN FROM TABLE)
SELECT A.Company
, A.Year as Base_Date
, B.Year as Base_Date1
, A.comment as Base_Date_Comment
, B.Comment as Base_Date1_Comment
FROM CTE A
LEFT JOIN CTE B
on A.RN+1 = B.RN
and A.Company = B.Company
and B.RN = 2
WHERE A.RN = 1
Note the limit on RN=2 must be on the join since it's an outer join or we would eliminate the companies without 2 years. (in essence making the left join an inner)
This approach makes all columns of the data available for each row.
If there are only two rows each, then that's pretty simple. If there are more than two rows, you could do something like this -- essentially joining all rows, then making sure A represents the earliest row and B represents the latest row.
SELECT A.Company, A.Year AS [Base Date], B.Year AS [Base Date 1],
A.Comment AS [Comment Base Date], B.Comment AS [Comment Base Date 1]
FROM MyTable A
INNER JOIN MyTable B ON A.Company = B.Company
WHERE A.Year = (SELECT MIN(C.YEAR) FROM MyTable C WHERE C.Company = A.Company)
AND B.Year = (SELECT MAX(C.YEAR) FROM MyTable C WHERE C.Company = B.Company)
There might be a more efficient way to do this with Row_Number or something.

Match two tables based on minimum dates efficiently

I have two tables one which contains quarterly data and one which contains daily data. I would like to join the two tables such that for each day in the daily data the quarterly data for that quarterly is selected and returned daily. I am working with Postgres 9.3.
The current query is as follows:
select
a.ID,
a.datadate,
b.*,
case when a.datadate = b.rdq then 1 else 0 end as VALID
from proj_data a, proj_rat b
where a.id = b.id
and b.rdq = (select min(rdq)
from proj_rat c
where a.id = c.id and a.datadate >= c.rdq);
But it is excruciatingly slow and I need to do this for several thousand IDs. Can anyone suggest a more efficient solution?
This eliminates the need for a subquery in the where clause
select
ID,
a.datadate,
b.*,
(a.datadate = b.rdq)::integer as VALID
from
proj_data a
inner join
(
select distinct on (id, rdq) *
from project_rat
order by id, rdq
) b using(id)
where a.datadate >= b.rdq;

SQL join: selecting last record that meets a condition from the original table

I am new to SQL, so excuse any lapse of notation. A much simplified version of my problem is as follows. I have hospital admissions in table ADMISSIONS and need to collect the most recent outpatient claim of a certain type from table CLAIMS prior to the admission date:
SELECT a.ID , a.date, b.claim_date
FROM admissions as a
LEFT JOIN claims b on (a.ID=b.ID) and (a.date>b.claim_date)
LEFT JOIN claims c on ((a.ID=c.ID) and (a.date>c.claim_date))
and (b.claim_date<c.claim_date or b.claim_date=c.claim_date and b.ID<c.ID)
WHERE c.ID is NULL
The problem is that for some IDs I get many records with duplicate a.date, c.claim_date values.
My problem is similar to one discussed here
SQL join: selecting the last records in a one-to-many relationship
and elaborated on here
SQL Left join: selecting the last records in a one-to-many relationship
However, there is the added wrinkle of looking only for records in CLAIMS that occur prior to a.date and I think that is causing the problem.
Update
Times are not stored, just dates, and since a patient can have multiple records on the same day, it's an issue. There is another wrinkle, which is that I only want to look at a subset of CLAIMS (let's say claims.flag=TRUE). Here's what I tried last:
SELECT a.ID , a.date, b.claim_date
FROM admissions as a
LEFT JOIN (
select d.ID , max(d.claim_date) cdate
from claims as d
where d.flag=TRUE
group by d.ID
) as b on (a.ID=b.ID) and (b.claim_date < a.date)
LEFT JOIN claims c on ((a.ID=c.ID) and (c.claim_date < a.claim_date))
and c.flag=TRUE
and (b.claim_date<c.claim_date or b.claim_date=c.claim_date and b.ID<c.ID)
WHERE c.ID is NULL
However, this ran for a couple of hours before aborting (typically takes about 30 mins with LIMIT 10).
You may want to try using a subquery to solve this problem:
SELECT a.ID, a.date, b.claim_date
FROM admissions as a
LEFT JOIN claims b ON (a.ID = b.ID)
WHERE b.claim_date = (
SELECT MAX(c.claim_date)
FROM claims c
WHERE c.id = a.id -- Assuming that c.id is a foreign key to a.id
AND c.claim_date < a.date -- Claim date is less than admission date
);
An attempt to clarify with different IDs, and using an additional subquery to account for duplicate dates:
SELECT a.ID, a.patient_id, a.date, b.claim_id, b.claim_date
FROM admissions as a
LEFT JOIN claims b ON (a.patient_ID = b.patient_ID)
WHERE b.claim_id = (
SELECT MAX(c.claim_id) -- Max claim identifier (likely most recent if sequential)
FROM claims c
WHERE c.patient_ID = a.patient_ID
AND c.flag = TRUE
AND c.claim_date = (
SELECT MAX(d.claim_date)
FROM claims d
WHERE d.patient_id = c.patient_id
AND c.claim_date < a.date -- Claim date is less than admission date
AND d.flag = TRUE
)
)
b.flag = TRUE;

sqlite select query

I am working in Android with SQLite.
My db has 2 tables:
Table 1: cars
_id(int,pri-key)
reg(text)
type(text)
Table 2: jobs
_id(int,foreign-key)
date(date)
I need a sqlite statment which will get me all cars which have NOT had a job in the past 3 weeks. Iam actually porting the app from c#, and the statment I use for this (in MySQL) is
SELECT c.id, c.reg, c.type FROM buses c WHERE NOT EXISTS (SELECT NULL FROM jobs j WHERE j.id = c.id AND j.date >= CURRENT_DATE - INTERVAL 21 DAY
But the SqliteDataBase Object I am working with in android takes a different format, How would I run this query?
Thanks in advance.
I would try something like this:
SELECT * from cars A LEFT OUTER JOIN jobs B on A._id = B._id WHERE B._id IS NULL OR B.date < date('now', '-21 days');
The LEFT OUTER JOIN, ensures all values from the cars table are shown in the output (including the ones that don't match the join criteria - i.e. ones that don't have an entry in the jobs table). The WHERE criteria, filters for either, ones that don't have an entry in the jobs table (B._id IS NULL) or ones that are more than 21 days old B.date < date('now', '-21 days')
Of course I am assuming, there will be only 1 entry on the Jobs table for each car. If there will be more, you probably want to use MAX to get the latest date.
WORKING SOLUTION: SELECT * from cars A LEFT OUTER JOIN jobs B on A._id = B._id GROUP BY A._id HAVING B._id IS NULL OR MAX(B.date) < date('now', '-21 days');