Normalize monthly payments - sql

First, sorry for my bad English. I'm trying to normalize a table in a pension system where subscribers are paid monthly. I need to know who has been paid and who has not and how much they've been paid. I believe I'm using SQL Server. Here's an example:
id_subscriber id_receipt year month pay_value payment type_pay
12 1 2016 January 100 80 1
13 1 2016 January 100 100 1
14 1 2016 January 100 100 1
12 2 2016 February 100 100 2
13 2 2016 February 100 80 1
But I'm not happy repeating the year and the month for every single subscriber. It doesn't seem right. Is there a better way to store this data?
EDIT:
The case is as follows: this company has many subscribers who must pay monthly and payment can be in various ways. They produce a single receipt for many customers, and each customer that receipt may be paying one or more installments.
These are my other tables:
tbl_subscriber
id_suscriber(PK) first_name last_name address tel_1 tel_2
12 Juan Perez xxx xxx xxx
13 Pedro Lainez xxx xxx xxx
14 Maria Lopez xxx xxx xxx
tbl_receipt
id_receipt(PK) value elaboration_date deposit_date
1 1,000.00 2015-09-16 2015-09-20
2 890.00 2015-12-01 2015-12-18
tbl_type_paym
id type description
1 bank xxxx
2 ventanilla xxx

This basically seems fine. You could split dates out into a separate table and reference that, but that strikes me as a kind of silly way to do it. I would recommend storing the month as an integer instead of a varchar column though. Besides not storing the same string over and over you can more reasonably do comparisons.
You could also use date values, although that might not be worth the trouble when you don't want greater granularity than the month.

Related

SQL Query - Identifying entries between payment dates greater than 6 years

I have this table (in reality it has more fields but for simplicity, it will demonstrate what I'm after)
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00
I need 2 specific things from this:
I need all entries where there is over 6 years difference between any given payment dates (when its been greater than or equal to 6 years from the date of the latest payment date). I don't need to count them, I just need it to return all the entries that meet this criteria.
I also need it to specify where a normal payment hasn't been made for 6 years or more from todays date but a reversal has however occurred within the last 6 years. (This might need to be a separate query but will take suggestions)
I'm using Data Lake (Hue).
Thank you.
I've tried to run a sub query with join and union but I'm not getting the desired results so will need to start from scratch. Any advice/insight on this is greatly appreciated.
Ideally, query one will show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2015-01-01
£1.00
Normal
1
2017-01-01
£2.00
Normal
2
2016-12-29
£3.00
Query 2 results should show:
Payment_Type
Person ID
Payment_date
Payment_Amount
Normal
1
2017-01-01
£2.00
Reversal
1
2022-01-09
£3.00
Normal
2
2016-12-29
£3.00
Reversal
2
2022-01-02
£4.00

What logic should be used to label customers (monthly) based on the categories they bought more often in the preceding 4 calendar months?

I have a table that looks like this:
user
type
quantity
order_id
purchase_date
john
travel
10
1
2022-01-10
john
travel
15
2
2022-01-15
john
books
4
3
2022-01-16
john
music
20
4
2022-02-01
john
travel
90
5
2022-02-15
john
clothing
200
6
2022-03-11
john
travel
70
7
2022-04-13
john
clothing
70
8
2022-05-01
john
travel
200
9
2022-06-15
john
tickets
10
10
2022-07-01
john
services
20
11
2022-07-15
john
services
90
12
2022-07-22
john
travel
10
13
2022-07-29
john
services
25
14
2022-08-01
john
clothing
3
15
2022-08-15
john
music
5
16
2022-08-17
john
music
40
18
2022-10-01
john
music
30
19
2022-11-05
john
services
2
20
2022-11-19
where i have many different users, multiple types making purchases daily.
I want to end up with a table of this format
user
label
month
john
travel
2022-01-01
john
travel
2022-02-01
john
clothing
2022-03-01
john
travel-clothing
2022-04-01
john
travel-clothing
2022-05-01
john
travel-clothing
2022-06-01
john
travel
2022-07-01
john
travel
2022-08-01
john
services
2022-10-01
john
music
2022-11-01
where the label would record the most popular type (based on % of quantity sold) for each user in a timeframe of the last 4 months (including the current month). So for instance, for March 2022 john ordered 200/339 clothing (Jan to and including Mar) so his label is clothing. But for months where two types are almost even I'd want to use a double label like for April (185 travel 200 clothing out of 409). In terms of rules this is not set in stone yet but it's something like, if two types are around even (e.g. >40%) then use both types in the label column; if three types are around even (e.g. around 30% each) use three types as label; if one label is 40% but the rest is made up of many small % keep the first label; and of course where one is clearly a majority use that. One other tricky bit is that there might be missing months for a user.
I think regarding the rules I need to just compare the % of each type, but I don't know how to retrieve the type as label afterwards. In general, I don't have the SQL/BigQuery logic very clearly in my head. I have done somethings but nothing that comes close to the target table.
Broken down in steps, I think I need 3 things:
group by user, type, month and get the partial and total count (I have done this)
then retrieve the counts for the past 4 months (have done something but it's not exactly accurate yet)
compare the ratios and make the label column
I'm not very clear on the sql/bigquery logic here, so please advise me on the correct steps to achieve the above. I'm working on bigquery but sql logic will also help
Consider below approach. It looks a little bit messy and has a room to optimize but hope you get some idea or a direction to address your problem.
WITH aggregation AS (
SELECT user, type, DATE_TRUNC(purchase_date, MONTH) AS month, month_no,
SUM(quantity) AS net_qty,
SUM(SUM(quantity)) OVER w1 AS rolling_qty
FROM sample_table, UNNEST([EXTRACT(YEAR FROM purchase_date) * 12 + EXTRACT(MONTH FROM purchase_date)]) month_no
GROUP BY 1, 2, 3, 4
WINDOW w1 AS (
PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
)
),
rolling AS (
SELECT user, month, ARRAY_AGG(STRUCT(type, net_qty)) OVER w2 AS agg, rolling_qty
FROM aggregation
QUALIFY ROW_NUMBER() OVER (PARTITION BY user, month) = 1
WINDOW w2 AS (PARTITION BY user ORDER BY month_no RANGE BETWEEN 3 PRECEDING AND CURRENT ROW)
)
SELECT user, month, ARRAY_TO_STRING(ARRAY(
SELECT type FROM (
SELECT type, SUM(net_qty) / SUM(SUM(net_qty)) OVER () AS pct,
FROM r.agg GROUP BY 1
) QUALIFY IFNULL(FIRST_VALUE(pct) OVER (ORDER BY pct DESC) - pct, 0) < 0.10 -- set threshold to 0.1
), '-') AS label
FROM rolling r
ORDER BY month;
Query results

SQL Query for a table

I’m looking for a little assistance. I have a table called equipment. One row is an order of some type of equipment.
Here are the fields:
num_id date player_id order_id active jersey comment
BIGINT DATE BIGINT BIGINT CHAR(1) CHAR(3) VARCHAR(1024)
11 2018-01-01 123 1 Y XL
11 2018-01-01 123 2 Y M Purple
11 2018-01-01 123 3 Y L White, Red
13 2018-01-11 456 1 N S Yellow, Light Blue
14 2018-02-01 789 1 Y M Orange, Black
15 2018-02-02 101 1 Y XL Shield
15 2018-02-02 101 2 Y XL Light Green, Grey
I need to write a query that shows one row for each month with the columns
Month
Total Orders
Total Products ordered
And one extra column for a total count of each size sold.
Is this easy? Any help would be appreciated.
EDIT: To answer people's questions below, SQL Server is the dbms. My apologies. As well, I am struggling as I don't know how to get the month from a date. And then adding the column for size counts has me baffled, but I haven't fully investigated that portion. I feel like the rest I have done individually, just never did it in one succinct query.
It looks weird here and I don't know how to add a table to stackoverflow, so I'll try to make it a little more visually appealing here:
The end goal I think would be like this:
Month Total Orders Total Products Ordered Size Count
January 1 3 S-0, M-1, L-1, XL-2
February 3 6 S–1, M–2, L–1, XL–3
Or this:
Month Total Orders Total Products Ordered S Count M Count L Count XL Count
January 1 3 0 1 1 2
February 3 6 1 2 1 3
You need PIVOT.
It basicly turns rows into columns, which exactly is your case.
https://www.codeproject.com/Tips/500811/Simple-Way-To-Use-Pivot-In-SQL-Query

DAX. Previous year Amount

I have a single table in Powerpivot.
My columns are Account, Amount and Date. I want to calculate PrevYearAmount, but I can't fin the correct formula.
Sample data:
Account Amount Date PrevYearAmount
1 100 01/01/2016 90
1 120 02/01/2016 200
2 130 01/01/2016 108
2 103 01/01/2015
2 105 01/01/2015
1 90 01/01/2015
1 200 02/01/2015
tried
=CALCULATE(SUM(Hoja1[Amount]);FILTER(Hoja1;DATEADD(Hoja1[Date];-1;YEAR));FILTER(Hoja1;Hoja1[Account]))
But this returns 350 for all rows.
Also tried:
=CALCULATE(SUM(Hoja1[Importe]);DATESYTD(SAMEPERIODLASTYEAR(Hoja1[Fecha])))
but returns blank
this should do the trick:
calculate(sum('Table1'[Amount]);SAMEPERIODLASTYEAR('Table1'[Date]))
Hope this helps.
But, please consider to create a date table, as it is always a good idea, to use relationships, this makes the expanding/collapsing part of DAX much easier.

SQL: Display joined data on a day to day basis anchored on a start date

Perhaps my title is misleading, but I am not sure how else to phrase this. I have two tables, tblL and tblDumpER. They are joined based on the field SubjectNumber. This is a one (tblL) to many (tblDumpER) relationship.
I need to write a query that will give me, for all my subjects, a value from tblDumpER associated with a date in tblL. This is to say:
SELECT tblL.SubjectNumber, tblDumpER.ER_Q1
FROM tblL
LEFT JOIN tblDumpER ON tblL.SubjectNumber=tblDumpER.SubjectNumber
WHERE tblL.RandDate=tblDumpER.ER_DATE And tblDumpER.ER_Q1 Is Not Null
This is straightforward enough. My problem is the value RandDate from tblL is different for every subject. However, it needs to be displayed as Day1 so I can have tblDumpER.ER_Q1 as Day1 for every subject. Then I need RandDate+1 As Day2, etc until I hit either null or Day84. The 'dumb' solution is to write 84 queries. This is obviously not practical. Any advice would be greatly appreciated!
I appreciate the responses so far but I don't think that I'm explaining this correctly so here is some example data:
SubjectNumber RandDate
1001 1/1/2013
1002 1/8/2013
1003 1/15/2013
SubjectNumber ER_DATE ER_Q1
1001 1/1/2013 5
1001 1/2/2013 6
1001 1/3/2013 2
1002 1/8/2013 1
1002 1/9/2013 10
1002 1/10/2013 8
1003 1/15/2013 7
1003 1/16/2013 4
1003 1/17/2013 3
Desired outcome:
(Where Day1=RandDate, Day2=RandDate+1, Day3=RandDate+2)
SubjectNumber Day1_ER_Q1 Day2_ER_Q1 Day3_ER_Q1
1001 5 6 2
1002 1 10 8
1003 7 4 3
This data is then going to be plotted on a graph with Day# on the X-axis and ER_Q1 on the Y-axis
I would do this in two steps:
Create a query that gets the MIN date for each SubjectNumber
Join this query to your existing query, so you can perform a DATEDIFF calculation on the MIN date and the date of the current record.
I'm not entirely sure of what it is that you need, but perhaps a calendar table would be of help. Just create a local table that contains all of the days of the year in it, then use that table to JOIN your dates up?