Join on elements that arent in a table, or bringing back elements that arent in a table - sql

I have two tables a month table and a product table. The product table will be updated in the future to have new prices (I will insert new 'valid_from' dates)
I would like to join the two tables together to return month, product, price_rate, initial_price, hire_price, other and connection under specific parameters:
I want to return months from the months table that fall within the date range as defined between valid_from and the next value of valid_from for the product and price rate, the following begins to return my required dates and columns:
SELECT
month,
product,
price_rate,
initial_price,
hire_price,
other,
connection
FROM w.products pc
RIGHT JOIN m.month m ON m.month >= pc.valid_from
However; I need to bring back all months from the months table where valid_from is null in the products table, prior to the instance of the next valid from (for the same product and price rate)
I also want to bring back the connection value - how can I do this/what do I join on as I'm currently joining on date, but this doesn't exist within the row where there is a value for connection
id
product
price_rate
valid_from
initial_price
hire_price
other
connection
1
computer
100
154.75
115.5
0.015
2
computer
100
01/01/2021
154.75
115.5
0.015
3
computer
1000
154.75
135
0.015
4
computer
1000
01/01/2021
154.75
135
0.015
5
computer
10000
01/01/2020
453.41
345.5
0.015
6
mouse
100
154.75
142.5
0.015
7
mouse
100
01/01/2021
154.75
142.5
0.015
8
mouse
1000
01/01/2020
154.75
162
0.015
9
mouse
10000
01/01/2020
450.91
415
0.015
10
keyboard
100
163.08
142.5
0.015
11
keyboard
100
01/01/2021
163.08
142.5
0.015
12
keyboard
1000
01/01/2020
163.08
162
0.015
13
121
month
01/01/2019
01/02/2019
01/03/2019
01/04/2019
01/05/2019
01/06/2019
01/07/2019
01/08/2019
01/09/2019
01/10/2019
01/11/2019
01/12/2019
01/01/2020
01/02/2020
01/03/2020
01/04/2020
01/05/2020
01/06/2020
01/07/2020
01/08/2020
01/09/2020
01/10/2020
01/11/2020
01/12/2020
01/01/2021
01/02/2021
01/03/2021
01/04/2021
01/05/2021
01/06/2021
01/07/2021
01/08/2021
01/09/2021
01/10/2021
01/11/2021
01/12/2021
01/01/2022
01/02/2022

Related

Calculate Average for Amount for certain date range in a year based on month

I have a table like below :
ID
Amount
Date
1
500
2022-01-03
1
200
2022-01-04
1
500
2022-01-05
1
340
2022-01-06
1
500
2022-01-25
1
500
2022-01-26
1
567
2022-01-27
1
500
2022-01-28
1
598
2022-01-31
1
500
2022-02-01
1
787
2022-02-02
1
500
2022-02-03
1
5340
2022-02-04
PROBLEM :-
So I have to calculate average of column where StartDate = 03/01/2022 (3rd Jan 2022) and for each month it would be like for January Average of Amount from StartDate to 25th Jan, then for Feb Startdate to 22nd Feb, so this date logic is also there
SET #Last = (SELECT DATEADD(DAY, CASE DATENAME(WEEKDAY, #Date)
WHEN 'Sunday' THEN -6
When 'Saturday' THEN -5
ELSE -7 END, DATEDIFF(DAY, 0, #Date)))
RETURN #Last
ID
Amount
Date
Last
1
500
2022-01-03
2022-01-25
1
500
2022-01-04
2022-01-25
1
340
2022-01-05
2022-01-25
1
500
2022-01-06
2022-01-25
1
567
2022-01-25
2022-01-25
1
500
2022-01-26
2022-01-25
1
500
2022-01-27
2022-01-25
1
40
2022-01-28
2022-01-25
1
500
2022-01-31
2022-01-25
1
589
2022-02-01
2022-02-22
1
540
2022-02-02
2022-02-22
1
500
2022-02-03
2022-02-22
1
5340
2022-02-04
2022-02-22
Like the above table..
Now if I calculate Avg(Amount), from 3rd jan to 25th Jan for Jan and 3rd Jan to 22nd Feb and so on.. It's not giving correct average, like it is calculating the rest of the days amount also. Also grouping by is grouping month wise not as where clause
Select Avg(Amount) from Table
where Date BETWEEN #StartDate AND Last
StartDate is fixed # 3rd Jan.
This is not giving the correct Avg. Any other way I could get the required data?

SQL query for getting data for the last 6 months grouped by month?

I know a basic query to get some results for the last 6 months. Let's say like this:
SELECT *
FROM RANDOM_TABLE
WHERE Date_Column >= DATEADD(MONTH, -6, GETDATE())
But what if I'd like to get results grouped by month - each month looking back 6 months into the past?
The first three rows of a result could ideally look like this (count of IDs is random):
Month_and_year
COUNT(ID)
January 2017
120
February 2017
160
March 2017
240
The last three rows:
Month_and_year
COUNT(ID)
November 2021
80
December 2021
350
January 2021
260
Hope it's understandable.
Thanks in advance!
EDIT:
Over the hours I made a few corrections. Most notably I corrected the self join query to reflect my intentions and also added more details to better explain what is going on.
To my knowledge there are two ways about it (which are probably the same under the hood).
Also, please note that these solutions assume you have a month field already in place. If you have a date or timestamp field, you should take one extra preparation step.
[Addendum] To be more precise, I'd say that the ideal would be to have a date/timestamp field that is truncated/flattened to the first day of the month.
As an example,
month
amount
2021-01-01
50
2021-02-01
20
2021-03-01
10
2021-04-01
100
2021-05-01
20
2021-06-01
40
2021-07-01
80
2021-08-01
50
The first is to use a "self-non-equi join"
SELECT
a.month,
SUM(b.amount) AS amount_over_6_months
FROM table AS a
INNER JOIN table AS b ON a.month BETWEEN b.month AND DATEADD(MONTH, 5, b.month)
WHERE a.month >= DATEADD(MONTH, -5, GETDATE())
GROUP BY a.month
What happens here is that you are joining the table with itself. Specifically, for each row in the (a) alias, you will join six rows from the (b) alias. For each row you will join the rows where the month is equal, all the way back to five months prior. So...
a.month
b.month
a.amount
b.amount
2021-01-01
2021-01-01
50
50
2021-02-01
2021-01-01
20
50
2021-02-01
2021-02-01
20
20
2021-03-01
2021-01-01
10
50
2021-03-01
2021-02-01
10
20
2021-03-01
2021-03-01
10
10
2021-04-01
2021-01-01
100
50
2021-04-01
2021-02-01
100
20
2021-04-01
2021-03-01
100
10
2021-04-01
2021-04-01
100
100
2021-05-01
2021-01-01
20
50
2021-05-01
2021-02-01
20
20
2021-05-01
2021-03-01
20
10
2021-05-01
2021-04-01
20
100
2021-05-01
2021-05-01
20
20
2021-06-01
2021-01-01
40
50
2021-06-01
2021-02-01
40
20
2021-06-01
2021-03-01
40
10
2021-06-01
2021-04-01
40
100
2021-06-01
2021-05-01
40
20
2021-06-01
2021-06-01
40
40
2021-07-01
2021-02-01
80
20
2021-07-01
2021-03-01
80
10
2021-07-01
2021-04-01
80
100
2021-07-01
2021-05-01
80
20
2021-07-01
2021-06-01
80
40
2021-07-01
2021-07-01
80
80
...
...
...
...
Then it's just a matter of grouping based on the month in the (a) alias, and summing the amounts coming from the (b) alias.
The advantage of this approach is that it should be vendor and generation agnostic, save the DATEADD() fucuntion.
The second solution would be to use window functions. I cannot comment on whether this would work with your vendor and the specific version.
SELECT
month,
SUM(amount) OVER (ORDER BY month ROWS BETWEEN 5 PRECEDING AND CURRENT ROW)
FROM table

How to merge records with aggregate historical data?

I have a table with individual records and another which holds historical information about the individuals in the former.
I want to extract information about the individuals from the second table. Both tables have timestamp. It is very important that the historical information happened before the record in the first table.
Date_Time name
0 2021-09-06 10:46:00 Leg It Liam
1 2021-09-06 10:46:00 Hollyhill Island
2 2021-09-06 10:46:00 Shani El Bolsa
3 2021-09-06 10:46:00 Kilbride Fifi
4 2021-09-06 10:46:00 Go
2100 2021-10-06 11:05:00 Slaneyside Babs
2101 2021-10-06 11:05:00 Hillview Joe
2102 2021-10-06 11:05:00 Fairway Flyer
2103 2021-10-06 11:05:00 Whiteys Surprise
2104 2021-10-06 11:05:00 Astons Lucy
The name is the variable by which you connect the two tables:
Date_Time name cc
13 2021-09-15 12:16:00 Hollyhill Island 6.00
14 2021-09-06 10:46:00 Hollyhill Island 4.50
15 2021-05-30 18:28:00 Hollyhill Island 3.50
16 2021-05-25 10:46:00 Hollyhill Island 2.50
17 2021-05-18 12:46:00 Hollyhill Island 2.38
18 2021-04-05 12:31:00 Hollyhill Island 3.50
19 2021-04-28 12:16:00 Hollyhill Island 3.75
I want to add aggregated data from this table to the first. Such as adding the cc mean and count.
Date_Time name
1 2021-09-06 10:46:00 Hollyhill Island
This line I would add 5 for cc count and 3.126 for the cc mean. Remember the historical records need to be before the date time of the individual records.
I am a bit confused how to do this efficiently. I know I need to groupby the historical data.
Also the individual records are usually in groups of Date_Time, if that makes it any easier.
IIUC:
try:
out=df1.merge(df2,on='name',suffixes=('','_y'))
#merging both df's on name
out=out.mask(out['Date_Time']<=out['Date_Time_y']).dropna()
#filtering results
out=out.groupby(['Date_Time','name'])['cc'].agg(['count','mean']).reset_index()
#aggregrating values
output of out:
Date_Time name count mean
0 2021-09-06 10:46:00 Hollyhill Island 5 3.126

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

How do you summarize row data in sybase table

I have this table in sybase:
Date File_name File_Size customer Id
1/1/205 11:00:00 temp.csv 100000 ESPN 1111
1/1/205 11:10:00 temp.csv 200000 ESPN 1122
1/1/205 11:20:00 temp.csv 400000 ESPN 1456
1/1/205 11:30:00 temp.csv 400000 ESPN 2345
1/2/205 11:00:00 llc.csv 100000 LLC 445
1/2/205 11:10:00 llc1.txt 200000 LLC 677
1/2/205 11:20:00 dtt.txt 500000 LLC 76
1/2/205 11:30:00 jpp.txt 400000 LLC 666
I need to come up with a query to summarize this data by day which will be month/day/Year.
Date total_file_size number_of_unique_customers number_unique_id
1/1/2015 110,000 1 4
1/2/2015 120,000 1 4
How would I do this in sql query? I tried this:
select convert(varchar,arrived_at,110) as Date
sum(File_Size),
count(distinct(customer)),
count(distinct(id))
group by Date
Does not seem to be working, any ideas?
try
select
convert(varchar,arrived_at,110) as Date,
SUM(File_Size),
count(distinct customer) as number_of_unique_customers,
count(distinct id ) as number_unique_id
group by convert(varchar,arrived_at,110)