I have this table in sybase:
Date File_name File_Size customer Id
1/1/205 11:00:00 temp.csv 100000 ESPN 1111
1/1/205 11:10:00 temp.csv 200000 ESPN 1122
1/1/205 11:20:00 temp.csv 400000 ESPN 1456
1/1/205 11:30:00 temp.csv 400000 ESPN 2345
1/2/205 11:00:00 llc.csv 100000 LLC 445
1/2/205 11:10:00 llc1.txt 200000 LLC 677
1/2/205 11:20:00 dtt.txt 500000 LLC 76
1/2/205 11:30:00 jpp.txt 400000 LLC 666
I need to come up with a query to summarize this data by day which will be month/day/Year.
Date total_file_size number_of_unique_customers number_unique_id
1/1/2015 110,000 1 4
1/2/2015 120,000 1 4
How would I do this in sql query? I tried this:
select convert(varchar,arrived_at,110) as Date
sum(File_Size),
count(distinct(customer)),
count(distinct(id))
group by Date
Does not seem to be working, any ideas?
try
select
convert(varchar,arrived_at,110) as Date,
SUM(File_Size),
count(distinct customer) as number_of_unique_customers,
count(distinct id ) as number_unique_id
group by convert(varchar,arrived_at,110)
Related
Let's say I have the following two tables. The first is invoice data.
customer_id
scheduled_payment_date
scheduled_total_payment
1004
2021-04-08 00:00:00
1300
1004
2021-04-29 00:00:00
1300
1004
2021-05-13 00:00:00
1300
1004
2021-06-11 00:00:00
1300
1004
2021-06-26 00:00:00
1300
1004
2021-07-12 00:00:00
1300
1004
2021-07-26 00:00:00
1300
1003
2021-04-05 00:00:00
2012
1003
2021-04-21 00:00:00
2012
1003
2021-05-05 00:00:00
2012
1003
2021-05-17 00:00:00
2012
1003
2021-06-02 00:00:00
2012
1003
2021-06-17 00:00:00
2012
The second is payment data.
customer_id
payment_date
total_payment
1003
2021-04-06 00:00:00
2012
1003
2021-04-16 00:00:00
2012
1003
2021-05-03 00:00:00
2012
1003
2021-05-18 00:00:00
2012
1003
2021-06-01 00:00:00
2012
1003
2021-06-17 00:00:00
2012
1004
2021-04-06 00:00:00
1300
1004
2021-04-22 00:00:00
200
1004
2021-04-27 00:00:00
2600
1004
2021-06-11 00:00:00
1300
I want to allocate the payments to the invoices in the correct order, i.e. payments are allocated to the earliest charge first and then when that is paid start allocating to the next earliest charge. The results should look like:
customer_id
payment_date
scheduled_payment_date
total_payment
payment_allocation
scheduled_total_payment
1004
2021-04-06 00:00:00
2021-04-08 00:00:00
1300
1300
1300
1004
2021-04-22 00:00:00
2021-04-29 00:00:00
200
200
1300
1004
2021-04-27 00:00:00
2021-04-29 00:00:00
2600
1100
1300
1004
2021-04-27 00:00:00
2021-05-13 00:00:00
2600
1300
1300
1004
2021-04-27 00:00:00
2021-06-11 00:00:00
2600
200
1300
1004
2021-06-11 00:00:00
2021-06-11 00:00:00
1300
1100
1300
1004
2021-06-11 00:00:00
2021-06-26 00:00:00
1300
200
1300
1003
2021-04-06 00:00:00
2021-04-05 00:00:00
2012
2012
2012
1003
2021-04-16 00:00:00
2021-04-21 00:00:00
2012
2012
2012
1003
2021-05-03 00:00:00
2021-05-05 00:00:00
2012
2012
2012
1003
2021-05-18 00:00:00
2021-05-17 00:00:00
2012
2012
2012
1003
2021-06-01 00:00:00
2021-06-02 00:00:00
2012
2012
2012
1003
2021-06-17 00:00:00
2021-06-17 00:00:00
2012
2012
2012
How can I do this in SQL?
When I was searching for the answer to this question I couldn't find a good solution anywhere so I figured out my own that I think can be understood and adapted for similar situations.
WITH payments_data AS (
SELECT
*,
SUM(total_payment) OVER (
PARTITION BY customer_id ORDER BY payment_ind ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS total_payment_cum,
COALESCE(SUM(total_payment) OVER (
PARTITION BY customer_id ORDER BY payment_ind ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
), 0) AS prev_total_payment_cum
FROM (
SELECT
customer_id,
payment_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY payment_date ASC) AS payment_ind,
total_payment
FROM
payments
) AS payments_ind
), charges_data AS (
SELECT
customer_id,
scheduled_payment_date,
scheduled_total_payment,
SUM(scheduled_total_payment) OVER (
PARTITION BY customer_id ORDER BY scheduled_payment_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS scheduled_total_payment_cum,
COALESCE(SUM(scheduled_total_payment) OVER (
PARTITION BY customer_id ORDER BY scheduled_payment_date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
), 0) AS prev_scheduled_total_payment_cum
FROM
charges
)
SELECT
*,
CASE
WHEN current_balance >= 0 THEN IIF(
updated_charges >= total_payment,
total_payment,
updated_charges
)
WHEN current_balance < 0 THEN IIF(
scheduled_total_payment >= updated_payments,
updated_payments,
scheduled_total_payment
)
ELSE 0
END AS payment_allocation
FROM (
SELECT
pd.customer_id,
pd.payment_ind,
payment_date,
scheduled_payment_date,
total_payment,
scheduled_total_payment,
total_payment_cum,
scheduled_total_payment_cum,
prev_total_payment_cum,
prev_scheduled_total_payment_cum,
prev_total_payment_cum - prev_scheduled_total_payment_cum AS current_balance,
IIF(
prev_total_payment_cum - prev_scheduled_total_payment_cum >= 0,
scheduled_total_payment - (prev_total_payment_cum - prev_scheduled_total_payment_cum),
NULL
) AS updated_charges,
IIF(
prev_total_payment_cum - prev_scheduled_total_payment_cum < 0,
total_payment + (prev_total_payment_cum - prev_scheduled_total_payment_cum),
NULL
) AS updated_payments
FROM
payments_data AS pd
JOIN charges_data AS cd
ON pd.customer_id = cd.customer_id
WHERE
prev_total_payment_cum < scheduled_total_payment_cum
AND total_payment_cum > prev_scheduled_total_payment_cum
) data
There is a lot going on here so I wrote up an article explaining it in detail. You can find it on Medium here.
The basic idea is to track the cumulative amount of payment and charge through each record (the payments_data and charges_data CTEs) and then use this information to identify whether the charge and payment match each other (the WHERE statement that generates the "data" subquery). If they match then identify how much of the payment should be allocated to the charge (all the calculations related to the "current_balance").
I have two tables a month table and a product table. The product table will be updated in the future to have new prices (I will insert new 'valid_from' dates)
I would like to join the two tables together to return month, product, price_rate, initial_price, hire_price, other and connection under specific parameters:
I want to return months from the months table that fall within the date range as defined between valid_from and the next value of valid_from for the product and price rate, the following begins to return my required dates and columns:
SELECT
month,
product,
price_rate,
initial_price,
hire_price,
other,
connection
FROM w.products pc
RIGHT JOIN m.month m ON m.month >= pc.valid_from
However; I need to bring back all months from the months table where valid_from is null in the products table, prior to the instance of the next valid from (for the same product and price rate)
I also want to bring back the connection value - how can I do this/what do I join on as I'm currently joining on date, but this doesn't exist within the row where there is a value for connection
id
product
price_rate
valid_from
initial_price
hire_price
other
connection
1
computer
100
154.75
115.5
0.015
2
computer
100
01/01/2021
154.75
115.5
0.015
3
computer
1000
154.75
135
0.015
4
computer
1000
01/01/2021
154.75
135
0.015
5
computer
10000
01/01/2020
453.41
345.5
0.015
6
mouse
100
154.75
142.5
0.015
7
mouse
100
01/01/2021
154.75
142.5
0.015
8
mouse
1000
01/01/2020
154.75
162
0.015
9
mouse
10000
01/01/2020
450.91
415
0.015
10
keyboard
100
163.08
142.5
0.015
11
keyboard
100
01/01/2021
163.08
142.5
0.015
12
keyboard
1000
01/01/2020
163.08
162
0.015
13
121
month
01/01/2019
01/02/2019
01/03/2019
01/04/2019
01/05/2019
01/06/2019
01/07/2019
01/08/2019
01/09/2019
01/10/2019
01/11/2019
01/12/2019
01/01/2020
01/02/2020
01/03/2020
01/04/2020
01/05/2020
01/06/2020
01/07/2020
01/08/2020
01/09/2020
01/10/2020
01/11/2020
01/12/2020
01/01/2021
01/02/2021
01/03/2021
01/04/2021
01/05/2021
01/06/2021
01/07/2021
01/08/2021
01/09/2021
01/10/2021
01/11/2021
01/12/2021
01/01/2022
01/02/2022
I am trying to query the ID's whose last entry lies within January month (01/01/2020 to 31/01/2020).
Data is as below
ID DATE
123 25/01/2020
123 27/01/2020
123 30/01/2020
123 02/02/2020
456 17/01/2020
456 18/01/2020
456 19/01/2020
456 22/01/2020
789 30/01/2020
789 01/01/2020
654 03/01/2020
654 08/01/2020
654 10/01/2020
654 25/01/2020
Expected Output
ID DATE
456 22/01/2020
654 25/01/2020
Thank you
You can use group by and having:
select id, max(date)
from t
group by id
having max(date) >= date '2020-01-01' and
max(date) < date '2020-02-01'
i got revenue over accounts monthly what am looking for is to view earnings for each account in descending order from last decrease
here is the query
SELECT account_id,
monthly_date,
earnings
FROM accounts_revenue
GROUP BY account_id,
monthly_date
the data is something like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2017-10-01
2000
55
2018-02-01
1500
55
2018-05-01
1200
55
2018-12-01
3000
55
2019-01-01
900
55
2019-02-01
810
55
2019-04-01
1000
55
2019-05-01
600
55
2020-01-01
800
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
so the data should be like that
account_id
monthly_date
earnings
55
2017-01-01
2000
55
2017-02-01
1950
55
2018-02-01
1500
55
2018-05-01
1200
55
2019-01-01
900
55
2019-02-01
810
55
2019-05-01
600
55
2020-02-01
100
122
2020-01-01
800
122
2020-02-01
100
any idea how to achieve this ??
Use NOT EXISTS:
SELECT ar1.*
FROM accounts_revenue ar1
WHERE NOT EXISTS (
SELECT 1
FROM accounts_revenue ar2
WHERE ar2.account_id = ar1.account_id
AND ar2.monthly_date < ar1.monthly_date
AND ar2.earnings <= ar1.earnings
)
ORDER BY ar1.account_id, ar1.monthly_date;
See the demo.
You can use the lag() window function and a CTE (Or subquery if you prefer) to filter out rows you don't want:
WITH revenue AS
(SELECT account_id, monthly_date, earnings,
lag(earnings) OVER (PARTITION BY account_id ORDER BY monthly_date) AS prev_earnings
FROM accounts_revenue)
SELECT account_id, monthly_date, earnings
FROM revenue
WHERE earnings < prev_earnings OR prev_earnings IS NULL
ORDER BY account_id, monthly_date;
For efficiency, you'll want an index on accounts_revenue(account_id, monthly_date).
The following is the table I am having,
City date count
Seattle 2016-07-14 10
Seattle 2016-07-15 20
Seattle 2016-07-16 30
Seattle 2016-07-18 40
Seattle 2016-07-19 50
Seattle 2016-07-20 60
Seattle 2016-07-25 70
Seattle 2016-07-26 80
Bellevue 2016-07-21 90
Bellevue 2016-07-22 100
Bellevue 2016-07-23 110
Bellevue 2016-07-25 120
Bellevue 2016-07-26 130
Bellevue 2016-07-27 140
Bellevue 2016-08-10 150
Bellevue 2016-08-11 160
Bellevue 2016-08-12 170
I want to summarize this table into date intervals where every row will contain each interval of date. Whenever there is a break in the days, I want to create another row. My sample output should be as follows,
City min_date max_date sum_count
Seattle 2016-07-14 2016-07-16 60
Seattle 2016-07-18 2016-07-20 150
Seattle 2016-07-25 2016-07-26 150
Bellevue 2016-07-21 2016-07-23 300
Bellevue 2016-07-25 2016-07-27 390
Bellevue 2016-08-10 2016-08-12 480
Here if we can see, whenever there is a break in the dates, a new entry is created and the count is summed across. I want to create a entry whenever there is a break in the date.
I tried,
select city, min(date), max(date) , sum(count) from table
group by city
but that gives only two rows here.
Can anybody help me in doing this in Hive?
This is a "gaps-and-islands" problem. The difference of row number from the date works:
select city, min(date), max(date), sum(count)
from (select t.*,
row_number() over (partition by city order by date) as seqnum
from t
) t
group by city, date_sub(date, seqnum);