How to delete records of orders that is canceled within 5 minutes in database? - sql

I have a record of users' purchasing behavior. However, it is long and includes a lot of redundant data. I want to delete orders that purchased and deleted within 5 min
My query so far:
--TABLE 3 COD
select z.user_id,
z.date,
z.actions,
x.name,
x.email,
x.address
sum(z.debit) over (partition by z.seller_id order by z.created_at) -
sum(z.credit) over (partition by z.seller_id order by z.created_at)
as balance
from table_1 z
left join
table_2 x
on z.seller_id = x.uid
order by seller_id, created_at
For simplicity, i got this result
user actions credit debit balance date
1 do_action_A 5000 0 5000 2020-01-01 1:00:00 #no need these 2
1 cancel_A 0 5000 0 2020-01-01 1:03:00 #in result
1 do_action_A 5000 0 5000 2020-01-01 1:10:00
1 do_action_b 3000 0 8000 2020-01-01 1:20:00
1 do_action_c 0 7000 1000 2020-01-01 1:30:00
2 do_action_A 5000 0 5000 2020-01-01 1:00:00
2 do_action_B 3000 0 8000 2020-01-01 1:10:00
We know that users can only cancel their orders within 5 minutes, unfortunately, there is a lot of cancels. I need to make this data table simple and short so as to track and visualize it easily.
Here is my expectataion:
user actions credit debit balance date
1 do_action_A 5000 0 5000 2020-01-01 1:10:00
1 do_action_b 3000 0 8000 2020-01-01 1:20:00
1 do_action_c 0 7000 1000 2020-01-01 1:30:00
2 do_action_A 5000 0 5000 2020-01-01 1:00:00
2 do_action_B 3000 0 8000 2020-01-01 1:10:00

You can try using lead()
select * from
(
select z.user_id,z.date,z.actions,x.name,
x.email,x.address,debtit, credit, balance,
lead(z.actions) over(parition by z.user_id order by z.created_at) as next_action
from table_1 z left join table_2 x
on z.seller_id = x.uid
)A where next_action not like '%cancel%' and actions not like '%cancel%'

Related

How can I create a column which computes only the change of other column on redshift?

I have this dataset:
product customer date value buyer_position
A 123455 2020-01-01 00:01:01 100 1
A 123456 2020-01-02 00:02:01 100 2
A 523455 2020-01-02 00:02:05 100 NULL
A 323455 2020-01-03 00:02:07 100 NULL
A 423455 2020-01-03 00:09:01 100 3
B 100455 2020-01-01 00:03:01 100 1
B 999445 2020-01-01 00:04:01 100 NULL
B 122225 2020-01-01 00:04:05 100 2
B 993848 2020-01-01 10:04:05 100 3
B 133225 2020-01-01 11:04:05 100 NULL
B 144225 2020-01-01 12:04:05 100 4
The dataset has the product the company sells and the customers who saw the product. A customer can see more than one product, but the combination product + customer doesn't have any repetition. I want to get how many people bought the product before the customer sees it.
This would be the perfect output:
product customer date value buyer_position people_before
A 123455 2020-01-01 00:01:01 100 1 0
A 123456 2020-01-02 00:02:01 100 2 1
A 523455 2020-01-02 00:02:05 100 NULL 2
A 323455 2020-01-03 00:02:07 100 NULL 2
A 423455 2020-01-03 00:09:01 100 3 2
B 100455 2020-01-01 00:03:01 100 1 0
B 999445 2020-01-01 00:04:01 100 NULL 1
B 122225 2020-01-01 00:04:05 100 2 1
B 993848 2020-01-01 10:04:05 100 3 2
B 133225 2020-01-01 11:04:05 100 NULL 3
B 144225 2020-01-01 12:04:05 100 4 3
As you can see, when the customer 122225 saw the product he wanted, two people have already bought it. In the case of customer 323455, two people have already bought the product A.
I think I should use some window function, like lag(). But lag() function won't get this "cumulative" information. So I'm kind of lost here.
This looks like a window count of non-null values of buyer_position over the preceding rows:
select t.*,
coalesce(count(buyer_position) over(
partition by product
order by date
rows between unbounded preceding and 1 preceding
), 0) as people_before
from mytable t
Hmmm . . . If I understand correctly, You want the max of the buyer position for the customer/product minus 1:
select t.*,
max(buyer_position) over (partition by customer, product order by date rows between unbounded preceding and current row) - 1
from t;

SQL Query Advance Salary counting with carry forward of monthly balance

I am displaying total balance with carry forward of all months by selecting month
if i have selected Month March and following
emp_id ==== bal_amt ==== advance_sal ==== date ==== basic_salary
-----------------------------------------------------------------
1 48000 2000 2019-01-10 50000
1 46000 2000 2019-01-11 50000
2 78000 2000 2019-01-11 80000
2 75000 3000 2019-01-11 80000
1 49000 1000 2019-02-10 50000
2 74000 6000 2019-02-11 80000
if i select month February then i want last balance amount of each id to be selected with total balance amount which is less than February
please see the selected row below
emp_id ==== bal_amt ==== advance_sal ==== date ==== basic_salary
-----------------------------------------------------------------
1 48000 2000 2019-01-10 50000
1 46000 2000 2019-01-11 50000 -- select
2 78000 2000 2019-01-11 80000
2 75000 3000 2019-01-11 80000 -- select
1 49000 1000 2019-02-10 50000 -- select
2 74000 6000 2019-02-11 80000 -- select
SELECT *
FROM advance_sal x JOIN
(SELECT empp_id, max(id)
FROM advance_sal
WHERE empp_id = 1
AND MONTH(`ad_date`)<="2"
AND YEAR(`ad_date`)<="2019" group_by empp_id) y ON y.empp_id = x.empp_id
AND y.id = x.id
ORDER BY x.id
So total result would be
emp_id ==== bal_amt ==== advance_sal ==== date ==== basic_salary
----------------------------------------------------------------
1 46000 2000 2019-01-11 50000
2 75000 3000 2019-01-11 80000
1 49000 1000 2019-02-10 50000
2 74000 6000 2019-02-11 80000
emp_id ==== total_bal_amount less than February
----------------------------------------------------------------
1 95000
2 149000
any help to write sql query
I'm guessing you need below query. I'm not just sure though how you decide which last balance you need from multiple same date as your date doesn't have a timestamp.
SELECT *, SUM(bal_amt) over(partition by emp_id) fROM (
SELECT emp_id, bal_amt, advance_sal, date, basic_salary, RN = Row_number() over(partition by emp_id order by date desc)
FROM advance_sal where
MONTH(`ad_date`)<="2"
AND YEAR(`ad_date`)<="2019" ) a where RN = 1

check if any three consecutive transactions are over a specified amount

There is a table which has data about the transactions of all the customers in a bank.
In case of the theft of the card (say debit), the miscreant is likely to withdraw as much money as he can in small chunks to avoid suspicion. So the bank wants to know if any three consecutive transactions exceed 10000 then its a fraud.
There can be 100 transactions in a day per customer but only consecutive three(any three- may be 1,2,3 or 7,8,9 etc) per customer per day will be considered.
The table has data as following:
cid trans_date trans_time amount
1 1/15/2018 9:21:33 4000
1 1/15/2018 9:21:34 12000
1 1/15/2018 9:35:33 11000
1 1/15/2018 10:21:33 10000
2 2/17/2018 11:21:33 10000
2 2/18/2018 9:21:33 10000
2 3/18/2018 9:21:33 10000
3 1/15/2018 9:23:33 4000
3 1/15/2018 9:24:33 9000
3 1/15/2018 9:25:33 10000
3 1/15/2018 9:26:33 14000
3 1/15/2018 9:27:33 4000
3 2/18/2018 9:21:33 10000
3 2/18/2018 9:22:33 13000
4 1/15/2018 9:21:33 4000
4 1/15/2018 9:22:33 10000
4 1/15/2018 9:23:33 12000
4 1/15/2018 9:24:33 4000
4 1/15/2018 9:25:33 2000
4 1/15/2018 9:26:33 60000
4 1/15/2018 9:27:33 10000
Output should be cid which satisfy the above condition: cid 1. cid 2 shouldn't be considered since the only two consecutive transactions happened in a day. 3 and 4 of course don't satisfy the condition.
I tried the following query but it didn't give the desired result.
select distinct bb.cid
from
(
select
cid,
count(case when amount>=10000 then 1 else 0 end)
over(partition by cid,trans_date order by trans_time rows between current row and 2 following)
cn
from testtable ) bb
where cn >= 3
I am using Teradata v16 database.
COUNT counts non-NULL values, but your CASE always returns a value (0/1). Either remove the else 0 or switch to SUM. Additionally you can get rid of the Derived Table using QUALIFY:
select distinct cid
from testtable
qualify
SUM(case when amount>=10000 then 1 else 0 end)
over(partition by cid, trans_date order by trans_time
rows 2 preceding) = 3

MS SQL, how to get summary parameter by the time periods (hours) during time period (each work day)

MS SQL 2014.
At the plant 2 work shifts of 12 hours each. I need to create a statistics table, with the columns of time, work shift, bunker number and the weight of products in each bunker (kg).
For example:
DateTime Shift Bunker Weight
> 2018-02-25 12:43:50.9480000 1 1 123
> 2018-02-25 13:57:49.3300000 1 2 200
> 2018-02-25 15:21:15.2970000 1 2 100
> 2018-02-25 01:57:49.3300000 2 1 345
> 2018-02-25 02:21:15.2970000 2 1 55
> 2018-02-26 13:56:02.5570000 1 1 561
> 2018-02-26 14:57:49.3300000 1 2 254
> 2018-02-26 03:57:49.3300000 2 2 400
> 2018-02-26 05:57:49.3300000 2 2 200
How to make a query to output the total weight of products in each bunker for each working shift, for each day? Like this:
DateTime Shift Bunker Weight
> 2018-02-25 1 1 123
> 2018-02-25 1 2 300
> 2018-02-25 2 1 400
> 2018-02-26 1 1 561
> 2018-02-26 1 2 254
> 2018-02-26 2 2 600
This is more than my capabilities in SQL ( Thanks.
select CONVERT(date,datetime),shift,bunker,sum(Weight) as Weight
from table1 group by CONVERT(date,datetime),shift,bunker
You need to do a GROUP BY on Date part of the DateTime column along with Shift and Bunker.
Following query should give your the desired output.
SELECT CAST([DATETIME] AS DATE) AS [DateTime], [Shift],[Bunker] ,SUM([Weight]) AS [Weight]
FROM [TABLE_NAME]
GROUP BY CAST([DATETIME] AS DATE), [Shift], [Bunker]

Postgres problem

Say I have the following data in my table;
tran_date withdraw deposit
25/11/2010 0 500
2/12/2010 100 0
15/12/2010 0 300
18/12/2010 0 200
25/12/2010 200 0
Suppose I want to get the following for date range between 1/12/2010 and 31/12/2010.
tran_date withdraw deposit balance days_since_last_tran
1/12/2010 0 0 500 0
2/12/2010 100 0 400 1
15/12/2010 0 300 700 13
18/12/2010 0 200 900 3
25/12/2010 200 0 700 7
31/12/2010 0 0 700 6
Is this doable in PostgreSQL 8.4?
Use:
SELECT t.tran_date,
t.withdraw,
t.deposit,
(SELECT SUM(y.deposit) - SUM(y.withdrawl)
FROM YOUR_TABLE y
WHERE y.tran_date <= t.tran_date) AS balance,
t.tran_date - COALESCE(LAG(t.tran_date) OVER(ORDER BY t.tran_date),
t.tran_date) AS days_since_last
FROM YOUR_TABLE t
8.4+ is nice, providing access to analytic/windowing functions like LAG.