Hive conditional count by resetting counter? - sql

I have two hive tables, customers and transaction.
customer table
---------------------------------
customer_id | account_threshold
---------------------------------
101 | 200
102 | 500
transaction table
-------------------------------------------
transaction_date | customer_id | amount
-------------------------------------------
07/01/2018 101 250
07/01/2018 102 450
07/02/2018 101 500
07/03/2018 102 100
07/04/2018 102 50
Result:
------------------------------
customer_id | breach_count
------------------------------
101 2
102 1
I have to count the number of instances the sum of amount in transaction table exceeds the account_threshold in customer table.
When a breach is detected I reset the counter to 0.
For customer 101, the first transaction is above threshold so, the breach count is 1. Then again there is a breach for 101 in 3rd transaction. Hence, the total breach count for 101 is 2.
for customer 102, the first transaction(450) is below the threshold. Next transaction for 102 is $100 which breaches the threshold of 500, so breach_count will be 1.
I have tried windowing but I am not able to get any clue how to proceed by joining two tables.

You can try to write a subquery to get accumulate amount order by amount by customer_id, then Outer JOIN base on customer then Count
SELECT t.customer_id, COUNT(t.totle) breach_count
FROM customer c
LEFT JOIN
(
select t1.*,SUM(t1.amount) OVER(PARTITION BY t1.customer_id order by t1.amount) as totle
from transaction1 t1
) t on c.customer_id = t.customer_id
WHERE c.account_threshold < t.totle
GROUP BY t.customer_id
Here is a sqlfildde from Sqlserver, although different DBMS, but the windows function syntax is the same
[Results]:
| customer_id | breach_count |
|-------------|--------------|
| 101 | 2 |
| 102 | 1 |

To reset count/rank/sum whenever value changes
Input table :-
Time | value
12 |A
13 |A
14 |C
15 |C
16 |B
17 |B
18 |A
You just need to take lag to know about previous value
Step 1.Select *, lag(status) as lagval
Now compare lag value to actual value and if it differs take it 1 else 0 ( take this column as flag)
Step 2. Select * , case when lagval! = status then 1 else 0
Now do sum over flag take it as running sum - you will get sum values different for each group, group means whenver value changed its a new group
Step 3. Select *, sum(flag) over (order by time) flag_sum
Now just row number on each group
Step 4.Select Rownumber() over (partition by flag_sum order by time)
Final result
Time | value | lagval | flag | flag_sum | rownumber
12 |A | null | 1 | 1 | 1
13 |A | A |0 |1 |2
14 |C |A |1 |2 |1
15 |C | C |0 |2 |2
16 |B |C |1 | 3 |1
17 |B |B |0 |3 |2
18 |A |B |1 |4 |1
You can use sum / count in place of rownumber whatever you want to reset whenever value changes.

Related

Trying to get positive or negative difference in window in postgresql query

I have a table, daily, as follows:
|date|high|low|
I am attempting to return the max positive or negative difference for each N day window of the data. For example, the following query gets me very close for a 5 day window:
SELECT date, high, low, (high - low) AS diff
FROM (
SELECT dd.date AS date,
MAX(dd.high)
OVER(ORDER BY dd.date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS high,
MIN(dd.low)
OVER(ORDER BY dd.date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS low
FROM daily dd
) AS win
ORDER BY date
However, this query is not correct because the result will always be positive. If the high occurred before the low, the result should be negative. Is there a way to accomplish this with a query?
EDIT: Adding examples and expected result
EDIT2: Modified with better example
|date |high|low|
|01-01-2001|20 |10 |
|01-02-2001|30 |20 |
|01-03-2001|40 |30 |
|01-04-2001|30 |25 |
|01-05-2001|35 |25 |
Result for 5 day period should be:
|date |high|low|diff|
|01-01-2001|20 |10 |10 |
|01-02-2001|30 |10 |20 |
|01-03-2001|40 |10 |30 |
|01-04-2001|40 |10 |30 |
|01-05-2001|40 |10 |30 |
Result for 3 day period should be:
|01-01-2001|20 |10 |10 |
|01-02-2001|30 |10 |20 |
|01-03-2001|40 |10 |10 |
|01-04-2001|40 |20 |20 |
|01-05-2001|40 |25 |-15 |
You can try to use a subquery to get the highest and lowest value from daily table. then do SELF JOIN with CASE WHEN
CREATE TABLE daily(
date date,
high int,
low int
);
INSERT INTO daily VALUES ('01-01-2001',40 ,30);
INSERT INTO daily VALUES ('01-02-2001',30 ,25);
INSERT INTO daily VALUES ('01-03-2001',35 ,25);
INSERT INTO daily VALUES ('01-04-2001',20 ,10);
INSERT INTO daily VALUES ('01-05-2001',30 ,20);
Query #1
SELECT t1.*,
CASE WHEN highdt.date > lowdt.date
THEN highest - lowest
ELSE lowest - highest
END diff
FROM (
select MAX(date) dates,
MAX(high) highest,
MIN(low) lowest
from daily
) t1
JOIN daily highdt ON t1.highest = highdt.high
JOIN daily lowdt ON t1.lowest = lowdt.low;
| dates | highest | lowest | diff |
| ------------------------ | ------- | ------ | ---- |
| 2001-01-05T00:00:00.000Z | 40 | 10 | -30 |
View on DB Fiddle

Max value from joined table

I have two tables:
Operations (op_id,super,name,last)
Orders (or_id,number)
Operations:
+--------------------------------+
|op_id| super| name | last|
+--------------------------------+
|1 1 OperationXX 1 |
|2 1 OperationXY 2 |
|3 1 OperationXC 4 |
|4 1 OperationXZ 3 |
|5 2 OperationXX 1 |
|6 3 OperationXY 2 |
|7 4 OperationXC 1 |
|8 4 OperationXZ 2 |
+--------------------------------+
Orders:
+--------------+
|or_id | number|
+--------------+
|1 2UY |
|2 23X |
|3 xx2 |
|4 121 |
+--------------+
I need query to get table:
+-------------------------------------+
|or_id |number |max(last)| name |
|1 2UY 4 OperationXC|
|2 23X 1 OperationXX|
|3 xx2 2 OperationXY|
|4 121 2 OperationXZ|
+-------------------------------------+
use corelared subquery and join
select o.*,a.last,a.name from
(
select super,name,last from Operations from operations t
where last = (select max(last) from operations t2 where t2.super=t.super)
) a join orders o on t1.super =o.or_id
you can use row_number as well
with cte as
(
select * from
(
select * , row_number() over(partition by super order by last desc) rn
from operations
) tt where rn=1
) select o.*,cte.last,cte.name from Orders o join cte on o.or_id=cte.super
SELECT Orders.or_id, Orders.number, Operations.name, Operations.last AS max
FROM Orders
INNER JOIN Operations on Operations.super = Orders.or_id
GROUP BY Orders.or_id, Orders.number, Operations.name;
I don't have a way of testing this right now, but I think this is it.
Also, you didn't specify the foreign key, so the join might be wrong.

Better way of writing my SQL query with conditional group by

Here's my data
|vendorname |total|
---------------------
|Najla |10 |
|Disney |20 |
|Disney |10 |
|ToysRus |5 |
|ToysRus |1 |
|Gap |1 |
|Gap |2 |
|Gap |3 |
|Najla |2 |
Here's the resultset I want
|vendorname |grandtotal|
---------------------
|Disney |30 |
|Gap |6 |
|ToysRus |6 |
|Najla |2 |
|Najla |10 |
If the vendorname = 'Najla' I want individual rows with their respective total otherwise I would like to group them and return a sum of their totals.
This is my query--
select *
from
(
select vendorname, sum(total) grandtotal
from vendor
where vendorname<>'Najla'
group by vendorname
union all
select vendorname, total grandtotal
from vendor
where vendorname='Najla'
) A
I was wondering if there's a better way to write this query instead of repeating it twice and performing a union. Is there a condensed way to group some rows "conditionally".
Honestly, I think the union all version is going to be the best performing and easiest to read option if it has appropriate indexes.
You could, however, do something like this (assuming you have a unique id on your table):
select vendorname, sum(total) grandtotal
from t
group by
vendorname
, case when vendorname = 'Najla' then id else null end
rextester demo: http://rextester.com/OGZQ33364
returns
+------------+------------+
| vendorname | grandtotal |
+------------+------------+
| Disney | 30 |
| Gap | 6 |
| ToysRus | 6 |
| Najla | 10 |
| Najla | 2 |
+------------+------------+

SQL Insert Query For Multiple Max IDs

Table w:
|ID|Comment|SeqID|
|1 |bajg | 1 |
|1 |2423 | 2 |
|2 |ref | 1 |
|2 |comment| 2 |
|2 |juk | 3 |
|3 |efef | 1 |
|4 | hy | 1 |
|4 | 6u | 2 |
How do I insert a standard new comment for each ID for a new SeqID (SeqID increase by 1)
The Below query results in the highest SeqID:
Select *
From w
Where SEQID =
(select max(seqid)
from w)
Table w:
|2 |juk | 3 |
Expected Result
Table w:
|ID|Comment|SeqID|
|1 |sqc | 3 |
|2 |sqc | 4 |
|3 |sqc | 2 |
|4 |sqc | 3 |
Will I have to go through and insert all the values (new comment as sqc) I want into the table using the below, or is there a faster way?
INSERT INTO table_name
VALUES (value1,value2,value3,...);
Try this:
INSERT INTO mytable (ID, Comment, SeqID)
SELECT ID, 'sqc', MAX(SeqID) + 1
FROM mytable
GROUP BY ID
Demo here
You are probably better off just calculating the value when you query. Define an identity column on the table, say CommentId and run a query like:
select id, comment,
row_number() over (partition by comment order by CommentId) as SeqId
from t;
What is nice about this approach is that the ids are always sequential, you don't have no opportunities for duplicates, the table does not have to be locked to when inserting, and the sequential ids work even for updates and deletes.

Sql Server Aggregation or Pivot Table Query

I'm trying to write a query that will tell me the number of customers who had a certain number of transactions each week. I don't know where to start with the query, but I'd assume it involves an aggregate or pivot function. I'm working in SqlServer management studio.
Currently the data is looks like where the first column is the customer id and each subsequent column is a week :
|Customer| 1 | 2| 3 |4 |
----------------------
|001 |1 | 0| 2 |2 |
|002 |0 | 2| 1 |0 |
|003 |0 | 4| 1 |1 |
|004 |1 | 0| 0 |1 |
I'd like to see a return like the following:
|Visits |1 | 2| 3 |4 |
----------------------
|0 |2 | 2| 1 |0 |
|1 |2 | 0| 2 |2 |
|2 |0 | 1| 1 |1 |
|4 |0 | 1| 0 |0 |
What I want is to get the count of customer transactions per week. E.g. during the 1st week 2 customers (i.e. 002 and 003) had 0 transactions, 2 customers (i.e. 001 and 004) had 1 transaction, whereas zero customers had more than 1 transaction
The query below will get you the result you want, but note that it has the column names hard coded. It's easy to add more week columns, but if the number of columns is unknown then you might want to look into a solution using dynamic SQL (which would require accessing the information schema to get the column names). It's not that hard to turn it into a fully dynamic version though.
select
Visits
, coalesce([1],0) as Week1
, coalesce([2],0) as Week2
, coalesce([3],0) as Week3
, coalesce([4],0) as Week4
from (
select *, count(*) c from (
select '1' W, week1 Visits from t union all
select '2' W, week2 Visits from t union all
select '3' W, week3 Visits from t union all
select '4' W, week4 Visits from t ) a
group by W, Visits
) x pivot ( max (c) for W in ([1], [2], [3], [4]) ) as pvt;
In the query your table is called t and the output is:
Visits Week1 Week2 Week3 Week4
0 2 2 1 1
1 2 0 2 2
2 0 1 1 1
4 0 1 0 0