Find the first order of a supplier in a day using SQL - sql

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.

You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)

The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)

select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Related

Removing Duplicates only when criteria's are met

Last question in regards to duplication. I understand how to select duplicate records using COUNT(*) with the HAVING clause > 1, but I'm faced with a challenge of removing duplicates given when a criteria has been met.
I asked one part of this yesterday in removing duplicates when the bill amount cancels out, but now I have to include a criteria to it where when the bill amount has the same positive and negative value that cancels out, the date is the same for both as well as the code.
So for example, record 1 has a bill amount of $250 with code "JUN" and a date of 03/02/2020, record 2 has a bill amount of $250 with code "PII" and a date of 03/07/2020 and record 3 has a bill amount of -$250 with code "PII" and a date of 03/07/2020. The results I would like to see in this example is only record 1 where record 2 and 3 would be consider the duplicates given the criteria that I stated.
Table Creation:
CREATE TABLE Billing (
BillId varchar(10),
SerialNo varchar(10),
BillAmt MONEY,
Code varchar(5),
DispenseDt DATE
);
Data Entry:
INSERT INTO Billing (BillId, SerialNo, BillAmt, Code, DispenseDt)
VALUES ('BL_001','aaa-111',250,'AAP','20200503')
,('BL_002','aab-112',250,'ADD','20200309')
,('BL_003','aab-112',-250,'ADD','20200309')
,('BL_004','aba-120',700,'YED','20200503')
,('BL_005','aba-120',370,'TPP','20200822')
,('BL_006','aba-120',370,'TPP','20201003')
,('BL_007','aba-120',400,'TPP','20200822')
,('BL_008','aba-120',-370,'TPP','20200822')
,('BL_009','aba-120',-700,'YED','20200503')
,('BL_010','baa-201',1000,'TOK','20200927')
,('BL_011','baa-201',-1000,'TOK','20200927')
,('BL_012','bab-210',1000,'TOK','20200927');
Sample Data:
+----------+-----------+---------+------+------------+
| BillId | SerialNo | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001 | aaa-111 | $250 | AAP | 20200503 |
| BL_002 | aab-112 | $250 | ADD | 20200309 |
| BL_003 | aab-112 |-$250 | ADD | 20200309 |
| BL_004 | aba-120 | $700 | YED | 20200503 |
| BL_005 | aba-120 | $370 | TPP | 20200822 |
| BL_006 | aba-120 | $370 | TPP | 20201003 |
| BL_007 | aba-120 | $400 | TPP | 20200822 |
| BL_008 | aba-120 |-$370 | TPP | 20200822 |
| BL_009 | aba-120 |-$700 | YED | 20200503 |
| BL_010 | baa-201 | $1000 | TOK | 20200927 |
| BL_011 | baa-201 |-$1000 | TOK | 20200927 |
| BL_012 | bab-210 | $1000 | TOK | 20200927 |
+----------+-----------+---------+------+------------+
Desire Results:
+----------+-----------+---------+------+------------+
| BillId | SerialNo | BillAmt | Code | DispenseDt |
+----------+-----------+---------+------+------------+
| BL_001 | aaa-111 | $250 | AAP | 20200503 |
| BL_006 | aba-120 | $370 | TPP | 20201003 |
| BL_007 | aba-120 | $400 | TPP | 20200822 |
| BL_012 | bab-210 | $1000 | TOK | 20200927 |
+----------+-----------+---------+------+------------+
My Code:
select a.SerialNo, a.BillAmt, a.Code, a.DispenseDt
from (
select *,
count(SerialNo) over(partition by SerialNo, DispenseDt) b
from Billing ) a
where b = 1
AND
InvoiceDt >= '20200601' And InvoiceDt <= '20200630'
AND
FacID IN ('IND600','IND605','IND610','IND620','IND630','IND640','IND650','IND660','IND670','IND680','IND690','IND695')
ORDER BY a.Serial;
i tried to solve it but I am kind of stuck myself. The logic here was to get ranks and then filter the rank that is same but somehow my code creates rank [created 2 of them using rank() and row_number()] that will remove some of the cases that you need as output maybe if someone else can edit this code? that would be great
fiddle link:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e0c990d3694ad99b628b3e05a5de624f
select
Bill_ID,
Code,
DispenseDt,
new_bill_amt,
rank()
over(partition by new_bill_amt,DispenseDt, code) as rank_,
row_number()
over(partition by new_bill_amt,DispenseDt, code) as rank_2
from (
select
*,
replace(billamt,'-','') as new_bill_amt
from Billing
) as f
I think this might work.
(I used a CTE, but you could convert that into a subquery.)
WITH base_cte AS (
SELECT
B1.SerialNo
, SUM(B1.BillAmt) AS [TotAmt]
, B1.Code
, B1.DispenseDt
FROM #Billing AS B1
GROUP BY
B1.SerialNo
, B1.Code
, B1.DispenseDt
)
SELECT
B.BillId
, B.SerialNo
, B.BillAmt
, B.code
, B.DispenseDt
FROM #Billing AS B
LEFT JOIN base_cte AS X ON X.SerialNo = B.SerialNo
WHERE X.TotAmt = B.BillAmt
AND X.DispenseDt = B.DispenseDt
Output:
BillId SerialNo BillAmt code DispenseDt
BL_001 aaa-111 250.00 AAP 2020-05-03
BL_006 aba-120 370.00 TPP 2020-10-03
BL_007 aba-120 400.00 TPP 2020-08-22
BL_012 bab-210 1000.00 TOK 2020-09-27
EDIT: Here's a different method with OVER().
SELECT
Y.BillId
, Y.SerialNo
, Y.BillAmt
, Y.Code
, Y.DispenseDt
FROM (
SELECT X.*
, [Ct] = COUNT(*) OVER(PARTITION BY X.code, X.TotAmt, X.DispenseDt ORDER BY X.SerialNo, X.code, X.DispenseDt)
FROM (
SELECT
B.BillId
, B.SerialNo
, B.BillAmt
, B.code
, B.DispenseDt
, [TotAmt] = SUM(B.BillAmt) OVER(PARTITION BY B.SerialNo, B.code, B.DispenseDt ORDER BY B.SerialNo, B.code, B.DispenseDt)
FROM #Billing AS B
) AS X
) AS Y
WHERE Y.BillAmt = Y.TotAmt
ORDER BY Y.BillId

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

How to add records for each user based on another existing row in BigQuery?

Posting here in case someone with more knowledge than may be able to help me with some direction.
I have a table like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
I'd really appreciate any help I could get! I've been trying different options without success
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
if applied to sample data from your question - output is
One option uses generate_date_array() to create the series of dates of each user, then brings the table with a left join.
select d.date, d.user_id,
last_value(t.score ignore nulls) over(partition by d.user_id order by d.date) as score
from (
select t.user_id, d.date
from mytable t
cross join unnest(generate_date_array(min(date), max(date), interval 1 day)) d(date)
group by t.user_id
) d
left join mytable t on t.user_id = d.user_id and t.date = d.date
I think the most efficient method is to use generate_date_array() but in a very particular way:
with t as (
select t.*,
date_add(lead(date) over (partition by user_id order by date), interval -1 day) as next_date
from t
)
select row_number() over (order by t.user_id, dte) as id,
t.user_id, dte, t.score
from t cross join join
unnest(generate_date_array(date,
coalesce(next_date, date)
interval 1 day
)
) dte;

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product