How to pull data based on current and last update? - sql

Our data table looks like this:
Machine Name
Lot Number
Qty
Load TxnDate
Unload TxnDate
M123
ABC
500
10/1/2020
10/2/2020
M741
DEF
325
10/1/2020
M123
ZZZ
100
10/5/2020
10/7/2020
M951
AAA
550
10/5/2020
10/9/2020
M123
BBB
550
10/7/2020
I need to create an SQL query that shows the currently loaded Lot number - Machines with no Unload TxnDate - and the last loaded Lot number based on the unload TxnDate.
So in the example, when I run a query for M123, the result will show:
Machine Name
Lot Number
Qty
Load TxnDate
Unload TxnDate
M123
ZZZ
100
10/5/2020
10/7/2020
M123
BBB
550
10/7/2020
As you can see although Machine Name has 3 records, the results only show the currently loaded and the last loaded. Is there anyway to replicate this? The Machine Name is dynamic, so my user can enter the Machine Name and see the results the machine based on the missing Unload TxnDate and the last Unload Txn Date

You seem to want the last two rows. That would be something like this:
select t.*
from t
where machine_name = 'M123'
order by load_txn_date desc
fetch first 2 rows only;
Note: not all databases support the first first clause. Some spell it limit, or select top, or even something else.

If you want two rows per machine, one option uses window functions:
select *
from (
select t.*,
row_number() over(
partition by machine_name, (case when unload_txn_date is null then 0 else 1 end)
order by coalesce(unload_txn_date, load_txn_date) desc
) rn
from mytable t
) t
where rn = 1
The idea is to separate rows between those that have an unload date, and those that do not. We can then bring the top record per group.
For your sample data, this returns:
Machine_Name | Lot_Number | Qty | Load_Txn_Date | Unload_Txn_Date | rn
:----------- | :--------- | --: | :------------ | :-------------- | -:
M123 | BBB | 550 | 2020-10-07 | null | 1
M123 | ZZZ | 100 | 2020-10-05 | 2020-10-07 | 1
M741 | DEF | 325 | 2020-10-01 | null | 1
M951 | AAA | 550 | 2020-10-05 | 2020-10-09 | 1

You might use the following query, presuming that you're on a database having Window(or Analytic) Function
WITH t AS
(
SELECT COALESCE(Unload_Txn_Date -
LAG(Load_Txn_Date) OVER
(PARTITION BY Machine_Name ORDER BY Load_Txn_Date DESC),0) AS lg,
MAX(CASE WHEN Unload_Txn_Date IS NULL THEN Load_Txn_Date END) OVER
(PARTITION BY Machine_Name) AS mx,
t.*
FROM tab t
), t2 AS
(
SELECT DENSE_RANK() OVER (ORDER BY mx DESC NULLS LAST) AS dr, t.*
FROM t
WHERE mx IS NOT NULL
)
SELECT Machine_Name,Lot_Number,Qty,Load_Txn_Date,Unload_Txn_Date
FROM t2
WHERE dr = 1 AND lg = 0
ORDER BY Load_Txn_Date
where if previous row's Unload_Txn_Date is equal to the current Load_Txn_Date, then it's accepted that there's no interruption will occur for the job, while determining the last Unload Txn Dates with no unload date values per each machine. And then, the result set returns through filtering by the values derived from the window functions within the penultimate query.
Demo

Related

Stop SQL Select After Sum Reached

My database is Db2 for IBM i.
I have read-only access, so my query must use only basic SQL select commands.
==============================================================
Goal:
I want to select every record in the table until the sum of the amount column exceeds the predetermined limit.
Example:
I want to match every item down the table until the sum of matched values in the "price" column >= $9.00.
The desired result:
Is this possible?
You may use sum analytic function to calculate running total of price and then filter by its value:
with a as (
select
t.*,
sum(price) over(order by salesid asc) as price_rsum
from t
)
select *
from a
where price_rsum <= 9
SALESID | PRICE | PRICE_RSUM
------: | ----: | ---------:
1001 | 5 | 5
1002 | 3 | 8
1003 | 1 | 9
db<>fiddle here

Self join to create a new column with updated records

I am trying to write a SQL query to get the start date for employees in a store. As seen in the first screenshot, employee number 5041 had the number A0EH but as the number got updated, it updated the start date for the employee as well. This effects the metric of total duration in the store.
I am trying to get to the output below but haven't been able to figure out how to get this view.
This is the code I was trying but I am not getting the correct output.
select
esd.employee_number,
(case when esd.old_employee_number is null then es.employee_number else es.old_employee_number end) as old_employee_number,
esd.entity_id,
esd.original_start_date
from earliest_start_date as esd
left join earliest_start_date as es
on (es.employee_number = esd.old_employee_number)
How do I solve this on SQL?
Redshift reportedly supports recursion via WITH clause. Here's an example:
MariaDB 10.5 has similar support. Test case is here:
Fully working test case (via MariaDB 10.5) (Updated)
Link to Amazon Redshift detail for WITH clause and window functions:
Amazon Redshift - WITH clause
Amazon redshift - Window functions
WITH RECURSIVE cte (employee_number, original_no, entity_id, original_start_date, n) AS (
SELECT employee_number, employee_number, entity_id, original_start_date, 1 FROM earliest_start_date WHERE old_employee_number IS NULL UNION ALL
SELECT new_tbl.employee_number, cte.original_no, cte.entity_id, cte.original_start_date, n+1
FROM earliest_start_date new_tbl
JOIN cte
ON cte.employee_number = new_tbl.old_employee_number
)
, xrows AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY n DESC) AS rn
FROM cte
)
SELECT * FROM xrows WHERE rn = 1
;
Result:
+-----------------+-------------+-----------+---------------------+------+----+
| employee_number | original_no | entity_id | original_start_date | n | rn |
+-----------------+-------------+-----------+---------------------+------+----+
| XXXX | XXXX | 88 | 2021-09-02 | 1 | 1 |
| 5041 | A0EH | 96 | 2021-09-05 | 2 | 1 |
+-----------------+-------------+-----------+---------------------+------+----+
2 rows in set
Raw test data:
SELECT * FROM earliest_start_date;
+-----------------+---------------------+-----------+---------------------+
| employee_number | old_employee_number | entity_id | original_start_date |
+-----------------+---------------------+-----------+---------------------+
| 5041 | A0EH | 96 | 2021-09-10 |
| A0EH | NULL | 96 | 2021-09-05 |
| XXXX | NULL | 88 | 2021-09-02 |
+-----------------+---------------------+-----------+---------------------+
Note that the logic makes assumption about uniqueness of the employee_number and, in the current form, can't handle cases where the employee_number is reused by the same employee or used again with a different employee without adjusting prior data. There may not be enough detail in the current structure to handle those cases.

SQL: Calculate number of days since last success

Following table represents results of given test.
Every result for the same test is either pass ( error_id=0) or fail ( error_id <> 0)
I need help to write a query, that returns the number of runs since last good run ( error_id= 0) and the date.
| Date | test_id | error_id |
-----------------------------------
| 2019-12-20 | 123 | 23
| 2019-12-19 | 123 | 23
| 2019-12-17 | 123 | 22
| 2019-12-18 | 123 | 0
| 2019-12-16 | 123 | 11
| 2019-12-15 | 123 | 11
| 2019-12-13 | 123 | 11
| 2019-12-12 | 123 | 0
So the result for this example should be:
| 2019-12-18 | 123 | 4
as the test 123 was PASS on 2019-12-18 and this happened 4 runs ago.
I have a query to determine whether given run is error or not, but I have trouble applying appropriate window function to it to get the wanted result
select test_id, Date, error_id, (CASE WHEN error_id 0 THEN 1 ELSE 0 END) as is_error
from testresults
You can generate a row number, in reverse order from the sorting of the query itself:
SELECT test_date, test_id, error_code,
(row_number() OVER (ORDER BY test_date asc) - 1) as runs_since_last_pass
FROM tests
WHERE test_date >= (SELECT MAX(test_date) FROM tests WHERE error_code=0)
ORDER BY test_date DESC
LIMIT 1;
Note that this will run into issues if test_date is not unique. Better use a timestamp (precise to the millisecond) instead of a date.
Here's a DBFiddle: https://www.db-fiddle.com/f/8gSHVcXMztuRiFcL8zLeEx/0
If there's more than one test_id, you'll want to add a PARTITION BY clause to the row number function, and the subquery would become a bit more complex. It may be more efficient to come up with a way to do this by a JOIN instead of a subquery, but it would be more cognitively complex.
I think you just want aggregation and some filtering:
select id, count(*),
max(date) over (filter where error_id = 0) as last_success_date
from t
where date > (select max(t2.date) from t t2 where t2.error_id = 0);
group by id;
You have to use the Maximum date of the good runs for every test_id in your query. You can try this query:
select tr2.Date_error, tr.test_id, count(tr.error_id) from
testresults tr inner join (select max(Date_error), test_id
from testresult where error_id=0 group by test_id) tr2 on
tr.test_id=tr2.test_id and tr.date_error >=tr2.date_error
group by test_id
This should do the trick:
select count(*) from table t,
(select max(date) date from table where error_id = 0) good
where t.date >= good.date
Basically you are counting the rows that have a date >= the date of the last success.
Please note: If you need the number of days, it is a complete different query:
select now()::date - max(test_date) last_valid from tests
where error_code = 0;

How I calculate gaps between rows

Now I include some parallelism to my app ProcessXX, I'm not sure the data can be process in the right order. So Im working in a query to return the lower and upperbound to pass to ProcessZZ.
My table avl_pool has avl_id and has_link and some other fields and a steady flow of data, when new data arrive they start with has_link=null, when ProcessX finish with the rows has_link have the link value xxxx is some number.
Now on the next step I have to process only those rows with links, but I cant skip rows, because order is very important.
In this case I need ProcessZZ(23561211, 23561219)
rn | avl_id | has_link
1 | 23561211 | xxxx -- start
2 | 23561212 | xxxx
3 | 23561213 | xxxx
4 | 23561214 | xxxx
5 | 23561215 | xxxx
6 | 23561216 | xxxx
7 | 23561217 | xxxx
8 | 23561218 | xxxx
9 | 23561219 | xxxx -- end
10 | 23561220 | null
11 | 23561221 | xxxx
12 | 23561222 | xxxx
13 | 23561223 | xxxx
Currently I have:
-- starting avl_id need to be send to ProcessZZ
SELECT MIN(avl_id) as min_avl_id
FROM avl_db.avl_pool
WHERE NOT has_link IS NULL
-- first avl_id still on hands of ProcessXX ( but can be null )
SELECT MIN(avl_id) as max_avl_id -- here need add a LAG
FROM avl_db.avl_pool
WHERE has_link IS NULL
AND avl_id > (SELECT MIN(avl_id)
FROM avl_db.avl_pool
WHERE NOT has_link IS NULL)
-- In case everyone has_link already the upper limit is the last one on the table.
SELECT MAX(avl_id) as max_avl_id
FROM avl_db.avl_pool
I can put everthing in muliple CTE and return both result, but I think this can be handle like some island, but not sure how.
So the query should looks like
SELECT min_avl_id, min_avl_id
FROM cte
min_avl_id | min_avl_id
23561211 | 23561219
If I understand correctly, you want to assign a sequential number to each block. This number is demarcated by the NULL values in has_link.
If this is the problem, then a cumulative sum solves the problem:
select p.*,
sum(case when has_link is null then 1 else 0 end) over (order by rn) as grp
from avl_db.avl_pool p;
This actually includes the NULL values in the output. The simplest method is probably then a subquery:
select p.*
from (select p.*,
sum(case when has_link is null then 1 else 0 end) over (order by rn) as grp
from avl_db.avl_pool p
) p
where has_link is not null;

How to search max value from group in sql

I am just learning some SQL, so I have a question.
-I have a table with name TABL
-a variable :ccname which has a value "Bottle"
The table is as follows:
+----------+---------+-------+--------+
| Name | Price | QTY | CODE |
+----------+---------+-------+--------+
| Rope | 3.6 | 35 | 236 |
| Chain | 2.8 | 15 | 237 |
| Paper | 1.6 | 45 | 124 |
| Bottle | 4.5 | 41 | 478 |
| Bottle | 1.8 | 12 | 123 |
| Computer | 1450.75 | 71 | 784 |
| Spoon | 0.7 | 10 | 412 |
| Bottle | 1.3 | 15 | 781 |
| Rope | 0.9 | 14 | 965 |
+----------+---------+-------+--------+
Now I want to find the CODE from the variable :ccname with the higher quantity! So I translated like this:
SELECT CODE
FROM TABL
GROUP BY :ccname
WHERE QTY=MAX(QTY)
In a perfect world that would turn as a result 478.
In the SQL world what should I write in order to get 478?
You probably want something like that:
SELECT code
FROM TABL
WHERE Name=:ccname
ORDER BY QTY DESC
LIMIT 1
The idea is we find all rows of the table whose Name column is the same as the contents of the variable :ccname, then order them by the quantity in descending order, and filally we select first one, which has to be the one with the largest quantity because they are sorted in descending order.
Try this
SELECT CODE
FROM TABLENAme
WHERE QTY = (SELECT MAX(QTY) FROM TablName WHERE Name = :ccname)
Use ORDER BY, a proper WHERE, and the something to limit the result set to one row:
SELECT CODE
FROM TABL
WHERE name = :ccname
ORDER BY QTY DESC
FETCH FIRST 1 ROW ONLY;
Note: Some databases spell the ANSI standard FETCH FIRST 1 ROW ONLY as LIMIT or as SELECT TOP 1.
Depending on your specific database, you can use one of the following options to restrict your result set to a single value after ordering your existing columns through an ORDER BY clause:
SELECT TOP 1
LIMIT 1
FETCH FIRST 1 ROW ONLY
Syntax Examples
SELECT TOP 1 Code
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
or
SELECT Code
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
LIMIT 1
or
SELECT CODE
FROM TABL
WHERE Name = :ccname
ORDER BY QTY DESC
FETCH FIRST 1 ROW ONLY;
Using join can also effectively solve the question:
Select t1.Code
From TABL As t1 Join (
Select Name, Max(table.QTY) as MaxQTY
From TABL
Where Name = :ccname
Group by Name
) As t2
Where t1.QTY = t2.MaxQTY And t1.Name = t2.Name
Explanation:
You first calculate the maximum value for "Bottle" using the subquery and then join the two tables to select corresponding row with MaxQTY and same name.