SQL: Trying to flag the earliest shipment date within a the same customer ID and order number across all rows of a database - sql

Think of an order fulfillment database where each customer ID can have the same Order Number for a product shipment and its refills. I am trying to flag the refills by adding 'Y' to a new column for refills. The first shipment is identified by the earliest ship date in the database for the same customer ID and order number. The shipments after the first shipment date with the same customer ID and order number would be the refills.
Customer # and Order # are varchars. Date is a date type.
Table I currently have. I want to be able to fill a new column called "Refill" with Y or N:
Customer # Order # ShipDate Refill <---New Column I want to create
1234 2124 5/25/2015 Y
1234 2124 3/25/2015 N
1234 2124 4/25/2015 Y
5678 4439 12/25/2014 Y
5678 4439 2/20/2015 Y
5678 4439 9/10/2014 N
6666 5920 1/12/2012 Y
6666 5920 5/12/2011 N
6666 6053 6/12/2016 Y
6666 6053 4/12/2016 N
6666 6053 8/12/2016 Y

It appears that the logic for the update is that the initial shipping record for a given customer and order is "No" but all subsequent records are "Yes".
In the update query below I join your original table to a subquery which finds the initial shipping record for each customer/order group. Then, a record in your original table which does match must be a No while a record which does not match must be a Yes.
UPDATE t1
SET Refill = CASE WHEN t2.Customer IS NULL THEN 'Yes' ELSE 'No' END
FROM yourTable t1
LEFT JOIN
(
SELECT Customer, Order, MIN(ShipDate) AS ShipDate -- this query finds
FROM yourTable -- the original
GROUP BY Customer, Order -- ship date
) t2
ON t1.Customer = t2.Customer AND
t1.Order = t2.Order AND
t1.ShipDate = t2.ShipDate
WHERE t1.Order IS NOT NULL OR t1.ShipDate IS NOT NULL
This answer also assumes that you already have a varchar column called Refill defined. If you don't, then go ahead and create one.

You would want to add the new field
ALTER TABLE MyTable Add Refill char(1) Not Null Default 'N';
Update rows in the table to set Refill if they are not the oldest ShipDate for that Customer and Order combination.
With cte As (
SELECT Customer, [Order], ShipDate, Refill, RowNo = Row_Number() Over (Partition By Customer, [Order] Order By ShipDate Asc)
From MyTable)
Update cte
Set Refill = 'Y'
Where RowNo <> 1;
Note that this has the advantage of handling cases where there were two shipments on the first date. We can't distinguish between them to say which one was first, but we will only mark one of them as 'N'.

Related

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Compare Two Different Fields In Oracle SQL

I have a requirement in which I have two main fields Amount CR and Amount DR.
Now the requirement is that this both amounts have different values like Trx Number, Bank Name ETC but have a common Reference Number.
There is only one record for every Refrence Number with a CR Amount, DR Amount respectivly.
For detaila see the table below:
Transaction Number
Bank Name
Reference Number
CR Amount
DR Amount
1
XYZ
1234
1000
2
ABC
1234
1000
3
DEF
1111
1000
4
TEST
1111
2300
So basically I want to compare CR and DR Amount based on the Reference Number. In the example Reference Number 1234 is ok and Reference Number 1111 should be listed.
How can I achieve this by an Oracle query?
Knowing that there is exactly one record with dr and one with cr amount you can make a self join over the reference number.
The 2 Trransactions for a Reference Number will be listed in one row:
select * from table t1
inner join table t2 on t1.referencenumber = t2.referencenumber
and t1.cr_amount is not null
and t2.dr_amount is not null
where t1.cr_amount <> t2.dr_amount
Add two analytical aggregated functions calculating the sum of CRand DB per the reference_number and compare them
case when
sum(cr_amount) over (partition by reference_number) =
sum(dr_amount) over (partition by reference_number) then 'Y' else 'N' end is_equ
This identifies the rows with reference_number where the sum is not equal.
In an additional query simple filter only the rows where the not equal sum.
with test as (
select a.*,
case when
sum(cr_amount) over (partition by reference_number) =
sum(dr_amount) over (partition by reference_number) then 'Y' else 'N' end is_equ
from tab a)
select
TRANSACTION_NUMBER, BANK_NAME, REFERENCE_NUMBER, CR_AMOUNT, DR_AMOUNT
from test
where is_equ = 'N'
TRANSACTION_NUMBER BANK REFERENCE_NUMBER CR_AMOUNT DR_AMOUNT
------------------ ---- ---------------- ---------- ----------
3 DEF 1111 1000
4 TEST 1111 2300
You can aggregate and use a case expression:
select reference_number,
sum(cr_amount), sum(db_amount),
(case when sum(cr_amount) = sum(db_amount)
then 'same' else 'different'
end)
from t
group by reference_number;

Is there SQL Logic to reduce type 2 table along a dimension

I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.
I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also
Deal with an unknown number of rows of the same price
Deal with products going back to an old price
As an example if i have this raw data:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
9/19/18
120
123456
9/20/18
11/8/18
120
123456
11/9/18
11/29/18
120
123456
11/30/18
12/6/18
120
123456
12/7/18
12/19/18
85
123456
12/20/18
1/1/19
85
123456
1/2/19
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
3/19/19
85
123456
3/20/19
5/22/19
85
123456
5/23/19
10/10/19
85
123456
10/11/19
6/19/19
80
123456
6/20/20
12/31/99
80
I need to transform it into this:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80
You can first find the intervals where the price does not change, and then group on those intervals:
with to_r as (select row_number() over (order by (select 1)) r, t.* from data_table t),
to_group as (select t.*, (select sum(t1.r < t.r and t1.price != t.price) from to_r t1) c from to_r t)
select t.product, min(t.effective), max(t.expiration), max(t.price) from to_group t group by t.c order by t.r;
Output:
Product
Price Effective Date
Price Expiration Date
Price
123456
6/22/18
12/6/18
120
123456
12/7/18
2/19/19
85
123456
2/20/19
2/20/19
120
123456
2/21/19
10/10/19
85
123456
10/11/19
12/31/99
80
This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.
The code to reconstruct the data is:
select product, price, min(effective_date), max(expiration_date)
from (select t.*,
sum(case when prev_expiration_date = effective_date - interval '1 day' then 0 else 1 end) over (partition by product order by effective_date) as grp
from (select t.*,
lag(expiration_date) over (partition by product, price order by effective_date) as prev_expiration_date
from t
) t
) t
group by product, price, grp;
Note that the logic for date arithmetic varies depending on the database.
Save this result into a temporary table, temp_t or whatever, using select into, create table as, or whatever your database supports.
Then empty the current table and reload it:
truncate table t;
insert into t
select product, price, effective_date, expiration_date
from temp_t;
Notes:
Validate the data before using truncate_table!
If there are triggers or columns with default values, you might want to be careful.
It sounds like you are asking for a temporal schema? Where for a given date you can know the price of an asset?
This is done with two tables; price_current and price_history.
price_id
item_id
price
rec_created
1
1
100
'2015-04-18'
price_id
item_id
from
to
price
1
1
'2001-01-01'
'2004-05-01'
114
1
1
'2004-05-01'
'2015-04-18'
102
i.e. for any item, you can ascertain the date it was set without polluting your "current" table. For this to work effectively you will need to have UPDATE triggers on your current_table. When you update a record you insert into the history table the details and the period it was valid from.
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated
END
Now you have a distinction between current and historical, without your current table (presumably the busier table) getting out of hand because of maintaining historical state. Hope i understood the question.
To ignore 'dummy' updates, just alter the trigger to ignore empty changes (if that's not handled by the DBMS anyway). Tbh, this should and could be done application side easily enough, but to manage it via the trigger:
CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
INSERT INTO price_history(price_id, item_id, from, to, price)
SELECT price_id, item_id, rec_created, GETDATE(), price
FROM rows_updated u
INNER JOIN price_current ON u.price_id = p.price_id
WHERE u.price <> p.price
END
i.e. rows_updated contains the record from the update, we insert into the history table the previous row, providing the previous row's price is different from the current row's price.
(edited to include new trigger. I also changed the date held in rec_created, this must be the date the row is created, not the first instance that product had a price assigned to it. that was a mistake. Regarding the dates, I am lazy to put the full DD-MM-YYYY hh:mm:ss:zzz, but that would generally be useful in between queries)
What you are asking for is a versioning system. Many RDBMS platforms implement support for this out of the box (it's a SQL standard), which may be suitable, depending on your requirements.
You have not tagged a specific platform so it's not possible to be specific to your situation. I use the concept of system versioning regularly in MS Sql Server, where you would implement it thus:
assuming schema "history" exists,
alter table dbo.MyTable add
ValidFrom datetime2 generated always as row start hidden constraint DF_MyTableSysStart default sysutcdatetime(),
ValidTo datetime2 generated always as row end hidden constraint DF_MyTableSysEnd default convert(datetime2, '9999-12-31 23:59:59.9999999'),
period for system_time (ValidFrom, ValidTo);
end
alter table MyTable set (system_versioning = on (history_table = History.MyTable));
create clustered index ix_MyTable on History.MyTable (ValidTo, ValidFrom) with (data_compression = page,drop_existing=on) on History;
A number of syntax extensions exist to aid querying the temporal data for example to find historical data at a point in time.
Alternatively, to utilise a single table but handle the duplication, you could create an instead of trigger.
the idea here is that the trigger gets to intercept the data before it is inserted, where you can check to see of the value is different to the last value and discard or insert as appropriate.
something along the lines of:
WITH keeps AS
(
SELECT p.product_id, p.effective, p.expires, p.price, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.effective = DATEADD(DAY, p.exires, 1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_after, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.expires = DATEADD(DAY, p.effective, -1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_before
FROM prices p
)
SELECT * FROM keeps
WHERE has_after = 1
OR has_before = 1
UNION ALL
SELECT p.product_id, p.effective, p.exires, p.price
FROM prices p
WHERE p.effective = (SELECT MIN(effective) FROM prices p1 WHERE p1.product_id = p.product_id)
What's it doing:
Find all the entries where there exists another entry whose effective date is that of the previous entry's expiry date + 1, and the price of that new entry is different. This gives us all the actual changes in price. But we miss the first price entry, so we simply include that in the results.
e.g.:
product_id
effective
expires
price
has_before
has_after
123456
6/22/18
9/19/18
120
0
0
123456
9/20/18
11/8/18
120
0
0
123456
11/9/18
11/29/18
120
0
0
123456
11/30/18
12/6/18
120
0
1
123456
12/7/18
12/19/18
85
1
0
123456
12/20/18
1/1/19
85
0
0
123456
2/1/19
2/19/19
85
0
1
123456
2/20/19
2/20/19
120
1
1
123456
2/21/19
3/19/19
85
1
0

Last record per transaction

I am trying to select the last record per sales order.
My query is simple in SQL Server management.
SELECT *
FROM DOCSTATUS
The problem is that this database has tens of thousands of records, as it tracks all SO steps.
ID SO SL Status Reason Attach Name Name Systemdate
22 951581 3 Processed Customer NULL NULL BW 2016-12-05 13:33:27.857
23 951581 3 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
24 947318 1 Hold Customer NULL NULL bw 2016-12-05 13:54:27.173
25 947318 1 Invoices Submit Customer NULL NULL bw 2016-13-05 13:54:27.300
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
I would to see the most recent record per the SO
ID SO SL Status Reason Attach Name Name Systemdate
23 951581 4 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
Well I'm not sure how that table has two Name columns, but one easy way to do this is with ROW_NUMBER():
;WITH cte AS
(
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY SO ORDER BY Systemdate DESC)
FROM dbo.DOCSTATUS
)
SELECT ID, SO, SL, Status, Reason, ..., Systemdate
FROM cte WHERE rn = 1;
Also please always reference the schema, even if today everything is under dbo.
I think you can keep it this simple:
SELECT *
FROM DOCSTATUS
WHERE ID IN (SELECT MAX(ID)
FROM DOCSTATUS
GROUP BY SO)
You want only the maximum ID from each SO.
An efficient method with the right index is a correlated subquery:
select t.*
from t
where t.systemdate = (select max(t2.systemdate) from t t2 where t2.so = t.so);
The index is on (so, systemdate).

How to loop through a table and look for adjacent rows with identical values in one field and update another column conditionally in SQL?

I have a table that has a field called ‘group_quartile’ which uses the sql ntile() function to calculate which quartile does each customer lie in on the basis of their activity scores. However using this ntile(0 function i find there are some customers which have same activity scores but are in different quartiles. I need to modify the ‘group-quartile’ column to make all customers with the same activity scores lie in the same group_quartile.
A view of the table values :
Customer_id Product Activity_Score Group_Quartile
CH002 T 2328 1
CR001 T 268 1
CN001 T 178 1
MS006 T 45 2
ST001 T 21 2
CH001 T 0 2
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
CN001 S 439 1
AC002 S 177 1
SC001 S 91 2
PV001 S 69 3
TS001 S 0 4
I used CTE expression but it didnot work.
My query only updates(from the above example) :
CX001 T 0 3
modified to
CX001 T 0 2
So only the first repeating activity score is checked and that row’s group_quartile is updated to 2.
I need to update all the below rows as well.
CX001 T 0 3
KH001 T 0 3
MH002 T 0 4
SJ003 T 0 4
I cannot use DENSE_RANK() instead of quartile to segregate the records as arranging the customers per product in approximately 4 quartiels is a business requirement.
From my understanding I need to loop through the table -
Find a row which has same activity score and the same product as its predecessor but has a different group_quartile
Update the selected row's group_quartile to its predecessor's quartile value
Then againg loop through the updated table to look for any row with the above condition , and update that row similarly.
The loop continues until all rows with same activity scores (for the same product) are put in the same group_quartile.
--
THIS IS THE TABLE STRUCTURE I AM WORKING ON:
CREATE TABLE #custs
(
customer_id NVARCHAR(50),
PRODUCT NVARCHAR(50),
ACTIVITYSCORE INT,
GROUP_QUARTILE INT,
RANKED int,
rownum int
)
INSERT INTO #custs
-- adding a column to give row numbers(unique id) for each row
SELECT customer_id, PRODUCT, ACTIVITYSCORE,GROUP_QUARTILE,RANKED,
Row_Number() OVER(partition by product ORDER BY activityscore desc) N
FROM
-- rows derived form a parent table based on 'segmentation' column value
(SELECT customer_id, PRODUCT, ACTIVITYSCORE,
DENSE_RANK() OVER (PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS RANKED,
NTILE(4) OVER(PARTITION BY PRODUCT ORDER BY ACTIVITYSCORE DESC) AS GROUP_QUARTILE
FROM #parent_score_table WHERE (SEGMENTATION = 'Large')
) as temp
ORDER BY PRODUCT
The method I used to achieve this partially is as follows :
-- The query find the rows which have activity score same as its previous row but has a different GRoup_Quartiel value.
-- I need to use a query to update this row.
-- Next, find any rows in this newly updated table that has activity score same as its previous row but a differnet group_quartile vale.
-- Continue to update the tabel in the above manner until all rows with same activity scores have been updated to have the same quartile value
I managed to find only the rows which have activity score same as its previous row but has a different Group_Quartill value but cannot loop thorugh to find new rows that may match this updated row.
select t1.customer_id,t1.ACTIVITYSCORE,t1.PRODUCT, t1.RANKED, t1.GROUP_QUARTILE, t2.GROUP_QUARTILE as modified_quartile
from #custs t1, #custs t2
where (
t1.rownum = t2.rownum + 1
and t1.ACTIVITYSCORE = t2.ACTIVITYSCORE
and t1.PRODUCT = t2.PRODUCT
and not(t1.GROUP_QUARTILE = t2.GROUP_QUARTILE))
Can anyone help with what should be the t-sql statement for the above?
Cheers!
Assuming you've already worked out a basis Group_Quartile as indicated above, you can update the table with a query similar to the following:
update a
set Group_Quartile = coalesce(topq.Group_Quartile, a.Group_Quartile)
from activityScores a
outer apply
(
select top 1 Group_Quartile
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle with demo.
Edit after comment:
I think you did a lot of the work already by getting the Group_Quartile working.
For each row in the table, the statement above will join another row to it using the outer apply statement. Only one row will be joined back to the original table due to the top 1 clause.
So each for each row, we are returning one more row. The extra row will be matched on Product and Activity_Score, and will be the row with the lowest Group_Quartile (order by Group_Quartile). Finally, we update the original row with this lowest Group_Quartile value so each row with the same Product and Activity_Score will now have the same, lowest possible Group_Quartile.
So SJ003, MH002, etc will all be matched to CH001 and be updated with the Group_Quartile value of CH001, i.e. 2.
It's hard to explain code! Another thing that might help is looking at the join without the update statement:
select a.*
, TopCustomer_id = topq.Customer_Id
, NewGroup_Quartile = topq.Group_Quartile
from activityScores a
outer apply
(
select top 1 *
from activityScores topq
where a.Product = topq.Product
and a.Activity_Score = topq.Activity_Score
order by Group_Quartile
) topq
SQL Fiddle without update.