Spark SQL Query to assign join date by next closest once

Spark SQL Query to assign join date by next closest once - apache-spark-sql

`CREATE TABLE TABLE_1(
CALL_ID INT,
CALL_DATE DATE);
INSERT INTO TABLE_1(CALL_ID, CALL_DATE)
VALUES (1, '2022-10-22'),
(2, '2022-10-31'),
(3, '2022-11-04');
CREATE TABLE TABLE_2(
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_2(PROD_ID, PROD_DATE)
VALUES (1, '2022-10-25'),
(2, '2022-11-17');
CREATE TABLE TABLE_RESULT(
CALL_ID INT,
CALL_DATE DATE,
PROD_ID INT,
PROD_DATE DATE);
INSERT INTO TABLE_RESULT(CALL_ID, CALL_DATE, PROD_ID, PROD_DATE)
VALUES (1, '2022-10-22', 1, '2022-10-25'),
(2, '2022-10-31', NULL, NULL),
(3, '2022-11-04', 2, '2022-11-17');`
Can you help me to create the TABLE_RESULT with a join in a elegant way? This is a very small example.
Thanks

I solved it. Thanks anyway.
SELECT * FROM (SELECT *, COALESCE(LEAD(CALL_DATE) OVER (PARTITION BY 1 ORDER BY CALL_DATE), CURRENT_DATE) AS CALL_DATE_NEXT FROM TABLE_1) AS A LEFT JOIN TABLE_2 AS B ON (A.CALL_DATE<=B.PROD_DATE AND A.CALL_DATE_NEXT>B.PROD_DATE)

Related

Most efficient way to update table column based on sum

I am looking for the most efficient / minimal code way to update a table column based on the sum of another value in the same table. A method which works and the temp table are shown below.
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) as sum_price_bystate
from #t1 t2_in
group by t2_in.id, t2_in.astate
) t2
on t1.id = t2.id
and t1.astate = t2.astate
update t1
set total_id_price = sum_price
from #t1 t1
inner join (
select t3_in.Id,
sum(t3_in.price) as sum_price
from #t1 t3_in
group by t3_in.id
) t3
on t1.id = t3.id
select * from #t1
The main thing I don't like about my method is that it requires an inner join with a subquery that requires the same table itself. So I am looking for a way that might be able to avoid this, although I don't think this method I have is overly complicated. Maybe there isn't any method too much more efficient.
To add, I am wondering what the best way would be to combine the two updates together, since they are very similar, but only differ by the group by clause.

As pointed out in the comments, this is not a good way to store data as it violates the basic principles of normalisation -
you are storing data that you can compute
you are storing the same data multiple times, ie, duplicates.
you need to re-calculate the totals whenever any individual values changes
it's possible to update a single row and create a data contradiction
it's also not a bad thing to pre-calculate aggregations, especially in a data warehouse scenario, but you would still only store the value once per unique key.
Normalisation prevents these issues.
Saying that, you can utilise analytic window functions to compute your values in a single pass over the table:
select *,
Sum(price) over(partition by id, astate) total_id_price_bystate,
Sum(price) over(partition by id) total_id_price
from #t1;
If you really want the data in this format you could create a view and query it:
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
select *
from Totals where id = 100;
And to answer your specific question, a view (or a CTE) that touches a single base table can be updated so you can accomplish what you are doing like so:
drop view Totals;
create view Totals as
select id, astate, code, price, total_id_price_bystate, total_id_price,
Sum(price) over(partition by id, astate) total_bystate,
Sum(price) over(partition by id) total
from t1;
update totals set
total_id_price_bystate = total_bystate,
total_id_price = total;

You can use PARTITION BY to get the two different aggregated value,
if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (id nvarchar(max), astate varchar(16), code varchar(16), price decimal(16,2), total_id_price_bystate decimal(16,2), total_id_price decimal(16,2))
INSERT into #t1 VALUES
(100, 'CA', '0123', 123.01, null, null),
(100, 'CA', '0124', 0.00, null, null),
(100, 'PA', '0256', 12.10, null, null),
(200, 'MA', '0452', 145.00, null, null),
(300, 'MA', '0578', 134.23, null, null),
(400, 'CA', '1111', 94.12, null, null),
(600, 'CA', '0000', 86.34, null, null),
(500, 'CO', '1111', 0.00, null, null);
update t1
set total_id_price_bystate = sum_price_bystate,total_id_price=sum_price
from #t1 t1
inner join (
select t2_in.Id,
t2_in.astate,
sum(t2_in.price) over(partition by t2_in.id, t2_in.astate) as sum_price_bystate,
sum(t2_in.price) over(partition by t2_in.id) as sum_price
from #t1 t2_in
) t2
on t1.id = t2.id
and t1.astate = t2.astate
select * from #t1

SQL Query to get the value of a product given a date

I have a table which gives the rate of a product on a particular date, #tableA.
create table #tableA
(
Id int not null,
ValueDate date,
Price decimal(9,2)
)
insert into #tableA (Id, ValueDate, Price)
values
(1, '2020-08-01', 100),
(1, '2020-08-05', 110),
(1, '2020-08-07', 50)
My other table has the id and the date the product is active.
create table #tableB
(
Id int not null,
Dates date
)
insert into #tableB (Id, Dates)
values
(1, '2020-08-01'),
(1, '2020-08-02'),
(1, '2020-08-03'),
(1, '2020-08-04'),
(1, '2020-08-05'),
(1, '2020-08-06'),
(1, '2020-08-07'),
(1, '2020-08-04')
I cannot find an efficient query where my resulting table gives the rate of the product on a given date.
I am expecting this result.
Id Dates ValueDate Price
-------------------------------------
1, '2020-08-01', '2020-08-01', 100
1, '2020-08-02', '2020-08-01', 100
1, '2020-08-03', '2020-08-01', 100
1, '2020-08-04', '2020-08-01', 100
1, '2020-08-05', '2020-08-05', 110
1, '2020-08-06', '2020-08-05', 110
1, '2020-08-07', '2020-08-07', 50

Something like this:
SELECT DISTINCT B.[id]
,B.[Dates]
,DS.*
FROM #tableB B
CROSS APPLY
(
SELECT TOP 1 *
FROM #tableA A
WHERE B.[Id] = A.[Id]
AND B.[Dates] >= A.[ValueDate]
AND A.[Price] IS NOT NULL
ORDER BY A.[ValueDate] DESC
) DS;
or this:
WITH DataSource AS
(
SELECT DISTINCT B.[ID]
,B.[Dates]
,A.[ValueDate]
,A.[Price]
,SUM(IIF(A.[ID] IS NOT NULL, 1, 0)) OVER (ORDER BY B.[Dates]) AS [GroupID]
FROM #tableB B
LEFT JOIN #tableA A
ON B.[Id] = A.[Id]
AND B.[Dates] = A.[ValueDate]
AND A.[Price] IS NOT NULL
)
SELECT [ID]
,[Dates]
,MAX([ValueDate]) OVER (PARTITION BY [GroupID]) AS [ValueDate]
,MAX([Price]) OVER (PARTITION BY [GroupID]) AS [Price]
FROM DataSource;

Is there any function for insert more than one row with trigger group by some result?

CREATE TABLE TABLE_1 (
ID NUMBER(10),
ID_DOCUMENT NUMBER(10),
ITEM_ID NUMBER(10),
SUPLLIER NUMBER(10)
);
Insert into TABLE_1 (ID, ID_DOCUMENT, ITEM_ID, SUPLLIER) Values (1, 1, 11, 25);
Insert into TABLE_1 (ID, ID_DOCUMENT, ITEM_ID, SUPLLIER) Values (2, 1, 87, 31);
Insert into TABLE_1 (ID, ID_DOCUMENT, ITEM_ID, SUPLLIER) Values (3, 1, 93, 31);
Insert into TABLE_1 (ID, ID_DOCUMENT, ITEM_ID, SUPLLIER) Values (4, 1, 41, 25);
Insert into TABLE_1 (ID, ID_DOCUMENT, ITEM_ID, SUPLLIER) Values (5, 1, 58, 40);
When I insert into table_1 I have to insert the result into two other tables:
create table doc
(
id number(10),
suplier number(10),
date_doc date
);
create table doc_rows
(
id number(10),
id_doc number(10), -- (references doc.id)
item_id number(10)
);
I want to create 3 new records in table doc (because into table_1 we have 3 unique suppliers) and for every new doc I have to insert his items into table doc_rows

-- Create tables
CREATE TABLE TABLE_1 (
ID int identity (1,1) ,
ID_DOCUMENT int,
ITEM_ID int,
SUPLLIER int
);
create table doc
(
id int identity (1,1),
suplier int,
date_doc date
);
create table doc_rows
(
id int identity (1,1),
id_doc int, -- (references doc.id)
item_id int
);
--Create triggers
GO
CREATE TRIGGER trgTABLE_1_Insert ON TABLE_1
FOR INSERT
AS
declare #SUPLLIER int
select #SUPLLIER = SUPLLIER from inserted
if not exists (select 1 from doc where suplier=#SUPLLIER )
Begin
insert into doc (suplier,date_doc)
select #SUPLLIER,GETDATE()
insert into doc_rows (id_doc,item_id)
SELECT (select id from doc where suplier = #SUPLLIER) id_doc ,
(SELECT top 1 ITEM_ID from TABLE_1 where SUPLLIER = #SUPLLIER) item_id
End
Go
-- insert statement
Insert into TABLE_1 ( ID_DOCUMENT, ITEM_ID, SUPLLIER) Values ( 1, 11, 25);
Insert into TABLE_1 ( ID_DOCUMENT, ITEM_ID, SUPLLIER) Values ( 1, 87, 31);
Insert into TABLE_1 ( ID_DOCUMENT, ITEM_ID, SUPLLIER) Values ( 1, 93, 31);
Insert into TABLE_1 ( ID_DOCUMENT, ITEM_ID, SUPLLIER) Values ( 1, 41, 25);
Insert into TABLE_1 ( ID_DOCUMENT, ITEM_ID, SUPLLIER) Values ( 1, 58, 40);
--fianl output
SELECT * from TABLE_1
SELECT * from doc
SELECT * from doc_rows

SQL statement to get all customers with no orders TODAY(current date)

The question is, how do I write a statement that would return all customers with NO Orders TODAY using sql join?
Tables : tbl_member ,tbl_order
tbl_member consist of id,name,
tbl_order consist of id, date, foodOrdered

If you left join, the select where the table on the right is nulkl, it limits to the rows that DO NOT meet the join condition:
select t1.*
from tbl_member t1
left join tbl_member t2
on t1.id = t2.id -- assuming that t2.id relates to t1.id
and t2.date = current_date() -- today's date in mysql
where t2.id is null

Assuming tbl_order date is a datetime (it probably should be) for sql server you could use something like:
declare #tbl_member table
(
id int,
fullname varchar(50)
)
declare #tbl_order table
(
id int,
orderdate datetime,
foodOrdered varchar(50)
)
INSERT INTO #tbl_member VALUES (1, 'George Washington')
INSERT INTO #tbl_member VALUES (2, 'Abraham Lincoln')
INSERT INTO #tbl_member VALUES (3, 'Mickey Mouse')
INSERT INTO #tbl_member VALUES (3, 'Donald Duck')
INSERT INTO #tbl_order VALUES (1, '2017-07-01 13:00:00', 'Fish and Chips')
INSERT INTO #tbl_order VALUES (2, '2017-07-03 08:00:00', 'Full English')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 08:00:00', 'Veggie Burger')
INSERT INTO #tbl_order VALUES (3, '2017-07-25 12:00:00', 'Bangers and Mash')
SELECT id, fullname FROM #tbl_member WHERE id NOT IN
(SELECT id FROM #tbl_order
WHERE CAST(orderDate as date) = CAST(GETDATE() as Date))
It helps if you specify what flavour database you are using as the syntax is often subtly different.

How to use DateDiff into only one SELECT statement?

I want to make a short version on my DATEDIFF function on my SQL Query. In my code, I created two temporary tables then there, I select and use the DATEDIFF funtion.
I would want this code to be simplified and only use ONE SELECT statement that will provide the same results. Is it possible?
Here is my result:
This is my SQL Query
DECLARE #Temp TABLE (ID int, Stamp datetime)
INSERT INTO #Temp (ID, Stamp) VALUES (1, '2016-08-17')
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE()+0.5)
INSERT INTO #Temp (ID, Stamp) VALUES (2, '2016-08-16')
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE()+3)
SELECT ROW_NUMBER() OVER (ORDER BY ID) as c, ID, Stamp INTO #Temp2
FROM #Temp
SELECT ROW_NUMBER() OVER (ORDER BY ID) as d, ID, Stamp INTO #Temp3
FROM #Temp
SELECT temp2.ID, temp2.Stamp, ISNULL(DATEDIFF(day, temp3.Stamp, temp2.Stamp),0) as DateDiff
FROM #Temp2 as temp2
LEFT JOIN #Temp3 as temp3 on temp2.ID = temp3.ID and temp2.c = temp3.d + 1
Thanks!

If you are using SQL Server 2012:
select * ,isnull(datediff(day,lag(stamp) over(partition by id order by stamp),stamp) ,0)
from #temp t1
Else use this..
;with cte
as
(select * ,row_number() over (partition by id order by stamp ) as rownum
from #temp t1
)
select c1.id,c1.stamp,isnull(datediff(day,c2.stamp,c1.stamp),0) as datee
from cte c1
left join
cte c2
on c1.id=c2.id and c1.rownum=c2.rownum+1

You could remove insert into the temp-tables and use subselects within the final query:
DECLARE #Temp TABLE (ID int, Stamp datetime)
INSERT INTO #Temp (ID, Stamp) VALUES (1, '2016-08-17')
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE()+0.5)
INSERT INTO #Temp (ID, Stamp) VALUES (2, '2016-08-16')
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE()+3)
SELECT temp2.ID, temp2.Stamp, ISNULL(DATEDIFF(day, temp3.Stamp, temp2.Stamp),0) as DateDiff
FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) as c, ID, Stamp FROM #Temp) as temp2
LEFT JOIN (SELECT ROW_NUMBER() OVER (ORDER BY ID) as d, ID, Stamp FROM #Temp) as temp3
on temp2.ID = temp3.ID and temp2.c = temp3.d + 1

In SQL Server 2012+, you would just use lag():
select t.*
isnull(datediff(day, lag(stamp) over (partition by id order by stamp), stamp), 0)
from #temp t;
In earlier versions, I would use outer apply:
select t.*,
isnull(datediff(day, t2.stamp, t.stamp), 0)
from #temp t outer apply
(select top 1 t2.*
from #temp t2
where t2.id = t.id and t2.stamp < t.stamp
order by t2.stamp desc
) t2;

try a cte,
DECLARE #Temp TABLE (ID int, Stamp datetime)
INSERT INTO #Temp (ID, Stamp) VALUES (1, '2016-08-17')
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (1, GETDATE()+0.5)
INSERT INTO #Temp (ID, Stamp) VALUES (2, '2016-08-16')
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE())
INSERT INTO #Temp (ID, Stamp) VALUES (2, GETDATE()+3)
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) as RowNo, ID, Stamp
FROM #Temp
)
SELECT temp2.ID, temp2.Stamp, ISNULL(DATEDIFF(day, temp3.Stamp, temp2.Stamp),0) as DateDiff
FROM CTE as temp2
LEFT JOIN CTE as temp3 on temp2.ID = temp3.ID
AND temp2.RowNo = temp3.RowNo + 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Spark SQL Query to assign join date by next closest once - apache-spark-sql

I solved it. Thanks anyway. SELECT * FROM (SELECT *, COALESCE(LEAD(CALL_DATE) OVER (PARTITION BY 1 ORDER BY CALL_DATE), CURRENT_DATE) AS CALL_DATE_NEXT FROM TABLE_1) AS A LEFT JOIN TABLE_2 AS B ON (A.CALL_DATE<=B.PROD_DATE AND A.CALL_DATE_NEXT>B.PROD_DATE)

Related

Most efficient way to update table column based on sum

SQL Query to get the value of a product given a date

Is there any function for insert more than one row with trigger group by some result?

SQL statement to get all customers with no orders TODAY(current date)

How to use DateDiff into only one SELECT statement?

Categories

Resources