Divide window values by a reference row

Divide window values by a reference row - sql

I would like to have some guidance or help to address the following problem:
I have the following data in a Spark Data frame.
I would like to create a window of n days preceding a succeeding a reference record and then calculate a division using reference values with the values in the window.
However I have not figured out how to do this kind of operation, everything that I find is just mean, count or sum operations in the window.
Original data looks like this:
| symbol_id | date | close | is_reference |
|----------|------------|----------|--------------|
| XXXX | 2000-01-19 | 809.9644 | FALSE |
| XXXX | 2000-01-20 | 784.274 | FALSE |
| XXXX | 2000-01-21 | 774.2831 | FALSE |
| XXXX | 2000-01-24 | 760.0106 | FALSE |
| XXXX | 2000-01-25 | 750.7335 | FALSE |
| XXXX | 2000-01-26 | 750.7335 | TRUE |
| XXXX | 2000-01-27 | 742.17 | FALSE |
| XXXX | 2000-01-28 | 749.3063 | FALSE |
| XXXX | 2000-01-31 | 750.02 | FALSE |
| XXXX | 2000-02-01 | 762.8653 | FALSE |
| XXXX | 2000-02-02 | 749.3063 | FALSE |
Expected output looks like this:
| symbol_id | date | close | is_reference | reference_change |
|----------|------------|----------|--------------|-------------------|
| XXXX | 2000-01-19 | 809.9644 | FALSE | 1.07889737170381 |
| XXXX | 2000-01-20 | 784.274 | FALSE | 1.04467697258748 |
| XXXX | 2000-01-21 | 774.2831 | FALSE | 1.03136878799201 |
| XXXX | 2000-01-24 | 760.0106 | FALSE | 1.0123573811479 |
| XXXX | 2000-01-25 | 750.7335 | FALSE | 1 |
| XXXX | 2000-01-26 | 750.7335 | TRUE | 1 |
| XXXX | 2000-01-27 | 742.17 | FALSE | 0.988593155893536 |
| XXXX | 2000-01-28 | 749.3063 | FALSE | 0.99809892591712 |
| XXXX | 2000-01-31 | 750.02 | FALSE | 0.999049596161621 |
| XXXX | 2000-02-01 | 762.8653 | FALSE | 1.01615992892285 |
| XXXX | 2000-02-02 | 749.3063 | FALSE | 0.99809892591712 |
I'm currently partition by symbol_id using the following snippet:
val window = Window.partitionBy(SYMBOL_ID)
.orderBy(col(DATE).desc)
.rowsBetween(5,0) // RangeBetween looks better but i just trying with rowsBetween for now
And trying to do something like this on reference_change column.
df
.withColumn("close_movement", $"close"/lit(col("close")
.where(col("is_reference") === true)).over(window)) // This command is wrong but its the most similar to thoughts in my mind.
So at the end I will be using the close WHERE is_reference = true divide by the close on the windows like the reference_change column we have on the expected output.
Thank you for your help!

I would just use a simple join:
val ref = df.filter($"is_reference")
df.join(ref, df.col("symbol_id") === ref.col("symbol_id") &&
abs(date_diff(df.col("date"), ref.col("date"))) <= 5)
.select(df.col("symbol_id"), df.col("date"), df.col("close"), df.col("is_reference"),
(df.col("close") / ref.col("close")).as("reference_change"))

Related

Selecting records when a criteria is met

I am trying to come up with a SQL that will NOT select records when error value is "true" and when the name of that person as well as the date are the same. I thought perhaps a main query using the IN function where the parameter would be a sub query that will identify what the duplicates are for User_ID and Error_Dt. So for example:
Sample Data:
+----------+-------+---------+----------+
| Error_ID | Error | User_ID | Error_Dt |
+----------+-------+---------+----------+
| Err_A_01 | True | JP_123 | 20200307 |
| Err_A_02 | True | DF_455 | 20200605 |
| Err_A_03 | True | DF_455 | 20200605 |
| Err_A_04 | False | DF_455 | 20200703 |
| Err_B_01 | False | BH_135 | 20200219 |
| Err_B_02 | True | DP_246 | 20200310 |
| Err_B_03 | True | DP_246 | 20200310 |
| Err_B_04 | True | DP_246 | 20200509 |
| Err_B_05 | False | DP_246 | 20200601 |
| Err_B_06 | True | KS_159 | 20200120 |
| Err_B_07 | True | KS_159 | 20200120 |
| Err_B_08 | True | KS_159 | 20200310 |
| Err_C_01 | False | JH_123 | 20200702 |
+----------+-------+---------+----------+
Desire Results:
+----------+-------+---------+----------+
| Error_ID | Error | User_ID | Error_Dt |
+----------+-------+---------+----------+
| Err_A_01 | True | JP_123 | 20200307 |
| Err_A_04 | False | DF_455 | 20200703 |
| Err_B_01 | False | BH_135 | 20200219 |
| Err_B_04 | True | DP_246 | 20200509 |
| Err_B_05 | False | DP_246 | 20200601 |
| Err_B_08 | True | KS_159 | 20200310 |
| Err_C_01 | False | JH_123 | 20200702 |
+----------+-------+---------+----------+

Select only unique Error + User_ID + Error_Dt rows or those not 'True'.
select Error_ID, Error, User_ID, Error_Dt
from (
select *,
count(*) over(partition by Error, User_ID, Error_Dt) cnt
from tbl ) t
where Error <> 'True' OR cnt = 1
order by Error_ID;

VBA Copy & Paste Loop ( Generate Field Number)

Right now im working to generate a label based on quantity in excel. I managed to get it copy & paste based on value from cell. But, i didnt know how to make some cell change according to the loop.
Below is as example :
Current result :
| A | B | C | D | E |
|------------------------------- |----- |-------------------- |----- |----- |
| NMB IN DIA | | MADE IN THAILAND | | |
| INVOICE NO | : | MM035639 | | |
| C/NO | : | 1 | / | 2 |
| SHIP TO | : | A | | |
| QTY | : | 100 | | |
| NMB PARTS NO | : | SFASDF234 | | |
| | | *SFASDF234* | | |
| CUST PARTS NO | : | SFASDF234 | | |
| CUST ORDER NO | : | | | |
| ----------------------------- | --- | ------------------ | --- | --- |
| NMB IN DIA | | MADE IN THAILAND | | |
| INVOICE NO | : | MM035639 | | |
| C/NO | : | 1 | / | 2 |
| SHIP TO | : | A | | |
| QTY | : | 100 | | |
| NMB PARTS NO | : | SFASDF234 | | |
| | | *SFASDF234* | | |
| CUST PARTS NO | : | | | |
| CUST ORDES NO | : | | | |
Expected result :
| A | B | C | D | E |
|------------------------------- |----- |-------------------- |----- |----- |
| NMB IN DIA | | MADE IN THAILAND | | |
| INVOICE NO | : | MM035639 | | |
| C/NO | : | 1 | / | 2 |
| SHIP TO | : | A | | |
| QTY | : | 100 | | |
| NMB PARTS NO | : | SFASDF234 | | |
| | | *SFASDF234* | | |
| CUST PARTS NO | : | SFASDF234 | | |
| CUST ORDER NO | : | | | |
| ----------------------------- | --- | ------------------ | --- | --- |
| NMB IN DIA | | MADE IN THAILAND | | |
| INVOICE NO | : | MM035639 | | |
| C/NO | : | 2 | / | 2 |
| SHIP TO | : | A | | |
| QTY | : | 100 | | |
| NMB PARTS NO | : | SFASDF234 | | |
| | | *SFASDF234* | | |
| CUST PARTS NO | : | | | |
| CUST ORDES NO | : | | | |
As you can see on the expected result, the C/No is loop based on quantity. Not just copy paste. Is there anything I can add?
Below is my current code :
Private Sub CommandButton1_Click()
Dim i As Long
For i = 2 To Worksheets("Sheet3").Range("E3").Value
Range("A1:A9", Range("E9")).Copy Sheet3.Range("A65536").End(xlUp)(2)
Next i
End Sub

Just set the value of the relevant cell to i:
Private Sub CommandButton1_Click()
Dim i As Long
Dim NewLoc As Range
For i = 2 To Worksheets("Sheet3").Range("E3").Value
'Decide where to copy the output to
Set NewLoc = Sheet3.Cells(Sheet3.Rows.Count, "A").End(xlUp).OffSet(1, 0)
'Copy the range
Range("A1:E9").Copy NewLoc
'Change the value of the cell 2 rows down and 2 rows to the right
NewLoc.Offset(2, 2).Value = i
Next i
End Sub

Crosstab query to show the working hours per day for each Vessel

I have made a Crosstab Query that should give information about the total working hours in each day for every Vessel we had in our small harbor.
my query:
TRANSFORM Sum(Main.WorkingH) AS SumOfWorkingH
SELECT DateValue([DeptDate]) AS [Date]
FROM Vessels INNER JOIN Main ON Vessels.ID = Main.VesselID
GROUP BY DateValue([DeptDate])
ORDER BY DateValue([DeptDate])
PIVOT Vessels.Vessel;
the problem here is this query is returning the total working hours start from departure date
| +---------------+--------+----+----+----+----+----+----+ |
| | | | | | | | | | |
| +---------------+--------+----+----+----+----+----+----+ |
| | Date | A1 | A2 | A3 | F3 | F4 | F5 | F6 | |
| | 26-May-17 | | | 32 | 29 | | | | |
| | 27-May-17 | 3 | 13 | | | | | | |
| | 28-May-17 | | | | | | | 73 | |
| | 29-May-17 | | | | 12 | 6 | 27 | | |
| | 01-Jun-17 | | | 10 | | 7 | 41 | | |
| | 02-Jun-17 | | 2 | 15 | 5 | | | | |
| | 03-Jun-17 | | 4 | | | | | | |
| +---------------+--------+----+----+----+----+----+----+ |
The desired Result
when a vessel leaves at 6/1 9pm and arrive back at 6/3 10am. This should appear as following:
6/1-->3Hours
6/2-->24Hours
6/3-->10Hours
**NOT** 6/1-->37Hours as in the previous table.
This is how it should look like
| +----------------+-----+----+----+----+--------+----+----+ |
| | Date | A1 | A2 | A3 | F3 | F4 | F5 | F6 | |
| +----------------+-----+----+----+----+--------+----+----+ |
| | 26-May-17 | | | 5 | 7 | | | | |
| | 27-May-17 | 3 | 13 | 24 | 21 | | | | |
| | 28-May-17 | | | 2 | | | | 9 | |
| | 29-May-17 | | | | 12 | 6 | 8 | 24 | |
| | 30-May-17 | | | | | | 18 | 24 | |
| | 31-May-17 | | | | | | | 15 | |
| | 01-Jun-17 | | | 10 | | 7 | 0 | | |
| | 02-Jun-17 | | 2 | 15 | 5 | 24 | | | |
| | 03-Jun-17 | | 4 | | | | 16 | | |
| +----------------+-----+----+----+----+--------+----+----+ |
These values are not accurate (I wrote them by hand), but I think you got the Idea
The Suggested Solution
while trying to fix this problem I made the following code which takes the
Public Function HoursByDate1(stTime, EndTime)
For dayloop = Int(EndTime) To Int(stTime) Step -1
If dayloop = Int(stTime) Then
WorkingHours = Hour(dayloop + 1 - stTime)
ElseIf dayloop = Int(EndTime) Then
WorkingHours = Hour(EndTime - dayloop)
Else
WorkingHours = 24
End If
HoursByDate1 = WorkingHours
Debug.Print "StartDate: " & stTime & ", EndDate:" & EndTime & ", The day:" & dayloop & " --> " & WorkingHours & " hours."
Next dayloop
End Function
It prints the data as following:
which is exactly what I want
But when I try to call this function from my query, It gets only the last value for each trip. as following:
| +-----------+----+----+----+----+----+----+----+ |
| | Date | A1 | A2 | A3 | F3 | F4 | F5 | F6 | |
| +-----------+----+----+----+----+----+----+----+ |
| | 5/26/2017 | | | 5 | 7 | | | | |
| | 5/27/2017 | 15 | 19 | | | | | | |
| | 5/28/2017 | | | | | | | 9 | |
| | 5/29/2017 | | | | 8 | 7 | 8 | | |
| | 6/1/2017 | | | 3 | | 6 | 0 | | |
| | 6/2/2017 | | 8 | 8 | 19 | | | | |
| | 6/3/2017 | | 9 | | | | | |
I seek any Solution: From VBA side of things or SQL Query Side.
Sorry for the very long question, but I wanted to show my effort on the subject because every time I am told that this is not enough Information

Need to shift the data to next column, unfortunately added data in wrong column

I have a table test
+----+--+------+--+--+--------------+--+--------------+
| ID | | Name1 | | | Name2 |
+----+--+------+--+--+--------------+--+--------------+
| 1 | | Andy | | | NULL |
| 2 | | Kevin | | | NULL |
| 3 | | Phil | | | NULL |
| 4 | | Maria | | | NULL |
| 5 | | Jackson | | | NULL |
+----+--+------+--+--+----------+--+--
I am expecting output like
+----+--+------+--+--+----------+--
| ID | | Name1 | | | Name2 |
+----+--+------+--+--+----------+--
| 1 | | NULL | | | Andy |
| 2 | | NULL | | | Kevin |
| 3 | | NULL | | | Phil |
| 4 | | NULL | | | Maria |
| 5 | | NULL | | | Jackson |
+----+--+------+--+--+----------+--
I unfortunately inserted data in wrong column and now I want to shift the data to the next column.

You can use an UPDATE statement with no WHERE condition, to cover the entire table.
UPDATE test
SET Name2 = Name1,
Name1 = NULL

Hiding inside group columns from other columns that don't have values

I'm working on a report. How do I get columns from the outside that are displaying dates to be next to a column inside the matrix that is displaying values.
For example it is setup like this:
| HiredDt | TermDt | [Type] | LicDt | MedDt |
---------------------------------------------------------------------------------
ID | [HiredDt] | [TermDt] | SUM([Count_of_Type]) | [LicDt] | [MedDt] |
---------------------------------------------------------------------------------
And looks like this:
| HiredDt | TermDt | Lic | Med | App | LicDt | MedDt |
----------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 1 | 12 | 6/1/15 | 9/1/14 |
2 | 2/19/12 | 9/18/14 | 1 | 1 | 12 | 3/2/15 | 9/1/14 |
But when I use inside grouping to match up the date next to the associated document type I get:
| HiredDt | TermDt | Lic | | | Med | | | App | | |
----------------------------------------------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 6/1/15 | | 1 | | 9/1/2014 | 12 | | |
2 | 2/19/12 | 9/18/14 | 1 | 3/2/15 | | 1 | | 9/1/2014 | 12 | | |
What I'm trying to get this:
| HiredDt | TermDt | Lic | LicDt | Med | MedDt | App |
--------------------------------------------------------------------------------------
1 | 1/31/12 | 1/31/14 | 1 | 6/1/15 | 1 | 9/1/14 | 12 |
2 | 2/19/12 | 9/18/14 | 1 | 3/2/15 | 1 | 9/1/14 | 12 |
Is this possible?

I would right-click on the cell you have labelled SUM([Count_of_Type]) and choose Insert Column - Inside Group - Right.
In that new cell I would set the expression to: = Max ( [LicDt] )

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Divide window values by a reference row - sql

Related

Selecting records when a criteria is met

VBA Copy & Paste Loop ( Generate Field Number)

Crosstab query to show the working hours per day for each Vessel

Need to shift the data to next column, unfortunately added data in wrong column

Hiding inside group columns from other columns that don't have values

Categories

Resources