R equivalent of case when in SQL? - sql

I have been using T-SQL for a while but I now have to make transition to R...
select case when date_column_1 < '20160801' then date_column_1 else '20160801' end as date_column_1, case when date_column_2 < '99991231' then '20190701' else date_column_2 end as date_column_2 from table
Also, apologies in advance with my crappy stackoverflow formatting skills.

Since you have only one condition to check for each column, we can use ifelse here. If there are multiple conditions you could check case_when from dplyr.
transform(table, date_column_1 = ifelse(date_column_1 < as.Date('2016-08-01'),
date_column_1, as.Date('2016-08-01')),
date_column_2 = ifelse(date_column_2 < as.Date('9999-12-31'),
as.Date('2019-07-01'), date_column_2))
Assuming date_column_1 and date_column_2 are of Date class, you cannot directly compare dates with literal string ("20160801") in R, so we convert the dates to compare to Date class as well. Moreover, for date_column_2 do you mean to replace all values with the date "2019-07-01" ?
As mentioned by #G. Grothendieck ifelse would convert the dates to numeric. We can use if_else from dplyr which is type-strict and will maintain the dates.
library(dplyr)
table %>%
mutate(date_column_1 = if_else(date_column_1 < as.Date('2016-08-01'),
date_column_1, as.Date('2016-08-01')),
date_column_2 = if_else(date_column_2 < as.Date('9999-12-31'),
as.Date('2019-07-01'), date_column_2)))

Related

Spark SQL is interpreting a datetime.date object as a mathematical formula or integer in statement

I've encountered a problem in Spark SQL. It is interpreting a datetime.date object as a mathematical formula, or integer, in a SQL statement I am writing.
currentDateAndTime = datetime,now()
current_month = currentDateAndTie.strftime("%m")
current_year = currentDateAndTime.strftime("%Y")
first_day_of_month = date(int(current_year), int(current)month), 1)
print(first_day_of_month)
type(first_day_of_month)
and you get:
2022-10-01
datetime.date
Then when I do
df = spark.sql("""
SELECT * FROM table_A
WHERE IncidentCreatedDate < {}
""".format(first_day_of_month))
I get an error that says AnalysisException: cannot resolve '(table_A.IncidentCreatedDate < ((2022 - 10) - 1' due to data type mismatch: differing types in '(tableA.IncidentCreatedDate < ((2022 - 10 - 1))' (date and int).;......
There might be a typo in everything above because I had to type everything out on another laptop since the other one is my work laptop and they don't like me sending anything from that laptop to anywhere else.)
pyspark doesn't support prepared statements.
format will replace the pace holder, but strings mus be in single quotes, so simply add them
df = spark.sql("""
SELECT * FROM table_A
WHERE IncidentCreatedDate < '{}'
""".format(first_day_of_month))

Using BETWEEN operator in a WHERE clause with dates from an internal table

I have an internal table populated with start and end dates for each type of period. I want to use this internal table in a WHERE clause of an SQL query to select items whose start and end dates are within the open period of their respective type.
TYPES: BEGIN OF s_openprd,
TETXT TYPE TETXT,
fromdate TYPE d,
todate TYPE d,
END OF s_openprd.
DATA: it_openprd TYPE TABLE OF s_openprd WITH KEY TETXT.
SELECT * FROM FPLT
INNER JOIN #it_openprd AS OP ON FPLT~TETXT = OP~TETXT
WHERE FPLT~FKDAT BETWEEN OP~fromdate AND OP~todate
AND FPLT~NFDAT BETWEEN OP~fromdate AND OP~todate
However I get the error saying that OP~fromdate should be of a compatible type to be used as an operator with BETWEEN. The types listed include the date type d.
I've tried replacing BETWEEN with regular >= and <= operators:
SELECT * FROM FPLT
INNER JOIN #it_openprd AS OP ON FPLT~TETXT = OP~TETXT
WHERE FPLT~FKDAT >= OP~fromdate AND FPLT~FKDAT <= OP~todate
AND FPLT~NFDAT >= OP~fromdate AND FPLT~NFDAT <= OP~todate
But the query returns incorrect results.
I assume the ABAP type d is incompatible with SQL type d ?
How can I use an internal table to restrict the selection in this way ?
Nothing prevents you from using FOR ALL ENTRIES instead of joining with internal table, if you ABAP version does not support it.
Regarding "incorrect results" I agree with Sandra, BETWEEN and LT/GT have totally identical sense, so it is more a matter of what you expect than correctness. I'd rather utilize standard logic for dealing with the issue that bothers you:
The problem with FPLT-NFDAT and FPLT-FKDAT is that their order is not consistent. In one entry, the value of NFDAT may be anterior to FKDAT and in another entry it's the opposite.
Following the same approach for you SQL query, you can write something like this:
TYPES: BEGIN OF ty_fplt,
fplnr TYPE fplnr,
fkdat TYPE fkdat,
nfdat TYPE nfdat,
END OF ty_fplt,
tt_fplt TYPE STANDARD TABLE OF ty_fplt WITH NON-UNIQUE KEY fkdat nfdat.
DATA(lt_fplt_base) = VALUE tt_fplt( ).
SELECT fplnr, CASE WHEN nfdat < fkdat THEN nfdat ELSE fkdat END AS fkdat,
CASE WHEN nfdat < fkdat THEN fkdat ELSE nfdat END AS nfdat
FROM fplt
INTO TABLE #lt_fplt_base.
SELECT *
FROM fplt AS f
INTO TABLE #DATA(result)
FOR ALL ENTRIES IN #lt_fplt_base
WHERE f~fplnr = #lt_fplt_base-fplnr
AND f~fkdat >= #lt_fplt_base-fkdat
AND f~nfdat <= #lt_fplt_base-nfdat.
Don't take it as a rule of thumb, it is just a quick suggestion.
P.S. Joining by text field INNER JOIN #it_openprd AS OP ON FPLT~TETXT = OP~TETXT does not make sense in any context. Text/string fields are often ambiguous, they often contain control characters, whitespaces, etc., which make them useless for primary key.

Can't explain this SQL query

Im sure these kind of questions are frowned upon but I really need some help. For the past 3 hours I have been staring at this SQL query and I just can't explain some of the logic in it. Normally I wouldn't ask but I'm reaching a deadline.
In the WHERE clause you will see an If construction. From what I can see it checks whether or not the Picking date was valid.
However, the way it's phrased it just looks weird, doesnt it always result in a 'Between'? (If x > y AND x < z then 'Between', else 'Not Between) = 'Between'.
And on what object does this 'if' result apply too? The way I interpret it the end result becomes WHERE 'Between' AND 'Between', which just doent make sense...
Any help is appreciated (P.S. The query is written for Access)
SELECT
DWH_PickOrderLines_Temp.*,
IIf(DWH_PickOrderLines_Temp.WayOfTransport IN ("ON", "PD"), "Kitting " & Mid(tbl_District_Activiteit.Activiteit, 8), IIf(DWH_PickOrderLines_Temp.PickMethode IN ("K", "V"), "Picking Bulk", tbl_District_Activiteit.Activiteit)) AS Activiteit,
IIf(DWH_PickOrderLines_Temp.WayOfTransport IN ("ON", "PD"), "Kitting", tbl_District_Activiteit.[Activiteit groep]) AS [Activiteit groep],
R14_Distinct_Warehouse_Location.Proces,
R14_Distinct_Warehouse_Location.Gebouw
FROM DWH_PickOrderLines_Temp
LEFT JOIN R14_Distinct_Warehouse_Location
ON DWH_PickOrderLines_Temp.PickLocation = R14_Distinct_Warehouse_Location.PickLocation
WHERE (((IIf([DWH_PickOrderLines_Temp].[PickDateTime] >= [R14_Distinct_Warehouse_Location].[tbl_Location_Zone_District.ValidFrom]
AND [DWH_PickOrderLines_Temp].[PickDateTime] < [R14_Distinct_Warehouse_Location].[tbl_Location_Zone_District.ValidTo], "Between", "Not Between")) = "Between")
AND ((IIf([DWH_PickOrderLines_Temp].[PickDateTime] >= [R14_Distinct_Warehouse_Location].[tbl_District_Activiteit.ValidFrom]
AND [DWH_PickOrderLines_Temp].[PickDateTime] < [R14_Distinct_Warehouse_Location].[tbl_District_Activiteit.ValidTo], "Between", "Not Between")) = "Between"));
the where never returns a value. and it is duplicated....
declare #pickdate int = 1
declare #validForm int = 0
print 'start'
if(((IIf(#pickdate >= #validForm AND #pickdate < #validForm, 'Between', 'Not Between')) = 'Between')
AND ((IIf(#pickdate >= #validForm AND #pickdate < #validForm, 'Between', 'Not Between')) = 'Between'))
print 'test'
I think the query miss a part (references to two tables, see my comment above). Anyway I think you can simplify where condition:
WHERE [DWH_PickOrderLines_Temp].[PickDateTime] >= [R14_Distinct_Warehouse_Location].[tbl_Location_Zone_Distric‌​t.ValidFrom]
AND [DWH_PickOrderLines_Temp].[PickDateTime] < [R14_Distinct_Warehouse_Location].[tbl_Location_Zone_Distric‌​t.ValidTo]
AND [DWH_PickOrderLines_Temp].[PickDateTime] >= [R14_Distinct_Warehouse_Location].[tbl_District_Activiteit.V‌​alidFrom]
AND [DWH_PickOrderLines_Temp].[PickDateTime] < [R14_Distinct_Warehouse_Location].[tbl_District_Activiteit.V‌​alidTo]
Why do you think if statement always results in 'Between'?
Basically, where condition boils down to:
where
pickDateTime between validFrom1 and validTo1
and
pickDateTime between validFrom2 and validTo2

Update Query depending on Date

RN_TESTCYCL_ID
RN_CYCLE_ID
RN_TEST_ID
RN_RUN_ID
RN_RUN_NAME
RN_EXECUTION_DATE (if RN_VTS Null, compare with this Date, format: DDMMYY)
RN_EXECUTION_TIME
RN_HOST
RN_STATUS
RN_DURATION
RN_TESTER_NAME
RN_PATH
RN_USER_01
RN_USER_02
RN_USER_03
RN_USER_04
RN_USER_05
RN_USER_06
RN_USER_07
RN_USER_08
RN_USER_09
RN_USER_10
RN_USER_11
RN_USER_12
RN_TEST_VERSION
RN_ATTACHMENT
RN_RUN_VER_STAMP
RN_VTS (compare with current Date, sometimes Null, Format: YYYYMMDDHH24MISS)
RN_CYCLE
RN_TEST_INSTANCE
RN_OS_NAME
RN_OS_SP
RN_OS_BUILD
RN_VC_LOKEDBY
RN_VC_STATUS
RN_VC_VERSION
RN_OS_CONFIG
RN_ASSIGN_RCYC
RN_BPTA_CHANGE_DETECTED
RN_BPTA_CHANGE_AWARENESS
RN_VC_VERSION_NUMBER
RN_PINNED_BASELINE
RN_TEST_CONFIG_ID
RN_DRAFT
RN_ITERS_PARAMS_VALUES
RN_ITERS_SUM_STATUS
RN_BPT_STRUCTURE
RN_STATE
RN_COMMENTS
RN_SUBTYPE_ID
RN_TEXT_SYNC
RN_ENVIRONMENT
RN_BUILD_REVISION
RN_DETAIL
RN_JENKINS_URL
RN_JENKINS_JOB_NAME
RN_RESULTS_FILES_NETWORK_PATH
I need to update every RN_Tester_name to "anonymous" if RN_VTS < DateX and if RN_VTS is Null then compare RN_EXECUTION_DATE < DateX
Can someone figure out a query to update RN_Tester_Name, i'm kinda stuck?
Use coalesce SQL function to select the first non-null column. Here you see an example oraclesql update statement:
update X set
rn_tester_name = 'anonymous'
where coalesce(rn_vts, rn_execution_date) < Y
I don't know how you can use that in your vb.net code thought.

SQL check for NULLs in WHERE clause (ternary operator?)

What would the SQL equivalent to this C# statement be?
bool isInPast = (x != null ? x < DateTime.Now() : true)
I need to construct a WHERE clause that checks that x < NOW() only if x IS NOT NULL. x is a datetime that needs to be null sometimes, and not null other times, and I only want the WHERE clause to consider non-null values, and consider null values to be true.
Right now the clause is:
dbo.assignments.[end] < { fn NOW() }
Which works for the non-NULL cases, but NULL values always seem to make the expression evaluate to false. I tried:
dbo.assignments.[end] IS NOT NULL AND dbo.assignments.[end] < { fn NOW() }
And that seems to have no effect.
For use in a WHERE clause, you have to test separately
where dbo.assignments.[end] is null or dbo.assignments.[end] < GetDate()
or you can turn the nulls into a date (that will always be true)
where isnull(dbo.assignments.[end],0) < GetDate()
or you can do the negative test against the bit flag derived from the below
where case when dbo.assignments.[end] < GetDate() then 0 else 1 end = 1
The below is explanation and how you would derive isInPast for a SELECT clause.
bool isInPast = (x != null ? x < DateTime.Now() : true)
A bool can only have one of two results, true or false.
Looking closely at your criteria, the ONLY condition for false is when
x != null && x < now
Given that fact, it becomes an easy translation, given that in SQL, x < now can only be evaluated when x!=null, so only one condition is needed
isInPast = case when dbo.assignments.[end] < { fn NOW() } then 0 else 1 end
(1 being true and 0 being false)
Not sure what { fn NOW() } represents, but if want SQL Server to provide the current time, use either GETDATE() or if you are working with UTC data, use GETUTCDATE()
isInPast = case when dbo.assignments.[end] < GetDate() then 0 else 1 end
The one you are looking for is probably the CASE statement
You need something like
WHERE X IS NULL
OR X < NOW()
Have two separate queries, one when x is null one when is not. Trying to mix the two distinct conditions is the sure shot guaranteed way to get a bad plan. Remember that the generated plan must work for all values of x, so any optimization based on it (a range scan on an index) is no longer possible.