Snowflake ignores statement in where clause where I'm comparing timestamps - sql

so I'm building a SCD type 2 in snowflake, but it ignores the where clause in which is comparision between "to_timestamp" and "expiry_date". Expiry_date is a variable that is set to '9999-08-17 07:31:29.901000000' (as infinity) and To_timestamp is a column in table. I want to query only the rows that have to_timestamp set to infinity (they are still active) but snowflake seems to ignore this part of where clause. Below is some of the code (it should update the rows that are expired - that means change their "to_timestamp" to current time. and it does but it does to rows with timestamps of all kind - it ignores last line)
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000';
SET CURRENT_DATE_NTZ = TO_TIMESTAMP_NTZ(CURRENT_TIMESTAMP());
UPDATE CUSTOMER_TARGET CT
SET CT.TO_TIMESTAMP = $CURRENT_DATE_NTZ
FROM POC.SNOWFLAKE_POC.CUSTOMER_STAGE CS
WHERE CT.C_CUSTOMER_ID = CS.C_CUSTOMER_ID
AND (CT.C_FIRST_NAME <> CS.C_FIRST_NAME OR CT.C_LAST_NAME <> CS.C_LAST_NAME OR CT.C_BIRTH_YEAR
<> CS.C_BIRTH_YEAR OR CT.C_BIRTH_COUNTRY <> CS.C_BIRTH_COUNTRY OR CT.C_LAST_REVIEW_DATE<>CS.C_LAST_REVIEW_DATE)
AND CT.TO_TIMESTAMP = $EXPIRY_DATE_NTZ;
I have two of these update statements (one for updates and one for deletes) and a merge statement for inserts. And it ignores the comparision in every single one, updating the rows that have "to_timestamp" set to something like "2021-08-24 07:11:53.510000000". I've tried every combination possible (between ... and ..., >= ... <=, <=, >=, comparing in "case" statement of update,...) - nothing. What could be the cause/solution?

As we do not know the structure of CUSTOMER_TARGET I would suggest to explicitly set the data type of EXPIRY_DATE_NTZ variable to match the column data type:
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000';
SELECT $EXPIRY_DATE_NTZ;
DESCRIBE RESULT LAST_QUERY_ID();
to:
-- TIMESTAMP_NTZ as an example
SET EXPIRY_DATE_NTZ = '9999-08-17 07:31:29.901000000'::TIMESTAMP_NTZ;
SELECT $EXPIRY_DATE_NTZ;
DESCRIBE RESULT LAST_QUERY_ID();
By doing that way there are no "implicit conversions" involved in the process.
Another advice is usage of IS DISTINCT FROM instead of <>. IS DISTINCT FROM is NULL safe, which is important if columns are defined as nullable.
UPDATE CUSTOMER_TARGET CT
SET CT.TO_TIMESTAMP = $CURRENT_DATE_NTZ
FROM POC.SNOWFLAKE_POC.CUSTOMER_STAGE CS
WHERE CT.C_CUSTOMER_ID = CS.C_CUSTOMER_ID
AND (CT.C_FIRST_NAME IS DISTINCT FROM CS.C_FIRST_NAME
OR CT.C_LAST_NAME IS DISTINCT FROM CS.C_LAST_NAME
OR CT.C_BIRTH_YEAR IS DISTINCT FROM CS.C_BIRTH_YEAR
OR CT.C_BIRTH_COUNTRY IS DISTINCT FROM CS.C_BIRTH_COUNTRY
OR CT.C_LAST_REVIEW_DATE IS DISTINCT FROM CS.C_LAST_REVIEW_DATE)
AND CT.TO_TIMESTAMP = $EXPIRY_DATE_NTZ;

Your SQL does not have any issues with the filters (ORs are surrounded by the brackets etc). I assume that you have checked the execution profile, and did not see your filter (CT.TO_TIMESTAMP = '9999-08-17 07:31:29.901000000'). In this case, all rows in the target table should have this value in the column TO_TIMESTAMP.
I highly recommend you check the data first. If you are running multiple UPDATE/MERGE commands, you may miss that the data has already updated with this value.

Related

MS Access Query date range when last date not found

I have the following update query in MS Access 2013
UPDATE WXObs SET WXObs.SnowFlag = 1
WHERE (((WXObs.StationID) ="451409") And(
(WxObs.ObsDate) Between #1/3/2003# AND #3/29/2003# OR
(WxObs.ObsDate) Between #11/16/2003# AND #5/7/2004# OR
(WxObs.ObsDate) Between #10/30/2004# AND #4/30/2005#));
This works until the end date in the range is not found. For instance, if 5/7/2004 is not in the data set, then the update continues to the next end date, in this case 4/30/2005.
I would prefer it ended on the last date in the range. For instance, if the data ended on 4/21/2004, that would be last field updated between 11/16/ and 5/7/2004. The query would then continue to update again beginning on 10/30/2004.
I have tried < and <=
Thanks
You're missing some parentheses that are affecting the evaluation order, causing the behavior that you're reporting.
What you want is for each of the BETWEEN portions to be evaluated completely before the OR option is evaluated, and you need to make sure that evaluation is done by surrounding the BETWEEN expressions in parentheses to guarantee the evaluation order.
This should correct it (untested, as you've not provided the test data necessary to create a test case).
UPDATE WXObs SET WXObs.SnowFlag = 1
WHERE
(WXObs.StationID ="451409")
And
(
(WxObs.ObsDate Between #1/3/2003# AND #3/29/2003#) OR
(WxObs.ObsDate Between #11/16/2003# AND #5/7/2004#) OR
(WxObs.ObsDate Between #10/30/2004# AND #4/30/2005#)
);

How to concatenate date variables to create between in where condition

While I am trying to set my value to between the user given start date and end date for a query, I am running into a run-time error 3071 (The expression is typed incorrectly or it is too complex)
This is being used to pass a user given variable from a form to a query
Please see below
WHERE (
IIf([Forms]![Find]![Entity]<>"",DB.Entity=[Forms]![Find]![Entity],"*")
AND IIf([Forms]![Find]![AEPS]<>"",DB.AEPSProgram=[Forms]![Find]![AEPS],"*")
AND IIf([Forms]![Find]![DeliveryType]<>"",DB.DeliveryType=[Forms]![Find]![DeliveryType],"*")
AND IIf([Forms]![Find]![ReportingYear] is not Null,DB.ReportingYear=[Forms]![Find]![ReportingYear],"*")
AND IIf([Forms]![Find]![Price] is not Null,DB.Price=[Forms]![Find]![Price],"*")
AND IIf([Forms]![Find]![Volume]is not Null,DB.Volume=[Forms]![Find]![Volume],"*")
AND IIf([Forms]![Find]![sDate] is not Null AND [Forms]![Find]![eDate] is not Null,DB.TransactionDate= ">" & [Forms]![Find]![sdate] & " and <" & [Forms]![Find]![edate],"*")
);
If I set it equal to one of the dates it works as expected. I assume I am missing something with how I am joining the dates
Thank you
This (for the date field only) works:
WHERE (((DB.TransactionDate)>[Forms]![Find]![sdate] And (DB.TransactionDate)<[Forms]![Find]![edate])) OR ((([Forms]![Find]![sDate]+[Forms]![Find]![eDate]) Is Null));
Don't use *, it is for use with Like.
Consider assigning NULL condition to corresponding column via NZ() which sets column equal to itself and so avoids filtering any rows by that respective condition. Asterisks alone does not evaluate to a boolean condition unless using LIKE operator.
WHERE DB.Entity = NZ([Forms]![Find]![Entity], DBEntity)
AND DB.AEPSProgram = NZ([Forms]![Find]![AEPS], DB.AEPSProgram)
AND DB.DeliveryType = NZ([Forms]![Find]![DeliveryType], DB.DeliveryType)
AND DB.ReportingYear = NZ([Forms]![Find]![ReportingYear], DB.ReportingYear)
AND DB.Price = NZ([Forms]![Find]![Price], DB.Price)
AND DB.Volume = NZ([Forms]![Find]![Volume], DB.Volume)
AND DB.TransactionDate >= NZ([Forms]![Find]![sdate], DB.TransactionDate)
AND DB.TransactionDate <= NZ([Forms]![Find]![edate], DB.TransactionDate)
;

SQL regex and field

I want to change the query to return multiply values in extra_fields, how can I change the regex? Also I don't understand what extra_fields is - is it a field? If so why it is not called with the table prefix like i.extra_fields?
SELECT i.*,
CASE WHEN i.modified = 0 THEN i.created ELSE i.modified END AS lastChanged,
c.name AS categoryname,
c.id AS categoryid,
c.alias AS categoryalias,
c.params AS categoryparams
FROM #__k2_items AS i
LEFT JOIN #__k2_categories AS c ON c.id = i.catid
WHERE i.published = 1
AND i.access IN(1,1)
AND i.trash = 0
AND c.published = 1
AND c.access IN(1,1)
AND c.trash = 0
AND (i.publish_up = '0000-00-00 00:00:00'
OR i.publish_up <= '2013-06-12 22:45:19'
)
AND (i.publish_down = '0000-00-00 00:00:00'
OR i.publish_down >= '2013-06-12 22:45:19'
)
AND extra_fields REGEXP BINARY '(.*{"id":"2","value":\["[^\"]*1[^\"]*","[^\"]*2[^\"]*","[^\"]*3[^\"]*"\]}.*)'
ORDER BY i.id DESC
The extra_fields is a column of the #__k2_items table. The table qualifier can be omitted, because it is not ambiguous in this query. The column is JSON encoded. That is a serialization format used to store information which is not searchable by design. Applying a RegExp may work one day, but fail another day, since there is no guarantee for id preceeding value (as in your example).
The right way
The right way to filter this is to ignore the extra_fields condition in the SQL query an evaluate in the resultset instead. Example:
$rows = $db->loadObjectList('id');
foreach ($rows as $id => $row) {
$extra_fields = json_decode($row->extra_fields);
if ($extra_fields->id != 2) {
unset($rows[$id]);
}
}
The short way
If you can't change the database layout (which is true for extensions you want to keep updateable), you must split the condition into two, because there is no guarantee for a certain order of the subfields. For some reason, one day value may occur before id. So change your query to
...
AND extra_fields LIKE '%"id":"2"%'
AND extra_fields REGEXP BINARY '"value":\[("[^\"]*[123][^\"]*",?)+\]'
Prepare an intermediate table to hold the contents of extra_fields. Each extra_fields field will be converted into a series of records. Then do a join.
Create a trigger and cronjob to keep the temp table in sync.
Another way is to write UDF in Perl that will decode the field, but AFAIK it is not indexable in mysql.
Using an external search engine is out of scope.
Ok, i didnt want to change the db strucure, i gost some help and changed the regex intoAND extra_fields REGEXP BINARY '(.*{"id":"2","value":\[("[^\"]*[123][^\"]*",?)+\]}.*)'
and i got the right resaults
Thanks

Getting table(records) to update properply using the MERGE Statement

Good morning everyone!
Below is a piece of code I stitched together: I used a CTE to grab the records(data) from a link table and than convert strings to dates, than use the merge statement to get the data into a local table:
I am having a problem with the column(field) LAST_RACE_DATE this field is set to NULL and is not required but it does not update with my current set up. What I am trying to accomplished is for this field to populate when data is entered but also update, meaning it should also update with NULL.
So if the field has a specific date, and a new date is entered in the remote database, this field should update as well, even if the data is deleted in the back end, it should also remove the local table data for this field.
WITH CTE AS(
SELECT MEMBER_ID
,[MEMBER_DATE] = MAX(CONVERT(DATE, MEMBER_DATE))
,RACE_DATE = MAX(CONVERT(DATE, RACE_DATE))
,LAST_RACE_DATE = MAX(CONVERT(DATE, LAST_RACE_DATE))
FROM [EXAMPLE].[dbo].[LINKED_MEMBER_DATA]
WHERE (MEMBER_DATE IS NOT NULL) AND (ISDATE(MEMBER_DATE)<> 0) AND (RACE_DATE IS NOT NULL) AND (ISDATE(RACE_DATE)<> 0)
AND (LAST_RACE_DATE IS NULL) OR (ISDATE(LAST_RACE_DATE)<> 0)
GROUP BY MEMBER_ID)
MERGE dbo.LINKED_MEMBER_DATA AS Target
USING (SELECT
MEMBER_ID, MEMBER_DATE, RACE_DATE, LAST_RACE_DATE
FROM CTE
GROUP BY MEMBER_ID, RACE_DATE, LAST_RACE_DATE)AS SOURCE ON (Target.MEMBER_ID = SOURCE.MEMBER_ID)
WHEN MATCHED AND
(Target.MEMBER_DATE) <> (SOURCE.MEMBER_DATE)
OR (Target.RACE_DATE) <> (SOURCE.RACE_DATE)
OR ISNULL(TARGET.LAST_RACE_DATE , Target.LAST_RACE_DATE) <> ISNULL(SOURCE.LAST_RACE_DATE, SOURCE.LAST_RACE_DATE)
THEN UPDATE SET
Target.MEMBER_DATE = SOURCE.MEMBER_DATE
,Target.RACE_DATE = SOURCE.RACE_DATE
,Target.LAST_RACE_DATE = SOURCE.LAST_RACE_DATE
WHEN NOT MATCHED BY TARGET THEN
INSERT(
MEMBER_ID, MEMBER_DATE, RACE_DATE, LAST_RACE_DATE)
VALUES (Source.MEMBER_ID, Source.MEMBER_DATE, Source.RACE_DATE, Source.LAST_RACE_DATE);
I also tried this:
ISNULL(Target.LAST_RACE_DATE,'N/A') <> ISNULL(SOURCE.LAST_RACE_DATE,'N/A')
But it generates the below error for dates conversion:
Conversion failed when converting date and/or time from character string.
Thanks a Million!!
Your current statement is failing because the ISNULLs that you have don't do anything (if one of the values is NULL the expression will evaluate to NULL), and NULL values don't compare. Your second attempt doesn't work because ISNULL requires the data types of the two values to be the same, so you could try eg ISNULL(Target.LAST_RACE_DATE, '1970-01-01') <> ISNULL(Source.LAST_RACE_DATE, '1970-01-01').
Another option would be to simply enumerate the different cases (eg, (((Source.LAST_RACE_DATE IS NULL AND Target.LAST_RACE_DATE IS NOT NULL) OR (Source.LAST_RACE_DATE IS NOT NULL AND Target.LAST_RACE_DATE IS NULL) OR (Source.LAST_RACE_DATE <> Target.LAST_RACE_DATE))). Enumerating the different situations makes the code a bit more verbose, but it can result in better performance (whether it is measurably better really depends on how much data you are processing).

What is the best way to run N independent column updates in PostgreSQL? What is the best way to do it in the SQL spec?

I'm looking for a more efficient way to run many columns updates on the same table like this:
UPDATE TABLE table
SET col = regexp_replace( col, 'foo', 'bar' )
WHERE regexp_match( col, 'foo' );
Such that foo, and bar, will be a combination of 40 different regex-replaces. I doubt even 25% of the dataset needs to be updated at all, but what I'm wanting to know is it is possible to cleanly achieve the following in SQL.
A single pass update
A single match of the regex, triggers a single replace
Not running all possible regexp_replaces if only one matches
Not updating all columns if only one needs the update
Not updating a row if no column has changed
I'm also curious, I know in MySQL (bear with me)
UPDATE foo SET bar = 'baz'
Has an implicit WHERE bar != 'baz' clause
However, in PostgreSQL I know this doesn't exist: I think I could at least answer one of my questions if I knew how to skip a single row's update if the target columns weren't updated.
Something like
UPDATE TABLE table
SET col = *temp_var* = regexp_replace( col, 'foo', 'bar' )
WHERE col != *temp_var*
Do it in code. Open up a cursor, then: grab a row, run it through the 40 regular expressions, and if it changed, save it back. Repeat until the cursor doesn't give you any more rows.
Whether you do it that way or come up with the magical SQL expression, it's still going to be a row scan of the entire table, but the code will be much simpler.
Experimental Results
In response to criticism, I ran an experiment. I inserted 10,000 lines from a documentation file into a table with a serial primary key and a varchar column. Then I tested two ways to do the update. Method 1:
in a transaction:
opened up a cursor (select for update)
while reading 100 rows from the cursor returns any rows:
for each row:
for each regular expression:
do the gsub on the text column
update the row
This takes 1.16 seconds with a locally connected database.
Then the "big replace," a single mega-regex update:
update foo set t =
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(t,
E'\bcommit\b', E'COMMIT'),
E'\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b',
E'9ACF10762B5F3D3B1B33EA07792A936A25E45010'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:53:13\b', E'04:53:13'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bUpdate\b', E'UPDATE'),
E'\bversion\b', E'VERSION'),
E'\bto\b', E'TO'), E'\b2.9.1\b',
E'2.9.1'), E'\bcommit\b', E'COMMIT'),
E'\b61c89e56f361fa860f18985137d6bf53f48c16ac\b',
E'61C89E56F361FA860F18985137D6BF53F48C16AC'),
E'\bAuthor:\b', E'AUTHOR:'),
E'\bCarl\b', E'CARL'), E'\bWorth\b',
E'WORTH'), E'\b\b',
E''), E'\bDate:\b',
E'DATE:'), E'\bMon\b', E'MON'),
E'\bOct\b', E'OCT'), E'\b26\b',
E'26'), E'\b04:51:58\b', E'04:51:58'),
E'\b2009\b', E'2009'), E'\b-0700\b',
E'-0700'), E'\bNEWS:\b', E'NEWS:'),
E'\bAdd\b', E'ADD'), E'\bnotes\b',
E'NOTES'), E'\bfor\b', E'FOR'),
E'\bthe\b', E'THE'), E'\b2.9.1\b',
E'2.9.1'), E'\brelease.\b',
E'RELEASE.'), E'\bThanks\b',
E'THANKS'), E'\bto\b', E'TO'),
E'\beveryone\b', E'EVERYONE'),
E'\bfor\b', E'FOR')
The mega-regex update takes 0.94 seconds to update.
At 0.94 seconds compared to 1.16, it's true that the mega-regex update is faster, running in 81% of the time of doing it in code. It is not, however a lot faster. And ye Gods, look at that update statement. Do you want to write that, or try to figure out what went wrong when Postgres complains that you dropped a parenthesis somewhere?
Code
The code used was:
def stupid_regex_replace
sql = Select.new
sql.select('id')
sql.select('t')
sql.for_update
sql.from(TABLE_NAME)
Cursor.new('foo', sql, {}, #db) do |cursor|
until (rows = cursor.fetch(100)).empty?
for row in rows
for regex, replacement in regexes
row['t'] = row['t'].gsub(regex, replacement)
end
end
sql = Update.new(TABLE_NAME, #db)
sql.set('t', row['t'])
sql.where(['id = %s', row['id']])
sql.exec
end
end
end
I generated the regular expressions dynamically by taking words from the file; for each word "foo", its regular expression was "\bfoo\b" and its replacement string was "FOO" (the word uppercased). I used words from the file to make sure that replacements did happen. I made the test program spit out the regex's so you can see them. Each pair is a regex and the corresponding replacement string:
[[/\bcommit\b/, "COMMIT"],
[/\b9acf10762b5f3d3b1b33ea07792a936a25e45010\b/,
"9ACF10762B5F3D3B1B33EA07792A936A25E45010"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:53:13\b/, "04:53:13"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bUpdate\b/, "UPDATE"],
[/\bversion\b/, "VERSION"],
[/\bto\b/, "TO"],
[/\b2.9.1\b/, "2.9.1"],
[/\bcommit\b/, "COMMIT"],
[/\b61c89e56f361fa860f18985137d6bf53f48c16ac\b/,
"61C89E56F361FA860F18985137D6BF53F48C16AC"],
[/\bAuthor:\b/, "AUTHOR:"],
[/\bCarl\b/, "CARL"],
[/\bWorth\b/, "WORTH"],
[/\b<cworth#cworth.org>\b/, "<CWORTH#CWORTH.ORG>"],
[/\bDate:\b/, "DATE:"],
[/\bMon\b/, "MON"],
[/\bOct\b/, "OCT"],
[/\b26\b/, "26"],
[/\b04:51:58\b/, "04:51:58"],
[/\b2009\b/, "2009"],
[/\b-0700\b/, "-0700"],
[/\bNEWS:\b/, "NEWS:"],
[/\bAdd\b/, "ADD"],
[/\bnotes\b/, "NOTES"],
[/\bfor\b/, "FOR"],
[/\bthe\b/, "THE"],
[/\b2.9.1\b/, "2.9.1"],
[/\brelease.\b/, "RELEASE."],
[/\bThanks\b/, "THANKS"],
[/\bto\b/, "TO"],
[/\beveryone\b/, "EVERYONE"],
[/\bfor\b/, "FOR"]]
If this were a hand-generated list of regex's, and not automatically generated, my question is still appropriate: Which would you rather have to create or maintain?
For the skip update, look at suppress_redundant_updates - see http://www.postgresql.org/docs/8.4/static/functions-trigger.html.
This is not necessarily a win - but it might well be in your case.
Or perhaps you can just add that implicit check as an explicit one?