I am new to SnowFlake and exploring new things every day. I am stuck with the below scenario.
SELECT
'{COL}' AS field_name,
a.{COL}AS old_value,
b.{COL}AS new_value FROM A JOIN B ON(...)
WHERE a.{COL} != b.{COL}
I want to parameterize COL. And it may have multiple values like COL=col1,col2,col3. And then I want 3 queries separated by UNION as shown below.
SELECT
'col1' AS field_name,
a.col1 AS old_value,
b.col1 AS new_value FROM A JOIN B ON(...)
WHERE a.col1 != b.col1
UNION ALL
SELECT
'col2' AS field_name,
a.col2 AS old_value,
b.col2 AS new_value FROM A JOIN B ON(...)
WHERE a.col2 != b.col2
UNION ALL
SELECT
'col3' AS field_name,
a.col3 AS old_value,
b.col3 AS new_value FROM A JOIN B ON(...)
WHERE a.col3 != b.col3
Is there any way to achieve this in SnowSQL ( SnowFlake )?
It's certainly possible to create dynamic SQL in SnowFlake. The most common way is to use Stored Procedures and/or UDFs using javascript to generate the SQL and execute it.
In the javascript you can use string replacement, loops, etc to create lists of parameters, join conditions etc.
Here's the general documentation
and here's a couple useful javascript snippits
//Set Up a multi-column Join Condition based on the columns in ColumnList
//Example output: "a.column1 = b.column1 AND a.column2 = b.column2"
ConditionArray = [];
ColumnList.forEach(function(column){
conditionArray.push("a." + column+ " = " + "b." + column);
});
joinCondition = conditionArray.join(" AND ");
//Executing a script, and returning the output from a resultSet
//You can chain these calls for cleaner/shorter code
SQLQuery = "SELECT 1;"
SQLStatement = snowflake.createStatement({sqlText: SQLQuery });
SQLResultSet = SQL_Statement.execute();
SQLResultSet.next();
SQLResultSet.getColumnValue(1)
You can then create loops to execute multiple similar queries, or combine them together with UNIONs and then execute that.
https://docs.snowflake.net/manuals/sql-reference/stored-procedures-overview.html
Related
I have the below query for negative testing, But I want to replace the union all if possible.
select A.*
from A
join B
on A.COL1=B.COL1
where B.COL3 is null
union all
select A.*
from A
join B
on A.COL2=B.COL4
where B.COL5 is null;
Need to get data from both SQL without using union all
You could combine the two queries into a single join and collapse the where condition into it:
select A.*
from A
join B on (A.COL1 = B.COL1 and B.COL3 is null) or
(A.COL2 = B.COL4 and B.COL5 is null)
Since you're only after data from Table A you don't need the join to table B at all and can re-write this as an Exists...
SELECT A.*
FROM A
WHERE EXISTS (SELECT 1
FROM B
WHERE A.COL1=B.COL1 and B.COL3 is null)
OR EXISTS (SELECT 1
FROM B
WHERE A.COL2=B.COL4 and B.COL5 is null)
But this has likely has two issues:
I'm pretty sure if you look at the execution plan for both; you'll find the union all is more efficient because it operates at a set level instead of a row level ad the OR needed in this is slower.
This will return 1 record from A instead of 2 from that of a union all. had it been a union; this should/would return the same results and avoid the union. But simply put you want the same data from A twice (or more depending on cardinality of joins)
SELECT A.*
FROM A
JOIN B ON (A.COL1 = B.COL1 OR A.COL2 = B.COL4) AND B.COL3 IS NULL;
I want to update the fields of a table WHERE the combination of three other attributes is IN another table. I am having some difficulties with the syntax, so any help is appreciated.
You would normally use EXISTS for this:
SELECT *
FROM a
WHERE EXISTS (
SELECT 1
FROM b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 = b.col3
)
Convert the above to an UPDATE query.
I would like to find records which differ from eachother, based on different datasets in the same table, which are loaded on a different date.
So if one or more attributes(except from the key) differ from eachother from dataset x loaded on 1-1-2018 and dataset y loaded on 31-12-2018.
How do i achieve this in SQL?
The key on which the compare should be made is ZIP_CODE + House_ID
Greets,
you can get previous zipcode by LAG
SELECT ZipCode, HouseId,
LAG(ZipCode, 1,0) OVER (ORDER BY LoadDate) AS ZipCodeMinus1,
LAG(HouseId, 1,0) OVER (ORDER BY LoadDate) AS HouseIdMinus11
FROM Addresses;
A simple way to compare sets is
select ... a
EXCEPT
select ... b
but you need another
select ... b
EXCEPT
select ... a
and this doesn't tell you which columns are different.
Or you use a full outer join:
select
coalesce(a.ZIP_CODE, b.ZIP_CODE)
,coalesce(a.House_ID, b.House_ID)
,case when a.col1 <> b.col then 'a: || a.col1 || ' b:' || b.col1 end
...
from
( select ....) as a
full join
( select ....) as b
on a.ZIP_CODE = b.ZIP_CODE
and a.House_ID = b.House_ID
and ( a.col1 <> a.col1 or
a.col2 <> a.col2 or
a.col3 <> a.col3 or
...
)
If columns are NULLable you need to add more conditions checking for one of both columns is NULL. Of course this comparison syntax can be automatically created using the existing metadata....
So I have some procs I inherited that I am trying to clean up. One of the things I see over and over in them is the following:
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on B.col1 =A.col1
and B.col2 = A.col2
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on a.col1 =b.col1
and B.col2 is null
Now , I have tried to combine these to make them a single query using the following different final lines (not at the same time!):
1) and (B.col2 = A.col2 or B.col2 is null)
2) and (isnull(B.col2,'') = COALESCE(a.col2, ''))
However, it always seems to do one of the updates, not both. I feel like I am missing something rather obvious, Is there a good way to combine these two queries?
thanks
This query should work:
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on B.col1 = A.col1
and (B.col2 = A.col2 OR or B.col2 is null)
which you said you tried - but you may try it as a SELECT first and see what the results are. That may shed some light on why you're not getting the results you expect.
I would expect the following query to work in SQL Server:
Update A
Set ColX = B.Colx
From Table_A A Join
Table_B B
on a.col1 = b.col1 and
(B.col2 = A.col2 or B.col2 is null);
Notes:
You should use the alias defined in the from clause after the update. My understanding is that if you use the table name and the table is not in the from clause without an alias, then all rows will be updated.
Although I was pretty sure that SQL Server does not support table aliases in the set, I appear to be wrong about that, as this simple SQL Fiddle shows. Perhaps this was not allowed in some ancient version of SQL Server, and the limitation just stuck with me.
I have been doing a bit of searching for a while now on a particular problem, but I can't quite find this particular question
I have a rather unusual task to achieve in SQL:
I have two tables, say A and B, which have exactly the same column names, of the following form:
id | column_1 | ... | column_n
Both tables have the same number of rows, with the same id's, but for a given id there is a chance that the rows from tables A and B differ in one or more of the other columns.
I already have a query which returns all rows from table A for which the corresponding row in table B is not identical, but what I need is a query which returns something of the form:
id | differing_column
----------------------
1 | column_1
3 | column_6
meaning that the row with id '1' has different 'column_1' values in tables A and B, and the row with id '3' has different 'column_6' values in tables A and B.
Is this at all achievable? I imagine it might require some sort of pivot in order to get the column names as values, but I might be wrong. Any help/suggestions much appreciated.
Yes you can do that with a query like this:
WITH Diffs (Id, Col) AS (
SELECT
a.Id,
CASE
WHEN a.Col1 <> b.Col1 THEN 'Col1'
WHEN a.Col2 <> b.Col2 THEN 'Col2'
-- ...and so on
ELSE NULL
END as Col
FROM TableOne a
JOIN TableTwo b ON a.Id=b.Id
)
SELECT Id, Col
WHERE Col IS NOT NULL
Note that the above query is not going to return all the columns with differences, but only the first one that it is going to find.
You can do this with an unpivot -- assuming that the values in the columns are of the same type.
If your data is not too big, I would just recommend using a bunch of union all statements instead:
select a.id, 'Col1' as column
from a join b on a.id = b.id
where a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
union all
select a.id, 'Col2' as column
from a join b on a.id = b.id
where a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
. . .
This prevents issues with potential type conversion problems.
If you don't mind having the results on one row, you can do:
select a.id,
(case when a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
then 'Col1;'
else ''
end) +
(case when a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
then 'Col2;'
else ''
end) +
. . .
from a join b on a.id = b.id;
If your columns are of the same type, there is a slick method:
SELECT id,col
FROM (SELECT * FROM A UNION ALL SELECT * FROM B) t1
UNPIVOT (value for col in (column_1,column_2,column_3,column_4)) t2
GROUP BY id,col
HAVING COUNT(DISTINCT value) > 1
If you need to handle NULL as a unique value, then use HAVING COUNT(DISTINCT ISNULL(value,X)) > 1 with X being a value that doesn't occur in your data