conditional column mapping in ssis - sql

I am a newbie on SSIS. I have a SOURCE table with columns s.CASH , s.ACC_ID and s.ADDITIONAL_NUM and a TARGET table with column t.ACCT_NUM in my SSIS package. Here is the mapping logic -
If s.CASH > 0 , map s.ACC_ID to t.ACCT_NUM
else map s.ADDITIONAL_NUM to t.ACCT_NUM.
If s.ADDITIONAL_NUM is empty, then t.ACCT_NUM = null
How can implement it in SSIS?

#billinkc - Thanks for your suggestion. I chose to create a single derived column and applied following condition -
(CASH > 0) ? [ACC_ID] : ([ADDITIONAL_NM]=="" ? ( DT_WSTR,255) NULL(DT_WSTR,255) : [ADDITIONAL_NM])

Related

How to store a value from source into a parameter and use it in data flow transformations?

I have a source table which just has one row:
So i stored the value from Values_per_Country into a parameter:
I want to use this parameter into my SELECT transformation(schema modifier),
but this error comes up:
Is there a way around this,so i can use values from the source tables?
You can create a Lookup activity to get the column values of source table. And then pass to the parameter in Data Flow. Finally, your expression type == 'double' && position > 0 && position <= $parameter3 will work.
Screenshot:
Expression in the below image: #activity('Lookup1').output.firstRow['Values_per_Country']

how to delete data from a delta file in databricks?

I want to delete data from a delta file in databricks.
Im using these commands
Ex:
PR=spark.read.format('delta').options(header=True).load('/mnt/landing/Base_Tables/EventHistory/')
PR.write.format("delta").mode('overwrite').saveAsTable('PR')
spark.sql('delete from PR where PR_Number=4600')
This is deleting data from the table but not from the actual delta file. And i want to delete the data in the file without using merge operation, because the join condition is not matching. Can anyone please help me in resolving this issue.
Thanks
Please do remember : Subqueries are not supported in the DELETE in Delta.
Issue Link : https://github.com/delta-io/delta/issues/730
From the documentation itself , an Alternative is as follows
For Example :
DELETE FROM tdelta.productreferencedby_delta
WHERE id IN (SELECT KEY
FROM delta.productreferencedby_delta_dup_keys)
AND srcloaddate <= '2020-04-15'
Can be written as below in case of DELTA
MERGE INTO delta.productreferencedby_delta AS d
using (SELECT KEY FROM tdatamodel_delta.productreferencedby_delta_dup_keys) AS k
ON d.id = k.KEY
AND d.srcloaddate <= '2020-04-15'
WHEN MATCHED THEN DELETE
Using Spark SQL functions in python would be:
dt_path = "/mnt/landing/Base_Tables/EventHistory/"
my_dt = DeltaTable.forPath(spark, dt_path)
seq_keys = ["4600"] // You could add here as many as you want
my_dt.delete(col("PR_Number").isin(seq_keys))
And in scala:
val dt_path = "/mnt/landing/Base_Tables/EventHistory/"
val my_dt : DeltaTable = DeltaTable.forPath(spark, dt_path)
val seq_keys = Seq("4600") // You could add here as many as you want
my_dt.delete(col("PR_Number").isin(seq_keys:_*))
You can remove data that matches a predicate from a Delta table
https://docs.delta.io/latest/delta-update.html#delete-from-a-table
It worked like
delete from delta.`/mnt/landing/Base_Tables/EventHistory/` where PR_Number=4600

data processing in pig , with tab separate

I am very new to Pig , so facing some issues while trying to perform very basic processing in Pig.
1- Load that file using Pig
2- Write a processing logic to filter records based on Date , for example the lines have 2 columns col_1 and col_2 ( assume the columns are chararray ) and I need to get only the records which are having 1 day difference between col_1 and col_2.
3- Finally store that filtered record in Hive table .
Input file ( tab separated ) :-
2016-01-01T16:31:40.000+01:00 2016-01-02T16:31:40.000+01:00
2017-01-01T16:31:40.000+01:00 2017-01-02T16:31:40.000+01:00
When I try
A = LOAD '/user/inp.txt' USING PigStorage('\t') as (col_1:chararray,col_2:chararray);
The result I am getting like below :-
DUMP A;
(,2016-01-03T19:28:58.000+01:00,2016-01-02T16:31:40.000+01:00)
(,2017-01-03T19:28:58.000+01:00,2017-01-02T16:31:40.000+01:00)
Not sure Why ?
Please can some one help me in this how to parse tab separated file and how to covert that chararray to Date and filter based on Day difference ?
Thanks
Convert the columns to datetime object using ToDate and use DaysBetween.This should give the difference and if the difference == 1 then filter.Finally load it hive.
A = LOAD '/user/inp.txt' USING PigStorage('\t') as (col_1:chararray,col_2:chararray);
B = FOREACH A GENERATE DaysBetween(ToDate(col_1,'yyyy-MM-dd HH:mm:ss'),ToDate(col_2,'yyyy-MM-dd HH:mm:ss')) as day_diff;
C = FILTER B BY (day_diff == 1);
STORE C INTO 'your_hive_partition' USING org.apache.hive.hcatalog.pig.HCatStorer();

NHibernate native SQL in WHERE clause

I have a Geography column in a table in SQL Server and would like to filter rows with a specific geometry type, e.g. all records where geometry type is 'Point'
The SQL query would look like
select * from GeometryTable g where g.Geography.STGeometryType() = 'Point'
How can I create a criteria for that? The criteria is going to be used with other criterias
criteria.Add(Restrictions.Add(<Geography.STGeometryType()>, some.Value)
Thanks
Use this syntax:
var criteria = session.CreateCriteria<Geometry>();
criteria.Add
(
Expression.Sql(" {alias}.[Geography].STGeometryType() = ? "
, "Point" // a place for your parameter
, NHibernate.NHibernateUtil.String)
);
var list = criteria.List<Geometry>();

How to make Linq to SQL translate to a derived column?

I have a table with a 'Wav' column that is of type 'VARBINARY(max)' (storing a wav file) and would like to be able to check if there is a wav from Linq to SQL.
My first approach was to do the following in Linq:
var result = from row in dc.Table
select new { NoWav = row.Wav != null };
The problem with the code above is it will retreive all the binary content to RAM, and this isn't good (slow and memory hungry).
Any idea how to have Linq query to translate into something like bellow in SQL?
SELECT (CASE WHEN Wav IS NULL THEN 1 ELSE 0 END) As NoWav FROM [Update]
Thanks for all the replies. They all make sense. Indeed, Linq should translate the != null correctly, but it didn't seem to effectively do it: running my code was very slow, so somehow my only explaination is that it got the binary data transfered over to the RAM.... but maybe I'm wrong.
I think I found a work around anyway somewhere else on stackoverflow: Create a computed column on a datetime
I ran the following query against my table:
ALTER TABLE [Table]
ADD WavIsNull AS (CASE WHEN [Wav] IS NULL Then (1) ELSE (0) END)
Now I'll update my DBML to reflect that computed column and see how it goes.
Are you sure that this code will retrieve the data to RAM?
I did some testing using LINQPad and the generated SQL was optimized as you suggest:
from c in Categories
select new
{
Description = c.Description != null
}
SELECT
(CASE
WHEN [t0].[description] IS NOT NULL THEN 1
ELSE 0
END) AS [Description]
FROM [Category] AS [t0]
What about this query:
var result = from row in dc.Table where row.Wav == null
select row.PrimaryKey
for a list of keys where your value is null. For listing of null/not null you could do this:
var result = from row in db.Table
select new
{ Key = row.Key, NoWav = (row.Wav == null ? true : false) };
That will generate SQL code similar to this:
SELECT [t0].[WavID] AS [Key],
(CASE
WHEN [t0].[Wav] IS NULL THEN 1
ELSE 0
END) AS [NoWav]
FROM [tblWave] AS [t0]
I'm not clear here, your SQL code is going to return a list of 1s and 0s from your database. Is that what you are looking for? If you have an ID for your record then you could just retrieve that single record with the a condition on the Wav field, null return would indicate no wav, i.e.
var result = from row in dc.Table
where (row.ID == id) && (row.Wav != null)
select new { row.Wav };