Sequence generation based on a conditon using Informatica - sql

I need to achieve following data transformation using Informatica,
The first picture is the sample input data
The data after transformations should look like as below,
Here for the different type I have the sequence staring from 200 in this case bfd should be considered as one group and (klm,kln) together as other group. The new id is id+the sequence number
Should I do this using UDF or by creating a procedure.. or using some sets of transformations..
I am new to informatica and am confused about what approach should I follow..
Thanks for the help in advance!

you can do this using two separate sequence transformation.
first one will start from 0 and increment by 1.
second one will start from 200 and increment by 1.
Then use an IIF condition to generate new_id column. if you have few different type of sequence, you can do it with IIF. But if you have hundreds, probably you need to use some UDF etc.
seq_1 = attach NEXTVAL from sequence generator 1
seq_2 = attach NEXTVAL from sequence generator 2
new_id = TO_INTEGER ( IIF( group ='bfd', id || seq_1,
IIF( group = 'klm' or group ='kln', id || seq_2)
)
)

You can write a query to do this:
select t.*,
(case when group = 'bfd'
then id * 10 + row_number() over (partition by group order by id)
when group in ('klm', 'kln')
then id * 100 + row_number() over (partition by case when group in ('klm', 'kln') then 1 else 2 end order by id)
end) as new_id
from t;

Related

Postgresql query to check string break

I have a problem to perform a search for numerical sequence break in postgresql, I need that, for example, if my seq column is out of sequence I will be notified by email, for example my sequence is: 340,341,342 if it is: 340 342,343 this with a sequence failure.
You can use next query to find anomalies:
select count(*) from (
select id - (lag(id) over (order by id)) diff from tbl
) t where diff is not null and diff <> 1;
When result of the query not 0 anomaly found
SQL online editor

How to force output to show only 1 row if multiple values appear

I have the following query:
SELECT
item_code,
description_global,
dept_item || class1_item AS division,
view_year_code AS year,
ssn AS season
FROM
us_raw.l_ims_sum_code_master
WHERE
ssn IN (3,4)
AND view_year_code = 9
AND description_global IS NOT NULL
AND item_code = 418251
Here's what it looks like in the output:
I only want my data to pull with one of those description_global values. I know I can just do LIMIT 1, but I need another way to do it because this is just one example, and I want my output to force only 1 row for each time there are multiple description_global values.
Thank you,
Z
You can use window functions:
SELECT *
FROM (SELECT item_code, description_global, dept_item || class1_item AS division,
view_year_code AS year, ssn AS season,
ROW_NUMBER() OVER (PARTITION BY item_code ORDER BY item_code) as seqnum
FROM us_raw.l_ims_sum_code_master
WHERE ssn IN (3,4) AND
view_year_code = 9 AND
description_global IS NOT NULL AND
item_code = 418251
) scm
WHERE seqnum = 1;
I was able to resolve this issue by using an Index in Tableau. Here are the steps for anyone that is using SQL+Tableau:
1.) Create a calculated field
2.) Simply enter Index()
3.) Click OK
4.) Drag the newly created field into your rows
5.) Right-click on the Index pill (whatever you had named it) and click compute on
6.) Select whichever field that has multiple values (item description for me)
7.) Right-click again and click filter
8.) Filter to only show 1s
9.) Remove from rows (if you don't want it on your report)
10.) Done
Best,
Z

Calculate Rank Pattern without any order value

My Data is like this -
You can check 3 columns, jil_equipment_id,req_group,operand.
Based on these 3 columns i have to generate a new "Patern" Column.
The patern column is a patern and starts from 2 and increases by 1 for each repeated combination of jil_equipment_id,req_group,operand.
The final data will look like this.
Please suggest me any possible approach. I am not able to use the RANK()/DENSE_RANK() Function on this.
You can use row_number(). You want to use the partition by as well:
select t.*,
(1 + row_number() over (partition by jil_equipment_id, req_group, operand
order by content_id
)
) as pattern
from t;
select *,Row_Number() over(partition by jil_equipment_id,req_group,operand order by jil_equipment_id,req_group,operand) + 1 as pattern
from tab
you can use row_number() function for this.

Sql -after group by I need to take rows with newest date

I need to write a query in sql and I can't do it correctly. I have a table with 7 columns 1st_num, 2nd_num, 3rd_num, opening_Date, Amount, code, cancel_Flag.
For every 1st_num, 2nd_num, 3rd_num I want to take only the record with the min (cancel_flag), and if there's more then 1 row so take the the newest opening Date.
But when I do group by and choose min and max for the relevant fields, I get a mix of the rows, for example:
1. 12,130,45678,2015-01-01,2005,333,0
2. 12,130,45678,2015-01-09,105,313,0
The result will be
:12,130,45678,2015-01-09,2005,333,0
and that mixes the rows into one
Microsoft sql server 2008 . using ssis by visual studio 2008
my code is :
SELECT
1st_num,
2nd_num,
3rd_num,
MAX(opening_date),
MAX (Amount),
code,
MIN(cancel_flag)
FROM do. tablename
GROUP BY
1st_num,
2nd_num,
3rd_num,
code
HAVING COUNT(*) > 1
How do I take the row with the max date or.min cancel flag as it is without mixing values?
I can't really post my code because of security reasons but I'm sure you can help.
thank you,
Oren
It is very difficult like this to answer, because every DBMS has different syntax.
Anyways, for most dbms this should work. Using row_number() function to rank the rows, and take only the first one by our definition (all your conditions):
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY t.1st_num,t.2nd_num,t.3rd_num order by t.cancel_flag asc,t.opening_date desc) as row_num
FROM YourTable t ) as tableTempName
WHERE row_num = 1
Use NOT EXISTS to return a row as long as no other row with same 1st_num, 2nd_num, 3rd_num has a lower cancel_flag value, or same cancel_flag but a higher opening_Date.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.1st_num = t1.1st_num
and t2.2nd_num = t1.2nd_num
and t2.3rd_num = t1.3rd_num
and (t2.cancel_flag < t1.cancel_flag
or (t2.cancel_flag = t1.cancel_flag and
t2.opening_Date > t1.opening_Date)))
Core ANSI SQL-99, expected to work with (almost) any dbms.

How to sum only the first row for each group in a result set

Ok, I will try to explain myself the best I can, but I have the following:
I have a datasource that basically is a dynamic query. The query in itself shows 3 fields, Name, Amount1, Amount2.
Now, I could have rows with the same Name. The idea is to make a sum of Amount1+Amount2 WHEN Name is distinct from the previous one I saved. If I would do this on C# it could be something like this:
foreach (DataRow dr in repDset.Dataset.Rows)
{
total = (long)dr["Amount1"] + (long)dr["Amount2"];
if (thisconditiontrue)
{
if (PreviousName == "" || PreviousName != dr["Name"].ToString())
{
TotalName = TotalName + total;
}
PreviousName = dr["Name"].ToString();
}
}
The idea is to grab this and make a Reporting Services expression using the methods RS can give me, for example:
IIF(Fields!Name.Value<>Previous(Fields!Name.Value),Fields!Amount1.Value + Fields!Amount2.Value,False)
Something like that but that stores the amount of the previous one.
Maybe creating another field? a calculated one?
I can clarify further and edit if needed.
*EDIT for visual clarification:
As an example, it is something like this:
This query is assuming you're working with SQL server. But you're going to need something to order the query results by otherwise how do you know which row is the first one?
SELECT SUM(NameTotal) AS Total
FROM (
SELECT Name, Amount1 + Amount2 AS NameTotal,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
) AS a
WHERE rowNum=1;
This uses the analytical window function ROW_NUMBER() to number each row and the PARTITION BY clause tells it to reset the numbering for every different value of Name in the result set. You do need a field that you can order the results by though or this won't work. If you really just want a random order you can do ORDER BY NEWID() but that will give you a non-deterministic result.
This syntax is particular to SQL server but it can usually be achieved in other databases.
If you're looking to display the output like you've shown in your example you could use two queries and reference the other one by passing it as the scope to an aggregate function in an SSRS expression like this:
=MAX(Fields!Total.Value, "TotalQueryDataset")
Where your dataset is called "TotalQueryDataset".
Otherwise you can achieve the output using pure SQL like this:
WITH nameTotals AS (
SELECT Name, Amount1, Amount2,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
)
SELECT Name, Amount1, Amount2
FROM nameTotals
UNION ALL
SELECT 'Total', SUM(Amount1 + Amount2), NULL
FROM nameTotals
WHERE rowNum=1;