informatica show the latest Status based on 2 attributes - sql

I need to show the latest Status based on 2 attributes (LAST_UPDATE and STAUS)
How can I do it in informatica? the source is flat file
Example:
NUMBER --------------------LAST_UPDATE ----------------- STATUS
-----1 -----------------------01/26/2015 ---------------------- CREATED
-----1 ---------------------- 01/27/2015 ------------------UNDER_PROCCESS
-----1---------------------- 01/28/2015 ---------------------COMPLETED
-----2---------------------- 01/28/2015 ------------------ CREATED
-----3---------------------- 01/28/2015 --------------------- UNDER_PROCCESS
Result should be
NUMBER --------------------LAST_UPDATE ------------- STATUS ---------------LAST_STAUS
-----1 -----------------------01/26/2015 ---------------------- CREATED -----------COMPLETED
-----1 ---------------------- 01/27/2015 -----------------UNDER_PROCCESS ---- COMPLETED
-----1---------------------- 01/28/2015 ---------------------COMPLETED ----------COMPLETED
-----2---------------------- 01/28/2015 ------------------ CREATED ---------------- CREATED
-----3---------------------- 01/28/2015 -------------UNDER_PROCCESS --UNDER_PROCCESS

You can either use an Aggregator transformation or do it in an Expression transformation using variable ports.
Using Aggregator
In a sorter transformation, sort on NUMBER and LAST_UPDATE, in ascending order
In aggregator group by on NUMBER. Optionally use the LAST function to get the latest status. By default Aggregator will output the value for last row for STATUS.
Use a joiner to join the output of Aggregator and Sorter.
SQ ----> Sorter -----> Agg----> Joiner ----> Target
|_____________________^
Using Expression
Sort the data on NUMBER (ascending) and LAST_UPDATE (descending)
In expression transformation:
NUMBER (i/o)
LAST_UPDATE (i/o)
STATUS (i/o)
v_LAST_STATUS (v) = IIF(STATUS<>v_PREV_STATUS, STATUS,v_PREV_STATUS)
LAST_STATUS (o) = v_LAST_STATUS
v_PREV_STATUS (v) = STATUS
Make sure the order of ports is correct.

Related

QlikView/SQL - If statement w/ IsNull

The logic for the problem is that I am attempting to resolve an issue where a certain field will return a null value and I would like to auto-generate a value for this field to that of another similar value given that its other relevant fields are the same.
Example (for both results):
*GradYear: 2018 ----
StudentName: Jake ----
*SchoolNumber: 54 ----
*StateCode: NA11 ----
CountyCode: MA02 ----
*SchoolName: Hillsburn ----
*GradYear: 2018 ----
StudentName: Sarah ----
*SchoolNumber: 54 ----
*StateCode: NA11 ----
CountyCode: NULL ----
*SchoolName: Hillsburn ----
As seen above, the CountCody for Sarah returns a null value. I am attempting to make it so that it will automatically fill the value for CountyCode, if the other similar values are the same between Students. (The necessary similar values being shown with a '*'.)
Also, I am attempting to solve this without using the "Previous" feature or hard-coded information so that it may be accomplished with any data.
My original attempt was to use a simple if/IsNull statement along with a Peek feature but the values persisted at returning a null value.
if((isnull(CountyCode)), Peek(CountyCode), CountyCode) as CountyCode
Any help with this would be greatly appreciated! Thank you in advance.
I would use applymap for this.
lets says the SchoolNumber is unique to CountyCode.
so first lets load our mapping table:
CountyCode_Map:
mapping load distinct SchoolNumber, CountyCode
from Data.qvd (qvd) where len(CountyCode)>0;
Now when loading you data use this for CountyCode:
applymap('CountyCode_Map',SchoolNumber) as CountyCode
in case that SchoolNumber is not unique to CountyCode you can use any other field or a concatenation of fields.
for more info on applymap : link

Custom order in DataTables (jQuery plug-in)

Each row in my dataset has an unique identifier. I want them to be ordered by my own custom order sequence. Here's an example:
I have my raw data:
ID Name
-------
1 Peter
2 John
3 Steve
And my order sequence, e.g. 3,1,2.
When I initialize the DataTable I want my entries to show up like this (according to my pre-computed order sequence):
ID Name
-------
3 Steve
1 Peter
2 John
Your code seems to work just fine. There were couple issues though.
RowReorder plugin requires order column in order to work correctly.
You need to handle reorder event row-reorder and change your URL hash accordingly.
Sorting on the top table needs to be disabled unless you want to handle order event and adjust URL hash accordingly.
See this jsFiddle for code and demonstration.

Selecting rows using multiple LIKE conditions from a table field

I created a table out of a CSV file which is produced by an external software.
Amongst the other fields, this table contains one field called "CustomID".
Each row on this table must be linked to a customer using the content of that field.
Every customer may have one or more set of customIDs at their own discretion, as long as each sequence starts with the same prefix.
So for example:
Customer 1 may use "cust1_n" and "cstm01_n" (where n is a number)
Customer 2 may use "customer2_n"
ImportedRows
PKID CustomID Description
---- --------------- --------------------------
1 cust1_001 Something
2 cust1_002 ...
3 cstm01_000001 ...
4 customer2_00001 ...
5 cstm01_000232 ...
..
Now I have created 2 support tables as follows:
Customers
PKID Name
---- --------------------
1 Customer 1
2 Customer 2
and
CustomIDs
PKID FKCustomerID SearchPattern
---- ------------ -------------
1 1 cust1_*
2 1 cstm01_*
3 2 customer2_*
What I need to achieve is the retrieval of all rows for a given customer using all the LIKE conditions found on the CustomIDs tables for that customer.
I have failed miserably so far.
Any clues, please?
Thanks in advance.
Silver.
To use LIKE you must replace the * with % in the pattern. Different dbms use different functions for string manipulation. Let's assume there is a REPLACE function available:
SELECT ir.*
FROM ImportedRows ir
JOIN CustomIDs c ON ir.CustomID LIKE REPLACE(c.SearchPattern, '*', '%')
WHERE c.FKCustomerID = 1;

SQL - ordering results by parent child

i have entries in my table of products and categories with columns id and parent.
lets say i have the following
0 ----- 0 ------ home
1 ----- 4 ------ PD1
2 ----- 0 ------ CAT1
3 ----- 2 ------ PD2
4 ----- 2 ------ CAT2
the fist col being the id, second being parent and a title at the end.
is there a way (using ORDER or some other method) of returning the results in the following order?
0 ----- 0 ------ home
2 ----- 0 ------ CAT1
3 ----- 2 ------ PD2
4 ----- 2 ------ CAT2
1 ----- 4 ------ PD1
Try this:
SELECT id, parent, title
FROM yourtable
ORDER BY parent, id
try this
SELECT * FROM yourtablename ORDER BY parentfieldname
could it be as simple as
ORDER BY ParentID, ID
Firstly, if you want to order in a custom way (not using PKs or alphabetic on a name field), you need to add a field to define the ordering weight of the various objects. I would add a field to the table called something like ordering_weight - you do not want to use the field name order or sequence b/c they are reserved SQL words.
Secondly, you need an order by clause: ORDER BY top_level.ordering_weight, next_level.ordering_weight, ..., deepest_level.ordering_weight Notice that my order by clause orders first by the highest level of my tree, and last by the lowest or deepest level of the tree.
Of course, disregard the above if all you are looking fo a dynamic level of recursion.
Generally when I see a parent child relationship like this I see people wanting to do more than 1 level of recursion. The problem with your schema is that It doesn't support dynamic levels of recursion as it is. You can only fetch the top level parent's children, every additional level requires another join (there are some clever ways to get past this, but they still require additional SQL per level).
I think what might be more useful to you is to look into the Nested Set Model, which allows querying of infinite levels of recursion.
see: http://en.wikipedia.org/wiki/Nested_set_model
For example the following tree of parent child relationships is extremely difficult using standard joins, but is very easy using a model like nested set.
Category A
- Category B
- - Category D
- Category E
Category F
- Category G
- - Category H
- - - Category I
- - - - Category J

SQL Alternative to performing an INNER JOIN on a single table

I have a large table (TokenFrequency) which has millions of rows in it. The TokenFrequency table that is structured like this:
Table - TokenFrequency
id - int, primary key
source - int, foreign key
token - char
count - int
My goal is to select all of the rows in which two sources have the same token in it. For example if my table looked like this:
id --- source --- token --- count
1 ------ 1 --------- dog ------- 1
2 ------ 2 --------- cat -------- 2
3 ------ 3 --------- cat -------- 2
4 ------ 4 --------- pig -------- 5
5 ------ 5 --------- zoo ------- 1
6 ------ 5 --------- cat -------- 1
7 ------ 5 --------- pig -------- 1
I would want a SQL query to give me source 1, source 2, and the sum of the counts. For example:
source1 --- source2 --- token --- count
---- 2 ----------- 3 --------- cat -------- 4
---- 2 ----------- 5 --------- cat -------- 3
---- 3 ----------- 5 --------- cat -------- 3
---- 4 ----------- 5 --------- pig -------- 6
I have a query that looks like this:
SELECT F.source AS source1, S.source AS source2, F.token,
(F.count + S.count) AS sum
FROM TokenFrequency F
INNER JOIN TokenFrequency S ON F.token = S.token
WHERE F.source <> S.source
This query works fine but the problems that I have with it are that:
I have a TokenFrequency table that has millions of rows and therefore need a faster alternative to obtain this result.
The current query that I have is giving duplicates. For example its selecting:
source1=2, source2=3, token=cat, count=4
source1=3, source2=2, token=cat, count=4
Which isn't too much of a problem but if there is a way to elimate those and in turn obtain a speed increase then it would be very useful
The main issue that I have is speed of the query with my current query it takes hours to complete. The INNER JOIN on a table to itself is what I believe to be the problem. Im sure there has to be a way to eliminate the inner join and get similar results just using one instance of the TokenFrequency table. The second problem that I mentioned might also promote a speed increase in the query.
I need a way to restructure this query to provide the same results in a faster, more efficient manner.
Thanks.
I'd need a little more info to diagnose the speed issue, but to remove the dups, add this to the WHERE:
AND F.source<S.source
Try this:
SELECT token, GROUP_CONCAT(source), SUM(count)
FROM TokenFrequency
GROUP BY token;
This should run a lot faster and also eliminate the duplicates. But the sources will be returned in a comma-separated list, so you'll have to explode that in your application.
You might also try creating a compound index over the columns token, source, count (in that order) and analyze with EXPLAIN to see if MySQL is smart enough to use it as a covering index for this query.
update: I seem to have misunderstood your question. You don't want the sum of counts per token, you want the sum of counts for every pair of sources for a given token.
I believe the inner join is the best solution for this. An important guideline for SQL is that if you need to calculate an expression with respect to two different rows, then you need to do a join.
However, one optimization technique that I mentioned above is to use a covering index so that all the columns you need are included in an index data structure. The benefit is that all your lookups are O(log n), and the query doesn't need to do a second I/O to read the physical row to get other columns.
In this case, you should create the covering index over columns token, source, count as I mentioned above. Also try to allocate enough cache space so that the index can be cached in memory.
If token isn't indexed, it certainly should be.