How can we implement First() function used in Informatica in SQL? - sql

I have an aggregate transformation in Informatica where Description1 column=First(Description).
I want to implement the same in SQL query.Can anyone suggest how to do this?
Sample Dataset
Table name-ABC
Name Expression
ID ID
DESCRIPTION
DESCRIPTION1 FIRST(DESCRIPTION1)
INSERT_DATE
INSERT_DATE1 FIRST(INSERT_DATE)
RANK
RANK1 FIRST(RANK)

Please use below query,
select max(Description1) from Router_Transform;
If you are using sorter transformation in your Mapping, please use order by clause,
select max(Description1) from Router_Transform order by column_name;

If you want the row with the smallest id, then you can sort the resultset and retain just one row. In standard SQL, you would typically use a row-limiting clause for this:
select t.*
from mytable
order by id
fetch first row only
Note that all databases support this syntax (but almost all have alternatives for that).
On the other hand, if you want to add more columns to each row that display the "first" value for each column, then you would use window function first_value():
select
t.*,
first_value(description) over(order by id) description1,
first_value(insert_date) over(order by id) insert_date1,
first_value(rank) over(order by id) rank1
from mytable

Related

Why is order by required inside OVER when using LEAD in SQL?

SELECT seller_name, sale_value,
LEAD(sale_value) OVER(ORDER BY sale_value) as next_sale_value
FROM sale
ORDER BY sale_value
Am I right to understand that lead must compulsorily have over(order by..) because the SELECT is executed before the final ORDER BY statement?
The ORDER BY is required in the OVER clause, not in the outer query. So this is fine:
SELECT seller_name, sale_value,
LEAD(sale_value) OVER (ORDER BY sale_value) as next_sale_value
FROM sale;
However, the results may be in any arbitrary order.
Why does LEAD() require the ORDER BY? Well the definition of LEAD() is to pull the value from the "next" row. However, SQL tables represent unordered (multi)sets. There is no next row unless a column or expression defines it -- and that is what the OVER ( . . . ORDER BY) defines.

Copy column values using a partition by statement in BigQuery

In BigQuery, I am trying to copy column values into other rows using a PARTITION BY statement anchored to a particular ID number.
Here is an example:
Right now, I am trying to use:
MIN(col_a) OVER (PARTITION BY CAST(id AS STRING), date ORDER BY date) AS col_b
It doesn't seem like the aggregate function is working properly. As in, the "col_b" still has null values when I try this method. Am I misunderstanding how aggregate functions work?
You can use this:
MIN(col_a) OVER (PARTITION BY id) AS col_b
If you have one value per id, this will return that value.
Note that converting id to a string is unnecessary. Also, you don't need a cumulative minimum, hence no ORDER BY.
Another option using coalesce
select *, coalesce(col_a, (select min(col_a) from my_table b where a.id=b.id)) col_b
from my_table a;
DEMO

How to use DISTINCT used while selecting all columns including sequence number column?

My query is to avoid duplicate in a particular column while selecting all columns. But DISTINCT is not working since seq.number column is also being selected.
Any idea to make the query work
In the below example query seq_num is unique key.
Edit: including sample data in picture
select DISTINCT(name), seq_num from table_1;![enter image description here](https://i.stack.imgur.com/Y3NYn.jpg)
For two columns this query will be enough:
SELECT name, min(seq_num)
FROM table
GROUP BY name
For more column, use row_number analytic functon
SELECT name, col1, col2, .... col500, seq_num
FROM (
SELECT t.*, row_number() over (partition by name order by seq_num ) As rn
FROM table t
)
WHERE rn = 1
The above queries pick only one row with a given name and the smallest seq_num value for each name.
You cannot do what you want. Please read more about DISTINCT clause and query result set. You will understand that distinct is not suitable for your issue. If you provide some sample data for what you have and what should query show, when possible we will help you.

selecting first result from output of a subquery

i want to select first and last outcome from a subquery in oracle.
i cant use "rownum" since i am using "order by" which completely changes the sequence of "rownum".
pls suggest some solutions.
thanx fr help.
Use keep if you have an aggregation query. That is what it is designed for. It looks something like this:
select x,
max(outcome) keep (dense_rank first order by datetime asc) as first_outcome,
max(outcome) keep (dense_rank first order by datetime desc) as last_outcome,
from t
group by x;
Use first_value() and last_value() if there is no aggregation:
select t.*,
first_value(outcome) over (partition by x order by datetime) as first_outcome,
last_value(outcome) over (partition by x order by datetime) as last_outcome
from t;
You can't use "rownum" because you want both the first and the last values - otherwise you could use rownum by putting your code in a subquery and selecting from it and filtering by rownum in the outer query. As it is, you need to use ROW_NUMBER() analytic function and such (both with order by ... and with order by ... desc, so you can get both the first and the last outcome in one single outer query.
If ties are possible you may prefer DENSE_RANK to get all rows tied for first (or for last); instead, ROW_NUMBER() will return "one of" the rows tied for first (or for last); which one, specifically, is random.
If you want to see an example, provide sample data for your problem.
I solved this by using ROW_NUMBER() function with OVER(order by..).

Return only the newest rows from a BigQuery table with a duplicate items

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.
I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.
I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.
How do I make sure I get the value for title, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).