Copy column values using a partition by statement in BigQuery - sql

In BigQuery, I am trying to copy column values into other rows using a PARTITION BY statement anchored to a particular ID number.
Here is an example:
Right now, I am trying to use:
MIN(col_a) OVER (PARTITION BY CAST(id AS STRING), date ORDER BY date) AS col_b
It doesn't seem like the aggregate function is working properly. As in, the "col_b" still has null values when I try this method. Am I misunderstanding how aggregate functions work?

You can use this:
MIN(col_a) OVER (PARTITION BY id) AS col_b
If you have one value per id, this will return that value.
Note that converting id to a string is unnecessary. Also, you don't need a cumulative minimum, hence no ORDER BY.

Another option using coalesce
select *, coalesce(col_a, (select min(col_a) from my_table b where a.id=b.id)) col_b
from my_table a;
DEMO

Related

How can we implement First() function used in Informatica in SQL?

I have an aggregate transformation in Informatica where Description1 column=First(Description).
I want to implement the same in SQL query.Can anyone suggest how to do this?
Sample Dataset
Table name-ABC
Name Expression
ID ID
DESCRIPTION
DESCRIPTION1 FIRST(DESCRIPTION1)
INSERT_DATE
INSERT_DATE1 FIRST(INSERT_DATE)
RANK
RANK1 FIRST(RANK)
Please use below query,
select max(Description1) from Router_Transform;
If you are using sorter transformation in your Mapping, please use order by clause,
select max(Description1) from Router_Transform order by column_name;
If you want the row with the smallest id, then you can sort the resultset and retain just one row. In standard SQL, you would typically use a row-limiting clause for this:
select t.*
from mytable
order by id
fetch first row only
Note that all databases support this syntax (but almost all have alternatives for that).
On the other hand, if you want to add more columns to each row that display the "first" value for each column, then you would use window function first_value():
select
t.*,
first_value(description) over(order by id) description1,
first_value(insert_date) over(order by id) insert_date1,
first_value(rank) over(order by id) rank1
from mytable

How to use DISTINCT used while selecting all columns including sequence number column?

My query is to avoid duplicate in a particular column while selecting all columns. But DISTINCT is not working since seq.number column is also being selected.
Any idea to make the query work
In the below example query seq_num is unique key.
Edit: including sample data in picture
select DISTINCT(name), seq_num from table_1;![enter image description here](https://i.stack.imgur.com/Y3NYn.jpg)
For two columns this query will be enough:
SELECT name, min(seq_num)
FROM table
GROUP BY name
For more column, use row_number analytic functon
SELECT name, col1, col2, .... col500, seq_num
FROM (
SELECT t.*, row_number() over (partition by name order by seq_num ) As rn
FROM table t
)
WHERE rn = 1
The above queries pick only one row with a given name and the smallest seq_num value for each name.
You cannot do what you want. Please read more about DISTINCT clause and query result set. You will understand that distinct is not suitable for your issue. If you provide some sample data for what you have and what should query show, when possible we will help you.

Selecting distinct values from database

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

Return only the newest rows from a BigQuery table with a duplicate items

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.
I'd like to do a select * from the table, but only return one row with the same id – the most recently requested.
I've looked into group by id but then I need to do an aggregate for each column. This is easy with requested_at – max(requested_at) as requested_at – but the others are tough.
How do I make sure I get the value for title, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
Try something like this:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY <id_column>
ORDER BY <timestamp column> DESC)
row_number,
FROM <table>
)
WHERE row_number = 1
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.
In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.
And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

Populate the maximum value without a join oracle

I have a table with 3 columns VALUE1, step,date with values as given below....Now i want a view with a 4th column like below picture. for example, the maximum value of date for value1 '1' is 13.3.2014 and its corresponding step is C...So, the value of max(step-date) for '1' should be 'C' and so on. I want to do this without performing a join on the table itself. Hope am clear in my requirement. Thanks in advance.
You want to use analytical functions:
first_value(step) over (partition by value1 order by date_ desc)
first_value (step) tells Oracle that you want to get the first value of a list of steps. The elements and order of these is speciefied in the paranthesis after the over clause.
The "lists" are created with the partition by value1. Since there are two different value1's, two lists are created. The list belonging to value1 constists of the elements A, B and C, the list belonging to value2 of the elements A and B. These lists are ordered with the order by date_ desc clause.
Then, Oracle can "return" the first element of these lists.
See also this SQL fiddle
You want the last value based on date. You can do that with Oracle using an analytic function:
select Value1, step, date,
max(step) keep (dense_rank last order by date) over (partition by value1) as maxval
from table t;
The important part here is the keep part and the part after that. The keep (dense_rank last order by date) says to get the last value by date. The over (partition by value1) says to do that within groups where value1 has the same value.
Sigh, they beat me to the punch....
Here's my SQL Fiddle