ordering string before concatenation in hive - hive

We have 3 columns in a table. Id, timestamp and comments. one Id can have several comments associated with it. We have a requirement to pick up top three comments based on timestamp in desc format, which we have done by using Rank() function. Next requirement was to concatenate those 3 top comments against an ID with pipe separation. We have used concat_ws to achieve this. However, we see that those comments are not getting concatenated with desc order of timestamp. They are getting concatenated randomly.
is there a way to do the concatenation is the same order for desc timestamp order without using a custom udf?

Use row_number() over (partition by id order by timestamp desc) and give a number to top three comments for each id. Then write query to concatenate comments based on rno you gave earlier.

Related

What else do I need to add to my SQL query to bring related information in other columns if using MIN() GROUP BY

There is a table with the following column headers: indi_cod, ries_cod, date, time and level. Each ries_cod contains more than one indi_cod, and these indi_cod are random consecutive numbers.
Which SQL query would be appropriate to build if the aim is to find the smallest ID of each ries_cod, and at the same time bring its related information corresponding to date, time and level?
I tried the following query:
SELECT MIN (indi_cod) AS min_indi_cod
FROM my-project-01-354113.indi_cod.second_step
GROUP BY ries_cod
ORDER BY ries_cod
And, indeed, it presented me with the minimum value of indi_cod for each group of ries_cod, but I couldn't write the appropriate query to bring me the information from the date, time and level columns corresponding to each indi_cod.
I usually use some kind of ranking for this type of thing. you can use row_number, rank, or dense_rank depending on your rdbms. here is an example.
with t as(select a.*,
row_number() over (partition by ries_cod, order by indi_cod) as rn
from mytable)
select * from t where rn = 1
in addition if you are using oracle you can do this without two queries by using keep.
https://renenyffenegger.ch/notes/development/databases/SQL/select/group-by/keep-dense_rank/index
I think you just need to group by with the other columns
SELECT MIN (indi_cod), ries_cod, date, time, level AS min_indi_cod
FROM mytavke p
GROUP BY ries_cod, date, time, level
ORDER BY ries_cod

Redshift Postgres 8

I'm trying to write a query to solve a logical problem using Redshift Postgres 8.
Input column is a bunch of Order IDs and Step Group IDs and desired output is basically a sequence of the IDs as you can see in the screenshot.
If you could help me answer this question, that would be great, thanks!
This is a follow up question from
SQL Server - logical question - Get rank from IDs
Assuming you want to partition the table by the first two columns and then to number the rows in each partition, you can use this query:
SELECT *, ROW_NUMBER() OVER (PARTITION BY order_item_id, id ORDER BY order_item_id, id) FROM table;

Selecting distinct values from database

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

How to read the maximum date in this SQL query?

Below is the image of the query result. I want to show Tucson/Boulder only once based on maximum 'addressvalidfrom'. How can I create/modify the query?
If you do not want to use grouping (to persist the rest of the query) you can add a ROW_NUMBER column and filter it where it is 1.
Example
SELECT * FROM
( -- insert your query here with new line below in the select fields
, ROW_NUMBER() OVER (PARTITION BY CUST_RETAIL_CHANNEL_NAME ORDER BY addressvalidfrom DESC) AS Rnk
) D
WHERE D.Rnk=1
use a max for the addressvalidfrom field, and a group by for the other fields.
I can show you if you post the actual query.
http://www.w3schools.com/sql/sql_groupby.asp
where the aggregate is your max(addressvalidfrom)
Can you also post what you want to get as a result if possible.

SQL select first records of rows for specific column

I realize my title probably doesnt explain my situation very well, but I honestly have no idea how to word this.
I am using SQL to access a DB2 database.
Using my screenshot image 1 below as a reference:
column 1 has three instances of "U11124", with three different descriptions (column 2)
I would like this query to return the first instance of "U11124" and its description, but then also unique records for the other rows. image 2 shows my desired result.
image 1
image 2
----- EDIT ----
to answer some of the questions / posts:
technically, it does not need to be the first , just any single one of those records. the problem is that we have three descriptions, and only one needs to be shown, i am now told it does not matter which one.
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;
In SQL Server:
select stvnst, stdesc
from (
select
stvnst, stdesc
row_number() over (order by stdesc partition by stvnst) row
from table
) a
where row = 1
This method has an advantage over a simple group by, in that it will also work when there's more than two columns in the table.
SELECT STVNST,FIRST(STDESC) from table group by STVNST ORDER BY what_you_want_first
All you need to do is use GROUP BY.
You say you want the first instance of the STDESC column? Well you can't guarntee the order of the rows without another column, however if you want to order by the highest ordered value the following will suffice:
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;