Redshift Postgres 8 - sql

I'm trying to write a query to solve a logical problem using Redshift Postgres 8.
Input column is a bunch of Order IDs and Step Group IDs and desired output is basically a sequence of the IDs as you can see in the screenshot.
If you could help me answer this question, that would be great, thanks!
This is a follow up question from
SQL Server - logical question - Get rank from IDs

Assuming you want to partition the table by the first two columns and then to number the rows in each partition, you can use this query:
SELECT *, ROW_NUMBER() OVER (PARTITION BY order_item_id, id ORDER BY order_item_id, id) FROM table;

Related

SQL Query - Rank showing only 1 rank for all records

I am trying to perform ranking based on some calculation of already existing columns. I tried using the SQL RANK() function however it is showing the result as 1 for all entries even if the value of the order by (score column) is different. Please see the details below:
qu_point and ti_points are calculated columns
score column is again a derived column, however, simply sum of two columns mentioned in point 1.
I have used the SQL query as follow:
use EFR_DB
GO
select d.serial, d.question_set_id, d.correct_answers, d.total_questions, d.time_taken_seconds, q.total_time_in_secs,
(cast(d.correct_answers as float)/d.total_questions) as qu_point, ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs) as ti_point,
(((cast(d.correct_answers as float)/d.total_questions)*2) + ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs)) as score,
rank() over (partition by d.question_set_id order by score)
from daily_quiz_record d join Question_set q
on q.question_set_id=d.question_set_id
Please help me how can I do the raking which is partitioned by question_set_id and ranked on the basis of the score.
Screenshot attached for your reference.
enter image description here
You can’t use an alias defined in the select clause in the same clause. I suppose that one of your table has a column called score, otherwise your query would error - so this existing column is being used for ordering instead of the computed value.
Since your expression is lengthy, it is simpler to turn the query to a subquery, and rank in the outer query:
select
t.*,
rank() over(partition by question_set_id order by score) rn
from (
-- your existing query (without rank)
) t

Deduplicate rows in complex schema in a bigquery partition

I have read some threads but I know too little sql to solve my problem.
I have a table with a complex schema with records and nested fields.
Below you see a query which finds the exact row that I need to deduplicate.
SELECT *
FROM my-data-project-214805.rfid_data.rfid_data_table
WHERE DATE(_PARTITIONTIME) = "2020-02-07"
AND DetectorDataMessage.Header.MessageID ='478993053'
DetectorDataMessage.Header.MessageID is supposed to be unique.
How can I delete one of these rows? (there are two)
If possible I would like deduplicate the whole table but its partitioned and I can't get it right. I try the suggestions in below threads but I get this error Column DetectorDataMessage of type STRUCT cannot be used in...
Threads of interest:
Deduplicate rows in a BigQuery partition
Delete duplicate rows from a BigQuery table
Any suggestions? Can you guide me in the right direction?
Try using a MERGE to remove the existing duplicate rows, and a single identical one. In this case I'm going for a specific date and id, as in the question:
MERGE `temp.many_random` t
USING (
# choose a single row to replace the duplicates
SELECT a.*
FROM (
SELECT ANY_VALUE(a) a
FROM `temp.many_random` a
WHERE DATE(_PARTITIONTIME)='2018-10-01'
AND DetectorDataMessage.Header.MessageID ='478993053'
GROUP BY _PARTITIONTIME, DetectorDataMessage.Header.MessageID
)
)
ON FALSE
WHEN NOT MATCHED BY SOURCE
# delete the duplicates
AND DATE(_PARTITIONTIME)='2018-10-01'
AND DetectorDataMessage.Header.MessageID ='478993053'
THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
Based on this answer:
Deduplicate rows in a BigQuery partition
If all of the values in the duplicate rows are the same, just use 'SELECT distinct'.
If not, I would use the ROW_NUMBER() function to create a rank for each unique index, and then just choose the first rank.
I don't know what your columns are, but here's an example:
WITH subquery as
(select MessageId
ROW_NUMBER() OVER(partition by MessageID order by MessageId ASC) AS rank
)
select *
from subquery
where rank = 1

Hive where clause on Nth field

I'm writing a simple select query in hive.
What i'm trying to do, is to filter by the value of the Nth (e.g. 5th) field, which I'm not sure what it's name will be, but I do know it's order.
Thanks!
I have no knowledge of Hive, but as I understood from this answer, it would be somewhat similar to SQL Server.
SELECT field FROM
(SELECT
field,
row_number() OVER (PARTITION BY [your order clause]) as row
FROM table_name
Order by [your order clause])
WHERE row = 5

How to read the maximum date in this SQL query?

Below is the image of the query result. I want to show Tucson/Boulder only once based on maximum 'addressvalidfrom'. How can I create/modify the query?
If you do not want to use grouping (to persist the rest of the query) you can add a ROW_NUMBER column and filter it where it is 1.
Example
SELECT * FROM
( -- insert your query here with new line below in the select fields
, ROW_NUMBER() OVER (PARTITION BY CUST_RETAIL_CHANNEL_NAME ORDER BY addressvalidfrom DESC) AS Rnk
) D
WHERE D.Rnk=1
use a max for the addressvalidfrom field, and a group by for the other fields.
I can show you if you post the actual query.
http://www.w3schools.com/sql/sql_groupby.asp
where the aggregate is your max(addressvalidfrom)
Can you also post what you want to get as a result if possible.

SQL select first records of rows for specific column

I realize my title probably doesnt explain my situation very well, but I honestly have no idea how to word this.
I am using SQL to access a DB2 database.
Using my screenshot image 1 below as a reference:
column 1 has three instances of "U11124", with three different descriptions (column 2)
I would like this query to return the first instance of "U11124" and its description, but then also unique records for the other rows. image 2 shows my desired result.
image 1
image 2
----- EDIT ----
to answer some of the questions / posts:
technically, it does not need to be the first , just any single one of those records. the problem is that we have three descriptions, and only one needs to be shown, i am now told it does not matter which one.
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;
In SQL Server:
select stvnst, stdesc
from (
select
stvnst, stdesc
row_number() over (order by stdesc partition by stvnst) row
from table
) a
where row = 1
This method has an advantage over a simple group by, in that it will also work when there's more than two columns in the table.
SELECT STVNST,FIRST(STDESC) from table group by STVNST ORDER BY what_you_want_first
All you need to do is use GROUP BY.
You say you want the first instance of the STDESC column? Well you can't guarntee the order of the rows without another column, however if you want to order by the highest ordered value the following will suffice:
SELECT STVNST, MAX(STDESC) FROM MY_TABLE GROUP BY STVNST;