SQL to label rows in a table

SQL to label rows in a table - sql

I have this table t1 in databrick as below
Can help me to write a query to get this result:
sort the table data by row_num by descending first, then look at "Event Label" column, if "Event Label" is "Hire" then Result is 0, the result is 0 for the next rows until encountering the "Event label" as "Rehire", then the result is increased by 1 to be 1, and 1 for the following rows until encountering the "Event label" as "Rehire", then the result is increased by 1 again to be 2, and so on, repeating the same process.
I tried some way but no luck.

In pure SQL, a cumulative sum would be sufficient...
SELECT
*,
SUM(
CASE WHEN [event label] = 'Rehire' THEN 1 ELSE 0 END
)
OVER (
PARTITION BY user_id
ORDER BY row_id
)
AS Result
FROM
t1

Related

Remove the duplicate rows based on Presence of Number of NULL values in a row

I was able to remove the duplicate rows, but I would like to remove the duplicate rows based on one more constraint. I want to keep only a row with a smaller number of NULL values.
Original Table
Ran the SQL Server Query
WITH CTE AS(
SELECT *,
RN = ROW_NUMBER()OVER(PARTITION BY Premise_ID ORDER BY Premise_ID)
FROM sde.Premise_Test
)
DELETE FROM CTE WHERE RN > 1
Result:
But I want to get this result
I have modified the SQL script as per the comment from Aaron. but the result is still the same. DB fiddle is showing NULL from IS NULL getting highlighted.

Update the ROW_NUMBER() function like this (no, there is no shorter way):
RN = ROW_NUMBER() OVER (
PARTITION BY Premise_ID
ORDER BY Premise_ID,
CASE WHEN Division IS NULL THEN 1 ELSE 0 END
+ CASE WHEN InstallationType IS NULL THEN 1 ELSE 0 END
+ CASE WHEN OtherColumn IS NULL THEN 1 ELSE 0 END
...
)

SQL: How to get max over column but exclude certain values?

I have a numeric column and all I want is the maximum value in that column that does NOT exceed a certain number. I am doing this along with a group by statement, using the MAX function.
So basically if for each group, the column is 1, 2, 3, 4, 5, and I want the maximum that does not exceed 4, then in this case, the maximum for this group would be 4.
However, if the column equals 5, 6, 7, 8, then since all values exceed 4, I … actually don't care, this won't end up being displayed, so just return anything.
How do I do this? Using SQL/Oracle.

You can use conditional aggregation as follows:
Select case when count(case when col > 4 then 1 end) = count(*)
then max(col)
else max(case when col <= 4 then col end)
end as res_
From your_table t

select max(case when col>4 then 0 else col end) from table1

If that is all that the query needs to do, it is best to filter the rows with col > 4 before aggregation, in a where clause. This will reduce the amount of work done by the aggregation itself, which is the most expensive operation in the whole query.
As a side effect, "groups" where all values in the column are > 4 will not be included at all in the output. For some reporting tasks this would be a problem, but you said in your case you wouldn't show anything in the output for those groups anyway.
So, you could do something like this:
select agg_col1, agg_col2, ..., max(col) as max_col_up_to_4
from your_table
WHERE col <= 4 -- DO THE FILTERING HERE!
group by agg_col1, agg_col2, ...
;
(Here agg_col1, agg_col2, ... are, obviously, the columns by which you group for your aggregation.)

Finding the Nth previous row based on criteria - dynamic LAG

I have this SQL challenge
I have a table which looks like that (I've just got a message from the site saying I can't post the image right here, so please use the link)
The challenge is to identify the nearest N-th previous row where the value of NEW_BATCH_FLAG = 1 and to "spread" the value of "date of current transaction" of THAT row for all the following rows where NEW_BATCH_FLAG = 0 (till we'll meet the next row with NEW_BATCH_FLAG = 1 )
As you can (hopefully :)) see in the image, the first row has NEW_BATCH_FLAG = 1 so I should "spread" it's date (1-jun-2020) for the 2 next rows where NEW_BATCH_FLAG = 0, but the problem is that for the first one i have to go one row back, and for the second one - 2 rows back.
So the challenge is to calculate for each given row - how many rows I have to go back till I hit the nearest NEW_BATCH_FLAG = 1 row.
This "distance" will be further used in the LAG function as the argument for offset.

One method is to use a scalar sub-query, but it won't be very efficient:
select t1.transaction_date,
t1.new_batch_flag,
case new_batch_flag
when 1 then t1.transaction_date
else (select max(t2.transaction_date)
from the_table t2
where t2.new_batch_flag = 1
and t2.transaction_date < t1.transaction_date)
end as start_date_of_batch
from the_table t1;
Online example
With Postgres I would do it like this:
select transaction_date,
new_batch_flag,
case new_batch_flag
when 1 then transaction_date
else max(transaction_date) filter (where new_batch_flag = 1) over (order by transaction_date)
end as start_of_batch
from the_table;
Online example

I think you just want a cumulative max with some conditional logic:
select t.*,
max(case when new_batch_flag = 1 then transaction_date end) over (order by transaction_date) as start_date_of_batch
from t;

Create a new table with columns with case statements and max function

I have some problems in creating a new table from an old one with new columns defined by case statements.
I need to add to a new table three columns, where I compute the maximum based on different conditions. Specifically,
if time is between 1 and 3, I define a variable max_var_1_3 as max((-1)*var),
if time is between 1 and 6, I define a variable max_var_1_6 as max((-1)*var),
if time is between 1 and 12, I define a variable max_var_1_12 as max((-1)*var),
The max function needs to take the maximum value of the variable var in the window between 1 and 3, 1 and 6, 1 and 12 respectively.
I wrote this
create table new as(
select t1.*,
(case when time between 1 and 3 then MAX((-1)*var)
else var
end) as max_var_1_3,
(case when time between 1 and 6 then MAX((-1)*var)
else var
end) as max_var_1_6,
(case when time between 1 and 12 then MAX((-1)*var)
else var
end) as max_var_1_12
from old_table t1
group by time
) with data primary index time
but unfortunately it is not working. The old_table has already some columns, and I would like to import all of them and then compare the old table with the new one. I got an error that says that should be something between ) and ',', but I cannot understand what. I am using Teradata SQL.
Could you please help me?
Many thanks

The problem is that you have GROUP BY time in your query while trying to return all the other values with your SELECT t1.*. To make your query work as-is, you'd need to add each column from t1.* to your GROUP BY clause.
If you want to find the MAX value within the different time ranges AND also return all the rows, then you can use a window function. Something like this:
CREATE TABLE new AS (
SELECT
t1.*,
CASE
WHEN t1.time BETWEEN 1 AND 3 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 3 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_3,
CASE
WHEN t1.time BETWEEN 1 AND 6 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 6 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_6,
CASE
WHEN t1.time BETWEEN 1 AND 12 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 12 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_12,
FROM old_table t1
) WITH DATA PRIMARY INDEX (time)
;
Here's the logic:
check if a row falls in the range
if it does, return the desired MAX value for rows in that range
otherwise, just return that given row's default value (var)
return all rows along with the three new columns
If you have performance issues, you could also move the max_var calculations to a CTE, since they only need to be calculated once. Also to avoid confusion, you may want to explicitly specify the values in your SELECT instead of using t1.*.
I don't have a TD system to test, but try it out and see if that works.

I cannot help with the CREATE TABLE AS, but the query you want is this:
SELECT
t.*,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 3) AS max_var_1_3,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 6) AS max_var_1_6,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 12) AS max_var_1_12
FROM old_table t;

MS Access SQL Compare Consecutive Rows

Hi I have a table name "T1' with a column name "Con". I need to find Unique and Repeat through additional columns as Unique and Repeat comparing consicutive rows. If row match with immediate row then repeat column will show "1" or else "0" and if row doesn't match with immediate row then Unique column will show "1" or else "0".

Given your sample data, you don't need to look at the "next" row. This logic does what you want:
select t1.con,
iif(cnt = 1, 1, 0) as is_unique,
iif(cnt > 1, 1, 0) as is_repeat
from t1 inner join
(select t1.con, count(*) as cnt
from t1
group by t1.con
) as tt1
on t1.con = tt1.con;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL to label rows in a table - sql

In pure SQL, a cumulative sum would be sufficient... SELECT *, SUM( CASE WHEN [event label] = 'Rehire' THEN 1 ELSE 0 END ) OVER ( PARTITION BY user_id ORDER BY row_id ) AS Result FROM t1

Related

Remove the duplicate rows based on Presence of Number of NULL values in a row

SQL: How to get max over column but exclude certain values?

Finding the Nth previous row based on criteria - dynamic LAG

Create a new table with columns with case statements and max function

MS Access SQL Compare Consecutive Rows

Categories

Resources