Merge two rows in sql and create new columns - sql

My Input data (data in table is):
I have this requirement where I need to find first and last entry as well as first entered and last entered/modified.
When I have a query like (Not exact query - this query is BIG MESS, I am simplifying for sake of PoC):
Select field1, field2, min(EnteredDate), max(EnteredDate) from table where field = <fieldvalue> group by field1 AND EnteredBy
Don't ask me why we are grouping by "EnteredBy", this is the mess I am dealing and done by former team.
Right now the results are something like
Actually the output of the query should be like below:
Can someone please provide guidance on how I can achieve this? Thank you!
Expected output in text format:
Field1 Field2 First entered by First entered date Last entered By Last Entered Date
1 hello User1 3/9/2022 User2 3/11/2022
2 somenew User2 3/10/2022 User1 3/11/2022

To get the first / last rows' values with some specified ordering, use the window functions first_value / last_value
The window frame clause must be specified explicitly for LAST_VALUE to work since the default frame is RANGE UNBOUNDED PRECEDING AND CURRENT ROW, which always yields the current row for LAST_VALUE.
DISTINCT drops all duplicate rows in the following query and returns only 1 row for each combination of field1, field2
SELECT DISTINCT
field1
, field2
, FIRST_VALUE(entered_by) OVER w first_entered_by
, FIRST_VALUE(entered_date) OVER w first_entered_date
, LAST_VALUE(entered_by) OVER w last_entered_by
, LAST_VALUE(entered_date) OVER w last_entered_date
FROM mytable
WINDOW w AS (
PARTITION by field1, field2
ORDER BY entered_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY field1
Here's a DB Fiddle illustrating how to do it with sample data & set up code

Related

need to pull a specific record

There is 1 record having duplicate values except in 1 column having x and y
record status
XXXXXXXXXX A
XXXXXXXXXX B
Need to pull A only and remove the other duplicate B
Select record
case
when status in ("'a', 'b'") then ('a')
from xyz
Let suppose you have data as below where Status is repeating for First column
but you are interesting in the status which is of having lower value as given below:
In this case following SQL may help. Here, we are partitioning on key field and ordering the Status so that we can apply filter on rank to get desired result.
WITH sampleData AS
 (SELECT '1234' as Field1,  'A' as STATUS UNION ALL 
  SELECT '1234',  'C' UNION ALL
  SELECT '5678', 'A' UNION ALL 
  SELECT '5678',  'B' )
 select * except(rank) from (
 select *, rank() over (partition by Field1 order by STATUS ASC) rank from sampleData)
 where rank = 1
 order by Field1
Consider below approach
select * from sampledata
qualify 1 = row_number() over win
window win as (partition by field1 order by if(status='A',1,2) )
if applied to sample data in your question - output is

How to check max from range in cursor?

I have a problem with transferring an Excel formula to SQL. My excel formula is: =IF(P2<(MAX($P$2:P2));"Move";"").
The P column in excel is a sequence of numbers.
a | b
------
1
2
7
3 MOVE
4 MOVE
8
9
5 MOVE
10
You can find more example on this screenshot:
I created a cursor with a loop but I don't know how to check max from range.
For example when I iterate for fourth row, I have to check max from 1-4 row etc.
No need for a cursor and a loop. Assuming that you have a column that defines the ordering of the rows (say, id), you can use window functions:
select t.*,
case when a < max(a) over(order by id) then 'MOVE' end as b
from mytable t
One option would be using MAX() Analytic function . But in any case, you'd have an extra column such as id for ordering in order to determine the max value for the current row from the first row, since SQL statements represent unordered sets. If you have that id column with values ordered as in your sample data, then consider using
WITH t2 AS
(
SELECT MAX(a) OVER (ORDER BY id ROWS BETWEEN
UNBOUNDED PRECEDING
AND
CURRENT ROW) AS max_upto_this_row,
t.*
FROM t
)
SELECT a, CASE WHEN max_upto_this_row > a THEN 'Move' END AS b
FROM t2
ORDER BY id;
Demo

Find preceding and following rows for a matching row in BigQuery?

Is it possible to find rows preceding and following a matching rows in a BigQuery query? For example if I do:
select textPayload from logs.logs_20160709 where textPayload like "%something%"
and say that I get these results back:
something A
something B
How can I also show the 3 rows preceding and following the matching rows? Something like this:
some text 1
some text 2
some text 3
something A
some text 4
some text 5
some text 6
some text 90
some text 91
some text 92
something B
some text 93
some text 94
some text 95
Is this possible and if so how?
While on Zuma Beach - I was thinking of avoiding CROSS JOIN in my original answer.
Check below - should be much cheaper especially for big set
SELECT textPayload
FROM (
SELECT textPayload,
SUM(match) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS flag
FROM (
SELECT textPayload, ts, IF(textPayload CONTAINS 'something', 1, 0) AS match
FROM YourTable
)
)
WHERE flag > 0
Of course another way to avoid cross join is to use BigQuery Standard SQL. But still - above solution with no joins at all is better than my original answer
I think, one piece is missing in your example - extra field that will define the order, so I added ts field for this in my answer. This mean I assume your table has two fields involved : textPayload and ts
Try below. Should give you exactly what you need
SELECT
all.textPayload
FROM (
SELECT start, finish
FROM (
SELECT textPayload,
LAG(ts, 3) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS start,
LEAD(ts, 3) OVER(ORDER BY ts ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) AS finish
FROM YourTable
)
WHERE textPayload CONTAINS 'something'
) AS matches
CROSS JOIN YourTable AS all
WHERE all.ts BETWEEN matches.start AND matches.finish
Please note: depends on type of your ts field - you might need to do some data casting in query for this field. hope not

SELECT DISTINCT is not working

Let's say I have a table name TableA with the below partial data:
LOOKUP_VALUE LOOKUPS_CODE LOOKUPS_ID
------------ ------------ ----------
5% 120 1001
5% 121 1002
5% 123 1003
2% 130 2001
2% 131 2002
I wanted to select only 1 row of 5% and 1 row of 2% as a view using DISTINCT but it fail, my query is:
SELECT DISTINCT lookup_value, lookups_code
FROM TableA;
The above query give me the result as shown below.
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
5% 121
5% 123
2% 130
2% 131
But that is not my expected result, mt expected result is shown below:
LOOKUP_VALUE LOOKUPS_CODE
------------ ------------
5% 120
2% 130
May I know how can I achieve this without specifying any WHERE clause?
Thank you!
I think you're misunderstanding the scope of DISTINCT: it will give your distinct rows, not just distinct on the first field.
If you want one row for each distinct LOOKUP_VALUE, you either need a WHERE clause that will work out which one of them to show, or an aggregation strategy with a GROUP BY clause plus logic in the SELECT that tells the query how to aggregate the other columns (e.g. AVG, MAX, MIN)
Here's my guess at your problem - when you say
"The above query give me the result as shown in the data table above."
this is simply not true - please try it and update your question accordingly.
I am speculating here: I think you are trying to use "Distinct" but also output the other fields. If you run:
select distinct Field1, Field2, Field3 ...
Then your output will be "one row per distinct combination" of the 3 fields.
Try GROUP BY instead - this will let you select the Max, Min, Sum of other fields while still yielding "one row per unique combined values" for fields included in GROUP BY
example below uses your table to return one row per LOOKUP_VALUE and then the max and min of the remaining fields and the count of total records using your data:
select
LOOKUP_VALUE, min( LOOKUPS_CODE) LOOKUPS_CODE_min, max( LOOKUPS_CODE) LOOKUPS_CODE_max, min( LOOKUPS_ID) LOOKUPS_ID_min, max( LOOKUPS_ID) LOOKUPS_ID_max, Count(*) Record_Count
From TableA
Group by LOOKUP_VALUE
I wanted to select only 1 row of 5% and 1 row of 2%
This will get the lowest value lookups_code for each lookup_value:
SELECT lookup_value,
lookups_code
FROM (
SELECT lookup_value,
lookups_code,
ROW_NUMBER() OVER ( PARTITION BY lookup_value ORDER BY lookups_code ) AS rn
FROM TableA
)
WHERE rn = 1
You could also use GROUP BY:
SELECT lookup_value,
MIN( lookups_code ) AS lookups_code
FROM TableA
GROUP BY lookup_value
How about the MIN() function
I believe this works for your desired output, but am currently not able to test it.
SELECT Lookup_Value, MIN(LOOKUPS_CODE)
FROM TableA
GROUP BY Lookup_Value;
I'm going to take a total shot in the dark on this one, but because of the way you have named your fields it implies you are attempting to mimic the vlookup function within Microsoft Excel. If this is the case, the behavior when there are multiple matches is to pick the first match. As arbitrary as that sounds, it's the way it works.
If this is what you want, AND the first value is not necessarily the lowest (or highest, or best looking, or whatever), then the row_number aggregate function would probably suit your needs.
I give you a caveat that my ordering criteria is based on the database row number, which could conceivably be different than what you think. If, however, you insert them into a clean table (with a reset high water mark), then I think it's a pretty safe bet it will behave the way you want. If not, then you are better off including a field explicitly to tell it what order you want the choice to occur.
with cte as (
select
vlookup_value,
vlookups_code,
row_number() over (partition by vlookup_value order by rownum) as rn
from
TableA
)
select
vlookup_value, vlookups_code
from cte
where rn = 1

SQL Combing the top 2 field values into 1 value

I have a very simple query that returns the Notes field. Since there can be multiple notes, I only want the top 2. No problem. However, I'm going to be using the sql within another query. I really don't want 2 lines in my results. I would like to combine the results into 1 field value so I only have 1 result line in the results. Is this possible?
For example, I currently get the following:
12345 1001 500.00 "Note 1"
12345 1001 500.00 "Note 2"
What I would like to see is this:
12345 1001 500.00 "Note 1 AND Note 2"
Following is the sql:
select top 2 rcai.field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
Thanks for your help. I appreciate this forum for help with these issues.....
Get the next row's value and filter all but the first row:
select ..., rcai.field_value || ' AND '
min(rcai.field_value) -- next row's value (same as LEAD in Standard SQL)
over (partition by ra.rnt_agr_nbr
order by rcai.field_value
rows between 1 following and 1 following) as next_field_value
from rnt_agrs ra
inner join rnt_agr_inv_notes rain on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
inner join RNT_CUST_ADDNL_INFO rcai on rain.rea_rnt_agr_nbr=rcai.rea_rnt_agr_nbr and rain.bac_acc_id=rcai.bac_acct_id
where ra.rnt_agr_nbr=128260511
qualify
row_number() -- only the first row
over (partition by ra.rnt_agr_nbr
order by rcai.field_value) = 1
If there might be only a single row you need to add a COALESCE(min...,'') to get rid of the NULL.
Both OLAP functions specify the same PARTITION and ORDER, so this is a single working step.
select *,(SELECT top 2 rcai.field_value + ' AND ' AS [text()]
FROM RNT_CUST_ADDNL_INFO rcai
WHERE rcai.rea_rnt_agr_nbr = rain.rea_rnt_agr_nbr
AND rcai.bac_acct_id=rain.bac_acc_id
FOR XML PATH('')) AS Notes
from
rnt_agrs ra inner join rnt_agr_inv_notes rain
on ra.rnt_agr_nbr=rain.rea_rnt_agr_nbr
I had something like this, where there was a 1 to many, and I wanted a semicolon delimited set of values in a single column with the main record.
You could use PIVOT to transform the two note rows into two note columns based on row number, then concatenate them. Here's an example:
SELECT pvt.[1] + ' and ' + pvt.[2]
FROM
( --the selection of your table data, including a row-number column
SELECT Msg, ROW_NUMBER() OVER(ORDER BY Id)
--sample data shown here, but this would be your real table
FROM (VALUES(1, 'Note 1'), (2, 'Note 2'), (3, 'Note 3')) Note(Id, Msg)
) Data (Msg, Row)
PIVOT (MAX(Msg) FOR Row IN ([1], [2])) pvt
Note that MAX is used for the aggregate in the PIVOT since an aggregate is required, but since ROW_NUMBER is unique, you're only aggregating a single value.
This could also be easily extended to the first N rows - just include the row numbers you want in the pivot and combine them as desired in the select statement.