Change string split select SQL query into update query - sql

I am trying to split a column in a SQL table into two columns where the data in the column is separated with a “-”, I have managed to edit a query I found online to do that.
The issue is, it only returns the data for viewing as it is a select query.
How can I change this query to update the table?
select
case when CHARINDEX('-',ProjectNumber)>0
then SUBSTRING(ProjectNumber,1,CHARINDEX('-',ProjectNumber)-1)
else ProjectNumber end ProjectNumber,
CASE WHEN CHARINDEX('-',ProjectNumber)>0
THEN SUBSTRING(ProjectNumber,CHARINDEX('-',ProjectNumber)+1,len(ProjectNumber))
ELSE NULL END as Vessel
from dbo.Stock
EDIT:
I have tried this:
update dbo.stock set vessel =
case when CHARINDEX('-',ProjectNumber)>0
then SUBSTRING(ProjectNumber,1,CHARINDEX('-',ProjectNumber)-1)
else ProjectNumber end **ProjectNumber,**
CASE WHEN CHARINDEX('-',ProjectNumber)>0
THEN SUBSTRING(ProjectNumber,CHARINDEX('-',ProjectNumber)+1,len(ProjectNumber))
ELSE NULL END as Vessel
But it is telling me i have a syntax error near ProjectNumber, it's the one i put stars around.

Why dont use the STRING_SPLIT function? (https://learn.microsoft.com/es-es/sql/t-sql/functions/string-split-transact-sql?view=sql-server-2017#examples):
UPDATE dbo.Stock
SET vessel = id1,
vessel2 = id2
FROM (
SELECT id,
(SELECT TOP 1 value as val
FROM STRING_SPLIT(id, '-')
ORDER BY (ROW_NUMBER() OVER(ORDER BY 1 ASC)) ASC
) AS id1,
(SELECT TOP 1 value as val
FROM STRING_SPLIT(id, '-')
ORDER BY (ROW_NUMBER() OVER(ORDER BY 1 ASC)) DESC
) AS id2
FROM dbo.Stock
) A inner join dbo.Stock B
ON A.id = B.id;

Related

first_value over non null values not working in Spark SQL

I am trying to run a query on Spark SQL, where I want to fill the missing average_price (NULL) values with the next non null average price
Problem:
Desired Result:
Result I am getting from my query below
Here is the query I am using
spark.sql("""
select *,
CASE
WHEN average_price IS NULL AND store_id = 0 THEN
first_value(average_price, yes)
OVER
(
PARTITION BY product_id
ORDER BY cast(purchase_dt as int) asc
range between current row and 3 following
)
ELSE 0
END AS new_av_price
from table
""")
what am I doing wrong here?
I understand that your spark-sql version does not supportIGNORE NULLS syntax; see https://issues.apache.org/jira/browse/SPARK-30789.
You can go with this:
select
t1.*,
(select min(t2.average_price)
from Tbl t2
where t1.product_id=t2.product_id
and t2.purchase_dt=(select min(t3.purchase_dt)
from Tbl t3
where t3.product_id = t1.product_id
and t3.purchase_dt >= t1.purchase_Dt
and t3.average_price is not null
)
) as new_average_price
from Tbl t1
or this:
select
t1.*,
t2.average_price
from
Tbl t1
left join
Tbl t2
on t2.product_id = t1.product_id
and t2.average_price is not null
and t2.purchase_dt = (select min(t3.purchase_dt)
from Tbl t3
where t3.product_id=t1.product_id
and t3.purchase_dt>=t1.purchase_dt
and t3.average_price is not null)
These assume that you have only one row per product_id, purchase_dt. If you can have more than one row, you need to add additional logic to get rid of all but one row.
UPDATE 20220405:
If you can't use a JOIN, but you know that the non-NULL value is only up to 3 rows away, could you use:
COALESCE(
average_price
, first_value(average_price) OVER (
PARTITION BY product_id
ORDER BY cast(purchase_dt as int) asc
range between 1 following and 1 following
)
, first_value(average_price) OVER (
PARTITION BY product_id
ORDER BY cast(purchase_dt as int) asc
range between 2 following and 2 following
)
/* , ... */
) as new_average_price

SQL Case depending on previous status of record

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2
Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".
You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;
other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"
try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

LAG functions and NULLS

How can I tell the LAG function to get the last "not null" value?
For example, see my table bellow where I have a few NULL values on column B and C.
I'd like to fill the nulls with the last non-null value. I tried to do that by using the LAG function, like so:
case when B is null then lag (B) over (order by idx) else B end as B,
but that doesn't quite work when I have two or more nulls in a row (see the NULL value on column C row 3 - I'd like it to be 0.50 as the original).
Any idea how can I achieve that?
(it doesn't have to be using the LAG function, any other ideas are welcome)
A few assumptions:
The number of rows is dynamic;
The first value will always be non-null;
Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.
Thanks
You can do it with outer apply operator:
select t.id,
t1.colA,
t2.colB,
t3.colC
from table t
outer apply(select top 1 colA from table where id <= t.id and colA is not null order by id desc) t1
outer apply(select top 1 colB from table where id <= t.id and colB is not null order by id desc) t2
outer apply(select top 1 colC from table where id <= t.id and colC is not null order by id desc) t3;
This will work, regardless of the number of nulls or null "islands". You may have values, then nulls, then again values, again nulls. It will still work.
If, however the assumption (in your question) holds:
Once I have a NULL, is NULL all up to the end - so I want to fill it with the latest value.
there is a more efficient solution. We only need to find the latest (when ordered by idx) values. Modifying the above query, removing the where id <= t.id from the subqueries:
select t.id,
colA = coalesce(t.colA, t1.colA),
colB = coalesce(t.colB, t2.colB),
colC = coalesce(t.colC, t3.colC)
from table t
outer apply (select top 1 colA from table
where colA is not null order by id desc) t1
outer apply (select top 1 colB from table
where colB is not null order by id desc) t2
outer apply (select top 1 colC from table
where colC is not null order by id desc) t3;
You could make a change to your ORDER BY, to force the NULLs to be first in your ordering, but that may be expensive...
lag(B) over (order by CASE WHEN B IS NULL THEN -1 ELSE idx END)
Or, use a sub-query to calculate the replacement value once. Possibly less expensive on larger sets, but very clunky.
- Relies on all the NULLs coming at the end
- The LAG doesn't rely on that
COALESCE(
B,
(
SELECT
sorted_not_null.B
FROM
(
SELECT
table.B,
ROW_NUMBER() OVER (ORDER BY table.idx DESC) AS row_id
FROM
table
WHERE
table.B IS NOT NULL
)
sorted_not_null
WHERE
sorted_not_null.row_id = 1
)
)
(This should be faster on larger data-sets, than LAG or using OUTER APPLY with correlated sub-queries, simply because the value is calculated once. For tidiness, you could calculate and store the [last_known_value] for each column in variables, then just use COALESCE(A, #last_known_A), COALESCE(B, #last_known_B), etc)
if it is null all the way up to the end then can take a short cut
declare #b varchar(20) = (select top 1 b from table where b is not null order by id desc);
declare #c varchar(20) = (select top 1 c from table where c is not null order by id desc);
select is, isnull(b,#b) as b, insull(c,#c) as c
from table;
Select max(diff) from(
Select
Case when lag(a) over (order by b) is not null
Then (a -lag(a) over (order by b)) end as diff
From <tbl_name> where
<relevant conditions>
Order by b) k
Works fine in db visualizer.
UPDATE table
SET B = (#n := COALESCE(B , #n))
WHERE B is null;

Duplicate Counts - TSQL

I want to get All records that has duplicate values for SOME of the fields (i.e. Key columns).
My code:
CREATE TABLE #TEMP (ID int, Descp varchar(5), Extra varchar(6))
INSERT INTO #Temp
SELECT 1,'One','Extra1'
UNION ALL
SELECT 2,'Two','Extra2'
UNION ALL
SELECT 3,'Three','Extra3'
UNION ALL
SELECT 1,'One','Extra4'
SELECT ID, Descp, Extra FROM #TEMP
;WITH Temp_CTE AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID, Descp ORDER BY (SELECT 0))
AS DuplicateRowNumber
FROM #TEMP
)
SELECT * FROM Temp_cte
DROP TABLE #TEMP
The last column tells me how many times each row has appeared based on ID and Descp values.
I want that row but I ALSO need another column* that indicates both rows for ID = 1 and Descp = 'One' has showed up more than once.
So an extra column* (i.e. MultipleOccurances (bool)) which has 1 for two rows with ID = 1 and Descp = 'One' and 0 for other rows as they are only showing up once.
How can I achieve that? (I want to avoid using Count(1)>1 or something if possible.
Edit:
Desired output:
ID Descp Extra DuplicateRowNumber IsMultiple
1 One Extra1 1 1
1 One Extra4 2 1
2 Two Extra2 1 0
3 Three Extra3 1 0
SQL Fiddle
You say "I want to avoid using Count" but it is probably the best way. It uses the partitioning you already have on the row_number
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, Descp
ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE
WHEN COUNT(*) OVER (PARTITION BY ID, Descp) > 1 THEN 1
ELSE 0
END AS IsMultiple
FROM #Temp
And the execution plan just shows a single sort
Well, I have this solution, but using a Count...
SELECT T1.*,
ROW_NUMBER() OVER (PARTITION BY T1.ID, T1.Descp ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE WHEN T2.C = 1 THEN 0 ELSE 1 END MultipleOcurrences FROM #temp T1
INNER JOIN
(SELECT ID, Descp, COUNT(1) C FROM #TEMP GROUP BY ID, Descp) T2
ON T1.ID = T2.ID AND T1.Descp = T2.Descp

Pivot using single column and no aggregate function

I have a doubt. I have a column called "ID". In that column I have values like "FieldName" followed by "FromDate", "Value" followed by "2012/12/01" , "FieldName" followed by "ToDate" Value "2013/12/01" etc.,
**ID column**
FieldName
FromDt
Value
2010/12/01
FieldName
ToDt
Value
2013/12/21
FieldName
CreatedDt
Value
2012/10/01
FieldName
ModifyDt
Value
2013/01/02
Now I want a table like
**FieldName Value**
FromDt 2010/12/01
ToDt 2013/12/21
CreatedDt 2012/10/01
ModifyDt 2013/01/02
Is it possible to use pivot with a single column and without aggregate function? Kindly suggest me how can I do this? (either using Pivot or some other methods)
Regards,
T.N.Nagasundar
You should have another column do Order By as #Magnus suggested.
Otherwise, try this. SQLFiddle
WITH cte AS
(
SELECT ID,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS row_num
FROM tbl
)
SELECT c1.ID AS [field_name], c2.ID AS [value]
FROM cte c1
INNER JOIN cte c2
ON c2.row_num = c1.row_num + 2
WHERE (c1.row_num % 4) = 2
I have no idea about using pivot in this situation. It is possible to split original-order rows into groups of 4 and select every 2nd and 4th row from each group like this:
http://sqlfiddle.com/#!3/b2af8/13/0
with t2 as (
select
id,
(row_number() over(order by (select 0)) - 1) / 4 grp,
(row_number() over(order by (select 0)) - 1) % 4 row
from t
)
select a.id FieldName,
b.id Value
from t2 a
join t2 b
on a.grp = b.grp
and a.row = 1 -- 2nd row numbering from 0
and b.row = 3
(inspired by https://stackoverflow.com/a/6390282/1176601)