Populate blank values in a Field with number from last populated value - sql

There is no specific number of blank values. It can be none or many. Here is the current result.
Blank Cells to be Populated:

You can use analytic functions. I think this will work:
select t.*, coalesce(coil, lag(coil ignore nulls) over (order by datetime))
from t;
I know Oracle has supported ignore nulls for a long, long time. I don't quite remember off-hand if ancient versions supported it.

The below approach should work (or hopefully will give you enough to go on). The idea is that you update columns by joining a table on itself and joining on an earliest row which has been entered before the row you are wanting to update and also a row in which the column you are wanting to update is not NULL.
SELECT YT1.ID, YT2.COIL
FROM Your_Table YT1
INNER JOIN Your_Table YT2 ON YT2.ID =
(SELECT TOP 1 ID FROM Your_Table
WHERE [start_date] < YT1.[start_date]
AND COIL IS NOT NULL
ORDER BY [start_date] DESC)
WHERE YT1.COIL IS NULL OR LEN(YT1.COIL) = 0

Related

Get a new column with updated values, where each row change in time depending on the actual column?

I have some data that includes as columns an ID, Date and Place denoted by a number. I need to simulate a real time update where I create a new column that says how many different places are at the moment, so each time a new place appear in the column, the new column change it's value and shows it.
This is just a little piece of the original table with hundreds of millions of rows.
Here is an example, the left table is the original one and the right table is what I need.
I tried to do it with this piece of code but I cannot use the function DISTINCT with the OVER clause.
SELECT ID, Dates, Place,
count (distinct(Place)) OVER (PARTITION BY Place ORDER BY Dates) AS
DiffPlaces
FROM #informacion_prendaria_muestra
order by ID;
I think it will be possible by using DENSE_RANK() in SQL server
you can try this
SELECT ID, Dates, Place,
DENSE_RANK() OVER(ORDER BY Place) AS
DiffPlaces
FROM #informacion_prendaria_muestra
I think you can use a self join query like this - without using windows functions -:
select
t.ID, t.[Date], t.Place,
count(distinct tt.Place) diffPlace
from
yourTable t left join
yourTable tt on t.ID = tt.ID and t.[Date] >= tt.[Date]
group by
t.ID, t.[Date], t.Place
order by
Id, [Date];
SQL Fiddle Demo

Lag Function to skip over previous record where there is a null

I am trying to get a previous value using the lag function, however it only works for data that is populated on the previous record. What I am looking to do is skip the previous record only if there is a null and look at the previous record prior to that which is not a null
Select LAG(previous_reference_no)OVER(ORDER BY createdon) FROM TableA
So say if I am at record 5,record 4 is null however record 3 is not null. So from record 5 I would want to display the value of record 4.
Hope this makes sense, please help/
Add a PARTITION BY clause?
Select LAG(previous_reference_no) OVER (PARTITION BY CASE WHEN previous_reference_no IS NULL THEN 0 ELSE 1 END
ORDER BY createdon)
FROM TableA
Standard SQL has the syntax for this:
SELECT LAG(previous_reference_no IGNORE NULLS) OVER (ORDER BY createdon)
FROM TableA
Unfortunately SQL Server does not support this. One method uses two levels of window functions and some logic:
SELECT (CASE WHEN previous_reference_no IS NULL
THEN MAX(prev_reference_no) OVER (PARTITION BY grp)
ELSE LAG(previous_reference_no) OVER (PARTITION BY (CASE WHEN previous_reference_no IS NOT NULL THEN 1 ELSE 0 END)
ORDER BY createdon)
END)
FROM (SELECT a.*,
COUNT(prev_reference_no) OVER (ORDER BY a.createdon) as grp
FROM TableA a
) a;
The logic is:
Create a grouping that has a given reference number and all following NULL values in one group.
If the reference number is NULL, then get the first value for the start of the group. This would be the previous non-NULL value.
If the reference number is not NULL then use partition by to look at the last not-NULL value.
Another method -- which is likely to be much slower -- uses APPLY:
select a.*, aprev.prev_reference_no
from TableA a outer apply
(select top (1) aprev.*
from TableA aprev
where aprev.createdon < a.createdon and
aprev.prev_reference_no is not null
) aprev;
For a small table, the performance hit might be worth the simplicity of the code.

How can I overwrite a column from a column in another table without a join?

I want to simply overwrite values in a column of a table with values from a column in another table.
I have a table based off from another table without a unique identifier in one of the columns, so I don't want to use joins, but just update the values since the rows are in the same order. How do I do that? So far I have tried two different approaches where Approach A only put the value from the first row into every row of the updated table whereas Approach B does not work at all.
Approach A:
Update Transactions
SET Transactions.Amount = Transactions_raw.Amount
FROM Transactions_raw
Approach B:
UPDATE Transactions
SET Amount = (SELECT Amount FROM Transactions_raw)
You need some kind of join to match the tables -- even if on an artificial key:
update t
set Amount = tr.Amount
from (select t.*, row_number() over (order by (select null)) as seqnum
from Transactions t
) t join
(select tr.*, row_number() over (order by (select null)) as seqnum
from Transactions_raw tr
) tr
on t.seqnum = tr.seqnum;
Your assumption that the rows are in the same order may mislead you. If you do select statement without order by and see the same order in both tables, this is not what you want rely on. This so called order is not guaranteed. Instead, you have to have some rule for ordering. When you have this rule, which you can place in order by, then you can add ID column to both tables according to this order.
You can calculate ID value using statement:
update Transactions
set Id = row_number() over(order by ...)
Then you can use regular inner join.

How to set updating row's field with value of closest to it by date another field?

I have a huge table with 2m+ rows.
The structure is like that:
ThingName (STRING),
Date (DATE),
Value (INT64)
Sometimes Value is null and I need to fix it by setting it with NOT NULL Value of closest to it by Date row corresponding to ThingName...
And I am totally not SQL guy.
I tried to describe my task with this query (and simplified it a lot by using only previous dates (but actually I need to check future dates too)):
update my_tbl as SDP
set SDP.Value = (select SDPI.Value
from my_tbl as SDPI
where SDPI.Date < SDP.Date
and SDP.ThingName = SDPI.ThingName
and SDPI.Value is not null
order by SDPI.Date desc limit 1)
where SDP.Value is null;
There I try to set updating row Value with one that I select from same table for same ThingName and with limit 1 I leave only single result.
But query editor tell me this:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Actually, I am not sure at all that my task can be solved just with query.
So, can anyone help me? If this is impossible, then tell me this, if it possible, tell me what SQL constructions may help me.
Below is for BigQuery Standard SQL
In many (if not most) cases you don't want to update your table (as it incur extra cost and limitations associated with DML statements) but rather can adjust 'missing' values in-query - like in below example:
#standardSQL
SELECT
ThingName,
date,
IFNULL(value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY thingname ORDER BY date)
) AS value
FROM `project.dataset.my_tbl`
If for some reason you actually need to update the table - above statement will not help as DML's UPDATE does not allow use of analytic functions, so you need to use another approach. For example as below one
#standardSQL
SELECT
t1.ThingName, t1.date,
ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] AS value
FROM `project.dataset.my_tbl` AS t1
LEFT JOIN `project.dataset.my_tbl` AS t2
ON t2.ThingName = t1.ThingName
AND t2.date <= t1.date
GROUP BY t1.ThingName, t1.date, t1.value
and now you can use it to update your table as in example below
#standardSQL
UPDATE `project.dataset.my_tbl` t
SET value = new_value
FROM (
SELECT TO_JSON_STRING(t1) AS id,
ARRAY_AGG(t2.Value IGNORE NULLS ORDER BY t2.date DESC LIMIT 1)[OFFSET(0)] new_value
FROM `project.dataset.my_tbl` AS t1
LEFT JOIN `project.dataset.my_tbl` AS t2
ON t2.ThingName = t1.ThingName
AND t2.date <= t1.date
GROUP BY id
)
WHERE TO_JSON_STRING(t) = id
In BigQuery, updates are rather rare. The logic you seem to want is:
select t.*,
coalesce(value,
lag(value ignore nulls) over (partition by thingname order by date)
) as value
from my_tbl;
I don't really see a reason to save this back in the table.

A SQL query to work on a table with subgroups

I have a large Oracle DB table which contains nearly 200 millions of rows. It has only three columns: A subscriber id field, a date field and an offer id field.
For each row in this table, I need to find whether this row has any corresponding rows in the table such that:
1) They belong to the same subscriber (same subscriber id)
2) They are from a certain distance in the past from the current row (for example if our current row is A, the row B with the same subscriber id should have that A.date > B.date >= A.date - 30(days))
3) In addition to 2) we will have to query for a specific offer id as well: (A.date > B.date >= A.date - 30 and B.offerid == some_id)
I am aware of the Oracle Analytics functions lag and lead and I plan to use them for this purpose. These function returns value of the fields above or below of the current row on the ordered table, according to some given fields. The disturbing thing is that the number of rows with the same subscriber id field varies up to 84. When I use an ORDER BY statement on (SUBSCRIBER_ID,DATE) with lag function, then for each row, I need to check 84 rows above of the current one, in order to make sure that the rows above share the same SUBSCRIBER_ID with my current row. Since some subscriber id subgroups only have entries around of 3 - 4 rows, this amount of unnecessary row accesses is wasteful.
How can I accomplish this job, without being in need to check 84 rows each time, for each row? Does Oracle support any methods which work solely on subgroups generated by the GROUP BY statement?
One option is to use a self-join like this:
SELECT t1.*, NVL2(t2.subscriber_id, 'Yes', 'No') as match_found
FROM
myTable t1 LEFT JOIN
myTable t2 ON t1.subscriber_id = t2.subscriber_id
AND t1.date > t2.date AND t2.date >= t1.date - 30
AND t2.offerid = <filter_offer_id>
Actually the analytic function COUNT(*) in Oracle did the necessary stuff for me. I used the following structure
SELECT
SUBSCRIBER_ID,
SEGMENTATION_DATE,
OFFER_ID,
COUNT(*) OVER (PARTITION BY SUBSCRIBER_ID ORDER BY SEGMENTATION_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS SENDEVER,
COUNT(*) OVER (PARTITION BY SUBSCRIBER_ID ORDER BY SEGMENTATION_DATE RANGE BETWEEN 30 PRECEDING AND 1
COUNT(CASE WHEN (OFFER_ID =580169) THEN 1 ELSE NULL END ) OVER (PARTITION BY SUBSCRIBER_ID ORDER BY SEGMENTATION_DATE RANGE BETWEEN 180 PRECEDING AND 1 PRECEDING) AS SEND6M580169
FROM myTable
PARTITION BY groups the table according to the SUBSCRIBER_ID field and by using proper RANGE BETWEEN statements on each group's rows, I only pick the ones which has the appropriate dates, in the desired time interval.
By using a CASE WHEN statement on the OFFER_ID field, I further filter the rows in the current SUBSCRIBER_ID group and throw out all rows with the invalid offer id.
The nice thing is there is no self join needed here, reducing the order of the operation a magnitude down.