Delete null values until first value is not null - sql

I have daily timeseries for companies in my dataset and use PostgreSQL.
For every company all rows with NULL in column3 shall be deleted until the first NOT NULL entry in this column for this company. Then all consecutive missing values are filled in with the value of the last observable value for this company that is NOT NULL.
You can imagine the following example data:
date company column3
1 2004-01-01 A 5
2 2004-01-01 B NULL
3 2004-01-01 C NULL
4 2004-01-02 A NULL
5 2004-01-02 B 7
6 2004-01-02 C NULL
7 2004-01-03 A 6
8 2004-01-03 B 7
9 2004-01-03 C 9
10 2004-01-04 A NULL
11 2004-01-04 B NULL
12 2004-01-04 C NULL
It would be great if I manage to write a query that delivers
date company column3
1 2004-01-01 A 5
2 2004-01-02 A 5
3 2004-01-02 B 7
4 2004-01-03 A 6
5 2004-01-03 B 7
6 2004-01-03 C 9
7 2004-01-04 A 6
8 2004-01-04 B 7
9 2004-01-04 C 9
I tried:
SELECT a.date, a.company, COALESCE(a.column3, (SELECT b.column3 FROM mytable b
WHERE b.company=a.company AND b.colmun3 IS NOT NULL ORDER BY b.company=a.company
DESC LIMIT 1)) FROM mytable a;
There are two problems with the code:
It does not delete all records with NULL values until the first NOT NULL value, but
fills in all missing values.
...with the first observation in the column and not with the last observation before
the missing value.

I suggest two subquery levels with window functions instead of correlated subqueries:
SELECT *
FROM (
SELECT the_date, company, max(col3) OVER (PARTITION BY company, grp) AS col3
FROM (
SELECT *, count(col3) OVER (PARTITION BY company ORDER BY the_date) AS grp
FROM tbl
) sub1
) sub2
WHERE col3 IS NOT NULL
ORDER BY the_date, company;
Produces the requested result.
db<>fiddle here
Old sqlfiddle
This assumes unique entries per (company, the_date). Should be much faster for tables with more than just a few rows. A (unique to enforce uniqueness?!) index would help performance a lot:
CREATE INDEX tbl_company_date_idx ON tbl (company, the_date);
How?
The aggregate function count() ignores NULL values when counting. Used as aggregate-window function, it computes the running count of a the column according to the default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This results in the count being "stuck" for rows with NULL values, thereby forming a peer group that shares the same (non-null) value.
In the second window function, the only non-null value per group is easily extracted with max(). The group before the first non-null value retains NULL, which is easily eliminated in the final SELECT.
See:
Retrieve last known value for each column of a row

Try:
SELECT *
FROM (
SELECT id,
date,
company,
case when column3 is not null
then column3
else (
SELECT column3
FROM mytable t1
WHERE t1.company = t.company
AND t1.date < t.date
AND t1.column3 IS NOT NULL
ORDER BY t1.date DESC LIMIT 1
)
end column3
FROM mytable T
) AS subq
WHERE column3 IS NOT NULL;
demo: http://sqlfiddle.com/#!15/0cdce/12

Related

Select rows where all values within group are NULL

How to:
Select rows where all rows within group are NULL. Below group is defined by id. I
wish to select rows where all values are NULL.
Id Var1
1 NULL
1 NULL
1 NULL
2 10
2 20
2 30
3 NULL
3 30
What I have tried:
select id
from table
where all var1 is null
group by id
Desired result:
id var1
1 NULL
1 NULL
1 NULL
Use having instead of where. It filters after aggregation:
select id
from table
group by id
having max(var1) is null;
A similar method uses:
having count(var1) = 0
You can try with this query:
select id,Var1
from table1 as a
where not exists (select id
from table1 as b
where a.id=b.id and b.Var1 is not null)
The subquery get the id that have values not null, so you don get them in the main query

SQL query to find counts of numbers in running total

Suppose the table has 1 column ID and the values are as below:
ID
5
5
5
6
5
5
6
6
the output should be
ID count
5 3
6 1
5 2
6 2
How can we do that in a single SQL query.
If you want to find the Total count of the Records you have you can write like
select count(*) from database_name order by column_name;
In relational databases data in the table has no any order, see this: https://en.wikipedia.org/wiki/Table_(database)
the database system does not guarantee any ordering of the rows unless
an ORDER BY clause is specified in the SELECT statement that queries
the table.
therefore, in order to get desired results, you must have an additional colum in the table that defines an order of rows (and can by used in ORDER BY clause).
In the below examle cn column defines such an order:
select * from tab123 ORDER BY rn;
RN ID
---------- -------
1 5
2 5
3 5
4 6
5 5
6 5
7 6
8 6
Starting from Oracle version 12c new MATCH_REGOGNIZE clause can be used:
select * from tab123
match_recognize(
order by rn
measures
strt.id as id,
count(*) as cnt
one row per match
after match skip past last row
pattern( strt ss* )
define ss as ss.id = prev( ss.id )
);
On earlier versions that support windows function (Oracle 10 and above) you can use two windows functions: LAG ... over and SUM ... over, in this way
select max( id ) as id, count(*) as cnt
FROM (
select id, sum( xxx ) over (order by rn ) as yyy
from (
select t.*,
case lag( id ) over (order by rn )
when id then 0 else 1 end as xxx
from tab123 t
)
)
GROUP BY yyy
ORDER BY yyy;

Query to group based on the sorted table result

Below is my table
a 1
a 2
a 1
b 1
a 2
a 2
b 3
b 2
a 1
My Expected output is
a 4
b 1
a 4
b 5
a 1
I want them to be grouped if they are in sequence.
If your dbms supports window functions, you can use the row_number difference to assign the same group to consecutive values (which are the same) in one column. After assigning the groups, it is easy to sum the values for each group.
select col1,sum(col2)
from (select t.*,
row_number() over(order by someid)
- row_number() over(partition by col1 order by someid) as grp
from tablename t
) x
group by col1,grp
Replace tablename, col1,col2,someid with the appropriate column names. someid should be the column to be ordered by.

how to select a value based on unique id column

Please help me ,
I have table with 3 column , when i select the column i need to dulicate the value based on the id
Id Days Values
1 5 7
1 NULL NULL
1 NULL NULL
2 7 25
2 NULL NULL
2 8 274
2 NULL NULL
I need a Result as
Id Days Values
1 5 7
1 5 7
1 5 7
2 7 25
2 7 25
2 8 274
2 8 274
`
Generate a set of data with the desired repeating values (B). Then join back to the base set (A) containing the # of record to repeat. This assumes that each ID will only have one record populated. If this is not the case, then you will not get desired results.
SELECT B.ID, B.MDays as Days, B.Mvalues as values
FROM TABLE A
INNER JOIN (SELECT ID, max(days) mDays, Max(values) Mvalues
FROM Table
GROUP BY ID) B
on A.ID = B.ID
And due to updates in question....--
This will get you close but without a way to define grouping within ID's I can't subdivide the records into 2 and 2
SELECT B.ID, B.Days as Days, B.values as values
FROM TABLE A
INNER JOIN (SELECT Distinct ID, days, values
FROM Table
GROUP BY ID) B
on A.ID = B.ID
and A.days is null
This isn't even close enough as we still Don't know how to order the rows...
It assumes order within the table which can't be trusted. We generate a row number for each row in the table using the Row_number Over syntax Grouping (partition by) the ID and days with the order of ID days (which doesn't work because of the null values)
We then join this data set back to a distinct set of values on ID and days
to get us close... but we still need some grouping logic. beyond that of ID that handles the null records and lack of order or grouping.
With CTE AS (
SELECT ID, Days, Values, Row_Number() Over (partition by ID, Days ORDER BY ID, Days) RN
FROM Table)
SELECT *
FROM (SELECT Distinct ID, Days, Values, max(RN) mRN FROM CTE GROUP BY ID, Days, Values) A
INNER JOIN CTE B
ON A.ID = B.ID
and A.Days = B.Ddays
and mRN <= B.RN
Order by B.RN

SQL to get next not null value in column

How can I get next not null value in column? I have MSSQL 2012 and table with only one column. Like this:
rownum Orig
------ ----
1 NULL
2 NULL
3 9
4 NULL
5 7
6 4
7 NULL
8 9
and I need this data:
Rownum Orig New
------ ---- ----
1 NULL 9
2 NULL 9
3 9 9
4 NULL 7
5 7 7
6 4 4
7 NULL 5
8 9 5
Code to start:
declare #t table (rownum int, orig int);
insert into #t values (1,NULL),(2,NULL),(3,9),(4,NULL),(5,7),(6,4),(7,NULL),(8,9);
select rownum, orig from #t;
One method is to use outer apply:
select t.*, t2.orig as newval
from #t t outer apply
(select top 1 t2.*
from #t t2
where t2.id >= t.id and t2.orig is not null
order by t2.id
) t2;
One way you can do this with window functions (in SQL Server 2012+) is to use a cumulative max on id, in inverse order:
select t.*, max(orig) over (partition by nextid) as newval
from (select t.*,
min(case when orig is not null then id end) over (order by id desc) as nextid
from #t
) t;
The subquery gets the value of the next non-NULL id. The outer query then spreads the orig value over all the rows with the same id (remember, in a group of rows with the same nextid, only one will have a non-NULL value for orig).