Difference two rows in a single SQL SELECT statement - sql

I have a database table that has a structure like the one shown below:
CREATE TABLE dated_records (
recdate DATE NOT NULL
col1 DOUBLE NOT NULL,
col2 DOUBLE NOT NULL,
col3 DOUBLE NOT NULL,
col4 DOUBLE NOT NULL,
col5 DOUBLE NOT NULL,
col6 DOUBLE NOT NULL,
col7 DOUBLE NOT NULL,
col8 DOUBLE NOT NULL
);
I want to write an SQL statement that will allow me to return a record containing the changes between two supplied dates, for specified columns - e.g. col1, col2 and col3
for example, if I wanted to see how much the value in col1, col2 and col3 has changed during the interval between two dates. A dumb way of doing this would be to select the rows (separately) for each date and then difference the fields outside the db server -
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'";
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'";
however, I'm sure there there is a way a smarter way of performing the differencing using pure SQL. I am guessing that it will involve using a self join (and possibly a nested subquery), but I may be over complicating things - I decided it would be better to ask the SQL experts on here to see how they would solve this problem in the most efficient way.
Ideally the SQL should be DB agnostic, but if it needs to be tied to be a particular db, then it would have to be PostgreSQL.

Just select the two rows, join them into one, and subtract the values:
select d1.recdate, d2.recdate,
(d2.col1 - d1.col1) as delta_col1,
(d2.col2 - d1.col2) as delta_col2,
...
from (select *
from dated_records
where recdate = <date1>
) d1 cross join
(select *
from dated_records
where recdate = <date2>
) d2

I think that if what you want to do is get in the result set rows that doesn't intersect the two select queries , you can use the EXCEPT operator :
The EXCEPT operator returns the rows that are in the first result set
but not in the second.
So your two queries will become one single query with the except operator joining them :
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'
EXCEPT
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'

SELECT
COALESCE
(a.col1 -
(
SELECT b.col1
FROM dated_records b
WHERE b.id = a.id + 1
),
a.col1)
FROM dated_records a
WHERE recdate='2001-01-01';

You could use window functions plus DISTINCT:
SELECT DISTINCT
first_value(recdate) OVER () AS date1
,last_value(recdate) OVER () AS date2
,last_value(col1) OVER () - first_value(col1) OVER () AS delta1
,last_value(col2) OVER () - first_value(col2) OVER () AS delta2
...
FROM dated_records
WHERE recdate IN ('2001-01-01', '2001-01-03')
For any two days. Uses a single index or table scan, so it should be fast.
I did not order the window, but all calculations use the same window, so the values are consistent.
This solution can easily be generalized for calculations between n rows. You may want to use nth_value() from the Postgres arsenal of window functions in this case.

This seemed a quicker way to write this if you are looking for a simple delta.
SELECT first(col1) - last(col1) AS delta_col1
, first(col2) - last(col2) AS delta_col2
FROM dated_records WHERE recdate IN ('2001-02-01', '2001-01-01')
You may not know whether the first row or the second row comes first, but you can always wrap the answer in abs(first(col1)-last(col1))

Related

Postgresql subtract comma separated string in one column from another column

The format is like:
col1
col2
V1,V2,V3,V4,V5,V6
V4,V1,V6
V1,V2,V3
V2,V3
I want to create another column called col3 which contains the subtraction of two columns.
What I have tried:
UPDATE myTable
SET col3=(replace(col1,col2,''))
It works well for rows like row2 since the order of replacing patterns matters.
I was wondering if there's a perfect way to achieve the same goal for rows like row1.
So the desired output would be:
col1
col2
col3
V1,V2,V3,V4,V5,V6
V4,V1,V6
V2,V3,V5
V1,V2,V3
V2,V3
V1
Any suggestions would be appreciated!
Split values into tables, subtract sets and then assemble it back. Everything is possible as an expression defining new query column.
with t (col1,col2) as (values
('V1,V2,V3,V4,V5,V6','V4,V1,V6'),
('V1,V2,V3','V2,V3')
)
select col1,col2
, (
select string_agg(v,',')
from (
select v from unnest(string_to_array(t.col1,',')) as a1(v)
except
select v from unnest(string_to_array(t.col2,',')) as a2(v)
) x
)
from t
DB fiddle
You will have to unnest the elements then apply an EXCEPT clause on the "unnested" rows and aggregate back:
select col1,
col2,
(select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
from the_table;
I wouldn't store that result in a separate column, but if you really need to introduce even more problems by storing another comma separated list.
update the_table
set col3 = (select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
;
string_to_table() requires Postgres 14 or newer. If you are using an older version, you need to use unnest(string_to_array(col1, ',')) instead
If you need that a lot, consider creating a function:
create function remove_items(p_one text, p_other text)
returns text
as
$$
select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t;
$$
language sql
immutable;
Then the above can be simplified to:
select col1, col2, remove_items(col1, col2)
from the_table;
Note, POSTGRESQL is not my forte, but thought I'd have a go at it. Try:
SELECT col1, col2, RTRIM(REGEXP_REPLACE(Col1,CONCAT('\m(?:', REPLACE(Col2,',','|'),')\M,?'),'','g'), ',') as col3 FROM myTable
See an online fidle.
The idea is to use a regular expession to replace all values, based on following pattern:
\m - Word-boundary at start of word;
(?:V4|V1|V6) - A non-capture group that holds the alternatives from col2;
\M - Word-boundary at end of word;
,? - Optional comma.
When replaced with nothing we need to clean up a possible trailing comma with RTRIM(). See an online demo where I had to replace the word-boundaries with the \b word-boundary to showcase the outcome.

NOT IN vs concatenate columns

Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.

How to select a value from different row if the column is null in the current row?

I have a decode statement in my select SQL like this -
...
decode(instr(col1,'str1'), 0, 'STR1', 'STR2') as NAME,
...
The problem is the col1 could be null. So I thought I could use an inner decode like the following -
decode(instr(
decode(col1, null, (
select unique col1 from SAMETABLE st where st.pid = pid) as col2, col1), 'str1'), 0, 'STR1', 'STR2') as NAME,
But it failed.
Here is a possible snapshot of what in DB -
col1 pid
row1 null 1
row2 somevalue 1
I would like to use the value of col1 in row2 to replace the value in row1 when col1 is null in row1 and the two records' pid are equal.
Can anyone point out if I'm doing something impossible?
There are the following issues with your code:
You give the inner table an alias st and then do where st.pid = pid, but that is a self-reference, because also the other pid is taken from the table of the inner query. Instead, give the table in the main query an alias.
You give the outcome of the inner query an alias (as col2), but giving aliases is not allowed inside expressions, so that needs to be removed.
The inner query selects unique col1, but that can still give multiple results, which will give an error. The inner query must return exactly one value at all times (when there are different non null values, and even when there are none). So you should use an aggregate function, like min
decode(a, null, b, a) is a long way to write nvl(a, b)
So you could use this:
select decode(
instr(
nvl(col1, (select min(col1) from t where pid = t1.pid)),
'str1'
),
0, 'STR1', 'STR2'
) as NAME
from mytable t1
I have tried this in Oracle 11 g and it works pretty well. I have also tried to change the starting value of col1 and it works. So i guess you have some other issues that is related to the field type not on how DECODE works.
DECLARE
col1 VARCHAR(10);
result VARCHAR2(10);
BEGIN
col1:=null;
select DECODE(
instr(DECODE(col1, null, (select 'HELLO' from DUAL),
col1),'str1'), 0, 'STR1', 'STR2') into result
from DUAL;
dbms_output.PUT_LINE(result);
END
I guess you have to change the subquery :
select unique col1 from SAMETABLE st where st.pid = pid
with something like
select unique col1 from SAMETABLE st where st.pid = pid and col1 is not null

SQL _ wildcard not working as expected. Why?

so i have this query
select id, col1, len(col1)
from tableA
from there I wanted to grab all data in col1 that have exactly 5 characters and start with 15
select id, col1, len(col1)
from tableA
where col1 like '15___' -- underscore 3 times
Now col1 is a nvarchar(192) and there are data that starts with 15 and are of length 5. But the second query always shows me no rows.
Why is that?
The case could be that the field is a large empty string? Such as "15123 "
You could also try another solution?
select id, col1, len(col1)
from tableA
where col1 like '15%' AND Len(col1)=5
EDIT - FOR FUTURE REFERENCE:
For sake of comprehensiveness, char and nchar uses the full field size, so char(10) would be 15________ ("15" + 8 characters) long, because it implicitly forces the size, whereas a varchar resizes based on what it is supplied 15 is simply 15.
To get around this you could
A) Do an LTRIM/RTRIM To cut off all extra spaces
select id, col1, len(col1)
from tableA
where rtrim(ltrim(col1)) like '15___'
B) Do a LEFT() to only grab the left 5 characters
select id, col1, len(col1)
from tableA
where left(col1,5) like '15___'
C) Cast as a varchar, a rather sloppy approach
select id, col1, len(col1)
from tableA
where CAST(col1 AS Varchar(192)) like '15___'
Does this query return anything?
select id, col1, len(col1)
from tableA
where len(col1) = 5 and
left(col1, 2) = '15';
If not, then there are no values that match that pattern. And, my best guess would be spaces, in which case, this might work:
select id, col1, len(col1)
from tableA
where ltrim(rtrim(col1)) like '15___';

Multiple data row columns per line

I am trying to display a single column from a data set but spread out across a single row. For example:
[Row1] [Row2] [Row3]
[Row4] [Row5] [Row6]
Instead of:
[Row1]
[Row2]
[Row3] etc.
The data set needs to be joined with another table based on column from an outer table which means, AFAIK, cross tabs are out of the question as you can't use data set parameters with them. There is not a limit to how many rows there will be in a single data set but I want to have 3 row columns per line.
I can modify the data set query however I can only use plain old SQL in those queries except for creating temporary tables or creating anything "new" on the server side - a BIRT-only solution would be more desirable however.
If you can change the query to output
1 1 [Row1]
1 2 [Row2]
1 3 [Row3]
2 1 [Row4]
2 2 [Row5]
2 3 [Row6]
into a temporary table tmp, then you could query that using something like
select col1, col3 from tmp into tmp1 where col2 = 1;
select col1, col3 from tmp into tmp2 where col2 = 2;
select col1, col3 from tmp into tmp3 where col2 = 3;
select tmp1.col3, tmp2.col3, tmp3.col3 from tmp1, tmp2, tmp3 where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
You could generate col1 and col2 using rownum, but it's non-standard, and it requires the output of the original query to be sorted properly.
Edit:
If you can't use a temporary table, I assume you can use subqueries:
select tmp1.col3, tmp2.col3, tmp3.col3 from
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 1) as tmp1,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 2) as tmp2,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 3) as tmp3
where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
and hope the optimizer is smart.