Multiple data row columns per line - sql

I am trying to display a single column from a data set but spread out across a single row. For example:
[Row1] [Row2] [Row3]
[Row4] [Row5] [Row6]
Instead of:
[Row1]
[Row2]
[Row3] etc.
The data set needs to be joined with another table based on column from an outer table which means, AFAIK, cross tabs are out of the question as you can't use data set parameters with them. There is not a limit to how many rows there will be in a single data set but I want to have 3 row columns per line.
I can modify the data set query however I can only use plain old SQL in those queries except for creating temporary tables or creating anything "new" on the server side - a BIRT-only solution would be more desirable however.

If you can change the query to output
1 1 [Row1]
1 2 [Row2]
1 3 [Row3]
2 1 [Row4]
2 2 [Row5]
2 3 [Row6]
into a temporary table tmp, then you could query that using something like
select col1, col3 from tmp into tmp1 where col2 = 1;
select col1, col3 from tmp into tmp2 where col2 = 2;
select col1, col3 from tmp into tmp3 where col2 = 3;
select tmp1.col3, tmp2.col3, tmp3.col3 from tmp1, tmp2, tmp3 where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
You could generate col1 and col2 using rownum, but it's non-standard, and it requires the output of the original query to be sorted properly.
Edit:
If you can't use a temporary table, I assume you can use subqueries:
select tmp1.col3, tmp2.col3, tmp3.col3 from
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 1) as tmp1,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 2) as tmp2,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 3) as tmp3
where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
and hope the optimizer is smart.

Related

NOT IN vs concatenate columns

Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.

Optimize sql to compare records between two tables and update one table

I need to compare the data between two tables and update one table. For e.g. the data in DEV (dev_tbl) should be updated with work area table (work_tbl) if any columns (except the primary/unique columns) in work_tbl are modified.
I'm using below update SQL, which works fine when records are less than 20K. The UPDATE statement is slower when record count is > 20K. The max records for update will not exceed 300K.
Is there a efficient way to write the UPDATE statement? I'm open to PL/SQL FORALL BULK UPDATE as well if performance SQL cannot be tuned.
UPDATE dev_tbl t
SET (t.coll,
t.co12,
t.co13,
col4) =
(SELECT coll, co12, col3, col4
FROM work_tbl s
WHERE t.join_col = s.join_col
AND t.pk_col = s.pk_col
)
WHERE EXISTS
(SELECT 1
FROM work_tbl ss
WHERE ss.join_col = s.join_col
AND ss.pk_col = s.pk_col
AND (col1 || co12 || co13 || col4) NOT IN (SELECT ( col1
|| col2
|| col3
|| col4)
FROM dev_tbl d
WHERE ss.join_col =
d.join_col))

Unable to write exact sql query to get result set

I have below type of data set:
Base Col1 Col2 Col3
1000 0 10 1100
1100 0 10 1210
1210 0 10 1331
For deriving col3, I will use formula like
col3 = (base - col1) * (1 + col2 / 100)
If you observe above data set 1st row of col3 value is the second row base column value. And Col2 value is same for all records.
So now my problem is at later point of time my col1 (Col1 column is a part of formula) row values will update based on this i need to recalculate col3 values by using mentioned formula.
See below data set for example, if col1 value has updates then we need to recalculate col3 values like below by using formula (Col3=(base-col1)*(1+col2/100))
Base Col1 Col2 Col3
1000 10 10 1089
1089 20 10 1175.9
1175.9 30 10 1293.4
For getting above data set, I have tried like below.
SELECT
col1, col2,
col3 - SUM(col1 * (Power((1 + COL2 / 100.00), RNO)))
OVER(ORDER BY RNO ROWS UNBOUNDED PRECEDING)
FROM
(SELECT
row_number() OVER(ORDER BY col1) rno,
*
FROM
#TABLE1) A
But I am not getting the correct results.
Please use below script to create table and for populating data.
CREATE TABLE #Table1
(
[col1] INT,
[col2] INT,
[col3] INT
);
INSERT INTO #Table1
([col1],
[col2],
[col3])
VALUES (10,10, 1100),
(20,10,1210),
(30,10,1331);
Note:- In my example always base value will dependent on previous row col3 value.
Please help me.
You should not store calculation results in your table. This is redundant and can lead to wrong data, as you notice. Your table also lacks an order. So first thing: Give the records a timestamp or a number. Then remove Col3 and Base. (Well, you must have the initial base value of course, so either keep the base column and make all values null except for the first one or store the value somewhere else or use a fix value in your query.)
Rno Col1 Col2
1 0 10
2 0 10
3 0 10
To get the results you need a recursive query. Below query considers RNOs as adjacent (with a non-adjacent number or dates, you'd have to use row_number to number your rows first). Here I just use 1000 as the base. If this is variable, store it somewhere and take it from there.
with cte(rno, base, col1, col2, col3) as
(
select rno, 1000 as base, col1, col2, (1000 - col1) * (1 + col2/100) as col3
from mytable
where rno = 1
union all
select m.rno, cte.col3 as base, m.col1, m.col2, (cte.col3 - m.col1) * (1 + m.col2/100)
from mytable m
join cte on m.rno = cte.rno + 1
)
select * from cte
order by rno;
You can create a view for this of course.
When col1 changes you need to update col3 of same row,
When col3 changes you need to update Base of next row,
When Base changes you need to update col3 of same row..
and so on..
At every update of Base, col1, or col3 run this loop:
declare #i int = 1
while #i<>0 begin
update t set Col3 = newCol3
from (
select top 1 base, col1, col2, col3, (base - col1) * (1 + col2 / 100.0) newCol3
from #t
where col3 <> (base - col1) * (1 + col2 / 100.0)
order by base
) t
update t set base = newbase
from (
select top 1 base, col1, col2, col3, newbase
from (
select base, col1, col2, col3, LAG(col3,1,null) over (order by base) newbase
from #t
) t
where base <> newbase
order by base
) t
if ##ROWCOUNT=0 set #i=0
end
output
base col1 col2 col3
1000 10 10 1089
1089 20 10 1175,9
1175,9 30 10 1260,49 -- I think you have an error in your example

Difference two rows in a single SQL SELECT statement

I have a database table that has a structure like the one shown below:
CREATE TABLE dated_records (
recdate DATE NOT NULL
col1 DOUBLE NOT NULL,
col2 DOUBLE NOT NULL,
col3 DOUBLE NOT NULL,
col4 DOUBLE NOT NULL,
col5 DOUBLE NOT NULL,
col6 DOUBLE NOT NULL,
col7 DOUBLE NOT NULL,
col8 DOUBLE NOT NULL
);
I want to write an SQL statement that will allow me to return a record containing the changes between two supplied dates, for specified columns - e.g. col1, col2 and col3
for example, if I wanted to see how much the value in col1, col2 and col3 has changed during the interval between two dates. A dumb way of doing this would be to select the rows (separately) for each date and then difference the fields outside the db server -
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'";
SQL1 = "SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'";
however, I'm sure there there is a way a smarter way of performing the differencing using pure SQL. I am guessing that it will involve using a self join (and possibly a nested subquery), but I may be over complicating things - I decided it would be better to ask the SQL experts on here to see how they would solve this problem in the most efficient way.
Ideally the SQL should be DB agnostic, but if it needs to be tied to be a particular db, then it would have to be PostgreSQL.
Just select the two rows, join them into one, and subtract the values:
select d1.recdate, d2.recdate,
(d2.col1 - d1.col1) as delta_col1,
(d2.col2 - d1.col2) as delta_col2,
...
from (select *
from dated_records
where recdate = <date1>
) d1 cross join
(select *
from dated_records
where recdate = <date2>
) d2
I think that if what you want to do is get in the result set rows that doesn't intersect the two select queries , you can use the EXCEPT operator :
The EXCEPT operator returns the rows that are in the first result set
but not in the second.
So your two queries will become one single query with the except operator joining them :
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-01-01'
EXCEPT
SELECT col1, col2 col3 FROM dated_records WHERE recdate='2001-02-01'
SELECT
COALESCE
(a.col1 -
(
SELECT b.col1
FROM dated_records b
WHERE b.id = a.id + 1
),
a.col1)
FROM dated_records a
WHERE recdate='2001-01-01';
You could use window functions plus DISTINCT:
SELECT DISTINCT
first_value(recdate) OVER () AS date1
,last_value(recdate) OVER () AS date2
,last_value(col1) OVER () - first_value(col1) OVER () AS delta1
,last_value(col2) OVER () - first_value(col2) OVER () AS delta2
...
FROM dated_records
WHERE recdate IN ('2001-01-01', '2001-01-03')
For any two days. Uses a single index or table scan, so it should be fast.
I did not order the window, but all calculations use the same window, so the values are consistent.
This solution can easily be generalized for calculations between n rows. You may want to use nth_value() from the Postgres arsenal of window functions in this case.
This seemed a quicker way to write this if you are looking for a simple delta.
SELECT first(col1) - last(col1) AS delta_col1
, first(col2) - last(col2) AS delta_col2
FROM dated_records WHERE recdate IN ('2001-02-01', '2001-01-01')
You may not know whether the first row or the second row comes first, but you can always wrap the answer in abs(first(col1)-last(col1))

SQL Sort/Paging question Question

Lets say I have a pivoted sorted dataset like this
ID Col1 Col2
1 a 11
2 b 22
3 c 33
4 d 44
5 e 55
When I make a paging call by returning two records at a time I would get the first two rows.
Lets say I want to return the same data but not pivot the data so my data set looks like
ID Col Val
1 Col1 a
2 Col1 b
3 Col1 c
4 Col1 d
5 Col1 e
1 Col2 11
2 Col2 22
3 Col2 33
4 Col2 44
5 Col2 55
I would like to write an sql statement that would return the same data as in the first example but without pivoting the data first.
Some additional challanges
1) There could be n columns not just two
2) Tt should also support a filter on all the columns. This part I have solved already see below
Filter on pivoted data
WHERE Col1 in ('a', 'b', 'c')
AND Col2 in ('11', '22')
Filter on unpivoted data
WHERE (Col = 'Col1' and Val in ('a', 'b', 'c')) or Col != 'Col1')
AND (Col = 'Col2' and Val in ('11', '22')) or Col != 'Col2')
Both filters return the same results.
The filter part I have figured out already I am stuck on the sorting and paging.
SQL, as a standard, doesn't support such operations. If you want it to handle arbitrarily many columns for your reformatting of the data, then use something like Perl's DBI interface which can tell you the names of the columns for any table. From there you can generate your table create.
To create your second table the insert will take the form:
INSERT INTO newtable (id, col, val)
SELECT id, 'Col1', Col1 from oldtable
UNION
SELECT id, 'Col2', Col2 from oldtable;
Just create an additional UNION SELECT... for each column you want to include.
As for you filter query, you're making it unnecessarily complicated. Your query of:
SELECT * FROM newtable
WHERE (Col = 'Col1' and Val in ('a', 'b', 'c')) or Col != 'Col1')
AND (Col = 'Col2' and Val in ('11', '22')) or Col != 'Col2')
Can be rewritten as
SELECT * from newtable
WHERE ( Col = 'Col1' and Val in ('a','b','c') )
OR ( Col = 'Col2' and Val in ('11','22') )
Each separate ORd clause doesn't interfere with the others.
I also don't understand why people try to work such travesties in SQL. It appears that you're trying to make a reasonable schema into something akin to a key/value store. Which may currently be all the rage with the kids nowadays, but you should really try to learn how to use the full power of SQL with good data modeling.