Lets say I have a pivoted sorted dataset like this
ID Col1 Col2
1 a 11
2 b 22
3 c 33
4 d 44
5 e 55
When I make a paging call by returning two records at a time I would get the first two rows.
Lets say I want to return the same data but not pivot the data so my data set looks like
ID Col Val
1 Col1 a
2 Col1 b
3 Col1 c
4 Col1 d
5 Col1 e
1 Col2 11
2 Col2 22
3 Col2 33
4 Col2 44
5 Col2 55
I would like to write an sql statement that would return the same data as in the first example but without pivoting the data first.
Some additional challanges
1) There could be n columns not just two
2) Tt should also support a filter on all the columns. This part I have solved already see below
Filter on pivoted data
WHERE Col1 in ('a', 'b', 'c')
AND Col2 in ('11', '22')
Filter on unpivoted data
WHERE (Col = 'Col1' and Val in ('a', 'b', 'c')) or Col != 'Col1')
AND (Col = 'Col2' and Val in ('11', '22')) or Col != 'Col2')
Both filters return the same results.
The filter part I have figured out already I am stuck on the sorting and paging.
SQL, as a standard, doesn't support such operations. If you want it to handle arbitrarily many columns for your reformatting of the data, then use something like Perl's DBI interface which can tell you the names of the columns for any table. From there you can generate your table create.
To create your second table the insert will take the form:
INSERT INTO newtable (id, col, val)
SELECT id, 'Col1', Col1 from oldtable
UNION
SELECT id, 'Col2', Col2 from oldtable;
Just create an additional UNION SELECT... for each column you want to include.
As for you filter query, you're making it unnecessarily complicated. Your query of:
SELECT * FROM newtable
WHERE (Col = 'Col1' and Val in ('a', 'b', 'c')) or Col != 'Col1')
AND (Col = 'Col2' and Val in ('11', '22')) or Col != 'Col2')
Can be rewritten as
SELECT * from newtable
WHERE ( Col = 'Col1' and Val in ('a','b','c') )
OR ( Col = 'Col2' and Val in ('11','22') )
Each separate ORd clause doesn't interfere with the others.
I also don't understand why people try to work such travesties in SQL. It appears that you're trying to make a reasonable schema into something akin to a key/value store. Which may currently be all the rage with the kids nowadays, but you should really try to learn how to use the full power of SQL with good data modeling.
Related
I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);
Isn't both below SQL the same? I mean functionality wise should do the same thing?
I was expecting this first SQL should have got result as well.
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 +COL2 FROM #TEST_1)
CREATE TABLE #TEST
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST VALUES ('123', '321', 'ABC')
INSERT INTO #TEST VALUES ('123', '436', 'ABC')
CREATE TABLE #TEST_1
(
COL1 VARCHAR(10),
COL2 VARCHAR(10),
COL3 VARCHAR(10)
)
INSERT INTO #TEST_1 VALUES ( '123','532','ABC')
INSERT INTO #TEST_1 VALUES ( '123','436','ABC')
--No result
SELECT *
FROM #TEST
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--1 record
SELECT *
FROM #TEST
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
Let's put this into a bit more context and look at your 2 WHERE clauses, which I'm going to call "WHERE 1" and "WHERE 2" respectively:
--WHERE 1
WHERE COL1 NOT IN (SELECT COL1 FROM #TEST_1)
AND COL2 NOT IN (SELECT COL2 FROM #TEST_1)
--WHERE 2
WHERE COL1 + COL2 NOT IN (SELECT COL1 + COL2 FROM #TEST_1)
As you might have noticed, this do not behave the same. In fact, from a logic point of view and the way the database engine would handle them they are completely different.
WHERE 2, to start with is not SARGable. This means that any indexes on your tables would not be able to able to be used and the data engine would have to scan the entire table. For WHERE 1, however, it is SARGable, and if you had any indexes, they could be used to perform seeks, likely helping with performance.
From the point of view of logic let's look at WHERE 2 first. This requires that the concatenated value of COL1 and COL2 not match the other concatenated value of COL1 and COL2; which means these values must be on the same row. So '123456' would match only when Col1 has the value '123' and Col2 the value '456'.
For WHERE 1, however, here the value of Col1 needs to be not found in the other table, and Col2 needs to be not found as well, but they can be on different rows. This is where things differ. As '123' in Col1 appears in both tables (and is the only value) then the NOT IN isn't fulfilled and no rows are returned.
In you wanted a SARGable version of WHERE 2, I would suggest using an EXISTS:
--1 row
SELECT T.COL1, --Don't use *, specify your columns
T.COL2, --Qualifying your columns is important!
T.COL3
FROM #TEST T --Aliasing is important!
WHERE NOT EXISTS (SELECT 1
FROM #TEST_1 T1
WHERE T1.COL1 = T.COL1
AND T1.COL2 = T.COL2);
db<>fiddle
When you add strings in this way (using + instead of concatenation) it adds the two strings and gives you numeric value.
At the first query you are not adding strings so what you did is:
Select all rows from #Test that values of Col1 and Col2 are not in Test1
And actually, only first argument is cutting everything out, since you got 123 values in both tables in col1.
Second query sums that strings, but not by concatenation.
It actually convert varchars to numbers behind the scene.
So the second query does:
Select all rows from #test where COL1+COL2 (its 444 at first row, and 559 in second row) are not in #Test 1
And if you add rows at #Test1, values are:
For the first row COL1+COL2= 655
For the second row COL1+COL2= 559
So only the row with the sum of 444 is not at #Test1, thats why you get 1 row as result.
To sum up:
Thats why you see only 1 row at the second query, and you don't see any records at your first query. At the first query only first condition actually works and cuts everything. And at the second query SQL engine is converting varchars to numerics.
So '123' +'321' is not '123321' but '444'.
I have two tables
Column_1 from SRC table will define to which columns
in the target table the SRC data values should get inserted into.
SRC TARGET
Col1 Col2 Col3 Col4 Tcol1 Tcol2 Tcol3 Tcol4
Test1 A B C Test1 A B C
Test2 X Y Z Test2 Z X Y
Test3 L M N Test3 M L N
Test3 L M N Test3 M L N
Test2 D E F Test2 F D E
I want to insert the data like the way how I shown above, depends on the col_1 in src table, target columns should get mapped .
Insert into TARGET(Tcol1,Tcol2,Tcol3)
select Col1 , Col2, Col3
from src;
but here I dont how to handle this situation like target table is fixed .
for the first scenario first row from the src table will map as is as shown in the above sql but when it comes to 2nd row here I have to insert the values of first column to 2nd column of target table and in the same way 3rd row also.
Im writing one procedure but it will only work for fixed target and fixed source tables but how could I write sql script in this scenario.
Thansk in advance.
Insert into TARGET(Tcol1,Tcol1,Tcol1)
select Tcol1,Tcol2,Tcol3
from
select Col1 as Tcol1,
Col2 as Tcol2,
Col3 as Tcol3
from src;
Is there any way that I can take one function and map the values based on column1 in SRC table.
Not an answer, but trying to understand the need.
This exactly gives your expected result, with hardcoded values based on the value of Col1:
with src (Col1, Col2, Col3, Col4) as (
select 'Test1', 'A', 'B', 'C' from dual union all
select 'Test2', 'X', 'Y', 'Z' from dual union all
select 'Test3', 'L', 'M', 'N' from dual union all
select 'Test3', 'L', 'M', 'N' from dual union all
select 'Test2', 'D', 'E', 'F' from dual
)
select Col1 as Tcol1,
case (Col1)
when 'Test1' then Col2
when 'Test2' then Col4
when 'Test3' then Col3
end as Tcol2,
case (Col1)
when 'Test1' then Col3
when 'Test2' then Col2
when 'Test3' then Col2
end as Tcol3,
case (Col1)
when 'Test1' then Col4
when 'Test2' then Col3
when 'Test3' then Col4
end as Tcol4
from src
TCOL1 TCOL2 TCOL3 TCOL4
----- ----- ----- -----
Test1 A B C
Test2 Z X Y
Test3 M L N
Test3 M L N
Test2 F D E
Is this correct? Does this logic apply to all the rows of your table? How to edit it?
I am trying to display a single column from a data set but spread out across a single row. For example:
[Row1] [Row2] [Row3]
[Row4] [Row5] [Row6]
Instead of:
[Row1]
[Row2]
[Row3] etc.
The data set needs to be joined with another table based on column from an outer table which means, AFAIK, cross tabs are out of the question as you can't use data set parameters with them. There is not a limit to how many rows there will be in a single data set but I want to have 3 row columns per line.
I can modify the data set query however I can only use plain old SQL in those queries except for creating temporary tables or creating anything "new" on the server side - a BIRT-only solution would be more desirable however.
If you can change the query to output
1 1 [Row1]
1 2 [Row2]
1 3 [Row3]
2 1 [Row4]
2 2 [Row5]
2 3 [Row6]
into a temporary table tmp, then you could query that using something like
select col1, col3 from tmp into tmp1 where col2 = 1;
select col1, col3 from tmp into tmp2 where col2 = 2;
select col1, col3 from tmp into tmp3 where col2 = 3;
select tmp1.col3, tmp2.col3, tmp3.col3 from tmp1, tmp2, tmp3 where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
You could generate col1 and col2 using rownum, but it's non-standard, and it requires the output of the original query to be sorted properly.
Edit:
If you can't use a temporary table, I assume you can use subqueries:
select tmp1.col3, tmp2.col3, tmp3.col3 from
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 1) as tmp1,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 2) as tmp2,
(select col1, col3 from (ORIGINAL_QUERY) where col2 = 3) as tmp3
where tmp1.col1 = tmp2.col1 and tmp1.col1 = tmp3.col1;
and hope the optimizer is smart.
Just curious about the IN statement in SQL.
I know I can search multiple columns with one value by doing
'val1' IN (col1,col2)
And can search a column for multiple values
col1 IN ('val1','val2')
But is there a way to do both of these simultaneously, without restorting to an repeating AND / OR in the SQl? I am looking to do this in the most scalable way, so independent of how many vals / cols i need to search in.
So essentially:
('val1','val2') IN (col1,col2)
but valid.
You could do something like this (which I've also put on SQLFiddle):
-- Test data:
WITH t(col1, col2) AS (
SELECT 'val1', 'valX' UNION ALL
SELECT 'valY', 'valZ'
)
-- Solution:
SELECT *
FROM t
WHERE EXISTS (
SELECT 1
-- Join all columns with all values to see if any column matches any value
FROM (VALUES(t.col1),(t.col2)) t1(col)
JOIN (VALUES('val1'),('val2')) t2(val)
ON col = val
)
Of course, one could argue, which version is more concise.
Yes, for example you can do this in Oracle:
select x, y from (select 1 as x, 2 as y from dual)
where (x,y) in (select 1 as p, 2 as q from dual)