Sql: Unpivot with dynamic column list - sql

I am looking to do some unpivoting in sql, converting key-values into a list of IDs. The Column from the table is dynamic and will change weekly (mayhaps even daily).
Where there maybe more KEY Columns and values possibilities.
DATABASE.PUBLIC.IMPORT_TABLE
+-----------+--------+--------+--------+
| SOURCE_ID | KEY1 | KEY2 | [KEY3] |
+-----------+--------+--------+--------+
| 0001 | TRUE | FALSE | [TBD] |
| 0002 | TRUE | FALSE | [TBD] |
| 0003 | TRUE | TRUE | [TBD] |
| 0004 | TRUE | TRUE | [TBD] |
| 0005 | FALSE | TRUE | [TBD] |
+-----------+--------+--------+--------+
To manage the possible combinations there is a table which manages the key-value pairs, and gives them an ID. This table is used to manage the Columns I am interested in. Where sometimes if there there are additional Columns added and not in this table it will be excluded.
DATABASE.PUBLIC.KEY_VALUE_TABLE
+-----------+--------+--------+
| KEY | VALUE | KV_ID |
+-----------+--------+--------+
| KEY1 | TRUE | AAA |
| KEY1 | FALSE | BBB |
| KEY2 | TRUE | CCC |
| KEY2 | FALSE | DDD |
| [KEY3] | [TRUE] | [EEE] |
+-----------+--------+--------+
Using the following query. I am able to achieve the unpivot I am looking for. However it does not take into account changes in dynamic changes for KEY_VALUE_TABLE
CREATE OR REPLACE TABLE DATABASE.PUBLIC.FINAL_TABLE AS
SELECT SOURCE_ID
,LISTAGG(DISTINCT b.KV_ID, ',') within group (order by b.KV_ID desc) AS KV_ID
FROM (
SELECT DISTINCT SOURCE_ID, KEY, VALUE from DATABASE.PUBLIC.IMPORT_TABLE
UNPIVOT(VALUE for KEY in (KEY1,KEY2))
) a
LEFT JOIN DATABASE.PUBLIC.KEY_VALUE_TABLE b
ON (a.KEY=b.KEY AND a.VALUE=b.VALUE)
GROUP BY 1,2
Where the results look like the following
DATABASE.PUBLIC.FINAL_TABLE
+-----------+--------+--------+
| SOURCE_ID | KV_ID |
+-----------+--------+--------+
| 0001 | AAA,DDD |
| 0002 | AAA,DDD |
| 0003 | AAA,CCC |
| 0004 | AAA,CCC |
| 0005 | BBB,CCC |
+-----------+--------+--------+
Is there a way to have the UNPIVOT(VALUE for KEY in (KEY1,KEY2))
to something like UNPIVOT(VALUE for KEY in (SELECT DISTINCT KEY FROM KEY_VALUE_TABLE)). Note I already tried the latter and it does not work.
SQL has an issue using SELECT within the UNPIVOT
I have also tried using UNPIVOT(VALUE for KEY in ($KEY_LIST))
Where my variable is SET KEY_LIST = '[KEY1,KEY2]'
=> SET KEY_LIST = (SELECT listagg(KEY, ',') as my_strings FROM DATABASE.PUBLIC.KEY_VALUE_TABLE)
SQL has a limit on the size of variables.
Any Advice would be appreciated.
My next step is to create a Stored Procedure or a Function. Which takes in external variables. via javascript or something similar.

Related

How to use wm_concat one a column that already exists in the query?

So... I am currently using Oracle 11.1g and I need to create a query that uses the ID and CusCODE from Table_with_value and checks Table_with_status using the ID to find active CO_status but on different CusCODE.
This is what I have so far - obviously does not work as it should unless CusCODE and ID are provided manually:
SELECT wm_concat(CoID) as active_CO_Status_for_same_ID_but_different_CusCODE
FROM Table_with_status
WHERE
CoID IN (SELECT CoID FROM Table_with_status WHERE ID = Table_with_value.ID AND CusCODE != Table_with_value.CusCODE)) AND Co_status = 'active';
Table_with_value:
|CoID | CusCODE | ID | Value |
|--------|---------|----------|----|
|354223 | 1.432 | 0784296L | 99 |
|321232 | 4.212321.22 | 0432296L | 32 |
|938421 | 3.213 | 0021321L | 93 |
Table_with_status:
|CoID | CusCODE | ID | Co_status|
|--------|--------------|----------|--------|
|354223 | 1.432 | 0784296L | active|
|354232 | 1.432 | 0784296L | inactive |
|666698 | 1.47621 | 0784296L | active |
|666700 | 1.5217 | 0784296L | active |
|938421 | 3.213 | 0021321L | active |
|938422 | 3.213 | 0021321L | active |
|938423 | 3.213 | 0021321L | active |
|321232 | 4.212321.22 | 0432296L | active |
|321232 | 4.212321.22 | 0432296L | active |
|321232 | 1.689 | 0432296L | inactive |
Expected output:
|CoID | active_CO_Status_for_same_ID_but_different_CusCODE | ID | Value |
|--------|---------|----------|----|
|354223 | 666698,666700 | 1.432 | 0784296L | 99 |
|321232 | N/A | 4.212321.22 | 0432296L | 32 |
|938421 | N/A | 3.213 | 0021321L | 93 |
Any idea on how this can be implemented ideally without any PL/SQL for loops, but it should be fine as well since the output dataset is expected < 300 IDs.
I apologize in advance for the cryptic nature in which I structured the question :) Let me know if something is not clear.
From your description and expected output, it looks like you need a left outer join, something like:
SELECT v.CoID,
wm_concat(s.CoID) as other_active_CusCODE -- active_CO_Status_for_same_ID_but_different_CusCODE
v.CusCODE,
v.ID,
v.value
FROM Table_with_value v
LEFT JOIN Table_with_status s
ON s.ID = v.ID
AND s.CusCODE != v.CusCODE
AND s.Co_status = 'active'
GROUP BY v.CoID, v.CusCODE, v.ID, v.value;
SQL Fiddle using listagg() instead of the never-supported and now-removed wm_concat(); with a couple of different approaches if the logic isn't quite what I interpreted. With your sample data they all get:
COID OTHER_ACTIVE_CUSCODE CUSCODE ID VALUE
------ -------------------- ----------- -------- -----
321232 (null) 4.212321.22 0432296L 32
354223 666698,666700 1.432 0784296L 99
938421 (null) 3.213 0021321L 93
Your code looks like it should work, assuming you are referring to the correct tables:
SELECT wm_concat(s.CoID) as active_CO_Status_for_same_ID_but_different_CusCODE
FROM Table_with_status s
WHERE s.CoID IN (SELECT v.CoID
FROM Table_with_value v
WHERE v.ID = s.ID AND
v.CusCODE <> s.CusCODE
) AND
s.Co_status = 'active';

UPDATE 2 columns using MERGE having source conditions

SQL SERVER 2014
I need to update two columns in TargetTable with values from SourceTable
SourceTbl
PersonNr | Block | BlockReason |
---------|----------|---------------|
000001 | 1 | abuse |
000001 | 1 | age |
000001 | 0 | memo |
000002 | 1 | age |
000002 | 0 | |
000003 | 0 | |
000003 | 0 | |
000004 | 1 | behaviour |
000005 | 0 | |
TargetTable
PersonNr | Block | BlockReason |
---------|----------|---------------|
000001 | 0 | |
000001 | 0 | |
000002 | 0 | |
000002 | 0 | |
000004 | 1 | |
000005 | 0 | |
Result needed:
PersonNr | Block | BlockReason |
---------|----------|---------------|
000001 | 1 | abuse |
000001 | 1 | abuse |
000002 | 1 | age |
000002 | 1 | age |
000004 | 1 | behaviour |
000005 | 0 | |
It is not relevant which BlockReason Person 1 gets,
as far as it's one from a row where Block = '1'.
I've tried this pretty straight-forward update :
UPDATE
src
SET
src.Block = '1',
src.BlockReason = targ.BlockReason
FROM
SourceTbl src
INNER JOIN
TargetTable targ
ON
src.PersonNr= targ.PersonNr
WHERE src.Block = '1'
But ended up with faulty result-rows where Block and Reason are updated separately :
PersonNr | Block | BlockReason |
---------|----------|---------------|
000001 | 1 | memo |
Next I've tried :
MERGE INTO TargetTable AS TGT
USING
(
SELECT Block, BlockReason, PersonNr
FROM SourceTbl
GROUP BY Block, BlockReason, PersonNr
) AS SRC
ON
SRC.PersonNr= TGT.PersonNr AND
SRC.Block= '1'
WHEN MATCHED THEN
UPDATE SET TGT.Block= SRC.Block, TGT.BlockReason= SRC.BlockReason;
Got the error
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
Any help? Hugely appreciated! Truly. Totally.
The problem with your query is that it gives duplicate values and it's trying to update the same record more than once.And the GROUP BY in the subquery doesn't make any sense since you are not using any aggregate function.
Let's take an id(say 1) and check what's going wrong with your query.
src.PersonNr | src.Block | src.BlockReason | tgt.PersonNr | tgt.Block | tgt.BlockReason |
-------------|--------------|-------------------|--------------
000001 | 1 | abuse | 000001 | 0 | |
000001 | 1 | age | 000001 | 0 | |
000001 | 1 | abuse | 000001 | 0 | |
000001 | 1 | age | 000001 | 0 | |
Your query will give you the above result and try to update targettable 2 times for each record once with abuse and next with age.
You can try the below query:
MERGE INTO TargetTable AS TGT
USING
(
SELECT Block, BlockReason, PersonNr
FROM(
SELECT Block, BlockReason, PersonNr,ROW_NUMBER() OVER (PARTITION BY PersonNr ORDER BY [YourPrimaryKey]) RN
FROM SourceTbl ) X
WHERE X.RN=1
) AS SRC
ON
SRC.PersonNr= TGT.PersonNr AND
SRC.Block= '1'
WHEN MATCHED THEN
UPDATE SET TGT.Block= SRC.Block, TGT.BlockReason= SRC.BlockReason;
You have duplicates in your data. Add another (or more than one) column to the ON clause of the MERGE that will help identify exactly one record or find a way to remove duplicates before merging.
The UPDATE should be something like this:
UPDATE
targ
SET
Block = '1',
BlockReason = src.BlockReason
FROM
SourceTbl src
INNER JOIN
TargetTable targ
ON
src.PersonNr= targ.PersonNr
WHERE src.Block = '1'
Since we're only using rows from SourceTbl where Block is 1, it should not be possible for a row affected by this update to end up with a reason which had a Block of 0.
There is still the general issue that this is non-deterministic in cases where multiple rows from SourceTbl are joined to one row in TargetTbl, but since you've indicated that determinism isn't required here, it shouldn't result in a problem.

SQL for calculated column that chooses from value in own row

I have a table in which several indentifiers of a person may be stored. In this table I would like to create a single calculated identifier column that stores the best identifier for that record depending on what identifiers are available.
For example (some fictional sample data) ....
Table = "Citizens"
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | ?
2 | Jones | JG892435262 | NULL | NULL | ?
3 | Trask | TSK73948379 | NULL | 9276542119 | ?
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | ?
I know the order in which I would like choose identifiers ...
Drivers-License-No
Social-Security-No
State-Id-No
So I would like the calculated identifier column to be part of the table schema. The desired results would be ...
Id | LastName | DL-No | SS-No | State-Id-No | Calculated
------------------------------------------------------------------------
1 | Smith | NULL | 374-784-8888 | 7383204848 | 374-784-8888
2 | Jones | JG892435262 | NULL | 4537409273 | JG892435262
3 | Trask | NULL | NULL | 9276542119 | 9276542119
4 | Clinton | CL231429888 | 543-123-5555 | 1840430324 | CL231429888
IS this possible? If so what SQL would I use to calculate what goes in the "Calculated" column?
I was thinking of something like ..
SELECT
CASE
WHEN ([DL-No] is NOT NULL) THEN [DL-No]
WHEN ([SS-No] is NOT NULL) THEN [SS-No]
WHEN ([State-Id-No] is NOT NULL) THEN [State-Id-No]
AS "Calculated"
END
FROM Citizens
The easiest solution is to use coalesce():
select c.*,
coalesce([DL-No], [SS-No], [State-ID-No]) as calculated
from citizens c
However, I think your case statement will also work, if you fix the syntax to use when rather than where.

Joining two tables and show data from one if there is any

I have these two tables that i need to join
fields_data fields
+------------+-----------+------+ +------+-------------+----------+
| relationid | fieldname | data | | name | displayname | position |
+------------+-----------+------+ +------+-------------+----------+
| 2 | ftp | test | | user | Username | top |
| 2 | other | 1234 | | pass | Password | top |
+------------+-----------+------+ | ftp | FTP | top |
| log | Log | top |
| txt | Text | mid |
+------+-------------+----------+
I want to get all the rows from the "fields" table if they have the position "top" AND if a row has a match on name = fieldname from fields_data it should also show the data. This is my join
SELECT
fd.`data`,
fd.`relationid`,
fd.`fieldname`,
f.`name`,
f.`displayname`
FROM `fields` AS f
LEFT OUTER JOIN `fields_data` AS fd
ON fd.`fieldname` = f.`name`
WHERE f.`position`='top' AND (fd.`relationid`='3' OR fd.`relationid` IS NULL)
My problem is that the above query only gives me this result:
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
The field called "ftp" is missing due to it having a relation to "2".. However i still want to display it as result but like the others with NULL in it. And if the SQL query had "fd.relationid='2'" instead of 3 it would give same result, but with the row containing ftp in name, holding data in the three fields.
I hope you get what i mean.. My english is not the best.. Heres the result i want:
with above query containing fd.`relationid`='3'
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| NULL | NULL | NULL | ftp | FTP |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
with above query containing fd.`relationid`='2'
+------+------------+-----------+------+-------------+
| data | relationid | fieldname | name | displayname |
+------+------------+-----------+------+-------------+
| NULL | NULL | NULL | user | Username |
| NULL | NULL | NULL | pass | Password |
| test | 2 | ftp | ftp | FTP |
| NULL | NULL | NULL | log | Log |
+------+------------+-----------+------+-------------+
You want to move the condition to the on clause:
SELECT fd.`data`, fd.`relationid`, fd.`fieldname`, f.`name`, f.`displayname`
FROM `fields` f LEFT OUTER JOIN
`fields_data` fd
ON fd.`fieldname` = f.`name` AND fd.`relationid` = '3'
WHERE f.`position`='top' ;
It is interesting that the semantics of your query and this query are different -- and you found the exact situation: when there is a match on another value, the where clause form filters out the row. This will still keep everything.
As a note, the following also does what you want:
SELECT fd.`data`, fd.`relationid`, fd.`fieldname`, f.`name`, f.`displayname`
FROM `fields` f LEFT OUTER JOIN
(SELECT fd.*
FROM `fields_data` fd
WHERE fd.`relationid` = '3'
) fd
ON fd.`fieldname` = f.`name`
WHERE f.`position` = 'top' ;
I wouldn't recommend writing the query this way, particularly in MySQL (because the subquery is materialized). However, understanding why your version is different from these versions (and why these are the same) is a big step forward in mastering outer joins.

Eliminate full table scan due to BETWEEN (and GROUP BY)

Description
According to the explain command, there is a range that is causing a query to perform a full table scan (160k rows). How do I keep the range condition and reduce the scanning? I expect the culprit to be:
Y.YEAR BETWEEN 1900 AND 2009 AND
Code
Here is the code that has the range condition (the STATION_DISTRICT is likely superfluous).
SELECT
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y FORCE INDEX(YEAR_IDX),
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
C.ID = 10663 AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND
-- Get the station district identification for the matching station.
--
S.STATION_DISTRICT_ID = SD.ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = '003' AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
Update
The SQL is performing a full table scan, which results in MySQL performing a "copy to tmp table", as shown here:
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
Answer
After using the STRAIGHT_JOIN:
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
| 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | Using where |
| 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | Using index |
| 1 | SIMPLE | Y | ref | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | Using where |
| 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where |
| 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
Related
http://dev.mysql.com/doc/refman/5.0/en/how-to-avoid-table-scan.html
http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
Optimize SQL that uses between clause
Thank you!
ONE Request... It looks like you KNOW your data. Add the keyword "STRAIGHT_JOIN" and see the results...
SELECT STRAIGHT_JOIN ... the rest of your query...
Straight-join tells MySql to DO IT AS I HAVE LISTED. So, your CITY table is the first in the FROM list, thus indicating you expect that to be your primary... Additionally, your WHERE clause of the CITY is the immediate filter. With that being said, it will probably fly through the rest of the query...
Hope it helps... Its worked for me with gov't data of millions of records queried and joined to 10+ lookup tables where mySql was trying to think for me.
in order to do efficient between queries you are going to want a b tree index on your YEAR column. for example:
CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);
BTREE indexes allow for efficient range queries, if this is in fact the root problem then having an index like this should get rid of the full table scan and have it only scan the part of the table that is in the range. read more about btrees on wikipedia
However, as with any optimisation advice, you should measure to make sure that you don't do more harm than good.
Can you change from searching within a radius to search in a bounding box?
You know the city so you can calculate a bounding box in your application.
Perhaps this
S.LATITUDE_DECIMAL >= latitude_lower and
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and
S.LONGITUDE_DECIMAL <= longitude_upper
could be a little faster?