Combining rows to create two columns of data - sql

I'm a bit confused on how to do this query properly. I have a table that looks like this. Where district 0 represent a value that should be applied to all district (global).
[ district ] [ code ] [ value ]
1 A 11
1 C 12
2 A 13
2 B 14
0 B 15
I have built a query (below) to combine the "global value" on each district.
[ district ] [ code ] [ district value ] [ global value ]
1 A 11 null -> row 1
1 B null 15 -> row 2
1 C 12 null -> row 3
2 A 13 null -> row 4
2 B 14 15 -> row 5
2 C null null -> row 6 (optional)
I did it by joining on the list of all possible district/code.
select all_code.district, all_code.code, table_d.value, table_g.value
from (select distinct b.district, a.code
from temp_table a
inner join (select distinct district
from temp_table
where district <> 0) b
on 1 = 1) all_code
left join temp_table table_d
on table_d.code = all_code.code
and table_d.district = all_code.district
left join temp_table table_g
on table_g.code = all_code.code
and table_g.district = 0
This query works great but seems pretty ugly. Is there a better way of doing this? (note that I don't care if row #6 is there or not).
Here's a script if needed.
create table temp_table
(
district VARCHAR2(5) not null,
code VARCHAR2(5) not null,
value VARCHAR2(5) not null
);
insert into temp_table (district, code, value)
values ('1', 'A', '11');
insert into temp_table (district, code, value)
values ('1', 'C', '12');
insert into temp_table (district, code, value)
values ('2', 'A', '13');
insert into temp_table (district, code, value)
values ('2', 'B', '14');
insert into temp_table (district, code, value)
values ('0', 'B', '15');

Here is one of the options. Since you are on 10g you can make use of partition outer join(partition by() clause) to fill the gaps:
with DCodes(code) as(
select 'A' from dual union all
select 'B' from dual union all
select 'C' from dual
),
DGlobal(code, value1) as(
select code
, value
from temp_table
where district = 0
)
select tt.district
, dc.code
, tt.value
, dg.value1 as global_value
from temp_table tt
partition by(tt.district)
right join DCodes dc
on (dc.code = tt.code)
left join DGlobal dg
on (dg.code = dc.code)
where tt.district != 0
order by 1, 2
Result:
DISTRICT CODE VALUE GLOBAL_VALUE
-------- ---- ----- ------------
1 A 11
1 B 15
1 C 12
2 A 13
2 B 14 15
2 C

I would argue that a lot of the "ugliness" comes from a lack of lookup tables for district and code. Without an authoritative source for those, you have to fabricate one from the values that are in use (hence the sub-queries with distinct).
In terms of cleaning up the query you have, the best I can come up with is to remove an unnecessary sub-query and use the proper syntax for the cross join:
SELECT a.district,
b.code,
c.value1,
d.value1
FROM (SELECT DISTINCT district FROM temp_table WHERE district <> 0) a
CROSS JOIN (SELECT DISTINCT code FROM temp_table) b
LEFT JOIN temp_table c
ON b.code = c.code AND a.district = c.district
LEFT JOIN temp_table d
ON b.code = d.code AND d.district = 0
ORDER BY district, code

Related

Find matching first 7 chars to identify duplicates

I'm trying to identify duplicate state_num that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates
I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R' to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.
My Query:
SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
FROM state_form sf2
WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
GROUP BY cont_ref, state_num
HAVING COUNT(state_num) > 1)
AND state = 'MT';
This is what I'd like my results to return:
id_ref
cont_ref
formatted
state_num
type
state
658311
5237
71-75011R
7175011R
Y
MT
1459
5237
71-75011
7175011
I
MT
7501
555678
99-67894
9967894
I
MT
345443
555678
99-67894R
9967894R
Y
MT
Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.
create table my_data (
id_ref integer,
cont_ref integer,
state_num varchar(20),
type varchar(5),
state varchar(5)
);
insert into my_data values
(1, 5237, '7175011R', 'Y', 'MT'),
(2, 5237, '7175011', 'I', 'MT'),
(3, 6789, '7878787', 'Y', 'CA'),
(4, 6789, '7878787R', 'I', 'CA'),
(5, 555678, '9967894', 'I', 'MT'),
(6, 555678, '9967894R', 'Y', 'MT'),
(7, 98765, '123456', 'I', 'MT');
Query #1
with dupes as (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
)
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;
Query #2
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
where m.cont_ref in (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
);
id_ref
cont_ref
state_num
type
state
1
5237
7175011R
Y
MT
2
5237
7175011
I
MT
5
555678
9967894
I
MT
6
555678
9967894R
Y
MT
View on DB Fiddle
UPDATE
If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.
with dupes as (
select cont_ref
from (
select cont_ref, left(state_num, 7) as left_seven
from my_data
where state = 'MT'
)z
group by cont_ref
having count(*) > 1
)
select m.*
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;

Value need from range, based on Priority in SQL ,

There is 3 cases: (In all cases value needs to be picked up based on priority)
case 1 : zip exist between range
case 2: zip does not exist between range
case 3 : overlap range
Table
Temp1
state
zip_start
zip_end
Priority
Value
NY
100
200
1
A
NY
150
250
3
c
NY
null
null
2
B
Data
state
zip
NY
201
NY
400
OUTPUT :
state
zip_start
zip_end
Priority
Value
zip
NY
null
null
2
B
201
NY
null
null
2
B
400
I am trying with below code , but It's not picking the data based on priority:
SELECT ZIP,ZIP_START,ZIP_END,VALUE,PRIORITY,STATE,IX FROM
(
SELECT TMP1.*,
ROW_NUMBER () OVER (PARTITION BY STATE,ZIP ORDER BY PRIORITY ) IX
FROM
(
WITH CASE_1 AS
( SELECT
temp1.*
,DATA.ZIP
FROM TEMP1
LEFT JOIN
"DATA" ON DATA.STATE = temp1.STATE
WHERE DATA.ZIP BETWEEN TEMP1.ZIP_START AND TEMP1 .ZIP_END
),
CASE_2 AS
(
SELECT
temp1.*
,DATA.ZIP
FROM "DATA"
LEFT JOIN
TEMP1 ON DATA.STATE = temp1.STATE
WHERE (ZIP_START IS NULL OR ZIP_START = '')
AND (ZIP_END IS NULL OR ZIP_END = '')
AND Not EXISTS
(SELECT 1 FROM CASE_1 WHERE CASE_1.zip=DATA.zip
AND CASE_1.STATE=DATA.STATE)
)
SELECT * FROM CASE_1
UNION
SELECT * FROM CASE_2
)TMP1
) TMP2
WHERE TMP2.IX = 1;
From Oracle 12, you can use a LATERAL join and filter when the zip is within range or when one-or-other end of the range is NULL the ORDER BY priority and FETCH the FIRST matched ROW ONLY:
SELECT t.*, d.zip
FROM data d
CROSS JOIN LATERAL (
SELECT *
FROM temp1 t
WHERE d.state = t.state
AND (t.zip_start <= d.zip OR t.zip_start IS NULL)
AND (t.zip_end >= d.zip OR t.zip_end IS NULL)
ORDER BY priority
FETCH FIRST ROW ONLY
) t
In earlier versions, you can join the two tables and then use the ROW_NUMBER analytic function to find the best match:
SELECT state, zip_start, zip_end, priority, value, zip
FROM (
SELECT t.*,
d.zip,
ROW_NUMBER() OVER (PARTITION BY d.ROWID ORDER BY t.priority) AS rn
FROM data d
INNER JOIN temp1 t
ON ( d.state = t.state
AND (t.zip_start <= d.zip OR t.zip_start IS NULL)
AND (t.zip_end >= d.zip OR t.zip_end IS NULL))
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE Temp1 (state, zip_start, zip_end, Priority, Value) AS
SELECT 'NY', 100, 200, 1, 'A' FROM DUAL UNION ALL
SELECT 'NY', 150, 250, 3, 'c' FROM DUAL UNION ALL
SELECT 'NY', null, null, 2, 'B' FROM DUAL;
CREATE TABLE Data (state, zip) AS
SELECT 'NY', 201 FROM DUAL UNION ALL
SELECT 'NY', 400 FROM DUAL;
Both output:
STATE
ZIP_START
ZIP_END
PRIORITY
VALUE
ZIP
NY
null
null
2
B
201
NY
null
null
2
B
400
db<>fiddle here
CREATE TABLE TEMP1 (
STATE VARCHAR(10),
ZIP_START NUMBER,
ZIP_END NUMBER,
PRIORITY NUMBER,
VAL VARCHAR(10));
INSERT INTO TEMP1 VALUES ('NY', 100,200,1,'A');
INSERT INTO TEMP1 VALUES ('NY', 150,250,3,'C');
INSERT INTO TEMP1 VALUES ('NY', null,null,2,'B');
CREATE TABLE DATATABLE (
STATE VARCHAR(10),
ZIP NUMBER
);
INSERT INTO DATATABLE VALUES ('NY', 201);
INSERT INTO DATATABLE VALUES ('NY', 400);
The main idea is to figure out first how many times your condition (zip in range between zip_start and end) is met. This is why we use count_match variable.
Once you get if your data is priority 1,2 or 3, you match your data table with the temp table to get the value associated with that priority.
SELECT
t0.STATE,
t0.ZIP_START,
t0.ZIP_END,
CASE WHEN t0.COUNT_MATCH > 1 THEN 3
WHEN t0.COUNT_MATCH = 1 THEN 1
WHEN t0.COUNT_MATCH = 0 THEN 2 END AS PRIORITY,
t.VAL,
t0.ZIP
FROM
(
SELECT
t1.STATE,
MIN(t2.ZIP_START) AS ZIP_START,
MAX(t2.ZIP_END) AS ZIP_END,
COUNT(t2.STATE) AS COUNT_MATCH,
t1.ZIP
FROM DATATABLE t1
LEFT JOIN TEMP1 t2 ON (t1.STATE = t2.STATE AND t1.ZIP>=t2.ZIP_START AND t1.ZIP <= t2.ZIP_END)
GROUP BY
t1.STATE, t1.ZIP) t0
LEFT JOIN TEMP1 t ON (t0.STATE = t.STATE AND CASE WHEN t0.COUNT_MATCH > 1 THEN 3
WHEN t0.COUNT_MATCH = 1 THEN 1
WHEN t0.COUNT_MATCH = 0 THEN 2 END = t.PRIORITY)
;

Excluding records from table based on rules from another table

I'm using Oracle SQL and I have a product table with diffrent attributes and sales volume for each product and another table with certain exclusion rules for different level of aggregation. Let's look at the example:
Here is our main table with sales data on which we want to perform some calculations:
And the other table contains diffrent rules which are supposed to exclude certain rows from table above:
When there is an "x", this column shouldn't be considered so our rules are:
1. exclude all rows with ATTR_3 = 'no'
2. exlcude all rows with ATTR_1 = 'Europe' and ATTR_2 = 'snacks' and ATTR_3 = 'no'
3. exlcude all rows with ATTR_1 = 'Africa'
And based on that our final output should be like that:
How this could be achived in SQL? I was thinking about join but I have no idea how to handle different levels of aggregation for exclusions.
I think your expected output is wrong. None of the rules excludes the 2nd row (Europe - snacks - yes).
SQL> with
2 -- sample data
3 test (product_id, attr_1, attr_2, attr_3) as
4 (select 81928 , 'Europe', 'beverages', 'yes' from dual union all
5 select 16534 , 'Europe', 'snacks' , 'yes' from dual union all
6 select 56468 , 'USA' , 'snacks' , 'no' from dual union all
7 select 129921, 'Africa', 'drinks' , 'yes' from dual union all
8 select 123021, 'Africa', 'snacks' , 'yes' from dual union all
9 select 165132, 'USA' , 'drinks' , 'yes' from dual
10 ),
11 rules (attr_1, attr_2, attr_3) as
12 (select 'x' , 'x' , 'no' from dual union all
13 select 'Europe', 'snacks', 'no' from dual union all
14 select 'Africa', 'x' , 'x' from dual
15 )
16 -- query you need
17 select t.*
18 from test t
19 where (t.attr_1, t.attr_2, t.attr_3) not in
20 (select
21 decode(r.attr_1, 'x', t.attr_1, r.attr_1),
22 decode(r.attr_2, 'x', t.attr_2, r.attr_2),
23 decode(r.attr_3, 'x', t.attr_3, r.attr_3)
24 from rules r
25 );
PRODUCT_ID ATTR_1 ATTR_2 ATT
---------- ------ --------- ---
81928 Europe beverages yes
16534 Europe snacks yes
165132 USA drinks yes
SQL>
You can use the join using CASE .. WHEN statement as follows:
SELECT P.*
FROM PRODUCT P
JOIN RULESS R ON
(R.ATTR_1 ='X' OR P.ATTR_1 <> R.ATTR_1)
AND (R.ATTR_2 ='X' OR P.ATTR_2 <> R.ATTR_2)
AND (R.ATTR_3 ='X' OR P.ATTR_3 <> R.ATTR_3)
You can use NOT EXISTS
SELECT *
FROM sales s
WHERE NOT EXISTS (
SELECT 0
FROM attributes a
WHERE ( ( a.attr_1 = s.attr_1 AND a.attr_1 IS NOT NULL )
OR a.attr_1 IS NULL )
AND ( ( a.attr_2 = s.attr_2 AND a.attr_2 IS NOT NULL )
OR a.attr_2 IS NULL )
AND ( ( a.attr_3 = s.attr_3 AND a.attr_3 IS NOT NULL )
OR a.attr_3 IS NULL )
)
where I considered the x values within the attributes table as NULL. If you really have x characters, then you can use :
SELECT *
FROM sales s
WHERE NOT EXISTS (
SELECT 0
FROM attributes a
WHERE ( ( NVL(a.attr_1,'x') = s.attr_1 AND NVL(a.attr_1,'x')!='x' )
OR NVL(a.attr_1,'x')='x' )
AND ( ( NVL(a.attr_2,'x') = s.attr_2 AND NVL(a.attr_2,'x')!='x' )
OR NVL(a.attr_2,'x')='x' )
AND ( ( NVL(a.attr_3,'x') = s.attr_3 AND NVL(a.attr_3,'x')!='x' )
OR NVL(a.attr_3,'x')='x' )
)
instead.
Demo
I would do this with three different not exists:
select p.*
from product p
where not exists (select 1
from rules r
where r.attr_1 = p.attr_1 and r.attr_1 <> 'x'
) and
not exists (select 1
from rules r
where r.attr_2 = p.attr_2 and r.attr_2 <> 'x'
) and
not exists (select 1
from rules r
where r.attr_3 = p.attr_3 and r.attr_3 <> 'x'
) ;
In particular, this can take advantage of indexes on (attr_1), (attri_2) and (attr_3) -- something that is quite handy if you have a moderate number of rules.

How can I Pivot a table in DB2? [duplicate]

This question already has answers here:
Pivoting in DB2
(3 answers)
Closed 5 years ago.
I have table A, below, where for each unique id, there are three codes with some value.
ID Code Value
---------------------
11 1 x
11 2 y
11 3 z
12 1 p
12 2 q
12 3 r
13 1 l
13 2 m
13 3 n
I have a second table B with format as below:
Id Code1_Val Code2_Val Code3_Val
Here there is just one row for each unique id. I want to populate this second table B from first table A for each id from the first table.
For the first table A above, the second table B should come out as:
Id Code1_Val Code2_Val Code3_Val
---------------------------------------------
11 x y z
12 p q r
13 l m n
How can I achieve this in a single SQL query?
select Id,
max(case when Code = '1' then Value end) as Code1_Val,
max(case when Code = '2' then Value end) as Code2_Val,
max(case when Code = '3' then Value end) as Code3_Val
from TABLEA
group by Id
SELECT Id,
max(DECODE(Code, 1, Value)) AS Code1_Val,
max(DECODE(Code, 2, Value)) AS Code2_Val,
max(DECODE(Code, 3, Value)) AS Code3_Val
FROM A
group by Id
If your version doesn't have DECODE(), you can also use this:
INSERT INTO B (id, code1_val, code2_val, code3_val)
WITH Ids (id) as (SELECT DISTINCT id
FROM A) -- Only to construct list of ids
SELECT Ids.id, a1.value, a2.value, a3.value
FROM Ids -- or substitute the actual id table
JOIN A a1
ON a1.id = ids.id
AND a1.code = 1
JOIN A a2
ON a2.id = ids.id
AND a2.code = 2
JOIN A a3
ON a3.id = ids.id
AND a3.code = 3
(Works on my V6R1 DB2 instance, and have an SQL Fiddle Example).
Here is a SQLFiddle example
insert into B (ID,Code1_Val,Code2_Val,Code3_Val)
select Id, max(V1),max(V2),max(V3) from
(
select ID,Value V1,'' V2,'' V3 from A where Code=1
union all
select ID,'' V1, Value V2,'' V3 from A where Code=2
union all
select ID,'' V1, '' V2,Value V3 from A where Code=3
) AG
group by ID
Here is the SQL Query:
insert into pivot_insert_table(id,code1_val,code2_val, code3_val)
select * from (select id,code,value from pivot_table)
pivot(max(value) for code in (1,2,3)) order by id ;
WITH Ids (id) as
(
SELECT DISTINCT id FROM A
)
SELECT Ids.id,
(select sub.value from A sub where Ids.id=sub.id and sub.code=1 fetch first rows only) Code1_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=2 fetch first rows only) Code2_Val,
(select sub.value from A sub where Ids.id=sub.id and sub.code=3 fetch first rows only) Code3_Val
FROM Ids
You want to pivot your data. Since DB2 has no pivot function, yo can use Decode (basically a case statement.)
The syntax should be:
SELECT Id,
DECODE(Code, 1, Value) AS Code1_Val,
DECODE(Code, 2, Value) AS Code2_Val,
DECODE(Code, 3, Value) AS Code3_Val
FROM A

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'
In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings
The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.
From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.
Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.