How to handle duplicate columns in multiple WITH clause SQL - sql

I am trying to create a query with multiple WITH clause in bigquery. I am getting an error : Duplicate column names in the result are not supported. Found duplicate(s): because I have some repeated columns in the tables.
Problem is I can't remove them as I need it to display in my table also they are needed in the group by clause in the tables.
My code somewhat looks like this:
WITH table0 as (## query0),
table1 AS (## query1),
table2 as (## query2),
table3 as (## query3),
table4 as (## query4),
table5 as (## query 5)
select
*
from
table0,
table1,
table2,
table3,
table4,
table5
How do I handle duplicate columns in multiple WITH clause in SQL

Why are you creating a Cartesian product of the subqueries?
In any case, BigQuery gives you more control over the columns than other databases. So, if col1 is common to table0 and table1, you can do:
select t1.*, t2.* except (col1)
If you want to keep both values:
select t1.*, t2.* except (col1), t2.col1 as t2_col1
or
select t1.* except (col1),
t2.* except (col1),
t1.col1 as t1_col1,
t2.col1 as t2_col1

From your previous question (that looks like was deleted) I think I remember your use case and there you had (I can be wrong ) only fields that are used for JOINs are "duplicate"
In such cases you can use below approach (in below example it is assumed that those duplicate fields are id and day)
#standardSQL
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
for example in below super-simplified dummy example
#standardSQL
WITH `project.dataset.table0` AS (
SELECT 1 id, '2019-01-01' day, 0 col0
), `project.dataset.table1` AS (
SELECT 1 id, '2019-01-01' day, 1 col1
), `project.dataset.table2` AS (
SELECT 1 id, '2019-01-01' day, 2 col2
), `project.dataset.table3` AS (
SELECT 1 id, '2019-01-01' day, 3 col3
)
SELECT *
FROM `project.dataset.table0`
JOIN `project.dataset.table1` USING(id, day)
JOIN `project.dataset.table2` USING(id, day)
JOIN `project.dataset.table3` USING(id, day)
result will be
Row id day col0 col1 col2 col3
1 1 2019-01-01 0 1 2 3
w/o any complains about duplicate fields
As you can see from above example - using USING() instead of ON "magicaly" resolves the issue - but again - note - only for case when "duplicate" fields are all JOIN fields

Related

Create a view of a table with a column that has multiple values

I have a table (Table1) like the following:
Col1
Col2
First
Code1,Code2,Code3
Second
Code2
So Col2 can contain multiple values comma separated, I have another table (Table2) that contains this:
ColA
ColB
Code1
Value1
Code2
Vaue2
Code3
Vaue3
I need to create a view that joins the two tables (Table1 and Table2) and returns something like this:
Col1
Col2
First
Value1,Value2,Value3
Second
Value2
Is that possible? (I'm on Oracle DB if that helps.)
It's a violation of first normal form to have a list in a column value like that. It causes a lot of difficulties in a relational database, like the one you are encountering now.
However, you can get what you want by using the LIKE operator to find colA values that are substrings of the Col2 column. Add delimiters before and after to catch the first and last ones. Then aggregate back up to a single list using LISTAGG.
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM table1,
table2
WHERE ','||table1.col2||',' LIKE '%,'||table2.colA||',%'
GROUP BY table1.col1
This will not perform well on large volumes, because without an equijoin it's going to use nested loops, and you can't use an index on a LIKE predicate with % at the beginning. The combination of nested loops + FTS is not pleasant with large volumes of data. Therefore, if this is your situation, you will need to fix the 1NF problem by transforming table1 into normal relational format, and then join it to table2 with an equijoin, which will enable it to use a hash join instead. So:
SELECT table1.col1,
LISTAGG(table2.colB,',') WITHIN GROUP (ORDER BY table2.colB) value_list
FROM (SELECT t.col1,
SUBSTR(t.col2,INSTR(t.col2,',',1,seq)+1,INSTR(t.col2,',',1,seq+1)-(INSTR(t.col2,',',1,seq)+1)) col2_piece
FROM (SELECT col1,
','||col2||',' col2
FROM table1) t,
(SELECT ROWNUM seq FROM dual CONNECT BY LEVEL < 10) x) table1,
table2
WHERE table1.col2_piece IS NOT NULL
AND table1.col2_piece = table2.colA
GROUP BY table1.col1
If you want the values in the same order in the list as the terms then you can use:
SELECT t1.col1,
LISTAGG(t2.colb, ',') WITHIN GROUP (
ORDER BY INSTR(','||t1.col2||',', ','||t2.colA||',')
) AS value2
FROM table1 t1
INNER JOIN table2 t2
ON INSTR(','||t1.col2||',', ','||t2.colA||',') > 0
GROUP BY
t1.col1
Which, for the sample data:
CREATE TABLE Table1 (Col1, Col2) AS
SELECT 'First', 'Code1,Code2,Code3' FROM DUAL UNION ALL
SELECT 'Second', 'Code2' FROM DUAL;
CREATE TABLE Table2 (ColA, ColB) AS
SELECT 'Code1', 'XXXX' FROM DUAL UNION ALL
SELECT 'Code2', 'ZZZZ' FROM DUAL UNION ALL
SELECT 'Code3', 'YYYY' FROM DUAL;
Outputs:
COL1
VALUE2
First
XXXX,ZZZZ,YYYY
Second
ZZZZ
fiddle

SQL sum, grouping by another table

Consider I have a table t1 with a single column a, and it has values, say
a
-
5
10
15
17
I want to write a single SQL query that does the following. Basically, I want to know the sum of all values up to the value in the table t1.
SELECT SUM(value) FROM t2 WHERE value<=5
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=10
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=15
UNION ALL
SELECT SUM(value) FROM t2 WHERE value<=17;
If someone changes the value in a, like delete or insert more elements, I have to rewrite the above query. Is there a query that always works automatically?
Here is the DB fiddle link.
I think you appears to want :
SELECT SUM(case when bound<=a[1] then value else 0 end),
. . .
FROM table t;
After edit with fiddle, you can use subquery instead of UNION :
select *, (select sum(t2.value) from t2 where t2.value <= t1.a)
from t1;
I think it's more clear with a join than a subquery, but just personal preference.
SELECT
t1.a,
SUM(COALESCE(t2.value, 0))
FROM
t1
LEFT JOIN
t2
ON
t2.value <= t1.a
GROUP BY
t1.a

Oracle - Check duplicates in two columns in same table

I want to find the duplicates of two columns in the same table.
Example data set is as follows.
Column_1 Column_2
**15440100000220** 15440300002980
15440100000150 **15440100000220**
15440100000170 **15440300002160**
**15440300002160** 15440100006170
As you can see, I have duplicates in the two columns. Records in the first column are present in the second column and records in the second are present in the first.
I looked for a solution but only came across examples comparing duplicates of two tables.
Is there a way to get these duplicates into a select query? If a record in column 2 is present in column1, then that record in column 2 should be captured in the query.
Another approach to just list the column_2 values that also appear in column_1 is to use exists:
select column_2
from your_table yt
where exists (
select null
from your_table yt2
where yt2.column_1 = yt.column_2
);
I think the intent of this is clearer, but you should check the performance of the various approaches.
you could use an having on the union subselect
select column_1, count(*) from (
select column_1 as column_1
from my_table
union all
select column_2
from my_table
) t
group by column_1
having count(*) > 1
You could self-join the table:
SELECT t1.column_1 AS col1, t1.column_2 AS col2,
t2.column_1 AS duplicate_col1, t2.column+2 AS duplicate_col2
FROM mytable t1
JOIN mytable t2 ON t1.column_1 = t2.column_2
You just want those duplicated id's? Do a self join:
select distinct t1.column_1
from tablename t1
join tablename t2 on t1.column_1 = t2.column_2

Redundancy in doing sum()

table1 -> id, time_stamp, value
This table consists of 10 id's. Each id would be having a value for each hour in a day.
So for 1 day, there would be 240 records in this table.
table2 -> id
Table2 consists of a dynamically changing subset of id's present in table1.
At a particular instance, the intention is to get sum(value) from table1, considering id's only in table2,
grouping by each hour in that day, giving the summarized values a rank and repeating this each day.
the query is at this stage:
select time_stamp, sum(value),
rank() over (partition by trunc(time_stamp) order by sum(value) desc) rn
from table1
where exists (select t2.id from table2 t2 where id=t2.id)
and
time_stamp >= to_date('05/04/2010 00','dd/mm/yyyy hh24') and
time_stamp <= to_date('25/04/2010 23','dd/mm/yyyy hh24')
group by time_stamp
order by time_stamp asc
If the query is correct, can this be made more efficient, considering that, table1 will actually consist of thousand's of id's instead of 10 ?
EDIT: I am using sum(value) 2 times in the query, which I am not able to get a workaround such that the sum() is done only once. Pls help on this
from table1
where exists (select t2.id from table2 t2 where value=t2.value)
The table2 doesn't have Value field. Why is the above query with t2.Value?
You could use a join here
from table1 t1 join table2 t2 on t1.id = t2.id
EDIT: Its been a while that I worked on Oracle. Pardon me, if my comment on t2.Value doesn't make sense.

SQL create a temporary 'mapping' table in a select statement

I'm building up results by joining tables
select t1.*, t2.col2 from t1, t2 where t1.col1=t2.col1
Is there a way to create a temporary 'mapping' table 'inline' in a select statement for instances where the t2 table doesn't exist?
So something like
select t1.*, tempt2.col2 from t1, (<create temp table>) tempt2 where ...
I'm using Oracle
Table with 1 row:
SELECT t1.*, t2.col2
FROM t1,
(
SELECT 1 AS col2
FROM dual
) t2
Table with 0 rows:
SELECT t1.*, t2.col2
FROM t1,
(
SELECT 1 AS col2
FROM dual
WHERE 1 = 0
) t2
Table with N rows:
SELECT t1.*, t2.col2
FROM t1,
(
SELECT 1 AS col2
FROM dual
CONNECT BY
level <= :N
) t2
I'm not sure this is what you're looking for, but you can do multiple SELECTs with UNIONs to get a derived table
SELECT t1.*, t2.col2
FROM t1, (
SELECT 1 as Id, 'Foo' as Col2
UNION ALL
SELECT 2 as Id, 'Bar' as Col2
UNION ALL
SELECT 3 as Id, 'FooBar' as Col2
) t2
WHERE
t1.Id = t2.Id
I know that this question is very old. But just to complete the answer, what about the DECODE() function?
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions040.htm#i1017437
SELECT product_id,
DECODE (warehouse_id, 1, 'Southlake',
2, 'San Francisco',
3, 'New Jersey',
4, 'Seattle',
'Non domestic')
"Location of inventory" FROM inventories
WHERE product_id < 1775;
As long as the mapping table is reasonably short this seems like a pretty elegant solution.
Don't know whether it works in Oracle, but are you looking for something like the following pseudocode?
select t1.*, tempt2.col2 from t1 inner join (select col2, foo, bar from t2 where bar = ?) tempt2 on t1.foo = tempt2.foo where . . .
I guess that doesn't really solve the problem, since you said that table2 (t2) doesn't really exist. I'm not sure what you'd have in your mapping table or where you'd get that data if not from a table.
I'm not totally sure what you're getting at here.
Either what you want is the WITH clause e.g.
WITH tempt2 AS
(SELECT x FROM y)
SELECT t1.*, tempt2.col2
FROM t1, tempt2
WHERE ...
Or else if you're running the same SQL on different DBs and you can't be sure that the table actually exists, then you would probably be better to test for its presence and react differently.
Does it need to pure SQL or can you use PL/SQL?