Select column value if column exists in that table else create that column and set it's value to null in BigQuery - dynamic

I want to select total 450 fixed columns from the table which may or may not have all 450 columns always. When it doesn't have all columns then it should create the missing column and set it's value as null.
In Sql there is a function
if exists()
But in bigquery I am unable to use it wisely.
Any suggestion will help a lot

I assume in the following that you have a source table (the one with potentially "missing" columns) and an existing target table (with the desired schema).
In order to get the information of the columns of these tables, you just need to look into the INFORMATION_SCHEMA.COLUMNS table.
The solution below uses dynamic SQL, to 1) generate the desired SQL, 2) run it.
DECLARE column_selection STRING;
SET column_selection = (
WITH column_table AS (
SELECT
source.column_name AS source_colum,
tgt.column_name AS target_column
FROM
(SELECT
column_name
FROM `<yourproject>.<target_dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='<target_table>') tgt
LEFT JOIN
(SELECT column_name
FROM `<yourproject>.<source_dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='<source_table>') source
ON source.column_name = tgt.column_name
)
SELECT STRING_AGG(coalesce(source_column,
CONCAT("NULL AS `",target_column, "`")), ", \n") AS col_selection
FROM
column_table
)
EXECUTE IMMEDIATE
FORMAT("SELECT %s FROM `<yourproject>.<source_dataset>.<source_table>`", column_selection) ;
Explanation of the steps
Build a column_table for the columns we want to query:
a. first column containing the columns of the target table,
b. second one containing the corresponding source columns if they exist, or NULL if they don't
Once we have this table, we can build the desired SELECT statement: the name of the column is it's in the source table, or if it's NOT present, we want to have in our query " NULL AS `column_name_in_target` "
This is expressed in the
coalesce(source_column, CONCAT("NULL AS ``",target_column, "\``"))
We aggregate all these statement with STRING_AGG into the desired column selection.
Final step: putting together the rest of the query ( "SELECT" + <column_selection_string> + "FROM <your_source_table>" + ...), and we can EXECUTE IMMEDIATE it.

Related

ERROR: more than one row returned by a subquery used as an expression when updating field

I have a table where for the same ID I have different information. Ex.
ID
Activity
1
12
1
15
2
15
3
20
I want to update the field "Activity", joining all different values of activity for each id as a single row.
When i do the select with the following code is what I want:
SELECT string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local;
The result being:
However, when I want to introduce that query in my update query, posgresql (version 13), gives me the following error: ERROR: more than one row returned by a subquery used as an expression.
The code I am using to trying to update the field is:
UPDATE febrero20_2
SET id_epigrafe = (SELECT string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local);
I have tried to create a new table and in that case I can do it correctly with the following code:
CREATE TABLE febrero20_3
AS
SELECT id_local, string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20
GROUP BY id_local
ORDER BY id_local;
Could anyone help me to understand why am I getting that error? I am just new in posgresql and, therefore, I am sorry if it is just some simple error, but I could not find any answer
You have to join the table you update with the table from which you take the values, so that you don't end up with a subquery that has multiple result rows:
UPDATE febrero20_2 AS f_1
SET id_epigrafe = f_2.epigrafe_list
FROM (SELECT id_local,
string_agg(id_epigrafe, ', ') AS epigrafe_list
FROM febrero20_2
GROUP BY id_local) AS f_2
WHERE f_1.id_local = f_2.id_local;
Let me remark that that seems to be a strange statement: you essentially destroy information. Wouldn't it be better to perform such an aggregation when you query from the table?

Required to create an empty table by joining multiple tables in Oracle DB

I got an error while creating an empty table by joining two tables.
I have two tables tab1 & tab2 and there are many common columns names in both tables.
I tried this:
create table tab3 as
select * from tab1 a, tab2 b where a.id = b.id and 1=2;
This gave ORA-00957: duplicate column name. As I mentioned above there are many common columns name between these two tables. If I prepare a create table statement by writing around 500 column names one by one then it will consume lots of time. Please help me out here.
The simple answer is, don't use *. Or is that the whole point, to avoid writing five lines of column names?
One way to avoid these conflicts, but that assumes that you are joining on all columns with the same name in both tables and on no other columns, is to do something like
create table new_table as
select *
from table_a natural join table_b
where null is not null
;
(As you can guess, as an aside, I prefer null is not null to 1 = 2; the parser seems to prefer it too, as it will rewrite 1 = 2 as null is not null anyway.)
Will you need to control the order of the columns in the new table? If you do, you will need to write them out completely in the select clause, regardless of which join syntax you choose to use.
That's an interesting question.
The only idea I have to offer it to let another query to compose the query you need
select 'select ' || listagg(col_name, ', ') within group(order by 1) || 'from tab1 a,tab2 b where (a.id=b.id) and 1=2'
from (select 'a.' || column_name col_name from user_tab_cols where table_name = 'TAB1'
union all
select 'b.' || column_name from user_tab_cols where table_name = 'TAB2')
Please be aware for subqueries you need to specify table names in the upper case

SQL - Finding duplicates based on 3 columns with different data types

SQL noob here, let me know if I'm not wording anything right. I'm trying to find all entries where there is more than one instance of the same data in 3 columns. Below is some sample data from the 3 columns.
formatid type_from call_desc_code
20 002694W0:USAGE V9
20 013030W0:USAGE OM
20 013030W0:USAGE NULL
From what I understand checksum can be used for this but the output from the below query doesn't seem right. The first part of the query that I'm putting into the #temp table has 29824 results which tells me there should be only 29824 unique combinations of the 3 columns but when I've run the full query then tried removing duplicates in Excel based on only those 3 columns to sanity check the results I have a whole lot more then 29824 entries left.
The formatid is a smallint data type so when I've tried just concatenating the cells with + it returns a conversion failed error. I'm running SQL Server 2012 but I don't think the database is on the same as it doesn't recognise the concat function.
select checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code)) as [checksum], count(*) as [Blah]
into #temp
from Table
group by checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code))
having count(*) > 1
select * from
Table
where checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code)) in (select [checksum] from #temp)
drop table #temp
this will get you everything from your source table which has duplicates
select *
from table t
inner join
(select formatid,type_from,call_desc_code
from Table
group by formatid,type_from,call_desc_code
having count(*) > 1) dup
on dup.formatid = t.formatid
and dup.type_from = t.type_from
and dup.call_desc_code = t.call_desc_code

Compare comma separated list with individual row in table

I have to compare comma separated values with a column in the table and find out which values are not in database. [kind of master data validation]. Please have a look at the sample data below:
table data in database:
id name
1 abc
2 def
3 ghi
SQL part :
Here i am getting comma separated list like ('abc','def','ghi','xyz').
now xyz is invalid value, so i want to take that value and return it as output saying "invalid value".
It is possible if i split those value, take it in temp table, loop through each value and compare one by one.
but is there any other optimal way to do this ??
I'm sure if I got the question right, however, I would personally be trying to get to something like this:
SELECT
D.id,
CASE
WHEN B.Name IS NULL THEN D.name
ELSE "invalid value"
END
FROM
data AS D
INNER JOIN badNames B ON b.Name = d.Name
--as SQL is case insensitive, equal sign should work
There is one table with bad names or invalid values if You prefer. This can a temporary table as well - depending on usage (a black-listed words should be a table, ad hoc invalid values provided by a service should be temp table, etc.).
NOTE: The select above can be nested in a view, so the data remain as they were, yet you gain the correctness information. Otherwise I would create a cursor inside a function that would go through the select like the one above and alter the original data, if that is the goal...
It sounds like you just need a NOT EXISTS / LEFT JOIN, as in:
SELECT tmp.InvalidValue
FROM dbo.HopeThisIsNotAWhileBasedSplit(#CSVlist) tmp
WHERE NOT EXISTS (
SELECT *
FROM dbo.Table tbl
WHERE tbl.Field = tmp.InvalidValue
);
Of course, depending on the size of the CSV list coming in, the number of rows in the table you are checking, and the style of splitter you are using, it might be better to dump the CSV to a temp table first (as you mentioned doing in the question).
Try following query:
SELECT SplitedValues.name,
CASE WHEN YourTable.Id IS NULL THEN 'invalid value' ELSE NULL END AS Result
FROM SplitedValues
LEFT JOIN yourTable ON SplitedValues.name = YourTable.name

SQL statement to return data from a table in an other sight

How would the SQL statement look like to return the bottom result from the upper table?
The last letter from the key should be removed. It stands for the language. EXP column should be split into 5 columns with the language prefix and the right value.
I'm weak at writing more or less difficult SQL statements so any help would be appreciated!
The Microsoft Access equivalent of a PIVOT in SQL Server is known as a CROSSTAB. The following query will work for Microsoft Access 2010.
TRANSFORM First(table1.Exp) AS FirstOfEXP
SELECT Left([KEY],Len([KEY])-2) AS [XKEY]
FROM table1
GROUP BY Left([KEY],Len([KEY])-2)
PIVOT Right([KEY],1);
Access will throw a circular field reference error if you try to name the row heading with KEY since that is also the name of the original table field that you are deriving it from. If you do not want XKEY as the field name, then you would need to break apart the above query into two separate queries as shown below:
qsel_table1:
SELECT Left([KEY],Len([KEY])-2) AS XKEY
, Right([KEY],1) AS [Language]
, Table1.Exp
FROM Table1
ORDER BY Left([KEY],Len([KEY])-2), Right([KEY],1);
qsel_table1_Crosstab:
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT qsel_table1.XKEY AS [KEY]
FROM qsel_table1
GROUP BY qsel_table1.XKEY
PIVOT qsel_table1.Language;
In order to always output all language columns regardless of whether there is a value or not, you need to spike of those values into a separate table. That table will then supply the row and column values for the crosstab and the original table will supply the value expression. Using the two query solution above we would instead need to do the following:
table2:
This is a new table with a BASE_KEY TEXT*255 column and a LANG TEXT*1 column. Together these two columns will define the primary key. Populate this table with the following rows:
"AbstractItemNumberReportController.SelectPositionen", "D"
"AbstractItemNumberReportController.SelectPositionen", "E"
"AbstractItemNumberReportController.SelectPositionen", "F"
"AbstractItemNumberReportController.SelectPositionen", "I"
"AbstractItemNumberReportController.SelectPositionen", "X"
qsel_table1:
This query remains unchanged.
qsel_table1_crosstab:
The new table2 is added to this query with an outer join with the original table1. The outer join will allow all rows to be returned from table2 regardless of whether there is a matching row in the table1. Table2 now supplies the values for the row and column headings.
TRANSFORM First(qsel_table1.Exp) AS FirstOfEXP
SELECT Table2.Base_KEY AS [KEY]
FROM Table2 LEFT JOIN qsel_table1 ON (Table2.BASE_KEY = qsel_table1.XKEY)
AND (Table2.LANG = qsel_table1.Language)
GROUP BY Table2.Base_KEY
PIVOT Table2.LANG;
Try something like this:
select *
from
(
select 'abcd' as [key], right([key], 1) as id, expression
from table1
) x
pivot
(
max(expression)
for id in ([D], [E])
) p
Demo Fiddle