How to select the nth column, and order columns' selection in BigQuery - sql

I have this huge table upon which I apply a lot of processing (using CTEs), and I want to perform a UNION ALL on 2 particular CTEs.
SELECT *
, 0 AS orders
, 0 AS revenue
, 0 AS units
FROM secondary_prep_cte WHERE purchase_event_flag IS FALSE
UNION ALL
SELECT *
FROM results_orders_and_revenues_cte
I get a "Column 1164 in UNION ALL has incompatible types : STRING,DATE at [97:5]
Obviously I don't know the name of the column, and I'd like to debug this but I feel like I'm going to waste a lot of time if I can't pin-point which column is 1164.
I also think this is a problem of the order of columns between the CTEs, so I have 2 questions:
How do I identify the 1164th column
How do I order my columns before performing the UNION ALL
I found this similar question but it is for MSSQL, I am using BigQuery

You can get information from INFORMATION_SCHEMA.COLUMNS but you'll need to create a table or view from the CTE:
CREATE OR REPLACE VIEW `project.dataset.secondary_prep_view` as select * from (select 1 as id, "a" as name, "b" as value)
Then:
SELECT * FROM dataset.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'secondary_prep_view';

Related

Want to concatenate column of the second query to the first query but getting errors such as "query block has incorrect number of result columns"

SELECT
ID, PRIM_EMAIL, SEC_EMAIL, PHONE
FROM
STUDENTS.RECORDS
WHERE
ID IN (SELECT ID FROM STUDENTS.INFO WHERE ROLL_NO = '554')
UNION
SELECT NAME
FROM STUDENTS.INFO
WHERE ROLL_NO = '554';
Here Roll_No is a user inserted data so for now I have hard coded it. Basically with the help of ROLL_NO I sort the STUDENTS_INFO table from where I get the ID and based on that I try to get PRIM_EMAIL, SEC_EMAIL, PHONE from the STUDENTS.RECORDS table while matching the foreign keys of both the tables. In addition to the current result set I also want to have the prov_name column.
Any help is very much appreciated. Thank you!
I suspect that you want to put all this information on the same row, which suggests a join rather than union all:
select
r.ID,
r.PRIM_EMAIL,
r.SEC_EMAIL,
r.PHONE,
r.NAME
from STUDENTS.RECORDS r
inner join STUDENTS.INFO i ON i.ID = r.ID
where I.ROLL_NO = '554';
I think the source of your error query block has incorrect number of result columns is coming from trying to union together a table with 4 columns (id, prim_email, sec_email, phone) with 1 column (name).
From your question, I am gathering that you want a single table of id, prim_email, sec_email, phone from students.records and name from students.info.
I think the following query using CTE's might get you (partially) to your final result. You may want to refactor for optimizing performance.
with s_records as ( select * from students.records ),
s_info as ( select * from students.info ),
joined as (
select
s_records.id,
s_records.prim_email,
s_records.sec_email,
s_records.phone,
s_info.name
from s_records
left join s_info
on s_records.roll_no = s_info.roll_no
where roll_np = '554' )
select * from joined
Overall, I think that a join will be part of your solution rather than a union :-)

Using a column in a CTE that does not exist in another table

I am new to writing recursive queries and I am trying to write my first one. I am trying to determine levels of a corporate hierarchy and have the result table include two columns- the consultant ID and the level that has been determined by the recursive query. In the CTE, I can't figure out how to reference "level" since it is a column that doesn't yet exist in a table. What am I doing wrong?
WITH consultantsandlevels
(c."ConsultantDisplayID",
consultantsandlevels."level"
)
AS
(
SELECT c."ConsultantDisplayID",
0
FROM flight_export_consultant AS c
WHERE c."ParentPersonDisplayID" IS NULL
UNION all
SELECT c."ConsultantDisplayID",
c."ParentPersonDisplayID",
consultantsandlevels."level" + 1
FROM flight_export_consultant
JOIN consultantsandlevels ON c."ParentPersonDisplayID" = consultantsandlevels."ConsultantDisplayID"
)
SELECT *
FROM consultantsandlevels;
I think you want:
WITH RECURSIVE consultantsandlevels(ConsultantDisplayID, level) AS (
SELECT "ConsultantDisplayID", 0
FROM flight_export_consultant
WHERE "ParentPersonDisplayID" IS NULL
UNION ALL
SELECT fec."ConsultantDisplayID", cal.level + 1
FROM flight_export_consultant fec
INNER JOIN consultantsandlevels cal
ON fec."ParentPersonDisplayID" = cal.ConsultantDisplayID
)
SELECT * FROM consultantsandlevels;
Rationale:
The WITH clause must start with keyword RECURSIVE
The declaration of the comon-table-expression just enumerates the column names (no table prefix should appear there).
The level is initially set to 0, and you can then increment it at each iteration by refering to corresponding common-table-expression column.
Queries on both sides of UNION ALL must return the same count of columns (that correspond to the declaration of the cte) with corresponding datatypes.

Column Mismatch in UNION queries

I have columns A,B in Table 1 and columns B,C in Table 2.Need to perform Union between them .
For example : select A,B from Table 1 UNION select 0,B from Table 2.
I dont need this zero to solve the column mismatch.Instead is there any other solution?
Am asking the question by providing simple example . But in my case the table structure is very large and the queries are already built.Now I need to fix this union query by replacing this zero.(due to DB2 upgrade)
Can anyone help?
For two legs A and B in a union to be union compatible it is required that:
a) A and B have the same number of columns
b) The types for each column in A is compatible with the corresponding column in B
In your query you can use null that is part of every type:
select a, b from T1
UNION
select null, b from T2
Under certain circumstances, you may have to explicitly cast null to the same type as A has (probably not in this case):
select a, b from Table 1
UNION
select cast(null as ...), b from Table 2
A column returned in a SQL result set can only have one data type.
A union or union all results in rows from the first query and the rows of the second query (in case of union they are deduplicated).
So the first column of the first query needs to match the data type of the first column of the second query.
You can check this by running a describe:
describe select a,b from t1
If you work within a GUI (JDBC Connection) you could also use
call admin_cmd('describe select a,b from t1')
So if the some column does not match you have to explicitly cast the data types.

Average count of elements in a set in hive?

I have two columns id and segment. Segment is comma separated set of strings. I need to find average number of segments in all the table. One way to do it is by using two separate queries -
A - select count(*) from table_name;
B - select count(*) from table_name LATERAL VIEW explode(split(segment, ',') lTable AS singleSegment where segment != ""
avg = B/A
Answer would be 8/4 = 2 in the above case.
Is there a better way to achieve this ?
Try:
select sum(CASE segment
WHEN '' THEN 0
ELSE size(split(segment,','))
END
)*1.0/count(*) from table_name;
If your id field is unique, and you want to add a filter to the segment part, or protect against other malformed segment values like a,b, and a,,b, you could do:
SELECT SUM(seg_size)*1.0/count(*) FROM (
SELECT count(*) as seg_size from table_name
LATERAL VIEW explode(split(segment, ',')) lTable AS singleSegment
WHERE trim(singleSegment) != ""
GROUP BY id
) sizes
Then you can add other stuff into the where clause.
But this query takes two Hive jobs to run, compared to one for the simpler query, and requires the id field to be unique.

SQL Server : compare two tables with UNION and Select * plus additional label column

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?
Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b
You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.
Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1