Querying Column Headers in GBQ - sql

Is it possible to do a query to provide me an output with the column headers of a specific table? I'm uploading multiple files into our server via GBQ and while it auto-detects the headers, I would like to list out the headers either in rows or as a comma separated cell.
Thank you

I am assuming your files are in CSV format so schema of table does not have repeated fields. With this in mind - below is for BigQuery Standard SQL and requires just fully qualified table name
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `project.dataset.table` WHERE FALSE) t
ON TRUE
If to apply to some real table as in below example
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `bigquery-public-data.utility_us.us_states_area` WHERE FALSE) t
ON TRUE
result will be
Row cols_as_array cols_as_string
1 region_code region_code,division_code,state_fips_code,state_gnis_code,state_geo_id,state_abbreviation,state_name,legal_area_code,feature_class_code,functional_status_code,area_land_meters,area_water_meters,internal_point_lat,internal_point_lon,state_geom
division_code
state_fips_code
state_gnis_code
state_geo_id
state_abbreviation
state_name
legal_area_code
feature_class_code
functional_status_code
area_land_meters
area_water_meters
internal_point_lat
internal_point_lon
state_geom
You can choose which version to use: list as array or list as comma separated string
Also note, above query does not incur any cost at all!

Related

How to group rows overwriting column value in Big Query SQL

I wanted to know how to group the rows maintaining the value form one the columns as following:
tabla_1
url
tags
pvs
www.helloworld.com
bigquery,sql
200
www.helloworld.com
-
100
www.byeworld.com
python,java
250
www.byeworld.com
-
150
and the desired result:
url
tags
pvs
www.helloworld.com
bigquery,sql
300
www.byeworld.com
python, java
400
I have tried creating two different tables (using filters) and then joining them. I supose there´s a much easier way but I´m not able to find it:
SELECT url, tags, sum(pvs)
FROM
(
SELECT * FROM tabla_1 WHERE tags != '-'
LEFT JOIN
(SELECT * FROM tabla_1 WHERE tags '-')
ON url
)
You can convert the - to NULL and get the non-null value with ARRAY_AGG function
SELECT
url,
SUM(pvs) AS total_pvs,
ARRAY_AGG(IF(tags = '-', NULL, tags) IGNORE NULLS)[OFFSET(0)] AS tags
FROM table_1
GROUP BY url

check if all elements in hive array contain a string pattern

I have two columns in a hive table that look something like this:
code codeset
AB AB123,MU124
LM LM123,LM234
I need to verify that all elements in codeset column contain the value in code column so in the above example the first row would be false and the second row would be true.
Is there a simple way to do this that I am missing? I already read about array_contains but that returns true if just one element matches, I need all elements to contain what's in the code column.
Thanks in advance.
split the string, explode and use lateral view to unpivot the data. Then check using locate if the split codeset contains each code (which is done with group by and having).
select code,codeset
from tbl
lateral view explode(split(codeset,',')) t as split_codeset
group by code,codeset
having sum(cast(locate(code,split_codeset)>0 as int))=count(*)
select a.pattern,b.listOfInputs from ( select * from (select a.pattern, case when inputCount = sumPatternMatchResult then true else false end finalResult from (select pattern , sum(patternMatchResult) as sumPatternMatchResult from (select pattern,case when locate(pattern,input) !=0 then 1 else 0 end patternMatchResult from (select pattern,explode(split(listOfInputs,',')) as input from tbl)a ) b group by pattern) a join (select pattern , count(input) inputCount from (select pattern,explode(split(listOfInputs,',')) as input from tbl)a group by pattern) b on a.pattern=b.pattern )c where finalResult=true)a join (select * from tbl) b on a.pattern=b.pattern
This works too.
column mapping details for your table:
code -> pattern
codeset -> listOfInputs

How to Take two SQL queries of a column of substrings to search database using the appended result of the other

I have two select statements that both contain a column of substrings that derived from a database table. They are substrings derived from a varchar that should be an XML, but were saved as varcars because they could be not well-formed and potentially invalid.
I am trying to take the table that results in the 1st query, a list of 50 Varchars, and search the database using the 2nd query. I could get from 0 to n SQLRelatesMessageID sets from each SQLmessageID if I use each row in the first query and append a string to get the node ("z4480" is an example here).
I have tried a cursor implementation but the performance detered me from finishing it. Join doesn't work if you try giving the substring column with an as alias. What steps should I do to get the overall list of SQLRelatesMessageIDs. My goal is to get all MessageLogId (3 in picture) given a NCPDPID.
I am using SQL Server Manager 2012.
--1--Recieves a list based on a given NCPDPID node Value
select substring(m.message, charindex('<MessageID>', m.message)+11, charindex('</MessageID>', m.message)-charindex('<MessageID>', m.message)-11) as
SQLmessageID from messagelog m where message like '%<NCPDPID>'+'1234567'+'</NCPDPID>%'
--2--Selects messageID from top select and searches RelatesToMessageID node
select substring(r.message, charindex('<RelatesToMessageID>', r.message)+20, charindex('</RelatesToMessageID>', r.message)-charindex('<RelatesToMessageID>', r.message)-20) as SQLRelatesMessageID, * from messagelog r
where message like ('%<RelatesToMessageID>'+'z4480'+'</RelatesToMessageID>%')
This works for this answer.
---main
SELECT * FROM
(
select substring(m.message, charindex('<MessageID>', m.message)+11, charindex('</MessageID>', m.message)-charindex('<MessageID>', m.message)-11) as SQLmessageID from messagelog m
where message like '%<NCPDPID>1234567</NCPDPID>%' and dateTime > '3/01/2016'
) a JOIN
(
select
substring(r.message, charindex('<RelatesToMessageID>', r.message)+20, charindex('</RelatesToMessageID>', r.message)-charindex('<RelatesToMessageID>', r.message)-20) as SQLRelatesMessageID,
message,
messagelogid from messagelog r
where
dateTime > '3/01/2016' AND
message LIKE ('%<RelatesToMessageID>%</RelatesToMessageID>%')
) b ON b.SQLRelatesMessageID = a.SQLmessageID

single query to find difference in values in the three tables in oracle

Am having three similar tables
test_dev
test_qmg
test_prod
All the tables have same columns. i want single query to find difference in values in the three tables.
example:
select * from test_dev
minus
select * from test_qmg
minus
select * from test_prod
column names are same for all three tables. I want to find the difference in values in column.
select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk where visa_type_id=1 select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk_qmg where visa_type_id=1 select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk_prod where visa_type_id=1
here validity_days_before_entry,validity_days_before_entry column will change. i want to find that difference
I believe this is what you are looking for:
SELECT dev.visa_type_id,
(dev.VALIDITY_DAYS_BEFORE_ENTRY - qmg.VALIDITY_DAYS_BEFORE_ENTRY - prod.VALIDITY_DAYS_BEFORE_ENTRY) as difference_before,
(dev.VALIDITY_DAYS_AFTER_ENTRY - qmg.VALIDITY_DAYS_AFTER_ENTRY - prod.VALIDITY_DAYS_AFTER_ENTRY) as difference_after
FROM visa_type_lk dev INNER JOIN visa_type_lk_qmg qmg ON dev.visa_type_id = qmg.visa_type_id
INNER JOIN visa_type_lk_prod prod ON qmg.visa_type_id = prod.visa_type_id
WHERE dev.visa_type_id =1
Here's a link to an SQL fiddle to demonstrate: http://sqlfiddle.com/#!2/56e16/2
Are you sure this is really what you want to do though? I can't imagine how this data would be useful. By the way, all of these tables are in one database, right?
SELECT
MIN(environment_name) as environment_name,VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM
(
SELECT
'development' as environment_name, VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM visa_type_lk A
where visa_type_id in (select visa_type_id from visa_type_lk_prod)
UNION ALL
SELECT
'production' as environment_name, VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM visa_type_lk_prod B
)
tmp
GROUP BY VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
HAVING COUNT(*) = 1
order by visa_type_id,environment_name

Help with a complex join query

Keep in mind I am using SQL 2000
I have two tables.
tblAutoPolicyList contains a field called PolicyIDList.
tblLossClaims contains two fields called LossPolicyID & PolicyReview.
I am writing a stored proc that will get the distinct PolicyID from PolicyIDList field, and loop through LossPolicyID field (if match is found, set PolicyReview to 'Y').
Sample table layout:
PolicyIDList LossPolicyID
9651XVB19 5021WWA85, 4421WWA20, 3314WWA31, 1121WAW11, 2221WLL99 Y
5021WWA85 3326WAC35, 1221AXA10, 9863AAA44, 5541RTY33, 9651XVB19 Y
0151ZVB19 4004WMN63, 1001WGA42, 8587ABA56, 8541RWW12, 9329KKB08 N
How would I go about writing the stored proc (looking for logic more than syntax)?
Keep in mind I am using SQL 2000.
Select LossPolicyID, * from tableName where charindex('PolicyID',LossPolicyID,1)>0
Basically, the idea is this:
'Unroll' tblLossClaims and return two columns: a tblLossClaims key (you didn't mention any, so I guess it's going to be LossPolicyID) and Item = a single item from LossPolicyID.
Find matches of unrolled.Item in tblAutoPolicyList.PolicyIDList.
Find matches of distinct matched.LossPolicyID in tblLossClaims.LossPolicyID.
Update tblLossClaims.PolicyReview accordingly.
The main UPDATE can look like this:
UPDATE claims
SET PolicyReview = 'Y'
FROM tblLossClaims claims
JOIN (
SELECT DISTINCT unrolled.LossPolicyID
FROM (
SELECT LossPolicyID, Item = itemof(LossPolicyID)
FROM unrolling_join
) unrolled
JOIN tblAutoPolicyList
ON unrolled.ID = tblAutoPolicyList.PolicyIDList
) matched
ON matched.LossPolicyID = claims.LossPolicyID
You can take advantage of the fixed item width and the fixed list format and thus easily split LossPolicyID without a UDF. I can see this done with the help of a number table and SUBSTRING(). unrolling_join in the above query is actually tblLossClaims joined with the number table.
Here's the definition of unrolled 'zoomed in':
...
(
SELECT LossPolicyID,
Item = SUBSTRING(LossPolicyID,
(v.number - 1) * #ItemLength + 1,
#ItemLength)
FROM tblLossClaims c
JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 1 AND (LEN(c.LossPolicyID) + 2) / (#ItemLength + 2)
) unrolled
...
master..spt_values is a system table that is used here as the number table. Filter v.type = 'P' gives us a rowset with number values from 0 to 2047, which is narrowed down to the list of numbers from 1 to the number of items in LossPolicyID. Eventually v.number serves as an array index and is used to cut out single items.
#ItemLength is of course simply LEN(tblAutoPolicyList.PolicyIDList). I would probably also declared #ItemLength2 = #ItemLength + 2 so it wasn't calculated every time when applying the filter.
Basically, that's it, if I haven't missed anything.
If the PolicyIDList field is a delimited list, you have to first separate the individual policy IDs and create a temporary table with all of the results. Next up, use an update query on the tblLossClaims with 'where exists (select * from #temptable tt where tt.PolicyID = LossPolicyID).
Depending on the size of the table/data, you might wish to add an index to your temporary table.