check if all elements in hive array contain a string pattern - hive

I have two columns in a hive table that look something like this:
code codeset
AB AB123,MU124
LM LM123,LM234
I need to verify that all elements in codeset column contain the value in code column so in the above example the first row would be false and the second row would be true.
Is there a simple way to do this that I am missing? I already read about array_contains but that returns true if just one element matches, I need all elements to contain what's in the code column.
Thanks in advance.

split the string, explode and use lateral view to unpivot the data. Then check using locate if the split codeset contains each code (which is done with group by and having).
select code,codeset
from tbl
lateral view explode(split(codeset,',')) t as split_codeset
group by code,codeset
having sum(cast(locate(code,split_codeset)>0 as int))=count(*)

select a.pattern,b.listOfInputs from ( select * from (select a.pattern, case when inputCount = sumPatternMatchResult then true else false end finalResult from (select pattern , sum(patternMatchResult) as sumPatternMatchResult from (select pattern,case when locate(pattern,input) !=0 then 1 else 0 end patternMatchResult from (select pattern,explode(split(listOfInputs,',')) as input from tbl)a ) b group by pattern) a join (select pattern , count(input) inputCount from (select pattern,explode(split(listOfInputs,',')) as input from tbl)a group by pattern) b on a.pattern=b.pattern )c where finalResult=true)a join (select * from tbl) b on a.pattern=b.pattern
This works too.
column mapping details for your table:
code -> pattern
codeset -> listOfInputs

Related

How to extract a json value inside an array inside a json in SQL

I have a JSON being returned in my query called MetaDataJSON, inside which is an array of JSONs called Results. Inside each JSON in Results are two values, Chronic and Probability. There are a couple other tables that have been joined too. Is there a way to get Chronic in a column by itself? Right now I have gotten this far (table and variable names have been made generic):
SELECT DISTINCT
JSON_QUERY(mdj.value, '$.Results[0]') [Results]
FROM table1 t1
JOIN table2 t2 ON t2.parameter1 = t1.parameter1
AND t2.parameter2 = 'ASDF'
JOIN table3 t3 ON oad.parameter3 = oa.parameter3
AND t3.parameter4 = 11
CROSS APPLY OPENJSON(t3.MetaDataJSON) as mdj
This gets me a column called Results where each entry looks like:
{"Chronic": 0, "Probability": 0.0016}
Is there an efficient way to get Chronic in a column by itself? Thanks!
You can do it like this,
WITH jsons (x)
AS
(
-- replace this part with your query
Select a.x from (select '{"Chronic": 0, "Probability": 0.0016}' x ) as a, (SELECT 1 as y union select 2 as y ) as b
)
select
(SELECT value FROM OPENJSON(x,'$') where [key]='Chronic') as "Chronic",
(SELECT value FROM OPENJSON(x,'$') where [key]='Probability') as "Probability"
from jsons;
I think you can change the #json equation to use your query. But I cannot try since I don't have your tables...
BTW, I assume you are using MSSQL...

SQL Statement Select only when 1 State is used

I have 3 columns
ProjektNummer, DokumentNummer, DokumentType
In DokumentType there are 3 possible values: Angebot, Rechnung, Lieferschein).
I want to select only those which have only Angebot and no other values.
Like you see in the picture, I only want the ProjektNummer (17011) where the DokumentType = Angebot and there no other entries for other values of DokumentType.
So it should NOT select ProjektNummmer = 17016 because there are other entries with different values.
I hope you know what I mean.
I already tried if conditions and other stuff but I can't get it done.
Thanks for your help
You can use NOT EXISTS :
SELECT t.*
FROM table t
WHERE NOT EXISTS (SELECT 1
FROM table t1
WHERE t1.ProjektNummer = t.ProjektNummer AND
t1.DokumentType <> t.DokumentType
)
AND t.DokumentType = 'Angebot';
As I read the question, you only want ProjektNummers that meet the criteria -- not the individual rows. If so, then aggregation is a simple solution:
select ProjektNummer
from t
group by ProjektNummer
having min(DokumentType) = max(DokumentType) and
min(DokumentType) = 'Angebot'

Querying Column Headers in GBQ

Is it possible to do a query to provide me an output with the column headers of a specific table? I'm uploading multiple files into our server via GBQ and while it auto-detects the headers, I would like to list out the headers either in rows or as a comma separated cell.
Thank you
I am assuming your files are in CSV format so schema of table does not have repeated fields. With this in mind - below is for BigQuery Standard SQL and requires just fully qualified table name
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `project.dataset.table` WHERE FALSE) t
ON TRUE
If to apply to some real table as in below example
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `bigquery-public-data.utility_us.us_states_area` WHERE FALSE) t
ON TRUE
result will be
Row cols_as_array cols_as_string
1 region_code region_code,division_code,state_fips_code,state_gnis_code,state_geo_id,state_abbreviation,state_name,legal_area_code,feature_class_code,functional_status_code,area_land_meters,area_water_meters,internal_point_lat,internal_point_lon,state_geom
division_code
state_fips_code
state_gnis_code
state_geo_id
state_abbreviation
state_name
legal_area_code
feature_class_code
functional_status_code
area_land_meters
area_water_meters
internal_point_lat
internal_point_lon
state_geom
You can choose which version to use: list as array or list as comma separated string
Also note, above query does not incur any cost at all!

How to group by more than one row value?

I am working with POSTGRESQL and I can't find out how to solve a problem. I have a model called Foobar. Some of its attributes are:
FOOBAR
check_in:datetime
qr_code:string
city_id:integer
In this table there is a lot of redundancy (qr_code is not unique) but that is not my problem right now. What I am trying to get are the foobars that have same qr_code and have been in a well known group of cities, that have checked in at different moments.
I got this by querying:
SELECT * FROM foobar AS a
WHERE a.city_id = 1
AND EXISTS (
SELECT * FROM foobar AS b
WHERE a.check_in < b.check_in
AND a.qr_code = b.qr_code
AND b.city_id = 2
AND EXISTS (
SELECT * FROM foobar as c
WHERE b.check_in < c.check_in
AND c.qr_code = b.qr_code
AND c.city_id = 3
AND EXISTS(...)
)
)
where '...' represents more queries to get more persons with the same qr_code, different check_in date and those well known cities.
My problem is that I want to group this by qr_code, and I want to show the check_in fields of each qr_code like this:
2015-11-11 14:14:14 => [2015-11-11 14:14:14, 2015-11-11 16:16:16, 2015-11-11 17:18:20] (this for each different qr_code)
where the data at the left is the 'smaller' date for that qr_code, and the right part are all the other dates for that qr_code, including the first one.
Is this possible to do with a sql query only? I am asking this because I am actually doing this app with rails, and I know that I can make a different approach with array methods of ruby (a solution with this would be well received too)
You could solve that with a recursive CTE - if I interpret your question correctly:
Assuming you have a given list of cities that must be visited in order by the same qr_code. Your text doesn't say so, but your query indicates as much.
WITH RECURSIVE
c AS (SELECT '{1,2,3}'::int[] AS cities) -- your list of city_id's here
, route AS (
SELECT f.check_in, f.qr_code, 2 AS idx
FROM foobar f
JOIN c ON f.city_id = c.cities[1]
UNION ALL
SELECT f.check_in, f.qr_code, r.idx + 1
FROM route r
JOIN foobar f USING (qr_code)
JOIN c ON f.city_id = c.cities[r.idx]
WHERE r.check_in < f.check_in
)
SELECT qr_code, array_agg(check_in) AS check_in_list
FROM (
SELECT *
FROM route
ORDER BY qr_code, idx -- or check_in
) sub
HAVING count(*) = (SELECT array_length(cities) FROM c);
GROUP BY 1;
Provide the list as array in the first (non-recursive) CTE c.
In the recursive part start with any rows in the first city and travel along your array until the last element.
In the final SELECT aggregate your check_in column in order. Only return qr_code that have visited all cities of the array.
Similar:
Recursive query used for transitive closure

How to use a variable AS a where clause?

I have one where clause which I have to use multiple times. I am quite new to Oracle SQL, so please forgive me for my newbe mistakes :). I have read this website, but could not find the answer :(. Here's the SQL statement:
var condition varchar2(100)
exec :condition := 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from table_name
where category = Y AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
The content field is a CLOB field and unfortunately all values needed are in the same column. My query does not work ofcourse.
You can't use a bind variable for that much of a where clause, only for specific values. You could use a substitution variable if you're running this in SQL*Plus or SQL Developer (and maybe some other clients):
define condition = 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND &condition
...
From other places, including JDBC and OCI, you'd need to have the condition as a variable and build the query string using that, so it's repeated in the code that the parser sees. From PL/SQL you could use dynamic SQL to achieve the same thing. I'm not sure why just repeating the conditions is a problem though, binding arguments if values are going to change. Certainly with two clauses like this it seems a bit pointless.
But maybe you could approach this from a different angle and remove the need to repeat the where clause. Querying the table twice might not be efficient anyway. You could apply your condition once as a subquery, but without knowing your indexes or the selectivity of the conditions this could be worse:
with sub_table as (
select category, content
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from sub_table
where category = X
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from sub_table
where category = Y
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
I'm not sure what the grouping is for - to eliminate duplicates? This only really makes sense if you have a single X and Y record matching the other conditions, doesn't it? Maybe I'm not following it properly.
You could also use a case statement:
select max(content_x), max(content_y)
from (
select
case when category = X
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3) end as content_x,
case when category = Y
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100) end as content_y,
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)