Impala AnalysisException: Subqueries are not supported in the HAVING clause - impala

I have a query where I am selecting destination host names where a user agent string matches and grouping by where there is a distinct srchostname using Impala.
select desthostname
from proxy_table
where useragentstring = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/538.1 (KHTML, like Gecko) Google Earth Pro/7.3.2.5491 Safari/538.1"
group by desthostname
having count(*) = (select count(distinct srchostname) from proxy_table);
But I am running into the error:
AnalysisException: Subqueries are not supported in the HAVING clause.
Do you know how I can fix this?

Run this:
select desthostname from
(select desthostname,count(*) as cnt
from proxy_table
where useragentstring = "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/538.1 (KHTML, like Gecko) Google Earth Pro/7.3.2.5491 Safari/538.1"
group by desthostname) A where A.cnt in (select count(distinct srchostname) from proxy_table);

Related

extracting data from unstructured JSON in big query

I have JSON as a string in a big query field:
[{"name":"user_group","value":"regular"},{"name":"checkout_version","value":"2.2"},{"name":"currency","value":"EUR"},{"name":"currency_exchange_rate","value":"1"},{"name":"currency_symbol","value":"€"},{"name":"variant","value":"default"},{"name":"snowplow_id","value":"XXXXXXX"},{"name":"ip_address","value":"XXXX"},{"name":"user_agent","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"},{"name":"is_test_order","value":"false"}]
I'm going crazy trying to extract the value ("default") from this section:
{"name":"variant","value":"default"}
the part I want will always follow "name":"variant","value":" and have a " at the end.
I have tried json_extract but regexp_extract seems the best option, I tried this: (
select REGEXP_EXTRACT(json_string_field, r'/\{"value":"([^"]+)"/') as variant
from source_table
)
but I'm just getting nulls back...would appreciate ideas...
consider below query
WITH json_data AS (
SELECT '[{"name":"user_group","value":"regular"},{"name":"checkout_version","value":"2.2"},{"name":"currency","value":"EUR"},{"name":"currency_exchange_rate","value":"1"},{"name":"currency_symbol","value":"€"},{"name":"variant","value":"default"},{"name":"snowplow_id","value":"XXXXXXX"},{"name":"ip_address","value":"XXXX"},{"name":"user_agent","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36"},{"name":"is_test_order","value":"false"}]' json
)
SELECT JSON_VALUE(kv, '$.value') AS value
FROM json_data, UNNEST(JSON_QUERY_ARRAY(json)) kv
WHERE JSON_VALUE(kv, '$.name') = 'variant';

Using AWS Athena, Apache Access Log cannot be analyzed as desired

The current SQL to create a table is as follows.
CREATE EXTERNAL TABLE apache_logs(
client_ip string,
client_id string,
user_id string,
request_received_time string,
client_request string,
server_status string,
returned_obj_size string,
referrer string
)
ROW FORMAT SERDE
'com.amazonaws.glue.serde.GrokSerDe'
WITH SERDEPROPERTIES (
'input.format'='^%{IPV4:client_ip} %{DATA:client_id} %{USERNAME:user_id} %{GREEDYDATA:request_received_time} %{QUOTEDSTRING:client_request} %{DATA:server_status} %{DATA: returned_obj_size} %{DATA:referrer}'
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://s3-bucket/logs/';
If you run the following SQL command on the table above, values ​​that do not fit into the column are added as shown below.
SELECT * FROM apache_logs;
The format of the log is below!
178.247.80.196 - - [23/Jan/2022:15:21:25 +0900] "POST /app/main/posts HTTP/1.0" 200 5006 "http://hicks.com/blog/tags/home.asp" "Opera/8.83.(X11; Linux i686; ia-FR) Presto/2.9.162 Version/10.00"

Karate: Is there a way to pass variable as string in scenario outline and examples table [duplicate]

This question already has an answer here:
Passing variables into Examples section [duplicate]
(1 answer)
Closed 1 year ago.
I am using latest karate v1.1.0. My feature looks like this:
Feature: Scenario outline with examples table
Background:
* url 'http://localhost:8080'
* def dummy = Java.type(karatetests.attributes)
* def test = new attributes()
* def userid = test.getuserid()
Scenario Outline: pass userid with string as parameter
Given path '<path>'
And Header Host = 'hostname'
And Header User-Agent = '<Ua-string>'
When method POST
Then status 200
Examples:
| path | Ua-string |
| api | AppleWebKit/537.36 (KHTML, like Gecko, userid)|
In cucumber: I was able to realise the variable value of 'userid' in Ua-string table with AppleWebKit/537.36 (KHTML, like Gecko, ${userid})
In karate: I tried with 'userid', "userid", "#(userid)", and '#(userid)' unfortunately was not succesfull.
Examples:
| path | Ua-string |
| api | AppleWebKit/537.36 (KHTML, like Gecko, userid)| => Result: userid string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, 'userid')| => Result: syntax error
| api | AppleWebKit/537.36 (KHTML, like Gecko, "userid")| => Result: "userid" string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, "#(userid)")| => Result: "#(userid)" string is passed not its value
| api | AppleWebKit/537.36 (KHTML, like Gecko, '#(userid)')| => Result: '#(userid)' syntax error
How can I replace the userid with its value, while passing it to Ua-string header?
Thanks
Works perfectly for me. Try this example that you can cut and paste into any feature and see that it works for yourself:
Scenario Outline:
* url 'https://httpbin.org/anything'
* header User-Agent = userAgent
* method get
Examples:
| userAgent |
| foo |
| bar |
But I think you are expecting functions and variables to work within Examples: - but sorry, that is not supported: https://stackoverflow.com/a/60358535/143475
EDIT I still don't understand what you mean by "variable" in your comment, but if you are seeing some problems with commas, try this, and refer the docs: https://github.com/intuit/karate#scenario-outline-enhancements
Scenario Outline:
* url 'https://httpbin.org/anything'
* header User-Agent = userAgent
* method get
Examples:
| userAgent! |
| 'foo, bar' |
| 'baz, ban' |
And finally, if you really want to do some "magic" using variables, you can always do that in the scenario body as shown here: https://stackoverflow.com/a/60358535/143475
Or just use string concatenation. Is that so hard ?

Grouping multiple 'AND x not like y' In SQL statement

I currently have an SQL statement that is too long for my program (I have a maximum number of character that I can use. I'm using sccm report). The problem is my SQL statement look like this:
Select distinct v_GS_ADD_REMOVE_PROGRAMS_64.DisplayName0, v_GS_ADD_REMOVE_PROGRAMS_64.Publisher0, v_GS_ADD_REMOVE_PROGRAMS_64.Version0
FROM v_GS_ADD_REMOVE_PROGRAMS_64
JOIN v_R_System ON v_GS_ADD_REMOVE_PROGRAMS_64.ResourceID = v_R_System.ResourceID
WHERE (v_R_System.Netbios_Name0 = #computername)
DisplayName0 NOT LIKE 'hpp%'
AND
DisplayName0 NOT LIKE 'Logitech SetPoint%'
AND
DisplayName0 NOT LIKE 'HP Document Manager%'
AND
DisplayName0 NOT LIKE 'HP Imaging Device Functions%'
AND
DisplayName0 NOT LIKE 'PyQt4 - PyQwt5%'
And it goes on and on for 20 pages. How can I minimize the amount of code this request contains? Is there a way to group all the displayName0 not like ?? with something like a NOT IN(value1, value2, ...)?
If you are OK without tailing % in you pattern you can replace it with:
SELECT ... WHERE DisplayName0 NOT IN ('hpp','Logitech SetPoint','HP Document Manager',...)
It would make it somehow shorter.
But it seems to me that proper solution would be to create [temp] table with all the names you need to filter against and then join it.
Could you store the values in a separate table and then reference it in your query like this?:
SELECT DISTINCT v_GS_ADD_REMOVE_PROGRAMS_64.DisplayName0
,v_GS_ADD_REMOVE_PROGRAMS_64.Publisher0
,v_GS_ADD_REMOVE_PROGRAMS_64.Version0
FROM v_GS_ADD_REMOVE_PROGRAMS_64
JOIN v_R_System ON v_GS_ADD_REMOVE_PROGRAMS_64.ResourceID = v_R_System.ResourceID
WHERE (v_R_System.Netbios_Name0 = #computername) DisplayName0 NOT IN (
SELECT DisplayName0
FROM < NewTableName >
)

SQL 2005: How to use GROUP BY with a sub query

The following very simple query
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
provides the following results:
guid browser_agent
367DE2B8-88A5-4DA9-ACBB-C0864493DC1F Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
5DCB918E-DA56-4545-A4E3-D09B1B803422 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
998B8F37-2C9A-49EB-AA0B-CF88C4CC7BDF Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
A0DD3BCB-E8A9-4434-A869-C343FB21F993 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727)
Ii want to be able to count the number of unique browser_agent strings, so I am performing the following query:
select browser_agent, count(browser_agent) as 'count'
from
(
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
)
group by browser_agent
order by 'count' desc;
Problem is SQL 2005 is complaining:
Msg 156, Level 15, State 1, Line 8
Incorrect syntax near the keyword 'group'.
Can anyone shed any light on how to resolve this please? I've run out of ideas.
Many thanks,
Mark
You need to alias your derived table.
select browser_agent, count(browser_agent) as 'count'
from
(
select distinct guid, browser_agent
from tblMyGlossary
where browser_agent is not null
) a
group by browser_agent
order by 'count' desc;