I'm trying to find an IP address that match a range of hosts (172.24.12.???), but none of the following queries are working:
select * from pg_catalog.pg_stat_activity
--where client_addr <> E'(?|172\.24\.12\.)'::inet;
--where client_addr <> E'(://|^)172\\.24\\.12\\.[0-9]'::inet
I'm getting two different errors.
SQL Error [22P02]: ERROR: invalid input syntax for type inet: "(?|172.24.12.)" and
SQL Error [22P02]: ERROR: invalid input syntax for type inet: "(^)172.24.12.[0-9]"
What Am I doing wrong here. Thanks!
PostgreSQL has native utilities to handle IP addresses, you don't need to use string manipulation as workaround:
WHERE client_addr << '172.24.12/24'
Demo code:
WITH fake_pg_stat_activity (client_addr) AS (
SELECT inet '172.24.12.20'
UNION ALL SELECT inet '192.168.0.1'
)
SELECT *, CASE WHEN client_addr << '172.24.12/24' THEN TRUE ELSE FALSE END AS belongs_to_subnet
FROM fake_pg_stat_activity;
To answer this, I did the following (all of the code below is available on the fiddle here):
CREATE TABLE test
(
IP INET
);
and
INSERT INTO test VALUES
('134.34.34.34'::INET),
('172.24.12.20'::INET);
Now, you appear to have your IP addresses as strings. This is not the best idea - it's always best to use the appropriate data type (operators, comparisons, sorting, indexing, correct values enforced automatically), but in this case, we'll just have to use strings.
As pointed out by #ÁlvaroGonzález, this works nicely with IP addresses:
SELECT
*
FROM test
WHERE ip <<= '172.24.12/24'::INET;
Result:
ip
172.24.12.20
We'll just have to use the PostgreSQL cast operator (::) to convert these to strings as follows:
SELECT
ip
FROM test
WHERE ip::TEXT ~ '172\.24\.12\.[0-2]{1}[0-9]{1,2}'
Result:
ip
172.24.12.20
The regex above isn't the best - you could spend all day searching for regexes - for example this:
SELECT
ip
FROM test
WHERE ip::TEXT ~ '172\.24\.12\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])';
will also work and is more thorough. It's up to you to choose which regex covers your needs.
Related
BigQuery provides updated geoip2 public dataset here [bigquery-publicdata -> geolite2 -> ipv4_city_blocks] which contains network column with IPv4 CIDR values.
How do I convert the CIDR values in the network column via BigQuery SQL (and not via a utility outside BigQuery) into start & end ip-address values so that I can find if an IP address is within a range or no? Would be helpful if you can provide the query to obtain the range ips for a CIDR value in the table.
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION cidrToRange(CIDR STRING)
RETURNS STRUCT<start_IP STRING, end_IP STRING>
LANGUAGE js AS """
var beg = CIDR.substr(CIDR,CIDR.indexOf('/'));
var end = beg;
var off = (1<<(32-parseInt(CIDR.substr(CIDR.indexOf('/')+1))))-1;
var sub = beg.split('.').map(function(a){return parseInt(a)});
var buf = new ArrayBuffer(4);
var i32 = new Uint32Array(buf);
i32[0] = (sub[0]<<24) + (sub[1]<<16) + (sub[2]<<8) + (sub[3]) + off;
var end = Array.apply([],new Uint8Array(buf)).reverse().join('.');
return {start_IP: beg, end_IP: end};
""";
SELECT network, IP_range.*
FROM `bigquery-public-data.geolite2.ipv4_city_blocks`,
UNNEST([cidrToRange(network)]) IP_range
It took about 60 sec to process all 3,037,858 rows with result like below
This query will do the job:
# replace with your source of IP addresses
# here I'm using the same Wikipedia set from the previous article
WITH source_of_ip_addresses AS (
SELECT REGEXP_REPLACE(contributor_ip, 'xxx', '0') ip, COUNT(*) c
FROM `publicdata.samples.wikipedia`
WHERE contributor_ip IS NOT null
GROUP BY 1
)
SELECT city_name, SUM(c) c, ST_GeogPoint(AVG(longitude), AVG(latitude)) point
FROM (
SELECT ip, city_name, c, latitude, longitude, geoname_id
FROM (
SELECT *, NET.SAFE_IP_FROM_STRING(ip) & NET.IP_NET_MASK(4, mask) network_bin
FROM source_of_ip_addresses, UNNEST(GENERATE_ARRAY(9,32)) mask
WHERE BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(ip)) = 4
)
JOIN `fh-bigquery.geocode.201806_geolite2_city_ipv4_locs`
USING (network_bin, mask)
)
WHERE city_name IS NOT null
GROUP BY city_name, geoname_id
ORDER BY c DESC
LIMIT 5000`
Find more details on:
https://towardsdatascience.com/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds-e9e652480bd2
The first thing you need to check is, if that function already exists, so please refer to the BigQuery Functions and Operators documentation.
If not, you need to use Standard SQL User-Defined Functions (UDF), which lets you create a function using another SQL expression or another programming language, such as JavaScript.
Keep in mind when using UDF JavaScript function, BigQuery initializes a JavaScript environment with the function's contents on every shard of execution. There is no optimization to avoid loading the environment, so it can slow down the query.
Regarding to GeoIP2 City and Country CSV Databases site, there is a utility to convert 'network' column to start/end IPs or start/end integers. Refer to Github site for details.
January 2023 solution
Just wanted to respond to Felipe's comment here. I'm not sure why he is suggesting an alternate solution using Snowflake, as his existing solution works just fine. The only difference is that you need to create the dataset yourself.
I managed to solve this by going through the exact same steps listed in Felipe's very helpful original blog article:
Sign-up to MaxMind and download the Geolite2 databases (link)
Download the two CSV files GeoLite2-City-Blocks-IPv4.csv and GeoLite2-City-Locations-en.csv, upload them to a GCP bucket, and create tables from them. I lazily used the BQ automated schema feature and it worked just fine :)
Simply create a geolite2_locs table using a query similar to the one below (just keep or drop your columns as required for your use-case)
CREATE OR REPLACE TALBLE `dataset.geolite2_locs` OPTIONS() AS (
SELECT
ip_ref.network,
NET.IP_FROM_STRING(REGEXP_EXTRACT(ip_ref.network, r'(.*)/' )) network_bin,
CAST(REGEXP_EXTRACT(ip_ref.network, r'/(.*)' ) AS INT64) mask,
ip_ref.geoname_id,
city_ref.continent_name as continent_name,
city_ref.country_name as country_name,
city_ref.city_name as city_name,
city_ref.subdivision_1_name as subdivision_1_name,
city_ref.subdivision_2_name as subdivision_2_name,
ip_ref.latitude as latitude,
ip_ref.longitude as longitude,
FROM `geolite2`.`geolite2-ipv4` ip_ref LEFT JOIN `geolite2`.`geolite2-city-en` city_ref USING (geoname_id)
);
Adapt the query in Felipe's guide or just replace the fh-bigquery.geocode.201806_geolite2_city_ipv4_locs with your new table in his answer above.
Should take you at max 1 hour to get this going. Hope it helps.
I have a logs message that I have to extract columns from with a sql query,
this how the message looks like:
"device=EOHCS-ZA-JIS-FW severity=high from=EOHCloudFAZ(FL1KVM0000005594) trigger=Syslog Critical System Alerts log="logver=54 itime=1528457940 devid=FG1K5D3I13800425 devname=FWJIS01 vd=95_LHC date=2018-06-08 time=13:34:55 logid=0100044546 type=event subtype=system level=information logdesc="Attribute configured" user="JoshuaK" ui="ha_daemon" action=Edit cfgtid=701760128 cfgpath="system.settings" cfgattr="gui-allow-unnamed-policy[disable->enable]" msg="Edit system.settings """
can someone give me an idea
I have a solution for SQL-Server, you could use PATINDEX and extract the log message.
Below is the code to extract from value
declare #input nVarchar(max),#from nVarchar(MAX)
declare #FromStart int,#FromEnd int
set #input='device=EOHCS-ZA-JIS-FW severity=high from=EOHCloudFAZ(FL1KVM0000005594) trigger=Syslog Critical System Alerts log="logver=54 itime=1528457940 devid=FG1K5D3I13800425 devname=FWJIS01 vd=95_LHC date=2018-06-08 time=13:34:55 logid=0100044546 type=event subtype=system level=information logdesc="Attribute configured" user="JoshuaK" ui="ha_daemon" action=Edit cfgtid=701760128 cfgpath="system.settings" cfgattr="gui-allow-unnamed-policy[disable->enable]" msg="Edit system.settings';
SET #FromStart=PATINDEX('%from=%',#input)+5;
SET #FromEnd=PATINDEX('% trigger=%',#input)-#FromStart;
SELECT #from=SUBSTRING(#input,#FromStart,#FromEnd)
SELECT #from
Note : use equivalent of PATINDEX for your corresponding DB server. Also note that this works only if the input string have parameters in a defined order.
I have a following table:
EstimatedCurrentRevenue -- Revenue column value of yesterday
EstimatedPreviousRevenue --- Revenue column value of current day
crmId
OwnerId
PercentageChange.
I am querying two snapshots of the similarly structured data in Azure data lake and trying to query the percentage change in Revenue.
Following is my query i am trying to join on OpportunityId to get the difference between the revenue values:
#opportunityRevenueData = SELECT (((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue)*100)/opty.EstimatedCurrentRevenue) AS PercentageRevenueChange, optyPrevious.EstimatedPreviousRevenue,
opty.EstimatedCurrentRevenue, opty.crmId, opty.OwnerId From #opportunityCurrentData AS opty JOIN #opportunityPreviousData AS optyPrevious on opty.OpportunityId == optyPrevious.OpportunityId;
But i get the following error:
E_CSC_USER_SYNTAXERROR: syntax error. Expected one of: AS EXCEPT FROM
GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')'
','
at token 'From', line 40
near the ###:
This expression is having the problem i know but not sure how to fix it.
(((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue)*100)/opty.EstimatedCurrentRevenue)
Please help, i am completely new to U-sql
U-SQL is case-sensitive (as per here) with all SQL reserved words in UPPER CASE. So you should capitalise the FROM and ON keywords in your statement, like this:
#opportunityRevenueData =
SELECT (((opty.EstimatedCurrentRevenue - optyPrevious.EstimatedPreviousRevenue) * 100) / opty.EstimatedCurrentRevenue) AS PercentageRevenueChange,
optyPrevious.EstimatedPreviousRevenue,
opty.EstimatedCurrentRevenue,
opty.crmId,
opty.OwnerId
FROM #opportunityCurrentData AS opty
JOIN
#opportunityPreviousData AS optyPrevious
ON opty.OpportunityId == optyPrevious.OpportunityId;
Also, if you are completely new to U-SQL, you should consider working through some tutorials to establish the basics of the language, including case-sensitivity. Start at http://usql.io/.
This same crazy sounding error message can occur for (almost?) any USQL syntax error. The answer above was clearly correct for the provided code.
However since many folks will probably get to this page from a search for 'AS EXCEPT FROM GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE', I'd say the best advice to handle these is look closely at the snippet of your code that the error message has marked with '###'.
For example I got to this page upon getting a syntax error for a long query and it turned out I didn't have a casing issue, but just a malformed query with parens around the wrong thing. Once I looked more closely at where in the snippet the ### symbol was, the error became clear.
I have a table in which there is a column of NVARCHAR2 datatype which holds a string.
The string contains some Email Ids which I require to fetch in a comma separated manner.
Below is the test data --
create table nvarchar2_email (email_reject nvarchar2(1000));
insert into nvarchar2_email values ('com.wm.app.b2b.server.ServiceException: javax.mail.SendFailedException: Invalid Addresses; nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <manoj.dalai#gmail.com>: Recipient address rejected: User unknown in virtual alias table;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <santoshi.k#gmail.com>: Recipient address rejected: User unknown in virtual alias table
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <biswajit-kumar.p#gmail.com>: Recipient address rejected: User unknown in virtual alias table');
insert into nvarchar2_email values ('com.wm.app.b2b.server.ServiceException: javax.mail.SendFailedException: Invalid Addresses; nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <manoj.dalai#gmail.com>: Recipient address rejected: User unknown in virtual alias table;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <santoshi.k#gmail.com>: Recipient address rejected: User unknown in virtual alias table');
I am trying to use the below SQL but it is repeating the Email Ids !!
select email_rejetc, listagg(REGEXP_substr (email_rejetc,'[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}', 1,level), ',') within group (order by email_rejetc) invalid_email
from nvarchar2_email
connect by level <= REGEXP_count (email_rejetc,'[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}')
group by EMAIL_REJETC
Here the required output is like
manoj.dalai#gmail.com,santosh.k#gmail.com,biswajit-kumar#gmail.com
Number of emails can VARY in different rows of the table;
My DB is :
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
select (select listagg (regexp_substr(cast(e.email_reject as varchar2(1000)),'<(.*?#.*?)>',1,level,'',1),',')
within group (order by e.email_reject)
from dual
connect by level <= regexp_count (e.email_reject,'<.*?#.*?>')
) as emails
from nvarchar2_email e
;
P.s.
There seem to be an issue with regexp_substr and nvarchar that causes each character in the result to be preceded by \0.
Tested on Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
According to your example, it would appear that the e-mail address is always presented as <aaaa#bbbb>, meaning a <, a string with a # in the middle, and a > sign.
You could try something like this (cannot check syntax, so you might need to do some tests):
SUBSTR(<input string> ,
INSTR(<input string>,'<') + 1 ,
(INSTR(<input string>,'>') - INSTR(<input string>,'<') - 2
) ;
This will yield the FIRST e-mail address within the string. You may use the same concept (providing a string without the first section that contains the first e-mail address) in a loop to extract additional addresses within the same string.
I can't see a way to do this through a single "SELECT" statement because each string may have several (and not all string the same number of) addresses.
One option to investigate is to implement a recursive select (Oracle supports this), but it will be much more complex.
Personally, I would go with the approach suggested above.
How can I capture the (linked) server (in this case Morpheus) name as a column of the result. I do not want to define the Server name in the query itself.
exec("
select
COMNO
,T$CPLS ""Catalog""
,T$CUNO ""Customer ID.""
,T$CPGS ""Price Group""
,T$ITEM ""Item Code""
,T$UPCD UPC
,T$DSCA ""Description""
,T$WGHT ""Weight""
,T$SHIP ""Shipping Indicator""
,nvl(T$STDT,to_char(sysdate,'YYYY-MM-DD')) ""From""
,nvl(case T$TDAT
when '4712-01-01' then ' '
when null then ' '
else t$tdat
end,' ') ""To""
,nvl(t$qanp,99999999) ""Qty.""
,T$PRIC ""List Price""
,T$DISC ""Discount""
,to_char(round(t$pric * (1-t$disc/100),2),99999.99) ""Net""
,Source ""Source""
from Table(edi.ftCompositCatalog(?,?,?)) --where trim(t$item)='105188-041'
order by Source,t$cpgs,t$item",'010','145','000164') at morpheus
If, when running your query, you already know what linked server you are pointing to, then just include that as a string literal in your result:
exec("
select
'morpheus' ""Server Name""
,T$CPLS ""CATALOG""
...
Even if the linked server name is being stored in a variable, you can do this easily since you're building your query string dynamically.
If, as you say, you don't want to define it as a string literal, here is a normal way to get the host (server) name in Oracle:
SELECT SYS_CONTEXT ('USERENV', 'SERVER_HOST') FROM DUAL;
If you want to embed this as a subquery or inline view in your query, I think it would work.
*Please note that some organizations & dba's do not want you to know anything about the backend environment for security reasons, but assuming you have no roadblock there, this should work.