I'm having trouble parsing urls in Postgres. I have a database full of customers and urls associated with them. I need an array of the unique domains associated with each customer. I'd love to be able to do the parsing in my query instead of dumping my results to Python and parsing it there.
In the postgres docs I found this, but can't figure out how to incorporate it into my query:
SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.html');
alias | description | token
----------+---------------+------------------------------
protocol | Protocol head | http://
url | URL | example.com/stuff/index.html
host | Host | example.com
url_path | URL path | /stuff/index.html
(http://www.postgresql.org/docs/9.3/static/textsearch-parsers.html)
I'm starting with a table, like this:
customer_id | url
-------------+--------------------
000001 | www.example.com/fish
000001 | www.example.com/potato
000001 | www.potato.com/artichoke
000002 | www.otherexample.com
My code so far:
SELECT customer_id, array_agg(url)
FROM customer_url_table
GROUP BY customer_id
Which gives me:
customer_id | unique_domains
-----------------------------
000001 | {www.example.com/fish, www.example.com/potato, www.potato.com/greenery}
000002 | {www.otherexample.com}
I want a table like this:
customer_id | unique_domains
-----------------------------
000001 | {example.com, potato.com}
000002 | {otherexample.com}
Working on a PostgreSQL 9.3.3 database that lives on AWS.
The document you linked above is for use with a Postgres text search parser. That requires a separate configuration to setup, and may be more overhead and/or a different sort of thing than you are looking for.
If you do want to go that route, to setup a text parser, you can find more info here:
http://www.postgresql.org/docs/9.3/static/sql-createtsconfig.html
However, if you want to do the parsing inline in Postgres, I would recommend using a procedural Postgres language, where you can import parsing libraries in that language.
You mentioned Python, so you could use PL/Python and a url parsing library such as urlparse (called urllib.parse in Python 3).
More info about urlparse
That includes this example code:
>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'
Going beyond that example, you can get the hostname with the hostname member:
>>> print o.hostname
www.cwi.nl
If you want properly parse out just the domain name (there are lots of edge cases and variants -- i.e. minus the www and any other assorted parts that may be there -- an approach such as in this answer would be best.
For more information about setting up PL/Python, you can go here:
http://www.postgresql.org/docs/9.3/static/plpython.html
So, that's how you could do the parsing in Postgres
instead of dumping my results to Python and parsing it there
It ends up coming a bit full circle with the PL/Python, but if you really want to do the parsing within SQL (especially for performance reasons, say, across a large data set), going with PL/Python may be worth the extra effort.
Related
I'm reading an xml file to build it dynamically using a table.
Scenario: Create subscriber base
* table data
|PaidMode|BillCycleCredit|BillCycleType|PrimaryOffer|SubscriberNo |
|'0' |'100000' |'1' |'500033' |'#(postPaidSubscriber1)'|
|'0' |'100001' |'1' |'500035' |'#(postPaidSubscriber2)'|
* def Request = read('newSubscriber.xml')
And request Request
* print Request
My xml file (exert):
<NewSubscriberRequest>
<new1:Customer>
<shar:FirstName>#(FirstName)</shar:FirstName>
<shar:LastName>#(LastName)</shar:LastName>
<shar:LangType>#(LangType)</shar:LangType>
</new1:Customer>
<new1:Account>
<shar:PaidMode>#(PaidMode)</shar:PaidMode>
<shar:BillCycleCredit>#(BillCycleCredit)</shar:BillCycleCredit>
<shar:CreditCtrlMode>#(CreditCtrlMode)</shar:CreditCtrlMode>
<new1:BillCycleType>#(BillCycleType)</new1:BillCycleType>
</new1:Account>
I've had a look at this solution, How to use dynamic values for Karate Features, he seams to want to do the same thing but it doesnt work for me i get the below error.
feature call loop failed at index: 0, match failed: EQUALS
$ | not equal | match failed for name: 'SubscriberNo' (MAP:MAP)
{"PaidMode":"0","BillCycleCredit":"100000","BillCycleType":"1","PrimaryOffer":"500033","SubscriberNo":"#(postPaidSubscriber1)"}
{"PaidMode":"0","BillCycleCredit":"100000","BillCycleType":"1","PrimaryOffer":"500033","SubscriberNo":"#(postPaidSubscriber1)"}
$.SubscriberNo | not equal (STRING:STRING)
'#(postPaidSubscriber1)'
'699111115'
I'm now here https://github.com/karatelabs/karate/blob/v1.2.0/karate-junit4/src/test/java/com/intuit/karate/junit4/xml/xml.feature again trying to figure out how best to do this. This question is similar to my previous question How to set a variable in the xml per scenario just want to read it from a table to create my subscriber base basically for readability and usage reasons. Sorry Peter I see you get alot of these type of questions (second from me alone lol). I just need a little nudge I think
I really think you don't need to do this in a table: '#(postPaidSubscriber1)' - just use the variable name as-is. Refer the docs: https://github.com/karatelabs/karate#table
* table data
|PaidMode|BillCycleCredit|BillCycleType|PrimaryOffer|SubscriberNo |
|0 |100000 |1 |500033 |postPaidSubscriber1|
|0 |100001 |1 |500035 |postPaidSubscriber2|
I'm trying to write a query that returns the vulnerabilities found by "Built-in Qualys vulnerability assessment" in log analytics.
It was all going smoothly I was getting the values from the properties Json and turning then into separated strings but I found out that some of the terms posses more than one value, and I need to get all of them in a single cell.
My query is like this right now
securityresources | where type =~ "microsoft.security/assessments/subassessments"
| extend assessmentKey=extract(#"(?i)providers/Microsoft.Security/assessments/([^/]*)", 1, id), IdAzure=tostring(properties.id)
| extend IdRecurso = tostring(properties.resourceDetails.id)
| extend NomeVulnerabilidade=tostring(properties.displayName),
Correcao=tostring(properties.remediation),
Categoria=tostring(properties.category),
Impacto=tostring(properties.impact),
Ameaca=tostring(properties.additionalData.threat),
severidade=tostring(properties.status.severity),
status=tostring(properties.status.code),
Referencia=tostring(properties.additionalData.vendorReferences[0].link),
CVE=tostring(properties.additionalData.cve[0].link)
| where assessmentKey == "1195afff-c881-495e-9bc5-1486211ae03f"
| where status == "Unhealthy"
| project IdRecurso, IdAzure, NomeVulnerabilidade, severidade, Categoria, CVE, Referencia, status, Impacto, Ameaca, Correcao
Ignore the awkward names of the columns, for they are in Portuguese.
As you can see in the "Referencia" and "CVE" columns, I'm able to extract the values from a specific index of the array, but I want all links of the whole array
Without sample input and expected output it's hard to understand what you need, so trying to guess here...
I think that summarize make_list(...) by ... will help you (see this to learn how to use make_list)
If this is not what you're looking for, please delete the question, and post a new one with minimal sample input (using datatable operator), and expected output, and we'll gladly help.
I have a list of usernames that I have to monitor and the list is growing every day. I read Splunk documentation and it seems like lookup is the best way to handle this situation.
The goal is for my query to leverage the lookup function and prints out all the download events from all these users in the list.
Sample logs
index=proxy123 activity="download"
{
"machine":"1.1.1.1",
"username":"ABC#xyz.com",
"activity":"download"
}
{
"machine":"2.2.2.2",
"username":"ASDF#xyz.com",
"activity":"download"
}
{
"machine":"3.3.3.3",
"username":"GGG#xyz.com",
"activity":"download"
}
Sample Lookup (username.csv)
users
ABC#xyz.com
ASDF#xyz.com
BBB#xyz.com
Current query:
index=proxy123 activity="download" | lookup username.csv users OUTPUT users | where not isnull(users)
Result: 0 (which is not correct)
I probably don't understand lookup correctly. Can someone correct me and teach me the correct way?
In the lookup file, the name of the field is users, whereas in the event, it is username. Fortunately, the lookup command has a mechanism for renaming the fields during the lookup. Try the following
index=proxy123 activity="download" | lookup username.csv users AS username OUTPUT users | where isnotnull(users)
Now, depending on the volume of data you have in your index and how much data is being discarded when not matching a username in the CSV, there may be alternate approaches you can try, for example, this one using a subsearch.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS username | return username ]
What happens here in the subsearch (the bit in the []) is that the subsearch will be expanded first, in this case, to (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com"). So your main search will turn into
index=proxy123 activity="download" (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com")
which may be more efficient than returning all the data in the index, then discarding anything that doesn't match the list of users.
This approach assumes that you have the username field extracted in the first place. If you don't, you can try the following.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS search | format ]
This expanded search will be
index=proxy123 activity="download" "ABC#xyz.com" OR "ASDF#xyz.com" OR "BBB#xyz.com")
which may be more suitable to your data.
I have a situation where I need to have multiple scenarios within the same feature file and I need them to share the data table so that the user need not enter the same test data in all the relevant data tables in that feature.
Eg:
Feature: ABC
Scenario : 1
<<Steps of Scenario>>
Enter the data here:
|fieldNickName|fieldValue|
|ABC | <aaa> |
<<Steps of Scenario>>
Examples:
|AAA|
|111|
Scenario : 2
<<Steps of Scenario>>
Enter the data here:
|fieldNickName|fieldValue|
|ABC | <aaa> |
|DEF | <bbb> |
<<Steps of Scenario>>
|HIJ | <ccc> |
<<Steps of Scenario>>
Examples:
|AAA|BBB|CCC|
|111|232|AJ|
Here as you can see, "ABC" is a shared parameter & AAA its value between both scenarios. Is there a way I can have a "COMMON" Examples section for a Feature which can feed to all the scenarios in it?
The way to do this is to take the examples out of the feature and push them down into the step definitions. I could explain this in greater details if you provided the actual scenarios with their steps and explained the business context behind them.
Your cuking can be much simpler if avoid using examples and outlines. There really is no need to make things so complicated. Scenarios should be clear, simple and descriptive. They should talk about WHAT you are doing not HOW it is done.
I do not think there is a way for having the common Example parameters. I am not sure about your scenario but if you are using the same step with same data in all the scenarios, you can make them a part of Background
I have a list of IDs and Values in a file and would like to use postman to connect to an API and update the records found in this file.
I tried with the Runner but am stuck in writing the syntax.
The answer is pretty simple and very well explained on this page
You can start with the a basic "put/post" - try to modify one single data set with static values to determine how the final query needs to be build. In my case the API accepted only RAW JSON formated data payloads.
As soon as you have your static postman query running - you can start automating it by determining which parts should be replaced. This data should be found in a data file (JSON or CSV). The schema is important for postman to understand the data. As reference I state the example as if I would like to replace an ID and a Value. My data document has one more column which is not a problem.
+--------+--------+--------+
| id | email | value |
+--------+--------+--------+
| data 1 | data 1 | data 1 |
+--------+--------+--------+
| data 2 | data 2 | data 2 |
+--------+--------+--------+
| data 3 | data 3 | data 3 |
+--------+--------+--------+
Column two (aka email) will be ignored and not be used. Notice how "id" and "value" are written in the header.
I would like to replace the ID which needs to be attached to the API endpoint and like to update a value which is within the dataset of this ID. Replacing the static parts with variables like {{variable}} allows Postman to understand that it needs to fill dynamic data here.
Notice that the variable attached to the URL says that it is not defined in the environment - if you did not set it up in the environment, this is correct and will work with data files.
I used simple tests to confirm if the data of the file made it into my query:
tests["URL has ID"] = responseURL.has(data.id);
tests["Body contains SFID"] = responseBody.has(data.value);
If you reach this point - all there is left to do is to go to the runner page, select the query to run, add the data file (you should preview if everything looks okay) and run it.