How to display only searched string in column in postgresql - sql

I want to only display searched string from a table, as example this is my table:
Table name: guidelines
id content
1 An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
2 There were a lot of people getting accused on this particular town.
If I use search query to search for "accused", it will show the full result:
SELECT content FROM "guidelines" WHERE "content" 'ILIKE' '%accused%';
Result:
content
An individual is accused “of” a crime, not “with” or “for” a crime. Accused, often as “the accused”, refers to the individual or individuals standing trial. EXAMPLES: The prosecutor accused the politician of bribery. The accused politician stood trial for bribery. See alleged, charged, suspected.
There were a lot of people getting accused on this particular town.
How can I only get the first matching string and followed by the data on the column, as example this is my goal:
content
Accused, often as “the accused”, refers to th...
accused on this particular to...
update: I updated the table and column name to make it better to differentiate table and column

In Postgresql, you can do that by using position function and substring function. see the following query as an example:
SELECT
id,
substring(content, position ('accused' in content)) as matched
FROM
guidelines
WHERE
content LIKE '%accused%'

Try this :
SELECT substring(content from '%#"accused%#"%' for '#') from guidelines;
each # is the place holder defined in the last part for '#' and need and aditional "
So you have % and function will return what is found inside both placeholder. In this case is % or the rest of the string after accused

Related

filter params for import users from AD

I'm to import users used this filter:
(&(objectClass=user)(objectCategory=PERSON))
And i want to add RealName parameter as filter.
RealName should contain 3 any words.
For example RealName contained "name middle_name surname" - it's good, need to import.
If RealName contained "name surname" (only two word) - wrong, not imported.
Can you help me with with filter?
LDAP queries can only use attributes that exist in Active Directory, and there is no attribute called "RealName".
You will have to split the input string yourself. So, for example, if you were given the string "Necro The Human", you would have to split that into 3 strings using whatever programming language you're using.
Then you will have to insert those into an LDAP query that matches the three name attributes: givenName, initials, and sn (surname)
Your finished query would look something like this:
(&(objectClass=user)(objectCategory=person)(givenName=Necro)(initials=The)(sn=Human))
Check if you're using initials or the middleName attribute for the middle name. It's the initials attribute that is labelled as "Initials" in Active Directory Users and Computers, so that may be what's used, even though the documentation says it's just for the initials of the full name, or middle initials (not the full middle name). It's also limited to only 6 characters, so you may be using middleName if you're storing full middle names.
If your company has the standard of setting the displayName to the user's full name, including middle name, then you could just match against that. But I think it would be pretty rare that the middle name would be in the display name.
(&(objectClass=user)(objectCategory=person)(displayName=Necro The Human))
There is also ambiguous name resolution, but it searches other attributes (not just the first/last name) and it does not include initials or middleName. I mention it only because it's not well known and you may find some other use for it one day.

extract text from documents like PAN and Aadhaar

I am using cloud google vision API to extract text from Aadhaar and PAN. How can I get exact user details like name, father's name, and address?
Raw Data
ଭାରତ ସରକାର
Government of India
ଜିତ୍ୟାନନ୍ଦ ଖେମୁକୁ
NITYANANDA KHEMUDU
ପିତା : ସୀତାରାମ ଖେମୁକୁ
Father: Sitaram Khemudu
ଜନ୍ମ ତାରିଖ / DOB : 01.07.1999
ପୁରୁଷ / Male
ମୋ ଆଧାର, ମୋ ପରିଚୟ
I have built 5-6 OCR till date like aadhar, pan, ITR, Driving Linces etc., using google cloud vision API, I think you are looking for response like
{"pan_card_no":"ECXXXXXX123",
"name":"fshksj"
}
to get such response you need to built your own logic, here are some logic's i can share with you
Perform OCR on your document using Google_cloud_vision API and store that response into one array (Goggle gives logic line by line)
Like in above case if you want to grab DOB first you can build logic like i) if "DOB" in (list of item) then grab the numeric values
To get the name what you can do is dropping the unnecessary items from list by if using if condition like (if "India" in i) or (if i.isdigit()) then drop it likewise you can drop the unnesseary items from main list to get the Name
to grab the Address what you can do is, 95% of the time address come with pincode at last, so what you can do is treat pincode as a last index of address and look of "Address" kind of keyword then add all the elements from "Add keyword index" to "pincode index" ( this can be easily done in list) to validate whether the pincode is valid or not you can use library like Pyzipin
There are multiple conditions that you can use, above are the very basic one i mentioned, if you need any specific logic then then you can ask me

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

NYT article search API not returning results for certain queries

I have a set of queries and I am trying to get web_urls using the NYT article search API. But I am seeing that it works for q2 below but not for q1.
q1: Seattle+Jacob Vigdor+the University of Washington
q2: Seattle+Jacob Vigdor+University of Washington
If you paste the url below with your API key in the web browser, you get an empty result.
Search request for q1
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington&begin_date=20170626&api-key=XXXX
Empty results for q1
{"response":{"meta":{"hits":0,"time":27,"offset":0},"docs":[]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Instead if you paste the following in your web browser (without the article 'the' in the query) you get non-empty results
Search request for q2
api.nytimes.com/svc/search/v2/articlesearch.json?q=Seattle+Jacob%20Vigdor+University%20of%20Washington&begin_date=20170626&api-key=XXXX
Non-empty results for q2
{"response":{"meta":{"hits":1,"time":22,"offset":0},"docs":[{"web_url":"https://www.nytimes.com/aponline/2017/06/26/us/ap-us-seattle-minimum-wage.html","snippet":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week....","lead_paragraph":"Seattle's $15-an-hour minimum wage law has cost the city jobs, according to a study released Monday that contradicted another new study published last week.","abstract":null,"print_page":null,"blog":[],"source":"AP","multimedia":[],"headline":{"main":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs","print_headline":"New Study of Seattle's $15 Minimum Wage Says It Costs Jobs"},"keywords":[],"pub_date":"2017-06-26T15:16:28+0000","document_type":"article","news_desk":"None","section_name":"U.S.","subsection_name":null,"byline":{"person":[],"original":"By THE ASSOCIATED PRESS","organization":"THE ASSOCIATED PRESS"},"type_of_material":"News","_id":"5951255195d0e02550996fb3","word_count":643,"slideshow_credits":null}]},"status":"OK","copyright":"Copyright (c) 2013 The New York Times Company. All Rights Reserved."}
Interestingly, both queries work fine on the api test page
http://developer.nytimes.com/article_search_v2.json#/Console/
Also, if you look at the article below returned by q2, you see that the query term in q1, 'the University of Washington' does occur in it and it should have returned this article.
https://www.nytimes.com//aponline//2017//06//26//us//ap-us-seattle-minimum-wage.html
I am confused about this behaviour of the API. Any ideas what's going on? Am I missing something?
Thank you for all the answers. Below I am pasting the answer I received from NYT developers.
NYT's Article Search API uses Elasticsearch. There are lots of docs online about the query syntax of Elasticsearch (it is based on Lucene).
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
If you want articles that contain "Seattle", "Jacob Vigdor" and "University of Washington", do
"Seattle" AND "Jacob Vigdor" AND "University of Washington"
or
+"Seattle" +"Jacob Vigdor" +"University of Washington"
I think you need to change encoding of spaces (%20) to + (%2B):
In your example,
q=Seattle+Jacob%20Vigdor+the%20University%20of%20Washington
When I submit from the page on the site, it uses %2B:
q=Seattle%2BJacob+Vigdor%2Bthe+University+of+Washington
How are you URL encoding? One way to fix it would be to replace your spaces with + before URL encoding.
Also, you may need to replace %20 with +. There are various schemes for URL encoding, so the best way would depend on how you are doing it.

Amazon CloudSearch returns false results

I have a DB of articles, and i would like to search for all the articles who:
1. contain the word 'RIO' in either the title or the excerpt
2. contain the word 'BRAZIL' in the parent_post_content
3. and in a certain time range
The query I search with (structured) was:
(and (phrase field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or (phrase field=title 'RIO') (phrase field=excerpt 'RIO')))
but for some reason i get results that contain 'RIO' in the title, but do not contain 'BRAZIL' in the parent_post_content.
This is especially weird because i tried to condition only on the title (and not the excerpt) with this query:
(and (phrase field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (phrase field=name 'RIO'))
and the results seem OK.
I'm fairy new to CloudSearch, so i very likely have syntax errors, but i can't seem to find them. help?
You're using the phrase operator but not actually searching for a phrase; it would be best to use the term operator (or no operator) instead. I can't see why it should matter but using something outside of how it was intended to be used can invite unintended consequences.
Here is how I'd re-write your queries:
Using term (mainly just used if you want to boost fields):
(and (term field=parent_post_content 'BRAZIL') (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or (term field=title 'RIO') (term field=excerpt 'RIO')))
Without an operator (I find this simplest):
(and parent_post_content:'BRAZIL' (range field=post_date ['2016-02-16T08:13:26Z','2016-09-16T08:13:26Z'}) (or title:'RIO' excerpt:'RIO'))
If that fails, can you post the complete query? I'd like to check that, for example, you're using the structured query parser since you mentioned you're new to CloudSearch.
Here are some relevant docs from Amazon:
Compound queries for more on the various operators
Searching text for specifics on the phrase operator
Apparently the problem was not with the query, but with the displayed content. I foolishly trusted that the content displaying in the CloudSearch site was complete, and so concluded that it does not contain Brazil. But alas, it is not the full content, and when i check the full content, Brazil was there.
Sorry for the foolery.