I am using Domino Data service to access documents based on certain search criteria.One of my document is
{
"#href":"/rrdb.nsf/api/data/documents/unid/2FC3551DC5266A5088257E35001D5D2C",
"#unid":"2FC3551DC5266A5088257E35001D5D2C",
"#noteid":"922",
"#created":"2015-04-28T05:20:43Z",
"#modified":"2015-04-28T05:20:47Z",
"#authors":
["CN=domain/O=test",""
],
"#form":"Reservation",
"ApptUNID":"B0E582BBA2A39B5988257E35001D5D29",
"From":"CN=ram/O=test",
"AltFrom":"CN=ram/O=test",
"Chair":"CN=ram/O=test",
"AltChair":"CN=ram/O=test",
"Principal":"CN=ram/O=cisco",
"SequenceNum":1,
"ORGState":"5",
"ResourceType":"1",
"ResourceName":"Sedna/B17",
"Room":"Sedna/B17#test",
"Capacity":1,
"_ViewIcon":133,
"AppointmentType":"3",
"StartTimeZone":"Z=-3005$DO=0$ZN=India",
"EndTimeZone":"Z=-3005$DO=0$ZN=India",
"Topic":"2 hour meeting with sendna conference room",
"SendTo":"CN=Sedna/O=B17",
"PostedDate":"2015-04-28T05:20:43Z",
"Encrypt":"0",
"Categories":"",
"RouteServers":"CN=B16-PF-QA-055/O=test",
"RouteTimes":
["2015-04-28T05:20:43Z","2015-04-28T05:20:44Z"
],
"DeliveredDate":"2015-04-28T05:20:44Z",
"StartDate":"2015-04-28T05:15:00Z",
"StartTime":"2015-04-28T05:15:00Z",
"StartDateTime":"2015-04-28T05:15:00Z",
"EndDate":"2015-04-28T07:15:00Z",
"EndTime":"2015-04-28T07:15:00Z",
"EndDateTime":"2015-04-28T07:15:00Z",
"UpdateSeq":1,
"Author":"CN=ram/O=test",
"ResourceOwner":"",
"ReservedFor":"CN=ram/O=cisco",
"ReservedBy":"CN=ram/O=cisco",
"RQStatus":"A",
"Purpose":"2 hour meeting with sendna conference room",
"NoticeType":"A",
"Step":3,
"Site":"B17",
"ReserveDate":"2015-04-28T05:15:00Z"
}
I am using http://{host}/rrdb.nsf/api/data/collections/name/$Calendar?search=([SendTo] CONTAINS "CN=Sedna") to fetch this document,But it is not returning me the record.But if i use CONTAINS "Sedna" then it works.
[edited]
The internal representation of the sendTo seems to be [ABBREVIATE] and not [CANONICALIZE]. Thus looking for CN=... doesn't return any result, since the CN= O= are not part of the data.
Replace the search by:
[SendTo]="Sedna/B17"
Or optionally "Sedna/" if you only want to test that exact name is Sedna.
Related
I am using cloud google vision API to extract text from Aadhaar and PAN. How can I get exact user details like name, father's name, and address?
Raw Data
ଭାରତ ସରକାର
Government of India
ଜିତ୍ୟାନନ୍ଦ ଖେମୁକୁ
NITYANANDA KHEMUDU
ପିତା : ସୀତାରାମ ଖେମୁକୁ
Father: Sitaram Khemudu
ଜନ୍ମ ତାରିଖ / DOB : 01.07.1999
ପୁରୁଷ / Male
ମୋ ଆଧାର, ମୋ ପରିଚୟ
I have built 5-6 OCR till date like aadhar, pan, ITR, Driving Linces etc., using google cloud vision API, I think you are looking for response like
{"pan_card_no":"ECXXXXXX123",
"name":"fshksj"
}
to get such response you need to built your own logic, here are some logic's i can share with you
Perform OCR on your document using Google_cloud_vision API and store that response into one array (Goggle gives logic line by line)
Like in above case if you want to grab DOB first you can build logic like i) if "DOB" in (list of item) then grab the numeric values
To get the name what you can do is dropping the unnecessary items from list by if using if condition like (if "India" in i) or (if i.isdigit()) then drop it likewise you can drop the unnesseary items from main list to get the Name
to grab the Address what you can do is, 95% of the time address come with pincode at last, so what you can do is treat pincode as a last index of address and look of "Address" kind of keyword then add all the elements from "Add keyword index" to "pincode index" ( this can be easily done in list) to validate whether the pincode is valid or not you can use library like Pyzipin
There are multiple conditions that you can use, above are the very basic one i mentioned, if you need any specific logic then then you can ask me
I am using premium account (not sandbox) for data collection.
I want to collect:
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to US and not geolocated at tweet level, excluding all retweets
All tweets in English that contain ‘china’ or ‘chinese’ that are user geolocated to ‘Minnesota’ and not geolocated at tweet level, excluding all retweets
The code is as follows:
premium_search_args = load_credentials('twitter_API.yaml',
yaml_key ='search_tweets_premium_api', env_overwrite=False)
# keywords for the search
# key word 1
keywords = '(China OR Chinese) lang:en profile_country:US -place_country:US -is:retweet'
# key word 2
keywords = '(China OR Chinese) lang:en -place_country:US profile_region:"Minnesota" -is:retweet'
# define search rule
rule = gen_rule_payload(keywords,from_date='2019-12-01',
to_date='2019-12-10',results_per_call=500)
# create result stream and print before start
rs = ResultStream(rule_payload=rule, max_results=1250000,
**premium_search_args)
My problems are that:
For the first one, a large portion of the results I get didn’t satisfy the query. First, some don’t have Profile Geo enrichment, i.e. user.derived.locations attribute is not in the user object. Second, if it is, a lot don’t have country code US, i.e. they are identified to other countries.
For the second one, the result I get from this method is a smaller subset of the results I can get from 1). That is, when I filter all tweets user geolocated to Minnesota (by user.derived.locations.region) from profile_country:US, it gives a larger sample than using profile_region:“Minnesota”. A considerable amount of data is missing using this method.
I have tried several times but it seems that user geolocation operators don’t work exactly what I want. Does anyone has any idea why this is the case? I would very much appreciate any answers/suggestions/comments.
Thank you!
How to programmatically list available Google BigQuery locations? I need a result similar to what is in the table of this page: https://cloud.google.com/bigquery/docs/locations.
As #shollyman has mentioned
The BigQuery API does not expose the equivalent of a list locations call at this time.
So, you should consider filing a feature request on the issue tracker.
Meantime, I wanted to add Option 3 to those two already proposed by #Tamir
This is a little naïve option with its pros and cons, but depends on your specific use case can be useful and easy adapted to your application
Step 1 - load page (https://cloud.google.com/bigquery/docs/locations) html
Step 2 - parse and extract needed info
Obviously, this is super simple to implement in any client of your choice
As I am huge BigQuery fan - I went through "prove of concept" using BigQuery Tool - Magnus
I've created workflow with just two Tasks:
API Task - to load page's HTML into variable var_payload
and
BigQuery Task - to parse and extract wanted info out of html
The "whole" workflow is as simple as it looks in below screenshot
The query I used in BigQuery Task is
CREATE TEMP FUNCTION decode(x STRING) RETURNS STRING
LANGUAGE js AS """
return he.decode(x);
"""
OPTIONS (library="gs://my_bucket/he.js");
WITH t AS (
SELECT html,
REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(html,
r'\n|<strong>|</strong>|<code>|</code>', ''),
r'<table>(.*?)</table>'
)[OFFSET(0)] x
FROM (SELECT'''<var_payload>''' AS html)
)
SELECT pos,
line[SAFE_OFFSET(0)] Area,
line[SAFE_OFFSET(1)] Region_Name,
decode(line[SAFE_OFFSET(2)]) Region_Description
FROM (
SELECT
pos, REGEXP_EXTRACT_ALL(line, '<td>(.*?)</td>') line
FROM t,
UNNEST(REGEXP_EXTRACT_ALL(x, r'<tr>(.*?)</tr>')) line
WITH OFFSET pos
WHERE pos > 0
)
As you can see, i used he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would ...
After workflow is executed and those two steps are done - result is in project.dataset.location_extraction and we can query this table to make sure we've got what is expected
Note: obviously parsing and extracting needed locations info is quite simplified and surely can be improved to be more flexible in terms of changing source page layout
Unfortunately, There is no API which provides BigQuery supported location list.
I see two options which might be good for you:
Option 1
You can manually manage a list and expose this list to your client via an API or any other means your application support (You will need to follow BigQuery product updates to follow on updates on this list)
Option 2
If your use case is to provide a list of the location you are using to store your own data you can call dataset.list to get a list of location and display/use it in your app
{
"kind": "bigquery#dataset",
"id": "id1",
"datasetReference": {
"datasetId": "datasetId",
"projectId": "projectId"
},
"location": "US"
}
While reading in the knowledge center, the following is mentioned:
The TTL properties are not applied to data that already exists in the
Analytics Platform. You must set the TTL properties before you add
data.
So how can I remove existing logs before setting those properties?
You must use the Elastic Search delete APIs to remove existing documents from Worklight Analytics.
Before using any of the Elastic Search delete APIs it is advised to back up your data first, as misuse of the APIs or an undesired query will result in permanent data loss.
Below is an example of how to delete client logs in a specified date range, assuming your instance of Elastic Search is running on http://localhost:9500. This specific example deletes all client logs between October 1st and October 15th 2014.
curl -XDELETE 'http://localhost:9500/worklight/client_logs/_query' -d
'
{
"query": {
"range": {
"timestamp": {
"gt" : 1412121600000,
"lt" : 1413331200000
}
}
}
}
'
You can delete any type of document using the path http://localhost:9500/worklight/{document_type}. The types of documents are app_activities, network_activities, notification_activities, client_logs and server_logs.
When deleting documents, you can filter on two properties: "timestamp" or "daystamp", which are both represented in epoch time in milliseconds. Please note, "daystamp" is simply the first timestamp for the given day (i.e. 12:00AM). The range query also accepts the following parameters:
gte - greater than or equal to
gt - greater than
lte - less than or equal to
lt - less than
For more information refer to Elastic Search delete and query APIS:
Delete by Query API
Queries
Range Query
I am trying to use ravendb (build 960) multi get to get the results of several queries.
I am posting to /multi_get with:
[
{"Url":"/databases/myDb/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/databases/myDb/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
The server responds with results for each query, however it responds with EVERY document for each index. It looks like neither the query is used, or the fetch parameters.
Is there something I am doing wrong here?
Multi GET assumes all the urls are local to the current database, you can specify urls starting with /datbases/foo
You specify that in the multi get url.
Change you code to generate:
[
{"Url":"/indexes/composers?query=title:beethoven&fetch=title&fetch=biography"},
{"Url":"/indexes/products?query=title:beethoven&fetch=title&fetch=price"}
]
And make sure that you multi get goes to
/databases/mydb/multi_get