Searching date and time in Lucene query string in Cloudant - lucene

I am trying to write the index and search using date and time in that index in Cloudant NoSql database.
When I pass only the date in the query string, it works fine
created_date:[2015-08-16 TO 2015-08-27]
This returns the correct results but when I include time in the parameter:
created_date:[2015-08-16 07:38:00 TO 2015-08-27 07:38:02]
I get an error:
Cannot parse 'created_date:[2015-08-16 07:38:00 TO 2015-08-27 07:38:02]': Encountered " "TO" "TO "" at line 1, column 50. Was expecting one of: "]" ... "}"
I have some more query parameters before this but the above is the gist of the error.
This is an Apache Lucene query string. What is causing this to happen?

According to Lucene Java doc, date format should looks like this:
A date field shall be of the form 1995-12-31T23:59:59Z The trailing
"Z" designates UTC time and is mandatory
This format was derived to be standards compliant (ISO 8601) and is a
more restricted form of the canonical representation of dateTime from
XML schema part 2. Examples...
1995-12-31T23:59:59Z 1995-12-31T23:59:59.9Z 1995-12-31T23:59:59.99Z
1995-12-31T23:59:59.999Z
So, you miss 'T' between date and time.
For more information: https://lucene.apache.org/solr/4_10_4/solr-core/org/apache/solr/schema/DateField.html

I did it the following way
created_date:["2015-08-16 07:38:00" TO "2015-08-27 07:38:02"]
and used the keyword analyzer in cloudant
This link explains it all
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

Related

Splunk field extractor unable to extract all values

I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...

Convert Integer to Date in Snowflake using Error Handling

I have a requirement where integer value should be converted to date type in Snowflake.Initially I used following query to get the desired result:
SELECT TO_DATE(TO_varchar(19000000+1200034),'YYYYMMDD')
2019-07-09
Now when I used same query for the input - "20200034", I am getting following error:
select TO_DATE(TO_varchar(19000000+1200034),'YYYYMMDD')
Can't parse '20200034' as date with format 'YYYYMMDD'
"20200034" is actually coming from one of the columns in snowflake table. To resolve this issue I tried using "TRY_TO_DATE" function, but output of "TRY_TO_DATE" function is giving incorrect result. Please find details below:
select TRY_TO_DATE(TO_varchar(19000000+1200034))
1970-08-22
As per Snowflake documentation, error handling function does not support optional format argument supported by TO_DATE , DATE.
https://docs.snowflake.com/en/sql-reference/functions/try_to_date.html
You can set the DATE_INPUT_FORMAT for the session before calling the TRY_TO_DATE function.
I suggest you contact Snowflake support and ask them to enable try_to_date with format string - it's available but needs to be enabled manually.
However you have to be aware that TRY_TO_DATE on '20200034' will be resolved to NULL.

How can I fetch a dynamic file from FTP server-generated every day?

There are transactional input CSV files coming on a daily basis on an FTP location. I need to read these input files and process them on daily batch execution. The name of the files remains the same every day, but the date gets appended at the end of the filenames every day,
Ex:
Day1
General_Ledger1_2020-07-01,
General_Ledger2_2020-07-01,
General_Ledger3_2020-07-01,
General_Ledger4_2020-07-01,
General_Ledger5_2020-07-01
Day2
General_Ledger1_2020-07-02,
General_Ledger2_2020-07-02,
General_Ledger3_2020-07-02,
General_Ledger4_2020-07-02,
General_Ledger5_2020-07-02
How can I append this Date information to the input file name every time the job runs?
I have faced similar problem earlier and this can be solved using calculated parameter in the file path. Here, you can create expressions that will retrieve the file dynamically.
Example,
CONCAT( UPPER(lit('$(Prefix)')), ADD_DAYS( TODATE(lit('$(currentTime)'), 'yyyy-mm-dd'), 'yyyy-mm-dd' ,-1),'.csv')
Breaking of the expression :
$(currentTime) : this system parameter will get the current date (this will also include timestamp).
(TODATE(lit('$(currentTime)'), 'yyyy-mm-dd') : TODATE will get only date from the whole timestamp with format as ‘yyyy-mm-dd’.
ADD_DAYS(TODATE(lit('$(currentTime)'), 'yyyy-mm-dd'), 'yyyy-mm-dd' ,-1) : ADD_DAYS here will add -1 to the date retrieved from. TODATE(). Hence (2020-04-24) + (-1) would give us 2020-04-23
$(Prefix) : $(Prefix) will be an user defined input parameter of type String which user will be providing at runtime – Since the
prefix will be always dynamic.
CONCAT() : Finally to combine all the results and form the exact file path CONCAT() can be used. Also in between some static
string is added as it will always be fixed for every file to be read.

How to create a correct filter string with OR and AND operators for django?

My app has a frontend on vue.js and backend on django rest framework. I need to do a filter string on vue which should do something like this:
((status=closed) | (status=canceled)) & (priority=middle)
but got an error as a response
["Invalid querystring operator. Matched: ') & '."]
After encoding my string looks like this:
?filters=((status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82)%20%7C%20(status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD))%20%26%20(priority%3D%D0%A1%D1%80%D0%B5%D0%B4%D0%BD%D0%B8%D0%B9)
which corresponds to
?filters=((status=closed)|(status=canceled))&(priority=middle)
How should look a correct filter string for django?
I have no problem if statement includes only | or only &. For example filter string like this one works perfect:
?filters=(status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82)%20%7C%20(status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD)
a.k.a. ?filters=(status=closed)|(status=canceled). But if i add an & after it and additional brackets to specify the order of conditions calculation it fails with an error.
I also tried to reduce usage of brackets and had string like this (as experiment):
?filters=(status%3D%D0%97%D0%B0%D0%BA%D1%80%D1%8B%D1%82%20%7C%20status%3D%D0%9E%D1%82%D0%BA%D0%BB%D0%BE%D0%BD%D0%B5%D0%BD)
a.k.a. ?filters=(status=closed | status=canceled). This one doesn't work - get neither error nor the data.
I need to have a mixed results in my case: both statuses (closed and canceled) and priority=middle, but a string format isn't correct. Please explain, which format would be Ok?
That doesn't look like a very uri friendly syntax you're trying to use there.
Try doing this instead:
?status[]=closed&status[]=cancelled&priority=middle
Then use request.GET.getlist('status[]') to get back the list and use the values for logical OR queryset filtering:
qs = qs.filter(status__in=request.GET.getlist('status[]', [])
and then add any additional filtering which works as logical AND.
If you're using axios, it should automatically format js status url param into proper format.

splunk search query returns entries with a variable value greater than some number

I've this log entry:
"2014-11-22 02:42:10,545 .. - average:2.74425 , min:1.43 , max:4.007..."
i want to create a search query that returns all log entries with "average > 5"
i want to select the date of the log entry and the average value,
can this be done? how can i do this?
Thanks,
It is quite simple to do in Splunk and you'll have to do it in two steps:
Parse your log to get each of the fields in your log files. To do this use the props.conf and transforms.conf files on your indexer server or on your client if you are using the heavy forwarder. Another option is to send you fields using the key=value format that Splunk knows how to parse by default. Example: "2014-11-22 02:42:10,545 .. - average=2.74425 min=1.43 max=4.007..."
After getting your fields in Splunk just search for average>5 and you'll get all these search results easily.
Answer from splunk:
Did you already extract the average field?
If not, go to Settings -> Fields -> Field Extractions -> New, enter "average" as name, fill in your sourcetype, and use this as inline extraction:
average:(?<average>\d+\.?\d*)
it worked. :)