Specify wildcard in JSON? - sql

I have searched the web, but I cannot find an answer (or a duplicate question for that matter).
I am POSTing JSON via Jersey REST services.
I usually POST something like this, where each value is specified:
{
"w":"val1",
"x":"val2",
"y":"val3",
"z":{
"zz":{
"za":"val9",
"zb":"val8",
"zc":"val7"
}
}
}
I would like to POST something like this, where the asterisk is a wildcard.
{
"w":"val1",
"x":"val2",
"y":"val3",
"z":{
"zz":{
"za":"val9",
"zb":"*",
"zc":"val7"
}
}
}
The JSON values will ultimately be passed as parameters to a Sybase Stored procedure, but in this case I do not know any valid values for "zb".
For example, "zb" may be a primary key ID. But I do not know any of the 10 digit ID's. So rather than repeatedly trying random combinations of 10 digits until I get a result back, I would like to specify that ANY existing primary key in the table would suffice.
Is this possible? If so, how?

For the sake of having a complete Q&A repository on this website, I am posting what I have come up with, although it is an unsatisfactory solution.
My workaround is a combination of (at the time of this writing) the two comments left to the question.
That being said, I did not specify a "wildcard" character. I POSTed empty strings for data I did not have, and, of course, received a broader data set in the response. For example, my JSON looked something like this:
{
"w":"val1",
"x":"",
"y":"val3",
"z":{
"zz":{
"za":"",
"zb":"",
"zc":"val7"
}
}
}
Ultimately, I could not do exactly what I wanted to do as described in the question. That is either because (it's still unknown to me) 1) it is currently impossible to do what I am trying to do in JSON or 2) it is possible, but I still do not know how.

Related

How to filter by tag in Jaeger

When trying to filter by tag, there is a small popup:
I have been looking for logfmt around, but all I can find is key=value format.
My questions are:
Is there a way for something more sophisticated? (starts_with, not equal, contains, etc)
I am trying to filter by url using http.url="http://example.com?bla=bla&foo=bar". I am pretty sure the value exists because I am copy/pasting from my trace. I am getting no results. Do I need to escape characters or do something else for this to work?
I did some research around logfmt as well. Based on the documentation of the original implementation and in the Python implementation of the parser (and respective tests), I would say that it doesn't support anything more sophisticated (like starts_with, not equal, contains). And this is because the output of the parser is a simple dictionary (with no regex involved in the values).
As for the second question, using the same mentioned Python parser, I was able to double-check that your filter looks fine:
from logfmt import parse_line
parse_line('http.url="http://example.com?bla=bla&foo=bar"')
Output:
{'http.url': 'http://example.com?bla=bla&foo=bar'}
This makes me suspect of an issue on the Jaeger side, but this is as far as I could go.

How can I write a ravenDB query in the studio that finds all fields that are not empty

I'm new to lucene, day 1 new. So I've read a tutorial on lucene and spent a while trying to work out how to find a non null value in lucene.
So I have a document called Inspect
The document has two fields I'm interested in: Inspect and Direct.
{
"Inspect": "Feather",
"Direct": {}
}
I want to find all documents where Inspect = "Feather" and Direct is not empty.
I am also interested in finding documents where Direct is also empty.
I am doing this in the ravenDB studio, so I am using lucene. I have tried a few things like
Inspect: Feather
And NOT
Direct: [[NULL_VALUE]]
However this doesn't seem to work. Any advice or some direction would be much appreciated.
Cheers
You need to run a query like this:
Inspect: Feather AND NOT Direct.Count: 0
When you are comparing to a null object, it fails, Direct is not null, but with the .Count there you are actually counting the number of properties in the object, which seems to be what you want.
#stacka Hi! I'm also rather new to RavenDB, but I have some ideas that may help you. First of all, use the '-' (minus) character instead of NOT. It's a convention. Second, you may face the problem that query cannot be run against db, when any property is not indexed. So, you should create one including the field you want to query against. Hope, this would help.

JSON issue with SQL lines return

I have an issue when I try to parse my JSON. I create my JSON "by my hand" like this in PHP :
$outp ='{"records":['.$outp.']}'; and I create it so I can take field from my database to show them in the page. The thing is, in my database I have a field "description" where people can give a description about something. So some people make return to line like this for example :
Interphone
Equipe:
Canape-lit
Autre:
Local
And when I try to parse my JSON there is an error because of these line's return. "SyntaxError: Unexpected token".
Here's an example of my JSON :
{"records":[{"Parking":"Aucun","Description":"Interphone
Equipé :
Canapé-lit
","Chauffage":"Fioul"}]}
Can someone help me please ?
You've really dug yourself into a very bad hole here.
The problem
The problem you're running into is that a newline (line feed and carriage return characters) are not valid JSON. They must be escaped as \n and \r. You can see the full JSON standard here here.
You need to do two things.
Fix your code
In spite of the fact that the JSON standard is comparatively simple, you should not create your JSON by hand. You already know why. You have to handle several edge cases and the like. Your users could enter anything on the page, and you need to make sure that it gets properly encoded no matter what.
You need to use a JSON serialization tool. json_encode is built in as of 5.2. If you can't use this for any reason, find an existing, widely used (and therefore heavily tested) third party library with a JSON serializer.
If you're asking, "Why can't I create my own serializer?", you could, in theory. Realistically, there is no point. Yours won't be better than existing ones. It will be much more likely to have bugs and to perform worse than something a lot of people have used in production. It will also take much longer to create and test than using an existing one.
If you need this data in code after you pull it back out of the database, then you need a JSON deserializer. json_decode should also be fine, but again, if you can't use it, look for a widely used third party library.
Fix your data
If you haven't hit production yet, you have really dodged a bullet here, and you can skip this whole section. If you have gone to production and you have data from users, you've got a major problem.
Even after you fix your code, you still have bad data in your production database that won't parse correctly. You have to do something to make this data usable. Unfortunately, it is impossible to automatically recover the original data for every possible case. This is because users might have entered the characters/substrings you added to the data to turn it into "JSON"; for example, they might have entered a comma separated list of quoted words: "dog","cat","pig", and "cow". That is an intractable problem, since you know for a fact you didn't properly serialize all your incoming input. There's no way to tell the difference between text your code generated and text the user entered. You're going to have to settle for a best effort and try to throw errors when you can't figure it out in code, and it might mess up a user's data in some special cases. You might have to fix some things manually.
Start by discussing this with your manager, team lead, whoever you answer to. Assuming that you can't lose the data, this is the most sound process to follow for creating a fix for your data:
Create a database dump of your production data.
Import that dump into a development database.
Develop and test your method of repairing this data against the development database from the last step.
Ensure you have a recovery plan for deployments gone wrong. Test this plan in your testing environment.
Once you've gone through your typical release process, it's time to release the fixed code and the data update together.
Take the website offline.
Back up the database.
Update the website with the new code.
Implement your data fix.
Verify that it worked.
Bring the site online.
If your data fix doesn't work (possibly because you didn't think of an edge case or something), then you have a nice back up you can restore and you can cancel the release. Then go back to step 1.
As for how you can fix the data, I don't recommend queries here. I recommend a little script tool. It would have to load the data from the database, pull the string apart, try to identify all the pieces, build up an object from those pieces, and finally serialize them to JSON correctly, and put them back into the database.
Here's an example function of how you might go about pulling the string apart:
const ELEMENT_SEPARATOR = '","';
const PAIR_SEPARATOR = '":"';
function recover_object_from_malformed_json($malformed_json, $known_keys) {
$tempData = substr($malformed_json, 14); // Removes {"records":[{" prefix
$tempData = substr($tempData, 0, -4); // Removes "}]} suffix
$tempData = explode(ELEMENT_SEPARATOR, $tempData); // Split into what we think are pairs
$data = array();
$lastKey = NULL;
foreach ($tempData as $i) {
$explodedI = explode(KEY_VALUE_SEPARATOR, $i, 2); // Split what we think is a key/value into key and value
if (in_array($explodedI[0], $known_keys)) { // Check if it's actually a key
// It's a key
$lastKey = $explodedI[0];
if (array_key_exists($lastKey, $data)) {
throw new RuntimeException('Duplicate key: ' + $lastKey);
}
// Assign the value to the key
$data[$lastKey] = $explodedI[1];
}
else {
// This isn't a key vlue pair, near as we can tell
// So it must actually be part of the last value,
// and the user actually entered the delimiter as part of the value.
if (is_null($lastKey)) {
// This one is REALLY messed up
throw new RuntimeException('Does not begin with a known key');
}
$data[$lastKey] += ELEMENT_SEPARATOR;
$data[$lastKey] += $i;
}
}
return $data;
}
Note that I'm assuming that your "list" is a single element. This gets much harder and much messier if you have more than one. You'll also need to know ahead of time what keys you expect to have. The bottom line is that you have to undo whatever your code did to create the "JSON", and you have to do everything you can to try to not mess up a user's data.
You would use it something like this:
$knownKeys = ["Parking", "Description", "Chauffage"];
// Fetch your rows and loop over them
foreach ($dbRows as $row) {
try {
$dataFromDb = $row.myData // or however you would pull out this string.
$recoveredData = recover_object_from_malformed_json($dataFromDb);
// Save it back to the DB
$row.myData = json_encode($recoveredData);
// Make sure to commit here.
}
catch (Exception $e) {
// Log the row's ID, the content that couldn't be fixed, and the exception
// Make sure to roll back here
}
}
(Forgive me if the database stuff looks really wonky. I don't do PHP, so I have no idea how that code should look. Hopefully, you can at least get the concept.)
Why I don't recommend trying to parse your data as JSON to recover it.
The bottom line is that your data in the database is not JSON. IF you try to parse it as such, all the other edge cases you didn't handle properly will get screwed up in the process. You'll see bad things like
\\ becomes \
\j becomes j
\t becomes a tab character
In the end, it will just mess up your data even more.
Conclusion
This is a huge mess, and you should never try to convert something into a standard format without using a properly built, well tested serializer. Fixing the data is going to be hard, and it's going to take time. I also seriously doubt you have a lot of background in text processing techniques, and lacking that knowledge is going to make this harder. You can get some good info on text processing by studying how compilers are made. Good luck.

Fastest key/value pair container in Objective-C

I am creating a syntax highlighting engine. My need is very specific. Keywords will be associated to their respective attribute array via a pointer. The data structure will look something like:
dict = {
"printf": keyword_attr_ptr
, "sprintf": keyword_attr_ptr
, "#import": special_attr_ptr
, "string": lib_attr_ptr
}
The look-up needs to be very fast as I will be iterating over this list every keypress.
I'm asking this question because I can not find any good documentation regarding how NSDictionary caches (if it does) and looks up values by its keys (does it use a map? a hashmap?). Can I rely on NSDictionary to be optimized to search for keys by strings?
When I was doing something similar a long while ago I used the MFC CMap function with very good results. NSDictionary appears to be the equivalent to CMap but the key type isn't specified and the NSDictionary clearly states that a key can be any type of object. I just want to make sure I can rely on it to return the results extremely fast before I put a lot of energy into this problem.
UPDATE 1
After a day of research, I ask the question on SO and I find the answer immediately after... go figure.
This is the documentation related to Dictionaries:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Collections/Articles/Dictionaries.html
It uses a hash table to manage its storage. I guess the short answer is that its almost equivalent to CMap.

TSearch2 - dots explosion

Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).