I am pretty new to postregSQL and not too familiar with SQL yet. But im trying to learn.
In my database i want to store huge JSON files (~2mio lines, 40mb) and later query them as fast as possible. Right now it is to slow, so i figured indexing should do the trick.
The Problem is i do not know how to index the file since it is a bit tricky. I am woking on it the whole day now and starting to get desperate..
My DB is calles "replays" the json column "replay_files"
So my files look like this:
"replay": [
{
"data": {
"posX": 182,
"posY": 176,
"hero_name": "CDOTA_Unit_Hero_EarthSpirit"
},
"tick": 2252,
"type": "entity"
},
{
"data": {
"posX": 123,
"posY": 186,
"hero_name": "CDOTA_Unit_Hero_Puck"
},
"tick": 2252,
"type": "entity"
}, ...alot more lines... ]}
I tried to get all the entries with say heron_name: Puck
So i tried this:
SELECT * FROM replays r, json_array_elements(r.replay_file#>'{replay}') obj WHERE obj->'data'->>'hero_name' = 'CDOTA_Unit_Hero_Puck';
Which is working but for smaller files.
So i want to index like that:
CREATE INDEX hero_name_index ON
replays ((json_array_elements(r.replay_file#>'{replay}')->'data'->'hero_name);
BUt it doesn work. I have no idea how to reach that deep into the file and get to index this stuff.
I hope you understand my problem since my english isnt the best and can help me out here. I just dont know what else to try out.
Kind regards and thanks alot in advance
Peter
Related
My splunk data looks like this
{
"name": "john",
"foo": []
}
sometimes foo is empty, and sometimes it has data in it. I want to query for all the EMPTY using SPL2.
I tried foo=[] and I tried foo="[]" but neither works.
You can try the following syntax :
<your_search>
| where isnull('foo{}')
I've spent hours on this without solution. I'm having a terrible time identifying and correcting an error when validating in Google SDTT. After dozens of revisions, I continue to get "Missing ',' or ']' in array declaration" error. I'd appreciate if someone will take a look, make the needed corrections or show me what I'm overlooking. Here's the code snippet >> https://drive.google.com/drive/folders/1HNJgZrGa7_F6-7FuGCbL2Y0vFPGsX7MQ
Your top sameAs is a bit mixed up. I suspect you wanted to quote every URL and drop the last comma. e.g.
"sameAs" : [ "https://plus.google.com/100804793716209856515", "https://plus.google.com/115455274861158767219", "https://www.facebook.com/pg/TallentRoofingInc/about/", "https://www.yelp.com/biz/tallent-roofing-mckinney-2", "https://www.yelp.com/biz/tallent-roofing-melissa", "https://www.yelp.com/biz/tallent-roofing-el-paso-2", "https://www.yelp.com/biz/tallent-roofing-alpine", "https://www.yelp.com/biz/tallent-roofing-artesia" ]
Your graph is an array, but does to some bad closing of }s so it directly includes properties. Remove the } on the line before "aggregateRating" and add one to the line after "reviewCount".
,
"aggregateRating" :
{
"#type": "AggregateRating",
"ratingValue" : "4.9",
"ratingCount": "57",
"reviewCount": "53"
}},
I've a sql column filled with json document, one for row:
[{
"ID":"TOT",
"type":"ABS",
"value":"32.0"
},
{
"ID":"T1",
"type":"ABS",
"value":"9.0"
},
{
"ID":"T2",
"type":"ABS",
"value":"8.0"
},
{
"ID":"T3",
"type":"ABS",
"value":"15.0"
}]
How is it possible to trasform it into tabular form? I tried with redshift json_extract_path_text and JSON_EXTRACT_ARRAY_ELEMENT_TEXT function, also I tried with json_each and json_each_text (on postgres) but didn't get what expected... any suggestions?
desired results should appear like this:
T1 T2 T3 TOT
9.0 8.0 15.0 32.0
I assume you printed 4 rows. In postgresql
SELECT this_column->'ID'
FROM that_table;
will return column with JSON strings. Use ->> if you want text column. More info here: https://www.postgresql.org/docs/current/static/functions-json.html
In case you were using some old Postgresql (before 9.3), this gets harder : )
Your best option is to use COPY from JSON Format. This will load the JSON directly into a normal table format. You then query it as normal data.
However, I suspect that you will need to slightly modify the format of the file by removing the outer [...] square brackets and also the commas between records, eg:
{
"ID": "TOT",
"type": "ABS",
"value": "32.0"
}
{
"ID": "T1",
"type": "ABS",
"value": "9.0"
}
If, however, your data is already loaded and you cannot re-load the data, you could either extract the data into a new table, or add additional columns to the existing table and use an UPDATE command to extract each field into a new column.
Or, very worst case, you can use one of the JSON Functions to access the information in a JSON field, but this is very inefficient for large requests (eg in a WHERE clause).
This should be a very simple one (been searching for a solution all day - read a thousand and a half posts).
I put a test row in my HBASE table in hbase shell:
put 'iEngine','testrow','SVA:SourceName','Journal of Fun'
I can get the value for a column family using the REST API in DHC Chrome:
https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA
I can't seem to get it for the specific cell: https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA:SourceName
{
"Row": [{
"key": "dGVzdHJvdw==",
"Cell": [{
"column": "U1ZBOlNvdXJjZU5hbWU=",
"timestamp": 1440602453975,
"$": "Sm91cm5hbCBvZiBGdW4="
}]
}]
}
I get back a 400 error.
When successfully asking for just the family, I get back:
I tried replacing the encoded value for SVA:SourceName, and a thousand other things. I'm assuming I'm missing something simple.
Also, the following works:
hbase(main):012:0> get 'iEngine', 'testrow', 'SVA:SourceName'
COLUMN CELL
SVA:SourceName timestamp=1440602453975, value=Journal of Fun
1 row(s) in 0.0120 seconds
hbase(main):013:0>
I opened a case with Microsoft support. I received confirmation that it is a bug (IIS and the colon separator not working). They are working on a fix - they are slightly delayed as the decide on the "best" way to fix it.
I have a file.csv file with over 180,000 lines in it. I need to pick out only about 8 lines from it. Each of these lines has go the same id so this is what the file would look like:
"id", "name", "subid"
"1", "Entry no 1", "4234"
"1", "Entry no 2", "5233"
"1", "Entry no 3", "2523"
. . .
"1", "Entry no 8", "2322"
"2", "Entry no 1", "2344"
Is there a way for me to pick out just all the data with the id 1 or another numbers without indexing the whole file into a database (Either SQLITE or Core Data) since this would cause major performance issues for the app to have to index 180,0000 records. This is all for the iPhone and is on ios 5.
Thanks for the help.
Just parse the CSV and store the values in local variable. For parsing CSV via Objective-C checkout following tutorial(s):
http://www.macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data
http://cocoawithlove.com/2009/11/writing-parser-using-nsscanner-csv.html
Kind regards,
Bo
I would strongly recommend putting that in Core Data, sure it will be indexed but that is actually a good thing since your lookups will be wayyy faster, parsing that document every time is going to be way more demanding than looking it up in Core Data, the overhead is a small price to pay.
Sounds like a good job for Dave DeLong's CHCSVParser.
It works a bit like NSXMLParser, so you can just skip all the lines you don't want, and keep the 8 lines you do want.