I have a file.csv file with over 180,000 lines in it. I need to pick out only about 8 lines from it. Each of these lines has go the same id so this is what the file would look like:
"id", "name", "subid"
"1", "Entry no 1", "4234"
"1", "Entry no 2", "5233"
"1", "Entry no 3", "2523"
. . .
"1", "Entry no 8", "2322"
"2", "Entry no 1", "2344"
Is there a way for me to pick out just all the data with the id 1 or another numbers without indexing the whole file into a database (Either SQLITE or Core Data) since this would cause major performance issues for the app to have to index 180,0000 records. This is all for the iPhone and is on ios 5.
Thanks for the help.
Just parse the CSV and store the values in local variable. For parsing CSV via Objective-C checkout following tutorial(s):
http://www.macresearch.org/cocoa-scientists-part-xxvi-parsing-csv-data
http://cocoawithlove.com/2009/11/writing-parser-using-nsscanner-csv.html
Kind regards,
Bo
I would strongly recommend putting that in Core Data, sure it will be indexed but that is actually a good thing since your lookups will be wayyy faster, parsing that document every time is going to be way more demanding than looking it up in Core Data, the overhead is a small price to pay.
Sounds like a good job for Dave DeLong's CHCSVParser.
It works a bit like NSXMLParser, so you can just skip all the lines you don't want, and keep the 8 lines you do want.
Related
I have a dataframe output as a result of running some code, like so
df = pd.DataFrame({
"i": self.direct_hit_i,
"domain name": self.domain_list,
"j": self.direct_hit_j,
"domain name 2": self.domain_list2,
"domain name cleaned": self.clean_domain_list,
"domain name cleaned 2": self.clean_domain_list2
})
All I was really looking for was a way to save these data to whatever file e.g. txt, csv but in a way where the columns of data align with the header. I was using df.to_csv() with \t delimeter but due to the data have different lengths of string and numbers, the elements within each row never quite line up as a column with the corresponding header. So I resulted to using
with open('./filename.txt', 'w') as fo:
fo.write(df.__repr__())
But bear in mind the data in the dataframe are lists with really long length. So for small lengths it returns
which is exactly what I want. However, when I have very big lists it gives me
So as seen below the outputs are truncated. I would like it to not be truncated since I'll need to manually scroll down and verify things.
Try the syntax:
with open('./filename.txt', 'w') as fo:
fo.write(f'{df!r}')
Another way of doing this export to csv would be to use a too like Mito, which full disclosure I'm the author of. It should allow you to export ot CSV easier than the process here!
Currently I have been able to translate only words, but in the case of wanting to translate an entire sentence, I don't know how to do it.
taking the following example json
{
"Hello":"Hola"
"how":"como"
"You go":"te va"
"text 4": "texto 4"
"text 5: "texto 5"
}
So when entering all the text "Hola como te va texto 4 texto 5", I must get "Hello how do you go text 4 text 5" as translation,but I only manage to do it by words, example:
<p>{{$t("Hola")}}</p>
get in response in the browser 'Hola', indeed
Lucky you, I just learned this about a few weeks ago.
So what you need is actually exist in the documentation. I believe it is called Linked Local Message.
So here's an example for you..
{
"Hello":"Hola",
"how":"comp",
"You_go":"te va",
"text4": "texto 4",
"text5: "texto 5",
"sentence": "#:Hello #:how #:You_go #:text4 #:text5"
}
Then try
$t('sentence')
So it will actually linked to the "Hello" part of your json. So if you only have #:Hello at sentence it will result in Hola. It's kind of hard to read, but it's the syntax.
And if you didn't understood, I just made a fiddle for your experiment related to the feature. It has two languages that is en and test that represent whatever language you use in the example. Here is the link to jsFiddle
Hope it helps, feel free to ask as well :D
I am pretty new to postregSQL and not too familiar with SQL yet. But im trying to learn.
In my database i want to store huge JSON files (~2mio lines, 40mb) and later query them as fast as possible. Right now it is to slow, so i figured indexing should do the trick.
The Problem is i do not know how to index the file since it is a bit tricky. I am woking on it the whole day now and starting to get desperate..
My DB is calles "replays" the json column "replay_files"
So my files look like this:
"replay": [
{
"data": {
"posX": 182,
"posY": 176,
"hero_name": "CDOTA_Unit_Hero_EarthSpirit"
},
"tick": 2252,
"type": "entity"
},
{
"data": {
"posX": 123,
"posY": 186,
"hero_name": "CDOTA_Unit_Hero_Puck"
},
"tick": 2252,
"type": "entity"
}, ...alot more lines... ]}
I tried to get all the entries with say heron_name: Puck
So i tried this:
SELECT * FROM replays r, json_array_elements(r.replay_file#>'{replay}') obj WHERE obj->'data'->>'hero_name' = 'CDOTA_Unit_Hero_Puck';
Which is working but for smaller files.
So i want to index like that:
CREATE INDEX hero_name_index ON
replays ((json_array_elements(r.replay_file#>'{replay}')->'data'->'hero_name);
BUt it doesn work. I have no idea how to reach that deep into the file and get to index this stuff.
I hope you understand my problem since my english isnt the best and can help me out here. I just dont know what else to try out.
Kind regards and thanks alot in advance
Peter
This should be a very simple one (been searching for a solution all day - read a thousand and a half posts).
I put a test row in my HBASE table in hbase shell:
put 'iEngine','testrow','SVA:SourceName','Journal of Fun'
I can get the value for a column family using the REST API in DHC Chrome:
https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA
I can't seem to get it for the specific cell: https://ienginemaster.azurehdinsight.net/hbaserest/iEngine/testrow/SVA:SourceName
{
"Row": [{
"key": "dGVzdHJvdw==",
"Cell": [{
"column": "U1ZBOlNvdXJjZU5hbWU=",
"timestamp": 1440602453975,
"$": "Sm91cm5hbCBvZiBGdW4="
}]
}]
}
I get back a 400 error.
When successfully asking for just the family, I get back:
I tried replacing the encoded value for SVA:SourceName, and a thousand other things. I'm assuming I'm missing something simple.
Also, the following works:
hbase(main):012:0> get 'iEngine', 'testrow', 'SVA:SourceName'
COLUMN CELL
SVA:SourceName timestamp=1440602453975, value=Journal of Fun
1 row(s) in 0.0120 seconds
hbase(main):013:0>
I opened a case with Microsoft support. I received confirmation that it is a bug (IIS and the colon separator not working). They are working on a fix - they are slightly delayed as the decide on the "best" way to fix it.
Started using Cypher about a week ago (really like it). In the 'browser' interface I'm running two queries:
1) start n=node:Node(name="foo") match (n)-[r*..4]-(m) return n,m
2) match (n{name:"foo"})-[r*..4]-(m) return n,m
The first query returns almost immediately, the second query more than an hour and counting. Naively I would think these would be equivalent, clearly they are not. I ran a 'smaller' (path just up to 1) version of both in the neo-shell so I could profile them.
profile start n=node:Node(name="foo") match (n)-[r*..1]-(m) return n,m;
ColumnFilter(symKeys=["m", "n", " UNNAMED51", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
TraversalMatcher(start={"expr": "Literal(foo)", "identifiers": ["n"], "key": "Literal(name)",
"idxName": "Node", "producer": "NodeByIndex"}, trail="(n)-[*1..1]-(m)", _rows=4, _db_hits=5)
.
profile match (n{name:"foo"})-[r*..1]-(m) return n,m
ColumnFilter(symKeys=["n", "m", " UNNAMED33", "r"], returnItemNames=["n", "m"], _rows=4, _db_hits=0)
Filter(pred="Property(n,name(0)) == Literal(foo)", _rows=4, _db_hits=196870)
TraversalMatcher(start={"producer": "AllNodes", "identifiers": ["m"]},
trail="(m)-[*1..1]-(n)", _rows=196870, _db_hits=396980)
From other stackoverflow questions I understand db_hits is good to look at, so it looks like the second query has basically done a linear scan (my db is almost 400k nodes). This seems to be confirmed by the "producer" value of "AllNodes" instead of "NodeByIndex".
Obviously I need to specify the match (predicate) differently so that it hits the index. The index is called 'Node' on parameter 'name'. My googling, stacko search is failing me.. how do I specify the conditional in the match so that it hits the index?
Update:
After some poking around it appears I'm using a 'legacy' index? and then trying to hit that with the 'new style (don't use start) query... (kinda extrapolating here). So I can do the following:
create index ON :label(name)
and that would provide an index for a particular label on the name property, but I really want an index (I guess non-legacy index) on ALL the node names. I have use cases where that's important (user may not know the label but does know the name).
Any suggestions or guidance is much appreciated.
Right now there is no global schema index, so you would probably want to create an index on a generic label like Entity or Node and create an index like this:
create index on :Entity(name);
And add that Entity label to all your nodes.
match (n) set n:Entity;