Compare good Yolov3 results - yolo

I tried to compare my training/testing results with the others results, I have googled and try more than week to find log files or result.csv but cannot find anything. Any one have an idea?? The code used in compare is in follows:

Related

How to partition by in pandas and output to a word doc?

I have a table I have filtered from data. It is my highlights across the web. I want to, ultimately, output these to a doc file I have by the page they came from
I have the api data filtered down to two columns
url|quote
How do I, for each url, output the quote to a doc file. or just for starters iterate through a set of quotes by each earl.
In SQL it would be something like this
Select quote over(partition by url) as sub_header
From table
url quote
https://jotengine.com/transcriptions/WIUL8HBabqxffIDOkUA9Dg I actually think that the bigger problem is not necessarily having the ideas. I think everyone has lots of interesting ideas. I think the bigger problem is not killing the bad ideas fast enough. I have the most respect for the Codecademy founders in this respect. I think they tried 12 ideas in seven weeks or something like that, in the summer of YC.
https://jotengine.com/transcriptions/WIUL8HBabqxffIDOkUA9Dg We were like what the heck is going on here so we went and visited five of our largest customers in New York, this was about three years ago and we said okay, you're using the S3 integration but what the heck are you using it for? For five out of five customers in a row, they said well we have a data engineering team that's taking data from the S3 bucket, converting it into CS view files and managing all the schema-translations and now they're uploading it into a data warehouse like Redshift. The first time I heard that from a customer, I was like okay, that's interesting
I want to output a url header followed by all the quotes I've highlighted. Ideally my final product will be in docx
it would be great if you could provide some source code to help explain your problem. From looking at your question, I would say all you need to do is put your columns into a DataFrame, then export this to excel.
df = pd.DataFrame({"url":url,"quote":quote})
df["quote"].to_excel("filename.xlsx")
Hope this helps.

Pentaho "Return value id can't be found in the input row"

I have a pentaho transformation, which is used to read a text file, to check some conditions( from which you can have errors, such as the number should be a positive number). From this errors I'm creating an excel file and I need for my job the number of the lines in this error file plus to log which lines were with problem.
The problem is that sometimes I have an error " the return values id can't be found in the input row".
This error is not every time. The job is running every night and sometimes it can work without any problems like one month and in one sunny day I just have this error.
I don't think that this is from the file, because if I execute the job again with the same file it is working. I can't understand what is the reason to fail, because it is saying the value "id", but I don't have such a value/column. Why it is searching a value, which doesn't exists.
Another strange thing is that normally the step, which fails should be executed at all( as far as I know), because no errors were found, so we don't have rows at all to this step.
Maybe the problem is connected with the "Prioritize Stream" step? Here I'm getting all errors( which use exactly the same columns). I tried before the grouping steps to put a sorting, but it didn't help. Now I'm thinking to try with "Blocking step".
The problem is that I don't know why this happen and how to fix it. Any suggestions?
see here
Check if all your aggregates ins the Group by step have a name.
However, sometimes the error comes from a previous step: the group (count...) request data from the Prioritize Stream, and if that step has an error, the error gets reported mistakenly as coming from the group rather than from the Prioritze.
Also, you mention a step which should not be executed because there is no data: I do not see any Filter which would prevent rows with missing id to flow from the Prioritize to the count.
This is a bug. It happens randomly in one of my transformations that often ends up with empty stream (no rows). It mostly works, but once in a while it gives this error. It seems to only fail when the stream is empty though.

Too many errors [invalid] ecountered when loading data into bigquery

I enriched a public dataset of reddit comments with data from LIWC (Linguistic Inquiry and Word Count). I have 60 files รก 600mb. The idea is now to upload to BigQuery, getting them together and analyze the results. Alas i faced some problems.
For a first test I had a test sample with 200 rows and 114 columns. Here is a link to the csv i used
I first asked on Reddit and fhoffa provided a really good answer. The problem seems to be the newlines (/n) in the body_raw column, as redditors often include them in their text. It seems BigQuery cannot process them.
I tried to transfer the original data, which i transfered to storage, back to BigQuery, unedited, untouched, but the same problem. BigQuery cannot even process the original data, which comes from BigQuery...?
Anyway, I can open the csv without problems in other programs such as R, which means that the csv itself is not damaged or the schema is inconsistent. So fhoffa's command should get rid of it.
bq load --allow_quoted_newlines --allow_jagged_rows --skip_leading_rows=1 tt.delete_201607a myproject.newtablename gs://my_testbucket/dat2.csv body_raw,score_hidden,archived,name,author,author_flair_text,downs,created_utc,subreddit_id,link_id,parent_id,score,retrieved_on,controversiality,gilded,id,subreddit,ups,distinguished,author_flair_css_class,WC,Analytic,Clout,Authentic,Tone,WPS,Sixltr,Dic,function,pronoun,ppron,i,we,you,shehe,they,ipron,article,prep,auxverb,adverb,conj,negate,verb,adj,compare,interrog,number,quant,affect,posemo,negemo,anx,anger,sad,social,family,friend,female,male,cogproc,insight,cause,discrep,tentat,certain,differ,percept,see,hear,feel,bio,body,health,sexual,ingest,drives,affiliation,achieve,power,reward,risk,focuspast,focuspresent,focusfuture,relativ,motion,space,time,work,leisure,home,money,relig,death,informal,swear,netspeak,assent,nonflu,filler,AllPunc,Period,Comma,Colon,SemiC,QMark,Exclam,Dash,Quote,Apostro,Parenth,OtherP
The output was:
Too many positional args, still have ['body_raw,score_h...]
If i take away "tt.delete_201607a" from the command, i get the same error message I have often seen now:
BigQuery error in load operation: Error processing job 'xx': Too many errors encountered.
So i do not know what to do here. Should I get rid of /n with Python? That would take probably days (although im not sure, i am not a programmer), as my complete data set is around 55 million rows.
Or do you have any other ideas?
I checked again, and I was able to load the file you left on dropbox without a problem.
First I made sure to download your original file:
wget https://www.dropbox.com/s/5eqrit7mx9sp3vh/dat2.csv?dl=0
Then I run the following command:
bq load --allow_quoted_newlines --allow_jagged_rows --skip_leading_rows=1 \
tt.delete_201607b dat2.csv\?dl\=0 \
body_raw,score_hidden,archived,name,author,author_flair_text,downs,created_utc,subreddit_id,link_id,parent_id,score,retrieved_on,controversiality,gilded,id,subreddit,ups,distinguished,author_flair_css_class,WC,Analytic,Clout,Authentic,Tone,WPS,Sixltr,Dic,function,pronoun,ppron,i,we,you,shehe,they,ipron,article,prep,auxverb,adverb,conj,negate,verb,adj,compare,interrog,number,quant,affect,posemo,negemo,anx,anger,sad,social,family,friend,female,male,cogproc,insight,cause,discrep,tentat,certain,differ,percept,see,hear,feel,bio,body,health,sexual,ingest,drives,affiliation,achieve,power,reward,risk,focuspast,focuspresent,focusfuture,relativ,motion,space,time,work,leisure,home,money,relig,death,informal,swear,netspeak,assent,nonflu,filler,AllPunc,Period,Comma,Colon,SemiC,QMark,Exclam,Dash,Quote,Apostro,Parenth,OtherP,oops
As mentioned in reddit, you need the following options:
--allow_quoted_newlines: There are newlines inside some strings, hence the CSV is not strictly newline delimited.
--allow_jagged_rows: Not every row has the same number of columns.
,oops: There is an extra column in some rows. I added this column to the list of columns.
When it says "too many positional arguments", it's because your command says:
tt.delete_201607a myproject.newtablename
Well, tt.delete_201607a is how I named my table. myproject.newtablename is how you named your table. Choose one, not both.
Are you sure you are not able to load the sample file you left on dropbox? Or you are getting errors from rows I can't find on that file?

Pig: how to loop through all fields/columns?

I'm new to Pig. I need to do some calculation for all fields/columns in a table. However, I can't find a way to do it by searching online. It would be great if someone here can give some help!
For example: I have a table with 100 fields/columns, most of them are numeric. I need to find the average of each field/column, is there an elegant way to do it without repeat AVERAGE(column_xxx) for 100 times?
If there's just one or two columns, then I can do
B = group A by ALL;
C = foreach B generate AVERAGE(column_1), AVERAGE(columkn_2);
However, if there's 100 fields, it's really tedious to repeatedly write AVERAGE for 100 times and it's easy to have errors.
One way I can think of is embed Pig in Python and use Python to generate a string like that and put into compile. However, that still sounds weird even if it works.
Thank you in advance for help!
I don't think there is a nice way to do this with pig. However, this should work well enough and can be done in 5 minutes:
Describe the table (or alias) in question
Copy the output, and reorgaize it manually into the script part you need (for example with excel)
Finish and store the script
If you need to be able with columns that can suddenly change etc. there is probably no good way to do it in pig. Perhaps you could read it in all columns (in R for example) and do your operation there.

Beyond Compare:How to compare specific sql files?

I need to compare two *.sql files which have serval changes. There are some changes like date and time which should be ignored. BC shows differences when just the time changes like:
File one: 13.06.14, 10:42 Files two: 13.06.14, 10:43.
How can i script it that BC ignores the date and time when comparing the two files?
I hope this will help, you can find an explanation hire:
Define Unimportant Text in Beyond Compare
There is also a very useful video explaining the steps to define unimportant text in Beyond Compare on that link.