Automatic test data generation - testing

I need to prepare sample test data with 5 million rows of Different employees ie;
It should contain relevant information like -
First Name
Last Name
Address-1
Address-2
Zip code
st
county
country
...etc
Is there any tool that I can use to test it?

I have found the site http://www.generatedata.com/ to be good for this kind of thing - it has a bunch of different formats you can generate data in and outputs in a number of different formats that can be either read in by your code (e.g. from CSV) or easily translated into code using your favorite Unix text manipulation tools.

Either try a webservice, like:
http://www.generatedata.com/
http://www.mockaroo.com/
or try one of the following utils for fake data generation:
PHP "Faker" - https://github.com/fzaninotto/Faker
Perl's Data::Faker - http://metacpan.org/pod/Data::Faker
ruby "faker" - http://faker.rubyforge.org/
http://paulthedutchman.nl/datagenerator/

I would like to suggest a modern PHP fake data generator, with also the ability to fake an entity.
Fakerino: https://github.com/niklongstone/Fakerino

Related

Asp.NET database multi-lang design [Include HTML]

,
I read lots of answer here and learn something about this topic but I need more help.Some pages in my project are getting the html(page body) from database.For my controls I use resx files and it works great, but now I need save the html values with multi-lang in my database.
I have a admin panel for my project and I edit sometimes the html or text or pictures in my editor and save it back to DB.I cache the html values in my website and get better performance for my project.But the html text parts is yet in my default lang I will create multi-lang.
I thought of a few possible solutions,
First solution was copying all of the same HTML data and create for all lang one by one
First solution problem : But later this doesnt make sense If I copy the values and create for other lang, then I had be repeating unnecessarily in my database the html values.
My second solution was like this :
this I hid it in the normal way in my DB (only a part of html in one column)
<li>Hello World</li>
<li>Hello Me</li>
Solution :
<li>[HelloWorld]</li>
<li>[HelloMe]</li>
string newHtmlValue = oldValue.ReplaceByLang("[HelloWorld]", GetCulture();,"Hallo Welt");
***// GetCulture(); return for this example german !
I create a new table for the replace text like the key is [HelloWorld]
1(ROW)- [HelloWorld] eng Hello World
2(ROW)- [HelloWorld] german Hallo Welt
3(ROW)- [HelloWorld] fr bonjour tout le monde
and in my project I select the right html value by culture lang.
Now my question is have you any better idea ?
I hope I've been clear and not messy, but if you need more informations I'll be glad to tell you more.Sorry for english.
I would prefer to store all your localized string in single repository like - DB or Satellite assembly.
For Example if you choose DB as repository - Define 3 tables (minimal structure)
1.Locale - Define your locale
2.Resourcemaster - Define your source string and reference Key and this key should be unique in you application and define a standard format like Module_Control_Section ..
3.LocalizedResource - Define your localized sting of Resourcemaster with Locale Key. Foreign key with ResourceMaster and Locale
In front End you can resole any string like control , Html string with localized string and Unique reference key.
Also Implement UI Caching / API caching for better performance.
Regards
Abdul

cTAKES UMLS ICD10 codes lookup

I created a cTAKES custom dictionary from UMLS database with ICD10 codes.
Right now I able to analyze the text by for example disease name, like Asthma and annotation index will contain the ICD10 code for this matching code = "J45.90".
Is it possible to configure cTAKES in order to reverse this process in order to look for ICD10 code appearance in the text instead?
The XML output contains the start and ends of a matched concept in the original corpus. I personally find it easier to convert the XML to a simple JSON format and then loop through it as needed.
I have been working on an open source solution for parsing out the data and displaying the corpus with the matches it in HTML: https://github.com/GoTeamEpsilon/ctakes-friendly-web-ui#demonstration - let me know if you'd like to contribute.

How to store data from Google Ngram API?

I need to store the data presented in the graphs on the Google Ngram website. For example, I want to store the occurences of "it's" as a percentage from 1800-2008, as presented in the following link: https://books.google.com/ngrams/graph?content=it%27s&year_start=1800&year_end=2008&corpus=0&smoothing=3&share=&direct_url=t1%3B%2Cit%27s%3B%2Cc0.
The data I want is the data you're able to scroll over on the graph. How can I extract this for about 140 different terms (e.g. "it's", "they're", "she's", etc.)?
econpy wrote a nice little module in Python that you can use through a command-line interface.
For your "it's" example, you would need to type this command in a terminal / windows console:
python getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
This will automatically save the query result in a CSV file named after your query parameters.
econpy's package, in #HugoMailhot's answer, no longer works (2021) and seems not maintained.
Here's a updated version, with some improvements for easier integration into Python code:
https://gitlab.com/cpbl/google-ngrams
You can call this from the command line (as in econpy's) to create a CSV file, e.g.
getngrams.py it's -startYear=1800 -endYear=2008 -corpus=eng_2009 -smoothing=3
or call it from python to get (and plot) data directly in python, e.g.:
from getngrams import ngrams
df = ngrams('bells and whistles -startYear=1900 -endYear=2018 -smoothing=2')
df.plot()
The xkcd functionality is still there too.
(Issues / bug fix pull requests /etc welcome there)

Custom naming of Aeroo report filename in Odoo

is there any way to get the report output from Aeroo named with a custom naming pattern?
I.e., for the invoice: [year]_[invoice number]...
#Raffaele, I'd recommend taking a look here and to this forum post.
You'll need to use some basic python logic in the report_custom_filename module to create the file name you need according to your requirements.
Using the following example I can create output for a filename for Sales Order/Quotation:
${(object.name or '').replace('/','_')}_${object.state == 'draft' and 'draft' or '' +'.xls'}
That looks like this:
SO039_.xls
You can add another field from the document/report you're printing out by adding another section, for example:
${(object.client_order_ref or '').replace('/','_')}_
this will add the field client_order_ref in front of the document name like this:
[Here's your client order reference]_SO039.xls
Have a look at what fields are available in the model you're trying to get this information from (eg. in my case sale.order) and I think you'll find roughly what you need there.
I have still not figured out how to add a date/timestamp like you are requesting (eg. Year), however someone else may be able to offer some advice on this.

How to get with Mediawiki API all images in a category which are not in another one?

I am entirely new to API, so sorry if the question is silly.
I would like to get all images in a category in Commons let's say X, but exclude those which are also in another one (Y). I do not understand if I can actually do this.
https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:X
will get all of them, how to exclude some?
moreover I would like in the result to have the description of the images, not just the name of the file, is that possible?
MediaWiki has - by default - no built-in support for category building and querying intersections. To accomplish this task, extensions or external tools or multiple API queries and result processing is required.
CirrusSearch API
On Wikimedia Commons, like on the whole Wikimedia Wiki farm, CirrusSearch powers filtered search, including search for category intersections and is also available through API (action=query&list=search&srsearch=incategory:A+-incategory:B, this is Category:A minus Category:B).
FastCCI
One of the tools I can recommend (because it's a dedicated high-performance solution and actually running) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, there is already a database maintained and a webservice running but it's possible to set it up for any wiki, provided the tool set has a host to run on and has database access.
Query
Consider the following query URL:
https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js
https://fastcci.wmflabs.org/ - Host Wikimedia Commons fastcci runs on
c1 - ID of category 1
c2 - ID of category 2
d1 - depth of category 1 to search in (fastcci by default considers sub-categories)
d2 - depth of category 2 to search in (fastcci by default considers sub-categories)
s - Number or results to return
o - Offset
a - conjunction
t - connection type (t=js for a JSONP response; otherwise assumes being used as websocket)
Response
fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );
RESULT followed by a | separated list of up to 50 integer triplets of the form pageId,depth,tag. Each triplet stands for one image or category
Resources
Sample client side implementation - to see it in action, just visit any category and next to the Good pictures button in any category page.
Example is FilesOf('Category:Saaleck') - FilesOf('Category:Rapeseed fields in Saxony-Anhalt')
Server application
Presentation on YouTube
Slides
A note on pageIDs
page IDs → page titles: GET /w/api.php?action=query&pageids=page_IDs_separated_by_pipe
page titles → page IDs: GET /w/api.php?action=query&titles=Titles_separated_by_pipe
AFAIK, there is no way to get that directly using the API. But, assuming both categories are reasonably small, you could get all images from both of them and then compute the complement in your code.
To retrieve the description, you can use prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription.
In the context of your example query, it would look like this:
https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtype=file&gcmtitle=Category:X&prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription