Which tools or methods would you suggest for creating large amounts of SQL test data? [duplicate] - sql

This question already has answers here:
Closed 14 years ago.
I'd like to stress test some of my SQL queries and find out about bad query plans and bottlenecks. I plan to fill some tables with random test data.
Are there tools or a set of scripts available for this purpose, preferably for SQL Server?
Thanks!
UPDATE: Sorry, didn't know these two question already existed:
Data generators for SQL server?
Creating test data in a database

This website will generate reams of customized data for you.
From that site:
Ever needed custom formatted sample / test data, like, bad? Well, that's the idea of the Data Generator. It's a free, open source script written in JavaScript, PHP and MySQL that lets you quickly generate large volumes of custom data in a variety of formats for use in testing software, populating databases, and scoring with girls.
This site offers an online demo where
you're welcome to tinker around to get
a sense of what the script does, what
features it offers and how it works.
Then, once you've whet your appetite,
there's a free, fully functional,
GNU-licensed version available for
download.

I've use this data generator with success in the past - may not be big enough for your needs though.

Related

Alter data source

Power BI can connect to various data sources and run SELECT queries.
Is it possible to run also other queries (INSERT INTO, UPDATE...)?
Now I need it for a postgresql database, but could use also for others in the future.
No, you can't run directly INSERT/UPDATE queries from Power BI. This isn't the idea of the tool. If you find you need it, then probably there is a major flaw in your design, or you are not using the right tool for this job. But there are few ways to workaround this (again, I'm not saying that you SHOULD do it). Usually this is done in a combination with custom written Power App, embedded in your report in Power Apps visual. The idea is that the app will write to the database, and will refresh your report after that (if needed).
You can start here and I will recommend you to look at this in-depth session - Writing back data to PowerBI from your reports.
The answer is No if I am very straight forward. PBI is a analysis platform for data. There are probably some advance way to do that but, this is not logical or good idea to think about manipulating data from report or from any BI tools. You can search answers from different blog where the same questions asked. For more details, you can check below links-
help link 1
help link 2

How to migrate data from MongoDB to SQL-Server? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I searched around I found that there are ways to transfer/sync data from sql-server to mongodb.
I also know that Mongodb contains collections instead of table and the data is stored differently.
I want to know whether it is possible to move data from mongodb to sql-server. If yes, then how and what are the tools/topics should I use?
Of course it's possible, but you will need to find a way to force the flexibility of a document db like MongoDB into a RDBMS like SQL Server.
It means that you need to define how you want to handle missing fields (will it be a NULL in the db column? or a default value?) and other things that usually don't fit well in a relational database.
Said do, you can use an ETL tool able to connect to both databases, SSIS can be an example if you want to stay in the MicroSoft world (you can check this Importing MongoDB Data Using SSIS 2012 to have an idea) or you can go for an open source tool like Talend Big Data Integration which has a connector to MongoDB (and of course to SQL Server).
There is no way to directly move data from MongoDB to SQL Server. Because MongoDB data is non-relational, any such movement must involve defining a target relational data model in SQL Server, and then developing a transformation that can take the data in MongoDB and transform it into the target data model.
Most ETL tools such as Kettle or Talend can help you with this process, or if you're a glutton for punishment, you can just write gobs of code.
Keep in mind that if you need this transformation process to be online, or applied more than once, you may need to tweak it for any small changes in the structure or types of the data stored in MongoDB. As an example, if a developer adds a new field to a document inside a collection, your ETL process will need rethinking (possibly new data model, new transformation process, etc.).
If you are not sold on SQL Server, I'd suggest you consider Postgres, because there is a widely-used open source tool called MoSQL that has been developed expressly for the purpose of syncing a Postgres database with a MongoDB database. It's primarily used for reporting purposes (getting data out of MongoDB and into an RDBMS so one can layer analytical or reporting tools on top).
MoSQL enjoys wide adoption and is well supported, and for badly tortured data, you always have the option of using the Postgres JSON data type, which is not supported by any analytics or reporting tools, but at least allows you to directly query the data in Postgres. Also, and now my own personal bias is showing through, Postgres is 100% open source, while SQL Server is 100% closed source. :-)
Finally, if you are only extracting the data from MongoDB to make analytics or reporting easier, you should consider SlamData, an open source project I started last year that makes it possible to execute ANSI SQL on MongoDB, using 100% in-database execution (it's basically a SQL-to-MongoDB API compiler). Most people using the project seem to be using it for analytics or reporting use cases. The advantage is that it works with the data as it is, so you don't have to perform ETL, and of course it's always up to date because it runs directly on MongoDB. A disadvantage is that no one has yet built an ODBC / JDBC driver for it, so you can't directly connect BI tools to SlamData.
Good luck!
There is a tool provided by MongoDB called mongoexport and it's capable of exporting csv files. These csv files can be easily imported into MySQL. Good luck!

What to use, Excel or database for data driven framework in Selenium?

I am working on Selenium automation using WebDriver. It is keyword and data driven approach. I am handling all the inputs of objects,data and test configuration from Microsoft Excel.
Now client want to use database. He is asking me which one is more good to use in framework. Either database or Microsoft Excel utility? I have to reply to him with valid points.
Which one is the better to use in framework and also why second one is not good to use?
This question really requires an opinionated answer, but because there are some important benefits from one or the other I will still answer it!
With Excel, data entry is easy and accessible to non-technical testers. It has good sorting ability and the data can be organised in a pretty intuitive fashion. However, you cannot add or alter data during a test easily (I understand you could do this but I would this this is overboard). This means data must be previously organised and specific to the test requirements.
In a database data can come from queries and be altered on the fly. You could write helper functions that populate required data if it can't be found in the db. This means tests can run without worrying about what data is currently in the db (to an extent) and can alleviate some project issues. However, with a database there can be issues importing/exporting and there could be a lot of coding overhead. Also, obviously non-technical testers will find it hard to change the data.
As I said this is really an opinion in the end and is very much a project by project decision.
With my experience efficiency plays an integral role comparing these too approaches. Using a database tend to slow down the performance unlike excel sheets. Using a file(excel) increases the efficiency. And in our company automations suits we rarely use databases. Instead will push data in to files and use them as the repository.

What data generators?

I'm about to release a FOSS data generator that can generate random yet meaningful data in CSV format. Rather belatedly, I guess, I need to poll the state of the art for such products - because if there is a well known and useful existing tool, I can write my work off to experience. I am aware of of a couple of SQL Server specific tools, but mine is not database specific.
So, links? And if you have used such a product,
what features did you find it was missing?
Edit: To add a bit more info on my tool (Ooh, Matron!) it is intended to allow generation of any kind of random data from existing data files, and
supports weighting. It is XML based (sorry, folks) and lets you say things like:
<pick distribute="20,80" >
<datafile file="femalenames.dat"/>
<datafile file="malenames.dat"/>
<pick/>
to select female names about 20% of the time and male names 80% of the time.
But the purpose of this question is not to describe my product but to get info on other tools.
Latest: If anyone is interested, they can get the alpha of my data generator at http://code.google.com/p/csvtest
That can be a one-liner in R where I use the littler scripting front-end:
# generate the data as a one-liner from the command-line
# we set the RNG seed, and draw from a bunch of distributions
# indented just to fit the box here
edd#ron:~$ r -e'set.seed(42); write.csv(data.frame(y=runif(10), x1=rnorm(10),
x2=rt(10,4), x3=rpois(10, 0.4)), file="/tmp/neil.csv",
quote=FALSE, row.names=FALSE)'
edd#ron:~$ cat /tmp/neil.csv
y,x1,x2,x3
0.914806043496355,-0.106124516091484,0.830735621223563,0
0.937075413297862,1.51152199743894,1.6707628713402,0
0.286139534786344,-0.0946590384130976,-0.282485683052060,0
0.830447626067325,2.01842371387704,0.714442314565005,0
0.641745518893003,-0.062714099052421,-1.08008578470128,0
0.519095949130133,1.30486965422349,2.28674786332467,0
0.736588314641267,2.28664539270111,-0.73270267483628,1
0.134666597237810,-1.38886070111234,-1.45317770550920,1
0.656992290401831,-0.278788766817371,-1.01676025893376,1
0.70506478403695,-0.133321336393658,0.404860813371462,0
edd#ron:~$
You have not said anything about your data-generating process, but rest assured that R can probably cope with just about any requirement, including multivariate normal, t, skew-t, and more. The (six different) random-number generators in R are also of very high quality.
R can also write to DBs, or read parameters from it, and if it needs to be on Windoze then the Rscript front-end could be used instead of littler.
I asked a similar question some months ago:
Tools for Generating Mock Data?
I got some sincere suggestions, but most were not suitable for my needs. Either expensive (non-free) software, or else not flexible enough w.r.t. data types and database structure, or range of mock data, or way too slow (e.g. the Rails ActiveRecord solution).
Features I was looking for were:
Generate mock data to fill existing database tables
Quick to generate > 1 million rows
Produce either SQL script format or flat file suitable for importing
Scriptable command-line interface, not a GUI
Not dependent on Microsoft Windows environment
Nice-to-have features:
Extensible/configurable
Open-source, free license
Written in a dynamic language like Perl/PHP/Python
Point it at a database and let it "discover" the metadata
Integrated with testing tools (e.g. DbUnit)
Option to fill directly into the database as it generates data
The answer I accepted as Databene Benerator. Though since asking the question, I admit I haven't used it very much.
I was surprised that even when asking the community, the range of tools for generating mock data was so thin. This seems like a niche waiting to be filled! I'll be interested to see what you release.

Tools for Generating Mock Data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'm looking for recommendations of a good, free tool for generating sample data for the purpose of loading into test databases. By analogy, something that produces "lorem ipsum" text for any RDBMS. Features I'm looking for include:
Flexibility to generate data for an existing table definition.
Ability to generate small and large data sets (> 1 million rows or more).
Generate in SQL script format (INSERT statements) or else in a flat file format suitable for bulk import (which is usually faster).
A command-line interface for easy scripting.
Extensible, open source, written in a dynamic language (these are nice-to-haves, not strong requirements).
PS: I did search for a duplicate question on StackOverflow, but I didn't find one. If there is one, I'll be grateful to get a pointer to it.
Thanks for the great responses everyone! I should amend my requirements that I use Mac OS X as my primary development environment, not Windows (though I did say command-line interface is desirable, and that practically rules out Windows). The Windows-specific suggestions will no doubt be useful to other readers of this question, though, so thanks.
Here is my conclusion:
GenerateData:
PHP web app interface, not command line
limited to generating 200 records (or pay $20 for license to generating 5,000 records)
RedGate SQL Data Generator
not free, price $295
requires Windows, .NET, SQL Server
Visual Studio 2008 Database Edition
requires Windows
requires costly MSDN or ISV subscription
Banner Datadect
not free, price $595
requires Windows (?)
no support for MySQL (?)
GUI, not command line or scriptable
Ruby Faker gem
way too slow to use ActiveRecord for bulk data load
Super Smack
chiefly a load-testing tool, with a random data generator built in
pretty simple to use nevertheless
overall a good runner-up tool
Databene Benerator
best solution for my needs
XML scripts, compatible with DbUnit
open source (GPL) Java code
command-line usage
access many databases directly via JDBC
Take a look at databene benerator, a test data generator that looks close to your requirements.
it can generate data for an existing table definition (or even anonymize production data)
it can generate larges data set (unlimited size)
it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
it can be used on the command line or through a maven plugin
it's open source and customizable
I would give it a try.
BTW, a list of similar products is available on databene benerator's web site.
This looks quite promising: generatedata.com. Open-source, has lots of built-in data types.
There are several others listed here: Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.
Try http://www.mockaroo.com
This is a tool my company made to help test our own applications. We've made it free for anyone to use. It's basically the Forgery ruby gem with a web app wrapped around it. You can generate data in CSV, txt, or SQL formats. Hope this helps.
I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.
If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.
Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.
One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.
Not sure if that is along the lines of what you are looking for, but just a thought.
A Ruby script with one of the available fake data generators should do you just fine.
http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.
Here is another: http://random-data.rubyforge.org/
And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/
RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.
Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)
As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator
a tool that really should not be missing from the list is the Data Generator from Datanamic that populates databases directly or generates insert scripts, has a large collection of pre-installed generators ( and supports multiple databases...
http://www.datanamic.com/datagenerator/index.html
I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.
Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)
I use a tool called Datatect:
Generates data to flat files or any ODBC compliant database.
Extensible via VBScript.
Referentially aware; will populate foreign keys with values from parent table.
Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
Can create custom, complex data types.
Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.
I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.
I am in no way affiliated with Banner Systems, just a satisfied customer.
Here is the list of such tools (both free and commercial):
http://c2.com/cgi/wiki?TestDataGenerator
For OS X there is Data Creator (US $ 7). Download is free for test purpose. You can use it to evaluate the software and its features.
It requires OS X Lion or successive. It can generate a lot of different field type and has a custom export mode plus some pre-set (TSV, CSV, Html table, web page with table inside).
http://www.tensionsoftware.com/osx/datacreator/
here at the App Store:
https://itunes.apple.com/us/app/data-creator/id491686136?mt=12
You can use DbSchema, www.dbschema.com it's a database management tool and it has a Random Data Generator to populate your database.
Not direct answer to your question but this can be helpful for certain kind of data :
Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.
+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.