Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
The community reviewed whether to reopen this question 12 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What is the quickest way to fill a SQL table with dummy data?
I have a wide table with about 40 fields of different kinds (int, bit, varchar, etc.) and need to do some performance testing. I'm using SQL Server 2008.
You Only need Go 1000 after your INSERT, to fill it 1000 times, just like this:
INSERT INTO dbo.Cusomers(Id, FirstName, LastName) VALUES(1, 'Mohamed', 'Mousavi')
GO 1000
It will make a table with 1000 same rows in it.
Another solution is that you can populate the beginning rows of your table with some data, then you fill the next rows of table by repeating the beginning rows over and over, it means you fill your table by itself:
INSERT INTO dbo.Customers
SELECT * FROM dbo.Customers
GO 10
In the case one or more column are identity (meaning they accept unique values, if it's auto incremental), you just don't place it in your query, for instance if Id in dbo.Customer is identity, the query goes like this:
INSERT INTO dbo.Customers
SELECT FirstName, Last Name FROM dbo.Customers
GO 10
Instead Of:
INSERT INTO dbo.Customers
SELECT Id, FirstName, Last Name FROM dbo.Customers
GO 10
Else you'll encounter this Error:
An explicit value for the identity column in table 'dbo.Customers' can only be specified when a column list is used and IDENTITY_INSERT is ON.
Note:
This is sort of an arithmetic progression, so it's going to last a little, don't use a big number in front of GO.
If you want to have a table which is filled a little bit more elaborated then you can achieve that the same way this time by executing a simple query and following these steps:
Choose one of your tables which has a remarkable number of rows, say dbo.Customers
Right click on it and select Script Table as > Create To > New Query Editor Window
Name your new table to something else like dbo.CustomersTest, Now you can execute the query to have a new table with similar structure with the dbo.Customers.
Note:Keep in mind that if it has a Identity filed, change it's Identity Specification to No Since you are supposed to fill the new table by the data of the original one repeatedly.
Run the following query, it's going to be run 1000 times, you can change it to more or less but be aware that it might last minuets based on your computer hardware:
INSERT INTO [dbo].[CustomersTest] SELECT * FROM [dbo].[Customers] GO 1000
After a while you have a table with dummy rows in it!
As #SQLMenace mentioned, RedGate Data Generator is a so good tool to fulfill it, it costs $369, you have a 14 days trial chance Although.
The good point is that RedGate identifies foreign keys so you can apply JOIN in your queries.
You have a bunch of options which allow you to decide how every column is supposed to be populated, every column is anticipated semantically so that related data are suggested, for instance if you have a column named 'Department' it isn't filled by weird characters, it's filled by expressions like "Technical", "Web", "Customer", etc. Even you can use regular expression to restrict selected characters.
I populated my tables with over 10,000,000 records which was an awesome simulation.
Late answer but can be useful to other readers of this thread.
Beside other solutions, I can recommend importing data from a .csv file using SSMS or custom SQL import scripts, programs. There is a step-by-step tutorial on how to do this, so you might want to check it out: http://solutioncenter.apexsql.com/how-to-generate-randomized-test-data-from-a-csv-file/
Be aware that importing a .csv file using SSMS or custom SQL import scripts, is easier than creating SQL inserts manually, but there are some limitations, as explained in the tutorial:
If there is a need for thousands of rows to be populated, and the .csv file contains few hundred rows of data it is just not enough. The workaround is reimporting the same .csv file over and over until needed. The drawback to this method is that it will insert large blocks of rows with the same data, without randomizing them.
The tutorial also explains how to use a 3rd party SQL data generator called ApexSQL Generate. The tool has an integrated function to generate large amounts of randomized data from the imported .csv formatted file. Application features a fully functional free trial so you can download and try it to see if it works for you.
http://filldb.info/dummy/ works best. It offers complete settings, choice of how many rows to generate, "real" dummy data, all for free.
I've never seen anything more effective or better at this conditions.
You can generate a whole database or just a table with an easy to use GUI. It is also very elaborate in its settings and options, allowing you to generate dummy data with basically no effort. The GUI has no limits in size and is very extensive in data type options.
To use it, navigate to the link and insert a SQL command that defines the tables or use their dummy tables. Then click next and fill out your rows data types and settings for dummy data population.
Then click next and generate the data. Wait. Once done, download the database and import it to your own database server.
Related
I need some help and I know I am not the only one to deal with this issue but I am wondering if you might have some ideas on how to handle the situation of comparing two rows of data filling out start and end dates.
To give you some context, we have a huge hierarchy (approx 8,000 rows and about 12 columns wide) that is updated each year. Sometimes the values change and sometimes they don’t. When the values don’t change, then I don’t need to adjust the dates. When the values do change and a new row is added, I need to change the data.
I have attached some fake data to try and illustrate my data. I am building this in MS Access, so I think this is more of a DBA type question that is going to be manipulated via a recordset type method.
In my example I have two tables – Old Table and New Table. In each table there is a routing code field that represents my join field and primary key for this table.
The Old table represents existing data - tblMain. The New Table represents the data to be appended - tblTemp.
To append the data, I have an append query set up in Access. I perform a left join between the Old and New tables, joining on every field and append the rows that are null in the Old table. That’s fine and that is not where my issue is.
What is causing me issue is how to fill out the start and end dates.
So as you can see from my tables, we are running a zoo. Let’s just say for the sake of the argument, our zoo started off pretty simple and has become more sophisticated. We now want our hierarchy to expand out and become a bit more detailed as we are now capturing the type of animal (Level 4) and the native location (Level 5).
As you can see when comparing one table to another the routing codes are the same, so the append query has to have a join on each field. When you do this, you return the Result Table which is essentially the Old and New tables stacked on top of each other. You might think about a Union query but this is going to give me duplicates and I don’t want that.
If you notice in the Result Table there is a Start and End Date. Let’s just say I get the start and end dates via message box that pops up upon the import of the data and is held in a variable. I think there are dates in my real data but still trying to verify this.
So how do I compare (pseudo code for the logic needed)?
• For each routing code:
Compare Levels 1-5
If the routing code is the same but Levels 1 -5 are not the same
fill out the end date of the old record
fill out the start date of the new record
This idea of comparing two records and filling out a data is quite prevalent in my organization but I haven’t found a way of creating the logic that consistently works so any help or suggestions would be appreciated.
Old Table
New Table
Result Table
Recently I have been using Microsoft SQL for creating databases that are referred to using an excel document. There have been a number of instances when I needed to make a small changes to my tables and ended up "DROP"-ing all my current tables and re-creating them using an updated query. I understand you can use UPDATE to change the values of records within a table, but I'm looking to manipulate a data type so that I can change the number of decimals in one record of my tables from 2 to 3. Code for creating the table looks something like this:
CREATE TABLE WIRE_INDEX
--"Field" "Data Type" "Null or Not"
(...
...
DENSITY decimal(18,2) Not Null);
I don't know if the solution is something obvious, but I have been unable to find anything useful. I'm not sure know how to refer to the data type of a field in SQL.
When I populate the database I use numbers like 0.283 and 0.164, but when I SELECT the record I only get the first two decimals. I'd like the first 3 decimals to appear in the way I enter them into the table.
(edit didn't show up properly)
(not sure if I'm supposed to post my solution), but credit to TEEKAY and Apurav for answering my question. I used the code posted by Apurav which looks like this:
ALTER TABLE WIRE_INDEX
ALTER COLUMN DENSITY decimal(18,3) Not Null
When I pulled the table, using a SELECT statement the precision showed three decimal places, but I lost the precision of my input and had to re-enter my values using UPDATE. Not sure if this is more effective than just starting over, but it worked for me and now I know.
I need to set up a new company for automated data import. The utility has provided the data in a spreadsheet. (Image 1)
Based on this data, I need to create a stored procedure that will identify the correct meter, if it exists, and perform either an insert or update to the monthly data table. For automated utility data import, I want to make sure I restrict everything to a particular utility company.
The steps are the following ( I am having a hard time converting this to SQL)
1- I just want a script that identify the correct meter to see if it exists, basically check the Meter# column in the excel with the MeterNumber column in the Meters table.
2- The next step is perform either an insert or update to the MonthlyData table. This is a screen shot of all its columns.
3- Then I just want to make sure that I am restricting everything to the particular company which in this case Site1 since 2 different companies might have the same meter#. The UtilityCompany table contains 3 columns: ID, Name, UtilityType
I honestly do not know from where to get started, would anybody help me with the script? Thank you
You will want to:
perform a Bulk Insert operation to take your data from the excel file into a staging table.
write a query to select ALL rows for the corresponding utility company (notice I didn't see iterate over each row...). This select could be an update where you update an additional column to mark the row as an INSERT, or an UPDATE.
Then the last step (2 parts), retrieve all of the rows that were marked as INSERT, and insert those into your table. Then grab all rows that were marked with an UPDATE, and update their corresponding values based on your matching criteria.
Ok, I've got a database table where data gets dumped by this horrid little program that I despise, but can't change at the moment. It has merchant data in there, names, addresses, and a set of categories that are pipe-delimited. What I need is a clean way to split these out, so I have one row for each merchant/category pair. From there, I can easily get it into the new data structure. This will need to be a repeatable process for a short period of time. I realize the optimal solution is to rid myself of this structure, but I've wracked my brain trying to figure out how to do this cleanly in sql.
I already have a function in the database that will split a delimited string and return a table.
This is in sql server 2008, btw.
Edit (for clarity_
Basically, the following might be a merchant (with the categories attached - other fields redacted for simplicity. Using commas for field delimiters here).
Jimbo's Bait Shoppe, Bait|Sports Gear|Sandwiches
What I need is:
Jimbo's Bait Shoppe, Bait
Jimbo's Bait Shoppe, Sports Gear
Jimbo's Bait Shoppe, Sandwiches
If you have already written a function that splits the string and returns the table you can use a trigger.
Create a trigger on INSERT on the table where the "horrid" program spits the data. The trigger will then take the unformatted data and populate two clean tables (I think in your case you should have two tables: one is a merchant, and another one is products, that are linked using one-to-many relationship using MerchantID).
In this case you can use the table with unformatted data as a "dirty" table. You can cleanse straight after the "horrid" program imported a file.
Please comment if you need help with the triggers
I'm trying to split a table into multiple tables based on the value of a given column using Talend Open Studio. Let's say this column can contain any of the integer values of 1, 2, 3, etc. then according to this value, these rows should go to table_1, table_2, table_3 etc.
It would be best if I could solve this when the number of different values in that column is not known in advance, but for now we can assume that all these output tables exists already. The bottom line is that the number of different values and therefore the number of different tables are high enough that setting up the individual filters manually is not an option.
Is this possible to solve this using Talend Open Studio or any similiary open source ETL tools like Pentaho Keetle?
Of course, I could just write a simple script myself, but I would prefer to use a proper ETL tool since the complete ETL process is quite complex.
In PDI or Pentaho Kettle you could do this with partitioning. (A right click option on the step IIRC) Partitioning in PDI is designed for exactly this sort of problem.
Yes that's Possible to do and split the data on the basis of single column to different table, but for that you need to create table dynamically :-
tFileInputDelimited->tFlowtoIterate ->tFixedFlowInput->and the can use
globalMap() to get the column values and use the same to seperate the
data to different tables. -> And the can use globalMap(Columnused to
seperate data) in table name.
The first solution that came to my mind was using the replicator to transport the current row to three filters which act as guard and only let rows through with either 1 2 or 3 in the given column. pic: http://i.imgur.com/FmvwU.png
But you could also build the table name dynamically, if that is what you want, pic: http://i.imgur.com/8LR7Q.png