How do I stitch together tables using SQL? - sql

Ok, I am learning SQL and just installed SQL Server. I've read about outer joins and inner joins but am not sure either is what I want. Basically, I want to reconstruct a text file that has been "chopped" up into 5 smaller text files. The columns are the same across all 5 text files, e.g. name, age, telephone #, etc. The only difference is that they have different numbers of rows of data.
What I'd like to do is "append" the data from each file into one "mega-file". Should I create a table containing all of the data, or just create a view? Then, how do I implement this...do I use union? Any guidance would be appreciated, thanks.

Beyond your immediate goal of merging the five files it sounds like you want the data contained in your text files to be generally available for more flexible analysis.
An example of why you might require this is if you need to merge other data with the data in your text files. (If this is not the case then Oded is right on the money, and you should simply use logparser or Visual Log Parser.)
Since your text files all contain the same columns you can insert them into one table*.
Issue a CREATE statement defining your table
Insert data into your newly created table**
Create an index on field(s) which might often be used in query predicates
Write a query or create a view to provide the data you need
*Once you have your data in a table you can think about creating views on the table, but to start you might just run some ad hoc queries.
**Note that it is possible to accomplish Step 2 in other ways. Alternatively you can programmatically construct and issue your INSERT statements.
Examples of each of the above steps are included below, and a tested example can be found at: http://sqlfiddle.com/#!6/432f7/1
-- 1.
CREATE TABLE mytable
(
id int identity primary key,
person_name varchar(200),
age integer,
tel_num varchar(20)
);
-- 2. or look into BULK INSERT option https://stackoverflow.com/q/11016223/42346
INSERT INTO mytable
(person_name, age, tel_num)
VALUES
('Jane Doe', 31, '888-888-8888'),
('John Smith', 24, '888-555-1234');
-- 3.
CREATE UNIQUE INDEX mytable_age_idx ON mytable (age);
-- 4.
SELECT id, person_name, age, tel_num
FROM mytable
WHERE age < 30;

You need to look into using UNION.
SELECT *
FROM TABLE1
UNION
SELECT *
FROM TABLE2
And I would just create a View -- no need to have a stored table especially if the data ever changes.

Related

Import a single column dataset (CSV or TXT or XLXS) to act as a list in SQL WHERE IN clause

I have a dataset that I receive on a weekly basis, this dataset is a single column of unique identifiers. Currently this dataset is gathered manually by our support staff. I am trying to query this dataset (CSV file) in my WHERE clause of a SQL Query.
In order to add this dataset to my query I do some data transformation to tweak the formatting, the reformatted data is then pasted directly into the WHERE IN part of my query. Ideally I would have the ability to import this list to the SQL query directly potentially bypassing the manual effort involved in the data formatting and swapping between programs.
I am just wondering if this is possible, have tried my best to scour the internet and have had no luck finding any reference to this functionality.
Using where in makes this more complex than it needs to be. Store the IDs you want to filter on in a table called MyTableFilters with a column of the ID values you want to use as filter(s) and join from MyTable on ID to MyTableFilters on ID. The join will cause MyTable to only return rows if the ID in MyTable is also on MyTableFilters
select * from MyTable A join MyTableFilters F on A.ID = F.ID
Since you don't really need to any transformations or data manipulation of what you want to ETL you could also easily truncate and use bulk insert to keep MyFiltersTable up to date
truncate table dbo.MyFiltersTable
BULK INSERT dbo.MyFiltersTable
FROM 'X:\MyFilterTableIDSourceFile.csv'
WITH
(
FIRSTROW = 1,
DATAFILETYPE='widechar', -- UTF-16
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK,
KEEPNULLS -- Treat empty fields as NULLs.
)
I'm guessing that you currently have something like the following:
SELECT *
FROM MyTable t
WHERE t.UniqueID in ('ID12','ID345','ID84')
My recommendation would be to create table in which to store the IDs referenced in the WHERE clause. So for the above, your table would look like this:
UniqueID
========
ID12
ID345
ID84
Supposing the table is called UniqueIDs the original query then becomes:
SELECT *
FROM MyTable t
WHERE t.UniqueID in (SELECT u.UniqueID FROM UniqueIDs u)
The question you're asking is then how to populate the UniqueIDs table. You need some means to expose that table to your users. There are several ways you could go about that. A lazy but relatively effective solution would be a simple MS Access database with that table as a "linked" table. You may need to be careful about permissions.
Alternatively, assuming your wedded to the CSV, set up an SSIS job which clears down the table and then imports from that CSV into the UniqueIDs table.

Matrix table index SQL Server 2008

I have a table with two columns built from another table of names, one identity and one a name like this:
ID---Name
1----Mike
2----Jeff
3----Robert
...down to however many
Could be 10 rows, could be 100. This will vary depending on input from other tables that are always changing but never be over 160 or so.
Now, pairings of names will have some meaning and thus a decimal data type score will be associated with said pairing (how at this point doesn’t matter, just need to build it for now...numbers just illustrative). I envision a matrix kind of like this:
ID------Name------Mike-------Jeff--------Robert-------- ...out to however many
1 -------Mike-------NULL------100.1------5.4-------- ...out to however many
2 -------Jeff---------100.1------NULL-----21.23--------- ...out to however many
3 ------Robert-------5.4--------21.23-----NULL---------...out to however many
…down to however many happen to be in the first table…
Maybe this isn’t quite the most optimal way to go (Yes, I know there are duplicates in the table but I plan to structure the queries such that the duplicates are ignored) but at this point am not aware of many viable options. After searching around, I thought maybe I wanted a pivot but that doesn’t seem to fit what I have here because I’m leaving the names in the column and associating them as column heads for a paired score. Then I thought maybe I wanted to store a variable as the value of each row and then add them as the columns. That was no help. My latest iteration was maybe creating a temp table as an exact copy with and identity column, then trying to select the specific name by the identity and looping through them but I can’t even seem to grab the first name and make it a column name in addition to a row value under the name column...see below
--create a table of names with an identity column
CREATE TABLE myTable2
(
ID INT IDENTITY(1,1),
Name VARCHAR(5),
);
--add names to the table from a different table
INSERT INTO myTable1 (Name)
SELECT Name
FROM myTable1
--create a temp table with the same values
SELECT ID, Name
INTO #new
FROM myTable2
GROUP BY ID, Name
--insert name from first row as a column head
INSERT INTO myTable2 (SELECT Number FROM #new WHERE ID =1)
So, in the last bit there, INSERT INTO”, I want to copy the names, in this instance “Mike” and make it ALSO a column head in the same table where it is a row (like in my second table). I get an error message that the syntax is not correct for the statement. Why isn’t this allowed? How can I get it to do what I want? It also has been suggested by someone that knows way more about this stuff than me, that maybe instead of building the table as a matrix, build it as below. It is possible here to get rid of the duplicates this way and I would except I have no idea where to even begin doing this…
Name1-----------Name2-----------Calculated Value
Mike--------------Mike-------------NULL
Jeff---------------Mike-------------100.1
Robert-------------Mike-------------5.4
Mike--------------Jeff-------------100.1
Jeff----------------Jeff-------------NULL
Robert------------Jeff-------------21.23
Mike--------------Robert-----------5.4
Jeff---------------Robert-----------21.23
Robert------------Robert-----------NULL
...etc
Any help suggestions or pointing of me in the right and most appropriate direction would be greatly appreciated!
EDIT: Here's how I solved my problem. Looks like the Cartesian product was the way to go. Thanks #Alex Kudryashev
--create a table of cross joined names
CREATE TABLE cartNames
(
Name1 VARCHAR(5),
Name2 VARCHAR(5),
);
--create two temporary tables from a source table of names
SELECT Name AS Name1
INTO #name1
FROM names
GROUP BY Name
SELECT Name AS Name2
INTO #Name2
FROM names
GROUP BY Name
--populate the Cartesian table
INSERT INTO cartNames
SELECT * FROM #name1 CROSS JOIN #name2
--get rid of the temp tables
DROP TABLE #Name1
DROP TABLE #Name2
--add columns and populate calculated scores
---
It looks like you want to create a Cartesian Product. There is very easy way to do so.
declare #tbl table(name varchar(10))
insert #tbl(name) values('MIke'),('Jeff'),('Robert')
select t1.name name1,t2.name name2, some_udf(t1.name,t2.name) calc_value
from #tbl t1 cross join #tbl t2

How do you copy Sql table from one database to another database with differ field names

I have a database name "EmpOld" with a table name "Employee" and a database name "EmpNew" with a table name "Employee".
The table structures are identical on both database tables except for the names in the table.
Here is a definition of the tables:
Database "EmpOld" table name "Employee" has following field names:
int e_id
char(20) e_fname
char(25) e_lname
Database "EmpNew" table "Employee" has following field names:
int id
char(20) fname
char(25) lname
Notice, the only difference in the tables is the "e_" prefix for field names are removed from the EmpNew Employee table.
How do I move data from EmpOld database to EmpNew database?
Is there a code that maps these field respectively.
Thanks community,
Nick
Well, you could just name them manually:
INSERT dbo.EmpNew(fname, lname) SELECT e_fname, e_lname FROM dbo.EmpOld;
If you want to do this without manually typing out the column names, there is magic in SSMS - just drag the Columns folder from each table into the appropriate spot in your query window (and then manually remove identity columns, timestamp, computed columns etc. if relevant).
There is no automatic way of mapping fields, unless with some code.
This can be done in two ways:
Using the SQL Import & Export Wizard
This is the most easy way to do this and here is an article that gives step by step to do this. The key is to change the mapping between the source and destination fields while importing the data.
Writing an SQL
This method requires both the databases to be accessible. Using a simple insert statement as follows this can be achieved
insert into EmpNew.dbo.Employee(id, fname, lname)
select
e_id, e_fname, e_lname
from
EmpOld.dbo.Employee
If they are on same sql server then the above will work good as is. If they are different sql server you may have to add a link server connection and prefix the table commands with that.
Is there a code that maps these field respectively.
No - you'll need to provide the mapping. If you're using an ETL tool like SSIS there may be a way to programatically map columns based on some criteria, but nothing built into SQL.
Maybe you can generate code with help from the tables sys.columns and other system tables so that you can make the copy-process run automatically.
I think you can't use:
insert into (...) (...)
because you have two databases. So just generate insert statements like:
insert into table (...) VALUES (...)
Please correct me if i misunderstood the question.
There are 2 ways you can do without the data loss.
1) you can use Insert statement
`
Insert into EmpNew (ID,fname,lname)
Select e_id, e_fname, e_lastname
from EmpOld
`
2) You can simple use Import-Export Wizard
Go to Start Menu > SQL Server 2008/2008R2/2012 > ImportandExport>
This will take you the wizard box
Select Source :- DataSource(ServerName) and Database where you are
extracting data from
Select Destination : DataSource(ServerName) and Database where you are extracting data to
Map the table
BE AWARE of PK/FK/Identity
you are good to go

Setting field size (per column) while generating table in Access

I am trying to export my Database as an .dbf by using a VBA script, but the dbf requires the database to have certain values for the column size.
When I leave the columns as they are in Access, I get an error saying
field will not fit in record
How can I set the column size for each column seperatly? Preferably while generating the table, so I don't have to do it manually everytime i generate a new table with queries
And where do I set them? (in a Query or in SQL?)
Thanks in advance!
Edit:
I have made sure that its the field size value that is giving me the error. I changed all the field size values manually by opening the table in Design View.
So now the second part of my question is becoming more crucial. Wether or not it is possible to set the field size while generating the table.
Edit2:
I am currently using SQL in a query to create the table as followed:
SELECT * INTO DB_Total
FROM Tags_AI_DB;
After the initial DB_Total is made, I use several Insert into queries to add other rows:
INSERT INTO DB_TOTAL
SELECT a.*
FROM Tags_STS_ENA_DB AS a
LEFT JOIN DB_TOTAL AS b
ON a.NAME = b.NAME
WHERE b.NAME IS NULL;
If I set the column values in the DB_Total table while generating it with the Select into query, will they still have those values after using the Insert Into queries to insert more rows?
Edit3:
I decided (after a few of your suggestions and some pointers from colleagues, that it would be better to first make my table and afterwards update this table with queries.
However, it seems like I have run into a dead end with Access, this is the code I am using:
CREATE TABLE DB_Total ("NAME" char(79),"TYPE" char(16), "UNIT" char(31),
"ADDR" char(254), "RAW_ZERO" char(11), "RAW_FULL" char(11), "ENG_ZERO" char(11),
"ENG_FULL" char(11), "ENG_UNIT" char(8), "FORMAT" char(11), "COMMENT" char(254),
"EDITCODE" char(8), "LINKED" char(1), "OID" char(10), "REF1" char(11), "REF2" char(11),
"DEADBAND" char(11), "CUSTOM" char(128), "TAGGENLINK" char(32), "CLUSTER" char(16),
"EQUIP" char(254), "ITEM" char(63), "HISTORIAN" char(6),
"CUSTOM1" char(254), "CUSTOM2" char(254), "CUSTOM3" char(254), "CUSTOM4" char(254),
"CUSTOM5" char(254), "CUSTOM6" char(254), "CUSTOM7" char(254), "CUSTOM8" char(254))
These are all the columns required for me to make a DBF file that is accepted by the application we are using it with.
You'll understand my sadness when this generated the following error:
Record is too large
Is there anything I can do to make this table work?
UPDATE
The maximum record size for Access 2007 is around 2kB (someone will no doubt correct that value)
When you create CHAR(255) it will use 255 bytes of space regardless as to what is in the field.
By contrast, VARCHARs do not use up space (only enough to define them) until you put something in the field, they grow dynamically.
Changing the CHAR(x)s to VARCHAR(x)s you will shrink the length of your table to within permitted values. Be aware that you may come into trouble if the row you are trying to insert is larger than the 2kB limit.
Previous
The way to specify column lengths when generating the table is to use a CREATE TABLE statement instead of a SELECT * INTO.
CREATE TABLE DB_Total
(
Column1Name NVARCHAR(255) --Use whatever datatype and length you need
,Column2Name NUMERIC(18,0) --Use whatever datatype and length you need
,...
) ;
INSERT INTO DB_Total
....
If you use a SELECT * INTO statement, SQL will use whatever field lengths and types it finds in the existing data.
It is also better practice to list the column names in your insert statement, so instead of
INSERT INTO DB_TOTAL
SELECT a.*
You should put:
INSERT INTO DB_Total
(
Column1Name
,Column2Name
,...
)
SELECT a.Column1Name
,a.Column2Name
,...
FROM ...
WHERE ... ;
In Edit2, you indicated your process starts with a "make table" (SELECT INTO) query which creates DB_Total and loads it with data from Tags_AI_DB. Then you run a series of "append" (INSERT) queries to add data from other tables.
Now your problem is that you need specific field size settings for DB_Total, but it is impossible to define those sizes with a "make table" query.
I think you should create DB_Total one time and set the field sizes as you wish. Do that manually with the table in Design View, or execute a CREATE TABLE statement if you prefer.
Then forget about the "make table" query and use only "append" queries to add the data.
If the issue is that this is a recurring operation and you want to discard previous data before importing the new, execute DELETE FROM DB_Total instead of DROP TABLE DB_Total. That will allow you to preserve the structure of the (now empty) DB_Total table so you needn't fiddle with setting the field sizes again.
Seems to me the only potential issue then might be if the structure of the source tables changes. If that happens, revise the structure of DB_Total so that it's compatible again.

Should I create a new DB column or not?

I don't know if it is better for me to create a new column in my Mysql database or not.
I have a table :
calculated_data
(id, date, the_value, status)
status is a boolean.
I need an extra value named : the_filtered_value
I can get it easily like this :
SELECT IF(status IS FALSE, 0, the_value) AS the_filtered_value FROM calculated_data
The calculated_data table has millions of entries and I display the_value and the_filtered_value in charts and data tables (using php).
Is it better to create a new column the_filtered_value in the calculated_data table or just use the SELECT IF query?
In "better" I see :
better in performance
better in DB design
easier to maintain
...
Thanks for your help!
Do not add a column. Instead, create a VIEW based on the original data table and in the view add a "virtual" calculated column called the the_filtered_value based on your expression.
In this way you will have easy access to the filtered value without having to copy the "logic" of that expression to different places in your code, while at the same time not storing any derived data. In addition, you will be able to operate directly on the view as if it were a table in most circumstances.
CREATE VIEW calculated_data_ex (id, date, the_value, status, the_filtered_value)
AS SELECT id, date, the_value, status, IF(status IS FALSE, 0, the_value)
FROM calculated_data
Adding the extra field adds complexity to your app but make queries easier (specially when joined on other tables).
I personally always try to keep the data as separated as possible on the database and I handle this cases on my application. Using a MVC pattern makes this task easier.
This works in MS SQL but I do not know if MySQL will support the syntax.
declare #deleteme table (value int, flag bit)
insert #deleteme
Values
(1,'False')
,(2,'true')
Select *, (flag*value) AS the_filtered_value from #deleteme