Transforming a sheet into a table with column names as values in SQL Server - sql

I've been given the task of turning the following Excel table into a database table in SQL Server (I have shortened the row count, of course).
A car has to go to service every 10.000 kilometers, but for some models there is a fast service that applies only to certain mileages (I don't know what mileage is called in kilometers lol).
The table shows car brands and models, and each following column represents the next maintenance service (i.e. column [10] represents the first service, performed at 10.000km, column [20] represents car service performed at 20.000km, etc.).
The values inside the mileage column will indicate if quick service "applies" to the corresponding model and mileage. (i.e. Quick service applies to [Changan A500] at 10.000km and 20.000km, but not at 30.000 or 40.000)
As mentioned before, I need to transform this table into a database table in SQL Server, with the following format.
In this format, there will be a row for every model and the mileage at which quick service corresponds. I hope this clarifies the requirement:
I can make a new SQL table with the source table, and then extract the data and insert it into the required table after transforming it (I assume there is no easy way of putting the information in the new format from the source Excel file).
Right now I'm thinking of using pointers in order to turn this data into a table, but I'm not very good at using pointers and I wanted to know if there might be an easier way before trying the hard way.
The point is to make this scalable, so they can keep adding models and their respective mileages.
How would you do it? Am I complicating myself too much by using pointers or is it a good idea?
Thanks, and sorry I used so many pictures, just thought it might clarify better, and the reason I haven't uploaded any SQL is because I just can't figure out yet how I plan to transform the data.

I'm not sure you can have a column named 10, 20, 30, 40, etc, but this is how I would solve this kind of problem.
SELECT *
INTO unpvt
FROM (VALUES
('chengan', 'a500', 'applies', 'applies', '', ''),
('renault', 'kwid', 'applies', 'applies', 'applies', 'applies')
)
v (brand, model, ten, twenty, thirty, fourty)
Select *
From unpvt
SELECT YT.brand,
YT.model,
V.number
FROM dbo.unpvt YT
CROSS APPLY (VALUES(ten),
(twenty),
(thirty),
(fourty))
V(number)
Result:

Related

Creating a Table Using Previous Values (Iterative Process)

I'm completely new to Visual FoxPro (9.0) and I was having trouble with creating a table which uses previous values to generate new values. What I mean by this is I have a given table that is two columns, age and probability of death. Using this I need to create a survival table which has the columns Age, l(x), d(x), q(x), m(x), L(x), T(x), and q(x) Where:
l(x): Survivorship Function; Defined as l(x+1) = l(x) * EXP(-m(x))
d(x): Number of Deaths; Defined as l(x) - l(x+1)
q(x): Probability of Death; This is given to me already
m(x): Mortality Rate; Defined as -LN(1-q(x))
L(x): Total Person-Years of Cohorts in the Interval (x, x+1); Defined as l(x+1) + (0.5 * d(x))
T(X): Total Person-Years of all Cohorts in the Interval (x, N); Defined as SUM(L(x)) [From x, N]
e(x): Remaining Life Expectancy; Defined as T(x) / l(x)
Now I'm not asking how to get all of these values, I just need help getting started and maybe pointed in the right direction. As far as I can tell, in VFP there is no way to point to a specific row in a data-table, so I can't do what I normally do in R and just make a loop. I.E. I can't do something like:
for (i in 1:length(given_table$Age))
{
new_table$mort_rate[i] <- -LN(1-given_table$death_prop[i])
}
It's been a little while so I'm not sure that's 100% correct anyway, but my point is I'm used to being able to create a table, and alter the values individually by pointing to a specific row and/or column using a loop with a simple counter variable. However, from what I've read there doesn't seem to be a way to do this in VFP, and I'm completely lost.
I've tried to make a Cursor, populating it with dummy values and trying to update each value sequentially using a SCATTER NAME and SCAN/REPLACE thing, but I don't really understand what's happening or how to fine tune this each calculation/entry that I need. (This is the post I was referencing when I tried this: Multiply and subtract values in previous row for new row in FoxPro.
So, how do I go about making a table that relies on iterative process to calculate subsequent values in Visual FoxPro? Are there any good resources that explain Cursors and the Scatter/Scan thing that I was trying (I couldn't find any resources that explained it in terms I could understand)?
Sorry if I've worded things poorly, I'm fairly new to programming in general. Thank you.
You absolutely can write a loop through an existing table in VFP. Use the SCAN command. However, if you're planning to add records to the same table as you go, you're going to run into some issues. Is that what you meant here? If so, my suggestion is to put the new records into a cursor as you create them and then APPEND them to the original table after you've processed all the records that were there when you started.
If you're putting records into a different table as you loop through the original, this is straightforward:
* Assumes you've already created the table or cursor to hold the result
SELECT YourOriginalTable && substitute in the alias/name for the original table
SCAN
* Do your calculations
* Substitute appropriately for YourNewTable and the two lists
INSERT INTO YourNewTable (<list of fields>) VALUES (<list of values>)
ENDSCAN
In the INSERT command, if you refer to any fields of the original table, you need to alias them, like this: YourOriginalTable.YourField, again substituting appropriately.
A bit too late but maybe still helps.
The steps to achieve what you want are:
0. close the tables - just in case (see CLOSE DATABASE)
open the Age table (see USE in VFP help)
create the Survival table structure (see CREATE TABLE)
for this you need to know the field type for each of your l(x), d(x), etc functions
Lets say that you named the fields like your functions (i.e. Lx,Dx, etc)
select the Age table (see SELECT)
loop through Age table (see SCAN)
pass each record into variables (see SCATTER)
made your calculations starting from the Age table data (variables) using L(x),D(x),etc formulas and store it into variables named as M.Your Survival Table Field
i.e. M.mx = -LOG(1-m.Age) && see LOG
Note: in these calculations you can use any mix of Age table variables and the new created variables.
after you calculated all the fields from Survival write it into table (see APPEND && GATHER commands)
close the tables (see CLOSE DATABASE)

Reading Max value from the XML file VIA SQL

My data is stored in a table Transactiontable. I have columns in the table, one of which is PostIT that stores XML data:
enter image description here---TABLE IMAGE
enter image description here---XML IMAGE
From the xml file I have to read the max or the most recent transaction datetime.
I am able to read the first node
select
convert(xml.Transactiontable).value ('Transaction/DateTime[2]','DATETIME')
from
dbo.transactiontable;
I have to read all the transactional datetime and I should be able to select the max or the latest transaction
I tried cross apply and did not work
select
T.N.value ('Transaction/DateTime[2]','DATETIME')
from
dbo.transactiontable
cross apply
XXX.nodes('TRANSACTION') as T(N)
There are several things to say first:
Please do not paste your sample data as picture. Nobody wants to type this in.
Best was, to provide a MCVE or a fiddle with DDL, sample data and your own attempt together with the expected output.
Always tag as specifically as possible. Tag with your tools and provide vendor and version.
Now to your question:
Again there is something to say first:
Your XML contains date-time values in a culture dependant format. This is dangerous. some systems will take "01/05/2019" as the first of May, others will read the 5th of January.
The XML is hard to read. If I get the picture correctly, you have alternating <Datetime> and <TRANSACTION_KEY> elements. It seems to be their sort order to tigh them together. This is a bad design...
Try this as first step:
SELECT dt.value('text()[1]','nvarchar(100)') AS SomeDateTimeValueAsString
FROM YourTable t
CROSS APPLY t.TheXmlColumn.nodes('/DATA/CO/TRANSACTION/DateTime') A(dt);
If this is not enough to get you running, you'll have to invest more effort to your question.

How to implement a matrix concept in SQL Server

I am stuck in this concept of creating a matrix in SQL Server where it is created in Excel. I couldn't find good answer online. There are room numbers as the first row and on the first column there are functional requirements. So for example when there is a camera needed in one of the rooms,I will place X mark in the desired row and col coordinate to indicate that it contains one.I attached an sample of the Excel to explain better. Excel Matrix.png
Rather than having multiple columns for every possible functional requirement, use proper relational methods for a many-to-may relationship:
Rooms
------
Id
RoomName
Functions
---------
Id
FunctionName
RoomFunctions
-------------
RoomId
FunctionId
Then you can relate one room to a variable number of functions, and can add functions easily without changing your data structure.
Without having data to work with, it's hard to give you an example.
With that said, the pivot method may help you out. You can just have dummy column with a 1 or 0 based on whether or not it has an 'X' in your data. Then in the pivot you would just do a max on that for the various values.
It may require massaging your data into a better format, but should be doable.

Scalable approach to splitting and validating a fixed width or CSV datarow in SQL A.K.A. How can I avoid SSIS?

I'm currently playing with different ways of getting data into SQL and yesterday hit a problem using BCP which although I solved reminded me of working with SSIS packages because of the not very useful error information. I feel that for the way I like to work I would be much happier loading entire datarows whether fixed-width or delimited into a staging table (using BCP or Bulk Insert) and then operating on the data rows rather than trying to force them into typed columns on the way in to SQL.
As such I would like to find an approach that would allow me to split and validate (check datatype) the data before I insert it into its destination and also write out any bad datarows to another table so I can then decide what to do with them.
I've knocked together a script to simulate the scenario, the importedData table would be the output of my BCP or BULK INSERT. All the data from ImportedData needs to end up either in the Presenters or the RejectedData tables.
I need an approach that could scale reasonably well, a real life situation might me more like 40 columns across with 20 million rows of data so I'm thinking I'll have to do something like process 10,000 rows at a time.
SQL Server 2012 has the new try_parse function which would probably help but I need to be able to do this on 2005 and 2008 machines.
IF OBJECT_ID (N'ImportedData', N'U') IS NOT NULL DROP TABLE dbo.ImportedData
CREATE TABLE dbo.ImportedData (RowID INT IDENTITY(1,1), DataRow VARCHAR(MAX))
IF OBJECT_ID (N'Presenters', N'U') IS NOT NULL DROP TABLE dbo.Presenters
CREATE TABLE dbo.Presenters (PresenterID INT, FirstName VARCHAR(10), LastName VARCHAR(10))
IF OBJECT_ID (N'RejectedData', N'U') IS NOT NULL DROP TABLE dbo.RejectedData
CREATE TABLE dbo.RejectedData (DataRow VARCHAR(MAX))
-- insert as fixed-width
INSERT INTO dbo.ImportedData(DataRow)
SELECT '1 Bruce Forsythe '
UNION ALL SELECT '2 David Dickinson '
UNION ALL SELECT 'X BAD DATA'
UNION ALL SELECT '3 Keith Chegwin '
-- insert as CSV
/*INSERT INTO dbo.ImportedData(DataRow)
SELECT '1,Bruce,Forsythe'
UNION ALL SELECT '2,David,Dickinson'
UNION ALL SELECT 'X,BAD,DATA'
UNION ALL SELECT '3,Keith,Chegwin'
*/
---------- DATA PROCESSING -------------------------------
SELECT
SUBSTRING(DataRow,1,3) AS ID,
SUBSTRING(DataRow,4,10) AS FirstName,
SUBSTRING(DataRow,14,10) AS LastName
FROM
ImportedData
---------- DATA PROCESSING -------------------------------
SELECT * FROM ImportedData
SELECT * FROM Presenters
SELECT * FROM RejectedData
For your 20M row scenario and concerns over performance, let's dive into that.
Step 1, load big file into database. The file system is going to disk and read all that data up. Maybe you're sitting on banks of Fusion-io drives and iops is not a concern but baring that unlikely scenario, you will spend X amount of time reading that data off disk via bcp/bulk insert/ssis/.net/etc. You then get to spend time writing all of that same data back to disk in the form of the table insert(s).
Step 2, parse that data. Before we spend any CPU time running those substring operations, we'll need to identify the data rows. If your machine is well provisioned on RAM, then the data pages for ImportedData might be in memory and it will be far less costly to access them. Odds are though, they aren't all in memory so a combination of logical and physical reads will occur to get that data. You've now effectively read that source file in twice for no gain.
Now time to start splitting your data. Out of the box, TSQL will give you trims, left, right, and substring methods. With CLR, you can get some wrapper methods to the .NET string library to help simplify the coding efforts but you'll trade some coding efficiencies with instantiation costs. Last I read on the matter the answer was (tsql vs clr) was "it depends." Shocking if you know the community but it really depends on your string lengths and a host of factors.
Finally, we're ready to parse the values and see whether it's a legit value. As you say, with SQL 2012, we have try_parse as well as try_convert. Parse is completely new but if you need to deal with locale aware data (01-02-05 In GB, it's Feb 1, 2005. In US, it's Jan 2, 2005. In JP, it's Feb 5, 2001) it's invaluable. If you're not on 2012, you could roll your own versions with a CLR wrapper.
Step 3, Errors! Someone slipped in a bad date or whatever and your cast fails. What happens? Since the query either succeeds or it doesn't, all of the rows fail and you get ever so helpful error messages like "string or binary data would be truncated" or "conversion failed when converting datetime from character string." Which row out of your N size slice? You won't know until you go looking for it and this is when folks usually devolve into an RBAR approach further degrading performance. Or they try and stay set based but run repeated queries against the source set filtering for scenarios that will fail the conversion before attempting the insert.
You can see by my tags, I'm an SSIS kind of guy but I am not such a bigot to think it's the only thing that can work. If there's an approach for ETL, I like to think I've tried it. In your case, I think you will get much better performance and scalability by developing your own ETL code/framework or using an existing one (rhino or reactive)
Finally finally, be aware of the implications of varchar(max). It has a performance cost associated to it.
varchar(max) everywhere?
is there an advantage to varchar(500) over varchar(8000)?
Also, as described, your process would only allow for a single instance of the ETL to be running at once. Maybe that covers your use case but in companies where we did lots of ETL, we couldn't force client B to wait for client A's etl to finish processing before starting their work or we'd be short of clients in no time at all.
There is no simple way of doing it in T-SQL.in this case you need to have isdate() ,isnumeric() type of UDF for all the datatypes you will try to parse. then you can move the rejected one to rejected Table ,delete those rows from importeddate and then continue with your load..
SELECT
RecordID,
SUBSTRING(DataRow,1,3) AS ID,
SUBSTRING(DataRow,4,10) AS FirstName,
SUBSTRING(DataRow,14,10) AS LastName,
SUBSTRING(DataRow,24,8) AS DOB,
SUBSTRING(DataRow,32,10) AS Amount,
INTO RejectedData
FROM ImportedData
WHERE ISDATE(SUBSTRING(DataRow,24,8))= 0
OR ISNUMERIC(SUBSTRING(DataRow,32,10))=0
then delete from imported data
DELETE FROM ImportedData WHERE RecordID IN (SELECT RecordID FROM RejectedData )
and then insert into presenter
INSERT INTO Presenters
SELECT
RecordID,
SUBSTRING(DataRow,1,3) AS ID,
SUBSTRING(DataRow,4,10) AS FirstName,
SUBSTRING(DataRow,14,10) AS LastName,
CONVERT(Date,SUBSTRING(DataRow,24,8)) AS DOB,
CONVERT(DECIMAL(18,2),SUBSTRING(DataRow,32,10)) AS Amount,
FROM ImportedData
and for managing batches in inserts this is a very good article.
http://sqlserverplanet.com/data-warehouse/transferring-large-amounts-of-data-using-batch-inserts

using MS SQL I need to select into a table while casting a whole load of strings to ints! can it be done?

Basically, I am the new IT type guy, old guy left a right mess for me! We have a MS-Access DB which stores the answers to an online questionnaire, this particular DB has about 45,000 records and each questionnaire has 220 questions. The old guy, in his wisdom decided to store the answers to the questionnaire questions as text even though the answers are 0-5 integers!
Anyway, we now need to add a load of new questions to the questionnaire taking it upto 240 questions. The 255 field limit for access and the 30ish columns of biographical data also stored in this database means that i need to split the DB.
So, I have managed to get all the bioinfo quite happily into a new table with:
SELECT id,[all bio column names] INTO resultsBioData FROM results;
this didn't cause to much of a problem as i am not casting anything, but for the question data i want to convert it all to integers, at the moment I have:
SELECT id,CInt(q1) AS nq1.......CInt(q220) AS nq220 INTO resultsItemData FROM results;
This seems to work fine for about 400 records but then just stops, I thought it may be because it hit something it cant convert to a integer to start with, so i wrote a little java program that deleted any record where any of ther 220 answers wasnt 0,1,2,3,4 or 5 and it still gives up around 400 (never the same record though!)
Anyone got any ideas? I am doing this on my test system at the moment and would really like something robust before i do it to our live system!
Sorry for the long winded question, but its doing my head in!
I'm unsure whether you're talking about doing the data transformation in Access or SQL Server. Either way, since you're redesigning the schema, now is the time to consider whether you really want resultsItemData table to include 200+ fields, from nq1 through nq220 (or ultimately nq240). And any future question additions would require changing the table structure again.
The rule of thumb is "columns are expensive; rows are cheap". That applies whether the table is in Access or SQL Server.
Consider one row per id/question combination.
id q_number answer
1 nq1 3
1 nq2 1
I don't understand why your current approach crashes at 400 rows. I wouldn't even worry about that, though, until you are sure you have the optimal table design.
Edit: Since you're stuck with the approach you described, I wonder if it might work with an "append" query instead of a "make table" query. Create resultsItemData table structure and append to it with a query which transforms the qx values to numeric.
INSERT INTO resultsItemData (id, nq1, nq2, ... nq220)
SELECT id, CInt(q1), CInt(q2), ... CInt(q220) FROM results;
Try this solution:
select * into #tmp from bad_table
truncate table bad_table
alter bad_table alter column silly_column int
insert bad_table
select cast(silly_column as int), other_columns
from #tmp
drop table #tmp
Reference: Change type of a column with numbers from varchar to int
Just wrote a small java program in the end that created the new table and went through each record individually casting the fields to integers, takes about an hour and a half to do the whole thing though so i am still after a better solution when i come to do this with the live system.