My data is stored in a table Transactiontable. I have columns in the table, one of which is PostIT that stores XML data:
enter image description here---TABLE IMAGE
enter image description here---XML IMAGE
From the xml file I have to read the max or the most recent transaction datetime.
I am able to read the first node
select
convert(xml.Transactiontable).value ('Transaction/DateTime[2]','DATETIME')
from
dbo.transactiontable;
I have to read all the transactional datetime and I should be able to select the max or the latest transaction
I tried cross apply and did not work
select
T.N.value ('Transaction/DateTime[2]','DATETIME')
from
dbo.transactiontable
cross apply
XXX.nodes('TRANSACTION') as T(N)
There are several things to say first:
Please do not paste your sample data as picture. Nobody wants to type this in.
Best was, to provide a MCVE or a fiddle with DDL, sample data and your own attempt together with the expected output.
Always tag as specifically as possible. Tag with your tools and provide vendor and version.
Now to your question:
Again there is something to say first:
Your XML contains date-time values in a culture dependant format. This is dangerous. some systems will take "01/05/2019" as the first of May, others will read the 5th of January.
The XML is hard to read. If I get the picture correctly, you have alternating <Datetime> and <TRANSACTION_KEY> elements. It seems to be their sort order to tigh them together. This is a bad design...
Try this as first step:
SELECT dt.value('text()[1]','nvarchar(100)') AS SomeDateTimeValueAsString
FROM YourTable t
CROSS APPLY t.TheXmlColumn.nodes('/DATA/CO/TRANSACTION/DateTime') A(dt);
If this is not enough to get you running, you'll have to invest more effort to your question.
Related
I've been given the task of turning the following Excel table into a database table in SQL Server (I have shortened the row count, of course).
A car has to go to service every 10.000 kilometers, but for some models there is a fast service that applies only to certain mileages (I don't know what mileage is called in kilometers lol).
The table shows car brands and models, and each following column represents the next maintenance service (i.e. column [10] represents the first service, performed at 10.000km, column [20] represents car service performed at 20.000km, etc.).
The values inside the mileage column will indicate if quick service "applies" to the corresponding model and mileage. (i.e. Quick service applies to [Changan A500] at 10.000km and 20.000km, but not at 30.000 or 40.000)
As mentioned before, I need to transform this table into a database table in SQL Server, with the following format.
In this format, there will be a row for every model and the mileage at which quick service corresponds. I hope this clarifies the requirement:
I can make a new SQL table with the source table, and then extract the data and insert it into the required table after transforming it (I assume there is no easy way of putting the information in the new format from the source Excel file).
Right now I'm thinking of using pointers in order to turn this data into a table, but I'm not very good at using pointers and I wanted to know if there might be an easier way before trying the hard way.
The point is to make this scalable, so they can keep adding models and their respective mileages.
How would you do it? Am I complicating myself too much by using pointers or is it a good idea?
Thanks, and sorry I used so many pictures, just thought it might clarify better, and the reason I haven't uploaded any SQL is because I just can't figure out yet how I plan to transform the data.
I'm not sure you can have a column named 10, 20, 30, 40, etc, but this is how I would solve this kind of problem.
SELECT *
INTO unpvt
FROM (VALUES
('chengan', 'a500', 'applies', 'applies', '', ''),
('renault', 'kwid', 'applies', 'applies', 'applies', 'applies')
)
v (brand, model, ten, twenty, thirty, fourty)
Select *
From unpvt
SELECT YT.brand,
YT.model,
V.number
FROM dbo.unpvt YT
CROSS APPLY (VALUES(ten),
(twenty),
(thirty),
(fourty))
V(number)
Result:
I only found answers about how to import csv files into the database, for example as blob or as 1:1 representation of the table you are importing it into.
What I need is a little different: My team and I are tracking everything we do in a database. A lot of these tasks produce logfiles, benchmark results, etc., which are stored in CSV format. The number of columns are far from consistent and also the data could be completely different from file to file, e.g. it could be a log from fraps with frametimes in it or a log of CPU temparatures over an amount of time, or even something completely different.
Long story short, I came up with an idea, but - being far from a sql pro - I am not sure if it makes sense or if there is a more elegant solution.
Does this make sense to you:
We also need to deal with a lot of data that is produced, so please give me also your opinion if that is feasible with like 200 files per day which can easyly have a couple of thousands rows.
The purpose of all this will be, that we can generate reports form the stored data and perform analysis of the data. E.g. view it on a webpage in a graph or do calculations with it.
I'm limited to MS-SQL in this case, because that's what the current (quite complex) database is and I'm just adding a new schema with that functionality to it.
Currently we just archive the files on a raid and store a link to it in the database. So everyone who wants to do magic with the data needs to download every file he needs and then use R or Excel to create a visualization of the data.
Have you considered a column of XML data type for the file data as an alternative of ColumnId -> Data structure? SQL server provides is a special dedicated XML index (over the entire XML structure) so your data can be fully indexed no matter what CSV columns you have. You will have much less records in the database to handle (as an entire CSV file will be a single XML field value). There are good XML query options to search by values & attributes of the XML type.
For that you will need to translate CSV to XML, but you will have to parse it either way ...
Not that your plan won't work, I am just giving an idea :)
=========================================================
Update with some online info:
An article from Simple talk: The XML Methods in SQL Server
Microsoft documentation for nodes() with various use case samples: nodes() Method (xml Data Type)
Microsoft document for value() with various use case samples: value() Method (xml Data Type)
I have a table containing a series of survey responses, structured like this:
Question Category | Question Number | Respondent ID | Answer
This seemed the most logical storage for the data. The combination of Question Category, Question Number, and Respondent ID is unique.
Problem is, I've been asked for a report with the Respondent ID as the columns, with Answer as the rows. Since Answer is free-text, the numeric-expecting PIVOT command doesn't help. It would be great if each row was a specific Question Category / Question Number pairing so that all of the information is displayed in a single table.
Is this possible? I'm guessing a certain amount of dynamic SQL will be required, especially with the expected 50-odd surveys to display.
I think this task should be done by your client code - trying to do this transposing on SQL side is not very good idea. Such SQL (even if it can be constructed) will likely be extremely complicated and fragile.
First of all, you should count how many distinct answers are there - you probably don't want to create report 1000 columns wide if all answers are different. Also, you probably want to make sure that answers are narrow - what if someone gave really bad 1KB wide answer?
Then, you should construct your report (would that be HTML or something else) dynamically based on results of your standard, non-transposed SQL.
For example, in HTML you can create as many columns as you want using <th>column</th> for table header and <td>value</td> for data cell - as long as you know already how many columns are going to be in your output result.
To me, it seems that the problem is the number of columns. You don't know how many respondents there are.
One idea would be to concatenate the respondent ids. You can do this in SQL Server as:
select distinct Answer,
(select cast(RespondentId as varchar(255))+'; '
from Responses r2
where r2.Answer = r.Answer
for xml path ('')
) AllResponders
from Responses r
(This is untested so may have syntax errors.)
If reporting services or excel power pivot are possibilities for the report then they could both probably accomplish what you want easier than a straight sql query. In RS you can use a tablix, and in power pivot a pivot table. Both avoid having to define your pivot columns in an sql pivot statement, and both can dynamically determine the column names from a tabular result set.
I have a Post model that has one huge column (full_html). So instead of doing a select "posts".* or whatever, I want to select every field except full_html by default (and only grab it when I actually try accessing the attribute)
My current solution is:
Post.select(Post.column_names.map(&:to_sym) - [:full_html]).where(...)
but it's pretty gross
Here is a similar SO Question regarding blobs. The last two answers open up a couple of alternatives that you might want to check out. I was going to suggest something similar to the second to the last where you store the full html in a different model and then associate the two together but that may open up other performance issues.
Say that I needed to share a database with a partner. Obviously I have customer information in that database. Short of going through and identifying every column that contains privacy information and a custom script to 'scrub' the data, is there any tool or script which can scrub the data, but keep the format in tact (for example, if a string is 5 characters, it would stay 5 characters, only scrubbed)?
If not, how would you accomplish something like this, preferably in TSQL?
You may consider only share VIEW, create VIEWs to hide data that you don't want share.
Example:
CREATE VIEW v_customer
AS
SELECT
NAME,
LEFT(CreditCard,5) + '****' As CreditCard -- OR, don't show this column at all
....
FROM customer
Firstly I need to state professional interest I work for IBM which has tools that do exactly this.
Step 1. Ensure you identify all the PII (Personally Identifiable Information). When sharing database information it is typical that the obvious column names like "name" are found but you also need to find the "hidden" data where either the data is embedded in a standard format eg string-name-string and column name is something like "reference code" or is in free format text fields . as you have seen this is not going to be an easy job unless you automate it. The Tool for this is InfoSphere Discovery
Step 2. What context does the "scrubbed" data need to be in. Changing named fields to random characters has problems when testing as users focus on text errors rather than functional failures, therefore change names to real but ficticious. Credit card information often needs to be "valid". by that I mean it needs to have a valid prefix say 49XX but the rest an invalid sequence. Finally you need to ensure that every instance of the change is propogated through the database to maintain consistency. Tool for this is Optim Test Data Management with Data Privacy option.
The two tools integrate to give a full data privacy solution.
Based on the original question, it seems you need the fields to be the same length, but not in a "valid" format? How about:
UPDATE customers
SET email = REPLICATE('z', LEN(email))
-- additional fields as needed
Copy/paste and rename tables/fields as appropriate. I think you're going to have a hard time finding a tool that's less work, unless your schema is very complicated, or my formatting assumptions are incorrect.
I don't have an MSSQL database in front of me right now, but you can also find all of the string-like columns by something like:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE IN ('...', '...')
I don't remember the exact values you need to compare for, but if you run the query and see what's there, they should be pretty self-explanatory.