I'm loading some quite nasty data through Azure data factory
This is how the data looks after being loaded, existing of 2 parts:
1. Metadata of a test
2. Actual measurements of the test -> the measurement is numeric
Image I have about 10 times such 'packages' of 1.Metadata + 2.Measurements
What I would like it to be / what I'm looking for is the following:
The number column with 1,2,.... is what I'm looking for!
Imagine my screenshot could go no further but this goes along until id=10
I guess a while loop is necessary here...
Query before:
SELECT Field1 FROM Input
Query after:
SELECT GeneratedId, Field1 FROM Input
Thanks a lot in advance!
EDIT: added a hint:
Here is a solution, this requires SQL-SERVER 2012 or later.
Start by getting an Id column on your data. If you can do this previous to the script that would be even better, but if not, try something like this...
CREATE TABLE #InputTable (
Id INT IDENTITY(1, 1),
TestData NVARCHAR(MAX) )
INSERT INTO #InputTable (TestData)
SELECT Field1 FROM Input
Now create a query to get the GeneratedId of each package as well as the Id where they start and end. You can do this by getting all the records LIKE 'title%' since that is the first record of each package, then using ROW_NUMBER, Id, and LEAD for the GeneratedId, StartId, and EndId respectively.
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%'
Lastly, join this to the input in order to get all the records, with the correct GeneratedId.
SELECT
package.GeneratedId, i.TestData
FROM (
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%' ) package
INNER JOIN #InputTable i
ON i.Id >= package.StartId
AND (package.EndId IS NULL OR i.Id < package.EndId)
Related
I have a table say MyTable has two columns Id, Data and has following records in it:
Id Data
----------
1. ABCDE00
2. DEFGH11
3. CCCCC21
4. AAAAA00
5. BBBBB10
6. vvvvv00
7. xxxxx88
Now what I want that all the records which have end with string 00 and does not have subsequent row having column ending with 11.
So my output using this condition should be like this:
1. AAAAA00
2. vvvvv00
Any help would be appreciated.
This answer makes some assumptions:
You have a column specifying the ordering. Let me call it id.
By "subsequent row" you mean the row with the next highest id.
You are using SQL Server 2012+.
In that case, lead() does what you want:
select t.*
from (select t.*, lead(data order by id) as next_data
from t
) t
where data like '%00' and (next_data not like '%11' or next_data is null);
Earlier versions of SQL Server have alternative methods for calculating next_data.
if anyone is not using sql server 2012,then they an try this
declare #t table(id int identity(1,1),col1 varchar(100))
insert into #t values
('ABCDE00')
,('DEFGH11')
,('CCCCC21')
,('AAAAA00')
,('BBBBB10')
,('vvvvv00')
,('xxxxx88')
;With CTE as
(
select *,case when CHARINDEX('00',reverse(col1))>0 then 1 end
End00 from #t
)
,CTE1 as
(
select a.id,a.col1 from cte A
where exists
(select id from cte b where a.id=b.id+1 and b.end00 is not null)
and CHARINDEX('11',reverse(a.col1))<=0
)
select a.id,a.col1 from cte A
where exists
(select id from cte1 b where a.id=b.id-1 )
Imagine having a table as the one below:
create table test (
id int auto_increment,
some int,
columns int
)
And then this table get used alot. Rows are inserted and rows are deleted and over time there might be gaps in the number that once was auto incremented. As an example, if I at some point make the following query:
select top 10 id from test
I might get something like
3
4
6
7
9
10
13
14
18
19
How do I design a query that returns the missing values 1,2,5,8 etc?
The easiest way is to get ranges of missing values:
select (id + 1) as firstmissing, (nextid - 1) as lastmissing
from (select t.id, lead(id) over (order by id) as nextid
from test t
) t
where nextid is not null and nextid <> id + 1;
Note this uses the lead() function, which is available in SQL Server 2012+. You can do something similar with apply or a subquery in earlier versions. Here is an example:
select (id + 1) as firstmissing, (nextid - 1) as lastmissing
from (select t.id, tt.id as nextid
from test t cross apply
(select top 1 id
from test t2
where t2.id > t.id
order by id
) tt
) t
where nextid is not null and nextid <> id + 1;
Simple way is by using cte..
;WITH cte
AS (SELECT 1 id
UNION ALL
SELECT id + 1 id from cte
WHERE id < (SELECT Max(id)
FROM tablename))
SELECT *
FROM cte
WHERE id NOT IN(SELECT id
FROM tablename)
Note: this will start from 1. If you want start from the min value of your table just replace
"SELECT 1 id" to "SELECT Min(id) id FROM tablename"
Why does it matter? I'm not trying to be snarky, but this question is usually asked in the context of "I want to fill in the gaps" or "I want to compress my id values to be contiguous". In either case, the answer is "don't do it". In your example, there was at some point a row with id = 5. If you're going to do either of the above, you'll be assigning a different, unrelated set of business data that id. If there's anything that references the id external to your database, now you've just invented a problem that you didn't have before. The id should be treated as immutable and arbitrary for all intents and purposes. If you really require it to be gapless, don't use identity and never do a hard delete (i.e. if you need to deactivate a row, you need a column which says whether it's active or not).
I have a log table where one of the fields is a filename. These filenames are versioned with a suffix at the end of filename. Say we made file SampleName.xml but later had to revise this -- the new version would appear in the log as SampleName_V2.xml (and this could continue increasing indefinitely, but the most I've seen is V8).
I need a way to SELECT every entry in this log, but only keep the entry with the latest version number on the filename.
I feel like there's got to be an easy answer to this, but I've been trying to think of it all day and can't come to it.
Anyone have any ideas?
EDIT: We do have a DateTime field in every row as well, if that helps.
Here is something that will do the job for you. Idea is to use temp table that also holds file names without _v suffix.
I’ve probably made this more complex than needed but you’ll be able to see the point
DROP TABLE #TmpResults
CREATE TABLE #TmpResults
(
Original nvarchar(100),
WO_Version nvarchar(100),
Last_Update datetime
)
INSERT INTO #TmpResults
(Original, WO_Version, Last_Update)
VALUES
('file1.xml', 'file1.xml', '01/01/2013'),
('file2.xml', 'file2.xml', '02/01/2013'),
('file2_v2.xml', 'file2.xml', '03/01/2013'),
('file3.xml', 'file3.xml', '01/01/2013'),
('file3_v2.xml', 'file3.xml', '01/02/2013'),
('file3_v3.xml', 'file3.xml', '01/03/2013'),
('file4.xml', 'file4.xml', '05/01/2013'),
('file5.xml', 'file5.xml', '06/01/2013'),
('file5_v2.xml', 'file5.xml', '06/02/2013'),
('file5_v3.xml', 'file5.xml', '06/03/2013'),
('file5_v4.xml', 'file5.xml', '06/04/2013')
SELECT
P.WO_Version,
(SELECT MAX(Last_Update) FROM #TmpResults T WHERE T.WO_Version =
P.WO_Version) as Last_Update,
(SELECT TOP 1 Original
FROM #TmpResults T
WHERE T.Last_Update =
( SELECT MAX(Last_Update)
FROM #TmpResults Tm
WHERE Tm.WO_Version = P.WO_Version) ) as Last_FileVersion
FROM
(
SELECT DISTINCT WO_Version
FROM #TmpResults
GROUP BY WO_Version
) P
Here is the SELECT query you can use to fill the temp table with SELECT INTO
SELECT
Original_File_Name,
REPLACE(#Original_File_Name,
SUBSTRING(#Original_File_Name, LEN(#Original_File_Name) - CHARINDEX('v_',REVERSE(#Original_File_Name), 1), LEN(#Original_File_Name) - CHARINDEX('v_',REVERSE(#Original_File_Name), 1)),
SUBSTRING(#Original_File_Name, LEN(#Original_File_Name) - CHARINDEX('.',REVERSE(#Original_File_Name), 1) +1 , LEN(#Original_File_Name) - CHARINDEX('.',REVERSE(#Original_File_Name), 1))) as WO_Version,
Last_Update
FROM OriginalDataTable
I think this will gives you the result
SELECT TOP(1) filename
FROM table
ORDER BY datetime_field DESC
If you are sure that your version number is in the order of _V1 to _V8 this will help you
SELECT TOP(1) filename
FROM table
ORDER BY CAST(RIGHT(SUBSTRING([Filename],1,LEN(SUBSTRING([Filename], 0,
PATINDEX('%.%',[Filename])) + '.') - 1),1) AS INT)
UPDATED
I am suggesting another method which gives you all the file name with latest version.
;WITH cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY LEFT(#products,LEN(#products)-CHARINDEX('_',#products))
ORDER BY date_field DESC
/*OR order by CAST(RIGHT(SUBSTRING([Filename],1,LEN(SUBSTRING([Filename], 0,
PATINDEX('%.%',[Filename])) + '.') - 1),1) AS INT) ASC*/
) AS rno,
filename
FROM table
)
SELECT * FROM cte WHERE rno=1
Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70
Microsoft SQL Server 2008 (SP1), getting an unexpected 'Conversion failed' error.
Not quite sure how to describe this problem, so below is a simple example. The CTE extracts the numeric portion of certain IDs using a search condition to ensure a numeric portion actually exists. The CTE is then used to find the lowest unused sequence number (kind of):
CREATE TABLE IDs (ID CHAR(3) NOT NULL UNIQUE);
INSERT INTO IDs (ID) VALUES ('A01'), ('A02'), ('A04'), ('ERR');
WITH ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
The error is, 'Conversion failed when converting the varchar value 'RR' to data type int.'
I can't understand why the value ID = 'ERR' should be being considered for conversion because the predicate ID LIKE 'A[0-9][0-9]' should have removed the invalid row from the resultset.
When the base table is substituted with an equivalent CTE the problem goes away i.e.
WITH IDs (ID)
AS
(
SELECT 'A01'
UNION ALL
SELECT 'A02'
UNION ALL
SELECT 'A04'
UNION ALL
SELECT 'ERR'
),
ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
Why would a base table cause this error? Is this a known issue?
UPDATE #sgmoore: no, doing the filtering in one CTE and the casting in another CTE still results in the same error e.g.
WITH FilteredIDs (ID)
AS
(
SELECT ID
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
),
ValidIDs (ID, seq)
AS
(
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM FilteredIDs
)
SELECT MIN(V1.seq) + 1 AS next_seq
FROM ValidIDs AS V1
WHERE NOT EXISTS (
SELECT *
FROM ValidIDs AS V2
WHERE V2.seq = V1.seq + 1
);
It's a bug and has already been reported as SQL Server should not raise illogical errors (as I said, it's hard to describe this one!) by Erland Sommarskog.
The response from the SQL Server Programmability Team is, "the issue is that SQL Server raises errors [too] eagerly due to pushing of prediates/expressions during query execution without considering the logical result of the query."
I've now voted for a fix, everyone do the same please :)
What if you replace the section
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM IDs
WHERE ID LIKE 'A[0-9][0-9]'
With
SELECT ID, CAST(RIGHT(ID, 2) AS INTEGER)
FROM
(
select ID from IDs
WHERE ID LIKE 'A[0-9][0-9]'
)
This happened to me because I did a Union and was not careful to make sure both queries had their fields in the same order. Once I fixed that, it was fine.