I get the following error when I want to execute a SQL query:
"Msg 209, Level 16, State 1, Line 9
Ambiguous column name 'i_id'."
This is the SQL query I want to execute:
SELECT DISTINCT x.*
FROM items x LEFT JOIN items y
ON y.i_id = x.i_id
AND x.last_seen < y.last_seen
WHERE x.last_seen > '4-4-2017 10:54:11'
AND x.spot = 'spot773'
AND (x.technology = 'Bluetooth LE' OR x.technology = 'EPC Gen2')
AND y.id IS NULL
GROUP BY i_id
This is how my table looks like:
CREATE TABLE [dbo].[items] (
[id] INT IDENTITY (1, 1) NOT NULL,
[i_id] VARCHAR (100) NOT NULL,
[last_seen] DATETIME2 (0) NOT NULL,
[location] VARCHAR (200) NOT NULL,
[code_hex] VARCHAR (100) NOT NULL,
[technology] VARCHAR (100) NOT NULL,
[url] VARCHAR (100) NOT NULL,
[spot] VARCHAR (200) NOT NULL,
PRIMARY KEY CLUSTERED ([id] ASC));
I've tried a couple of things but I'm not an SQL expert:)
Any help would be appreciated
EDIT:
I do get duplicate rows when I remove the GROUP BY line as you can see:
I'm adding another answer in order to show how you'd typically select the lastest record per group without getting duplicates. You's use ROW_NUMBER for this, marking every last record per i_id with row number 1.
SELECT *
FROM
(
SELECT
i.*,
ROW_NUMBER() over (PARTITION BY i_id ORDER BY last_seen DESC) as rn
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
) ranked
WHERE rn = 1;
(You'd use RANK or DENSE_RANK instead of ROW_NUMBER if you wanted duplicates.)
You forgot the table alias in GROUP BY i_id.
Anyway, why are you writing an anti join query where you are trying to get rid of duplicates with both DISTINCT and GROUP BY? Did you have issues with a straight-forward NOT EXISTS query? You are making things way more complicated than they actually are.
SELECT *
FROM items i
WHERE last_seen > '2017-04-04 10:54:11'
AND spot = 'spot773'
AND technology IN ('Bluetooth LE', 'EPC Gen2')
AND NOT EXISTS
(
SELECT *
FROM items other
WHERE i.i_id = other.i_id
AND i.last_seen < other.last_seen
);
(There are other techniques of course to get the last seen record per i_id. This is one; another is to compare with MAX(last_seen); another is to use ROW_NUMBER.)
Related
I have this table:
CREATE TABLE [dbo].[Phrase] (
[PhraseId] UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
[English] NVARCHAR (MAX) NOT NULL,
[Romaji] NVARCHAR (MAX) NULL,
[EnglishAscii] AS (ascii([English])) PERSISTED,
PRIMARY KEY CLUSTERED ([PhraseId] ASC)
);
What I would like to do is to get a report that looks something like this:
A 25
B 35
C 10
D 99
...
Y 3
All the strings in the English column have a first character that is uppercase.
Can someone give me some hints as to how I can do this kind of a report?
Use LEFT string function
Select Left([English],1), Count(1)
From Yourtable
Group by Left([English],1)
or you can use SUBSTRING string function
Select Substring([English],1,1), Count(1)
From Yourtable
Group by Substring([English],1,1)
Use the UPPER keyword in order to make the first character of the column [English] in to upper case ,if it is not.
SELECT UPPER(Left([English],1)) Col,
Count(1) CNT
FROM Yourtable
GROUP BY Left([English],1)
Have a derived table which simply returns that first character. GROUP BY it's result:
select letter, count(*)
from
(
select substring([English], 1, 1) as letter
from [dbo].[Phrase]
) dt
group by letter
This way you only have to write the substring expression once. Easier to write without errors, and easier and safer to maintain/update.
This question already has answers here:
GROUP BY / aggregate function confusion in SQL
(5 answers)
Closed 8 years ago.
This is my table creation code
CREATE TABLE "transactions" (
"id" INT NOT NULL,
"tno" INT NOT NULL,
"pno" INT NOT NULL,
"sno" INT NOT NULL,
"accode" INT NOT NULL,
"acname" VARCHAR(50) NOT NULL,
"date" DATE NOT NULL,
"truck" VARCHAR(50) NULL,
"weight" DECIMAL NULL,
"quality" INT NULL,
"debit" MONEY NOT NULL,
"credit" MONEY NOT NULL,
"amount" MONEY NOT NULL,
"comment" TEXT NULL,
PRIMARY KEY ("id")
);
I want to get some data from this table by this query.
Select * from transactions GROUP BY tno
but it gives me error:
Msg 8120, Level 16, State 1, Line 1
Column 'transactions.tno' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
finaly i got solution . my answer is
select yt.id, yt.tno, yt.accode, yt.acname, yt.pno, yt.comment, yt.amount, yt.date from transactions yt inner join( select tno, max(id) id from transactions ss group by tno ) ss on yt.id = ss.id and yt.tno = ss.tno order by yt.tno desc
GROUP BY allows you to collect data about mathematical operations and collect them under non-mathematical ones.
For example
SELECT truck, SUM(tno)
FROM transactions
GROUP BY truck
Would give you the total values of TNO field for each type of "truck" you have in your table. The mathematical operator here is SUM.
So if your table contained this:
truck tno
------------
dodge 44
viper 33
dodge 22
The above query would return
truck tno
------------
dodge 66
viper 33
You're using GROUP BY without a math operator, so the query fails. To just get your data use
Select * from transactions
Or if you do want to use a math operator to get some data, tell us more clearly what you're trying to achieve rather than just posting table structure and a failing query.
I have to maintain a scary legacy database that is very poorly designed. All the tables have more than 100 columns - one has 650. The database is very denormalized and I have found that often the same data is expressed in several columns in the same row.
For instance, here is a sample of columns for one of the tables:
[MEMBERADDRESS] [varchar](331) NULL,
[DISPLAYADDRESS] [varchar](max) NULL,
[MEMBERINLINEADDRESS] [varchar](max) NULL,
[DISPLAYINLINEADDRESS] [varchar](250) NULL,
[__HISTISDN] [varchar](25) NULL,
[HISTISDN] [varchar](25) NULL,
[MYDIRECTISDN] [varchar](25) NULL,
[MYISDN] [varchar](25) NULL,
[__HISTALT_PHONE] [varchar](25) NULL,
[HISTALT_PHONE] [varchar](25) NULL,
It turns out that MEMBERADDRESS and DISPLAYADDRESS have the same value for all rows in the table. The same is true for the other clusters of fields I have shown here.
It will be very difficult and time consuming to identify all cases like this manually. Is it possible to create a query that would identify if two fields have the same value in every row in a table?
If not, are there any existing tools that will help me identify these sorts of problems?
There are two approaches I see to simplify this query:
Write a script that generates your queries - feed your script the name of the table and the suspected columns, and let it produce a query that checks each pair of columns for equality. This is the fastest approach to implement in a one-of situation like yours.
Write a query that "normalizes" your data, and search against it - self-join the query to itself, then filter out the duplicates.
Here is a quick illustration of the second approach:
SELECT id, name, val FROM (
SELECT id, MEMBERADDRESS as val,'MEMBERADDRESS' as name FROM MyTable
UNION ALL
SELECT id, DISPLAYADDRESS as val,'DISPLAYADDRESS' as name FROM MyTable
UNION ALL
SELECT id, MEMBERINLINEADDRESS as val,'MEMBERINLINEADDRESS' as name FROM MyTable
UNION ALL
...
) first
JOIN (
SELECT id, MEMBERADDRESS as val,'MEMBERADDRESS' as name FROM MyTable
UNION ALL
SELECT id, DISPLAYADDRESS as val,'DISPLAYADDRESS' as name FROM MyTable
UNION ALL
SELECT id, MEMBERINLINEADDRESS as val,'MEMBERINLINEADDRESS' as name FROM MyTable
UNION ALL
...
) second ON first.id=second.id AND first.value=second.value
There is a lot of manual work for 100 columns (at least it does not grow as N^2, as in the first approach, but it is still a lot of manual typing). You may be better off generating the selects connected with UNION ALL using a small script.
The following approach uses unpivot to create triples. It makes some assumptions: values are not null; each row has an id; and columns have compatible types.
select t.which, t2.which
from (select id, which, value
from MEMBERADDRESS
unpivot (value for which in (<list of columns here>)) up
) t full outer join
(select id, which, value
from MEMBERADDRESS
unpivot (value for which in (<list of columns here>)) up
) t2
on t.id = t2.id and t.which <> t2.which
group by t.which, t2.which
having sum(case when t.value = t2.value then 1 else 0 end) = count(*)
It works by creating a new table with three columns: id, which column, and the value in the column. It then does a self join on id (to keep comparisons within one row) and value (to get matching values). This self-join should always match, because the columns are the same in the two halves of the query.
The having then counts the number of values that are the same on both sides for a given pair of columns. When all these are the same, then the match is successful.
You can also leave out the having clause and use something like:
select t.which, t2.which, sum(case when t.value = t2.value then 1 else 0 end) as Nummatchs,
count(*) as NumRows
To get more complete information.
I've got a table that looks like this (I wasn't sure what all might be relevant, so I had Toad dump the whole structure)
CREATE TABLE [dbo].[TScore] (
[CustomerID] int NOT NULL,
[ApplNo] numeric(18, 0) NOT NULL,
[BScore] int NULL,
[OrigAmt] money NULL,
[MaxAmt] money NULL,
[DateCreated] datetime NULL,
[UserCreated] char(8) NULL,
[DateModified] datetime NULL,
[UserModified] char(8) NULL,
CONSTRAINT [PK_TScore]
PRIMARY KEY CLUSTERED ([CustomerID] ASC, [ApplNo] ASC)
);
And when I run the following query (on a database with 3 million records in the TScore table) it takes about a second to run, even though if I just do: Select BScore from CustomerDB..TScore WHERE CustomerID = 12345, it is instant (and only returns 10 records) -- seems like there should be some efficient way to do the Max(ApplNo) effect in a single query, but I'm a relative noob to SQL Server, and not sure -- I'm thinking I may need a separate key for ApplNo, but not sure how clustered keys work.
SELECT BScore
FROM CustomerDB..TScore (NOLOCK)
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2 (NOLOCK)
WHERE sc2.CustomerID = 12345)
Thanks much for any tips (pointers on where to look for optimization of sql server stuff appreciated as well)
When you filter by ApplNo, you are using only part of the key. And not the left hand side. This means the index has be scanned (look at all rows) not seeked (drill to a row) to find the values.
If you are looking for ApplNo values for the same CustomerID:
Quick way. Use the full clustered index:
SELECT BScore
FROM CustomerDB..TScore
WHERE ApplNo = (SELECT Max(ApplNo)
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345)
AND CustomerID = 12345
This can be changed into a JOIN
SELECT BScore
FROM
CustomerDB..TScore T1
JOIN
(SELECT Max(ApplNo) AS MaxApplNo, CustomerID
FROM CustomerDB..TScore sc2
WHERE sc2.CustomerID = 12345
) T2 ON T1.CustomerID = T2.CustomerID AND T1.ApplNo= T2.MaxApplNo
If you are looking for ApplNo values independent of CustomerID, then I'd look at a separate index. This matches your intent of the current code
CREATE INDEX IX_ApplNo ON TScore (ApplNo) INCLUDE (BScore);
Reversing the key order won't help because then your WHERE sc2.CustomerID = 12345 will scan, not seek
Note: using NOLOCK everywhere is a bad practice
Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70