With my SQL getting rusty I'm trying hard to get this working.
I have a table with multiple entries as shown in the attached image.
I need to write a query so that it returns me the count of the number of entries. i.e, 7 in this case. which is a distinct combination of MainID, ItemID, DeviceID, and ChannelID
Thanks for your help!!
Try the following:
Select
Count(*)
From
(
Select
Distinct
MainID,
ItemID,
DeviceID,
ChannelID
From
yourtable
)X;
Just use distinct as you are looking for distinct combination
ie
select distinct MainID, ItemID, DeviceID, ChannelID from Yourtable
ROW_NUMBER()
there may have many other solutions.
But to speed-up, your query's best use is to Use ROW_NUMBER().
Sample code:
select
MainID,
ItemID,
DeviceID,
ChannelID,
ROW_NUMBER() OVER(ORDER BY MainID,ItemID,DeviceID,ChannelID) as COUNT_VAR
from Your_table;
It is faster than GROUP BY query because GROUP BY uses run_time generation of sequence number where ROW_NUMBER() uses SQL Server defined default indexes.
Hope this will help you
IF OBJECT_ID('tempdb..#T') IS NOT NULL
BEGIN
DROP TABLE #T
END
CREATE TABLE #T (
MainId INT
,ItemId INT
,DeviceId INT
,ChannelId INT
)
INSERT INTO #T VALUES (1,1,2,1),(1,1,2,2),(1,1,2,3),(1,1,22,0),(1,2,1,1),(1,2,1,2)
,(1,2,1,3),(1,2,2,1),(1,2,2,2),(1,2,2,3),(1,2,3,1),(1,2,3,2),(1,2,3,3),(1,2,4,1)
,(1,2,4,2),(1,2,4,3),(1,2,5,1),(1,2,5,2),(1,2,5,3)
SELECT MainId, ItemId, DeviceId, COUNT(*) as RecordCount
FROM
#T
GROUP BY
MainId, ItemId, DeviceId
SELECT DISTINCT MainId, ItemId, DeviceId
FROM
#T
SELECT *, DENSE_RANK() OVER (ORDER BY MainId, ItemId, DeviceId) as GroupNumber
FROM
#t
As I have previously stated the reason you are having difficulty is that you are including ChannelId into your query and that of the other answers. If ChannelId is included it means that there will be more than 7 Distinct combinations of records. And no my original query did not return more than 7 records it returned only 7 records and the count of how many records of that combination of columns.
I have updated with 3 examples including sample data, which I urge you to do for people next time as you are asking us for help so make it easy provide DML statements rather than an image of your test case!
Example 1 simply adds the group by columns into my original query so you could see what I was talking about and was a little oversight.
Example 2 is the distinct which is basically the same as #1.
Example 3 if you want all of the other data like you have specified in other comments you would actually want to use a windowed function if your DBMS supports it, but of course this doesn't translate perfectly to linq. But you would need to use DENSE_RANK() to get the grouping the way you want it.
Related
I have user_id in one column, attempt_id in another, along with various other data. The attempt_id is unique but user_id is not.
I would like to insert a column which puts down the number of times a user_id has appeared up to that point. So first time it appears, 1, second 2, etc.
I have attempted to do this with basic count functions, but that returns the total count in each instance, rather than a running count.
Is there a simple formula/trick which solves this issue? Running sums aren't really what I want, I want to search the column and see if this is the first instance of that user_id appearing.
If you just want a count in a query, use a row_number(). You require something that specifies the ordering of the rows -- that is presumably attempt_id:
select t.*,
row_number() over (partition by user_id order by attempt_id) as seqnum
from t;
Assuming the attempt_id increases with occurrence for a given user, you may try using COUNT as an analytic function:
WITH cte AS (
SELECT user_id, attempt_id,
COUNT(*) OVER (PARTITION BY user_id ORDER BY attempt_id) cnt
FROM yourTable
)
SELECT user_id, attempt_id
FROM cte
WHERE cnt = 1;
Going out on a limb here, but based on your description above I assume you want to increase the attempt_id as you insert new rows. So I am going to assume you are going to pass in user_id as a variable. So you may consider this pattern:
CREATE TABLE #tmp(userid int, attempt_id int)
DECLARE #userid int
SET #userid = 1
INSERT INTO #tmp
SELECT #userid, (SELECT ISNULL(MAX(attempt_ID),0) + 1 from #tmp where userid = #userid)
select * from #tmp
I would like to be able to extract one field from multiple records from within a single table. For example, assuming I have a schema as follows
userId, eventTimestamp, theField
And what I want to do is be able to concatenate all instances of the field 'theField' together into a single string for a given userId ordered by eventTimestamp. And for an extra wrinkle, lets say I only want to include the first fiftiest oldest records.
My first attempt was to try something like:
SELECT
userId,
eventTimestamp,
LEAD(theField,0) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step0,
LEAD(theField,1) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step1,
....,
LEAD(theField,50) OVER (PARTITION BY userId ORDER BY eventTimestamp) AS step50,
And then the next step was to wrap that first step up in another SELECT statement as follows:
SELECT userId, eventTimestamp, CONCAT(STRING(step0), STRING(step1),...,STRING(step50)) as concatenatedString
FROM [whateverDataset.whateverTable],
GROUP BY
userId, eventTimestamp
This approach doesn't work though because if I have more than 50 steps (which I do), then I end up getting multiple rows for each of those outer SELECT statements, basically N-50 rows, where N = the total number of records for a particular userId. A 'solution' to this would be to have a HAVING statement in the inner SELECT statement to limit itself to only reporting the first 50 records, but overall this seems like a rather cumbersome solution. In non-BigQuery variants of SQL the GROUP_CONCAT seems to be a good way to go forward, but it either doesn't work here or I lack the creativity to get it to work. Anyone have any suggestions?
Thanks,
Brad
For BigQuery Legacy SQL:
SELECT
userid, GROUP_CONCAT(theField) AS Fields
FROM (
SELECT
userid, eventTimestamp, theField,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY eventTimestamp DESC) AS pos
FROM YourTable
ORDER BY eventTimestamp
)
WHERE pos < 51
GROUP BY userid
Please note: inner ORDER BY does not guarantee the order of theField in GROUP_CONCAT. But, so far, in all practical cases I see the order is carrying. So, test carefuly
For BigQuery Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
SELECT
userid,
(SELECT STRING_AGG(fields) FROM t.fields) AS fields
FROM (
SELECT
userid,
ARRAY(SELECT theField FROM t.fields ORDER BY eventTimestamp) fields
FROM (
SELECT
userid,
ARRAY_AGG(STRUCT(theField, eventTimestamp)) fields
FROM (
SELECT
userid,
eventTimestamp,
theField,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY eventTimestamp DESC) AS pos
FROM YourTable
)
WHERE pos < 51
GROUP BY userid
) t
) t
I have a query that partitions and ranks "Note" records, grouping them by ID_Task (users add notes for each task). I want to rank the notes by date, but I also want to restrict it so they're ranked between two dates.
I'm using SQL Server 2008. So far my SELECT looks like this:
SELECT Note.ID,
Note.ID_Task,
Note.[Days],
Note.[Date],
ROW_NUMBER() OVER (PARTITION BY ID_Task ORDER BY CAST([Date] AS DATE), Edited ASC) AS Rank
FROM
Note
WHERE
Note.Locked = 1 AND Note.Deleted = 0
Now, I assume that if I put the WHERE clause at the bottom, although they'll still have ranks, I might or might not get item with rank 1, as it might get filtered out. So is there a way I can only partition records WHERE , ignoring all of the others? I could partition a sub-query I guess.
The intention is to use the rank number to find the most recent note for each task, in another query. So in that query I'll join with this result WHERE rank = 1.
row_number() operates after where. You'll always get a row 1.
For example:
declare #t table (id int)
insert #t values (3), (1), (4)
select row_number() over (order by id)
from #t
where id > 1
This prints:
1
2
Title says it all, why can't I use a windowed function in a where clause in SQL Server?
This query makes perfect sense:
select id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc)
But it doesn't work. Is there a better way than a CTE/Subquery?
EDIT
For what its worth this is the query with a CTE:
with Best_Sales as (
select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by sales_amount desc) rank
from Sales_log
)
select id, sales_person_id, product_type, product_id, sale_amount
from Best_Sales
where rank = 1
EDIT
+1 for the answers showing with a subquery, but really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.
why can't I use a windowed function in a where clause in SQL Server?
One answer, though not particularly informative, is because the spec says that you can't.
See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).
really I'm looking for the reasoning behind not being able to use
windowing functions in where clauses.
The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)
Suppose your table was
CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)
INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')
And your query
SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'
What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?
There is no need for CTE, just use the windowing function in a subquery:
select id, sales_person_id, product_type, product_id, sale_amount
from
(
select id, sales_person_id, product_type, product_id, sale_amount,
row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
from Sales_Log
) sl
where rn = 1
Edit, moving my comment to the answer.
Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.
"All-at-once operation" means that all expressions in the same
logical query process phase are evaluated logically at the same time.
And great chapter Impact on Window Functions:
Suppose you have:
CREATE TABLE #Test ( Id INT) ;
INSERT INTO #Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM #Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can
evaluate conditions in WHERE clause in arbitrary order, based on
estimated execution plan. So the main question here is which condition
evaluates first.
Case 1:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
Result: 1002
Case 2:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
Result: empty
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause.
You can think more about this and find why Window Functions are
allowed to be used just in SELECT and ORDER BY clauses!
Addendum
Teradata supports QUALIFY clause:
Filters results of a previously computed ordered analytical function according to user‑specified search conditions.
SELECT Id
FROM #Test
WHERE Id = 1002
QUALIFY ROW_NUMBER() OVER(ORDER BY Id) = 1;
Snowflake - Qualify
QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses.
In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:
From
Where
Group by
Having
Window
QUALIFY
Distinct
Order by
Limit
Databricks - QUALIFY clasue
Filters the results of window functions. To use QUALIFY, at least one window function is required to be present in the SELECT list or the QUALIFY clause.
You don't necessarily need to use a CTE, you can query the result set after using row_number()
select row, id, sales_person_id, product_type, product_id, sale_amount
from (
select
row_number() over(partition by sales_person_id,
product_type, product_id order by sale_amount desc) AS row,
id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
) a
where row = 1
It's an old thread, but I'll try to answer specifically the question expressed in the topic.
Why no windowed functions in where clauses?
SELECT statement has following main clauses specified in keyed-in order:
SELECT DISTINCT TOP list
FROM JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH
Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.
----- Relational result
1. FROM
1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
2. WHERE
3. GROUP BY
3.1. WITH CUBE / WITH ROLLUP
4. HAVING
---- After the HAVING step the Underlying Query Result is ready
5. SELECT
5.1. SELECT list
5.2. DISTINCT
----- Relational result
----- Non-relational result (a cursor)
6. ORDER BY
7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)
For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.
Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.
OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.
Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.
The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.
Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.
Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.
Links: here, here and here
Finally, there's the old-fashioned, pre-SQL Server 2005 way, with a correlated subquery:
select *
from Sales_Log sl
where sl.id = (
Select Top 1 id
from Sales_Log sl2
where sales_person_id = sl.sales_person_id
and product_type = sl.product_type
and product_id = sl.product_id
order by sale_amount desc
)
I give you this for completeness, merely.
Basically first "WHERE" clause condition is read by sql and the same column/value id looked into the table but in table row_num=1 is not there still. Hence it will not work.
Thats the reason we will use parentheses first and after that we will write the WHERE clause.
Yes unfortunately when you do a windowed function SQL gets mad at you even if your where predicate is legitimate. You make a cte or nested select having the value in your select statement, then reference your CTE or nested select with that value later. Simple example that should be self explanatory. If you really HATE cte's for some performance issue on doing a large data set you can always drop to temp table or table variable.
declare #Person table ( PersonID int identity, PersonName varchar(8));
insert into #Person values ('Brett'),('John');
declare #Orders table ( OrderID int identity, PersonID int, OrderName varchar(8));
insert into #Orders values (1, 'Hat'),(1,'Shirt'),(1, 'Shoes'),(2,'Shirt'),(2, 'Shoes');
--Select
-- p.PersonName
--, o.OrderName
--, row_number() over(partition by o.PersonID order by o.OrderID)
--from #Person p
-- join #Orders o on p.PersonID = o.PersonID
--where row_number() over(partition by o.PersonID order by o.orderID) = 2
-- yields:
--Msg 4108, Level 15, State 1, Line 15
--Windowed functions can only appear in the SELECT or ORDER BY clauses.
;
with a as
(
Select
p.PersonName
, o.OrderName
, row_number() over(partition by o.PersonID order by o.OrderID) as rnk
from #Person p
join #Orders o on p.PersonID = o.PersonID
)
select *
from a
where rnk >= 2 -- only orders after the first one.
I have a table that I wish to select a subset of columns from but also add on the end a computed column based upon where you are located in a queue. There are the following fields (that are pertinent):
id: int, auto increment, primary key
answertime: datetime, nullable
By default, when something is submitted to the queue, its answertime is NULL. So, I wish to select the ID of the thing in the queue as well as its rank in the queue (i.e. rank 1 is the next item that is unanswered, etc). Here's what I was thinking:
rank - id - COUNT(ids below my id where answertime is not null). However, I'm having an issue with the syntax of this query:
SELECT id AS outerid, COUNT(
SELECT * FROM tablename WHERE id<outerid AND answertime IS NOT NULL
)
FROM tablename
WHERE answertime IS NULL;
Now, obviously, this is wrong because I'm fairly confident you can't embed a select inside of an aggregate function, likewise flipping the SELECT and COUNT doesn't work as you can't embed a SELECT at that point in the code (it can only be used in a WHERE clause).
Is this even possible to do with just SQL or do I need to add some logic on the program end?
If it helps, I'm doing this on SQL Server 2008, although I doubt that would add any value.
You can do that, you just can't use SELECT * in an aggregate sub-query. Try this, which gets the COUNT value as a scalar result:
SELECT
id AS outerid,
(SELECT COUNT(Id) FROM tablename
WHERE id<outie.id AND answertime IS NOT NULL)
FROM tablename outie
WHERE answertime IS NULL;
You may need to choose for yourself between using COUNT(*), COUNT(Id) or some other column depending on what you're really after.
SELECT id AS outerid,
(SELECT COUNT(*) FROM tablename WHERE id < outerid AND answertime IS NOT NULL) AS othercol
FROM tablename -- ?
WHERE answertime IS NULL;
also, where's the FROM statement?
As suggested by #HLGEM, you could use ROW_NUMBER() to obtain your results. The method involves ranking the rows in tablename by id without partitioning and by id with partitioning by answertime. The difference between the rankings for every row where answertime is NULL would give you the same value as the one you are calculating using COUNT() in the subquery.
Here's an implementation of the method:
;
WITH ranked AS (
SELECT
*,
Rnk = ROW_NUMBER() OVER ( ORDER BY id),
PartRnk = ROW_NUMBER() OVER (PARTITION BY answertime ORDER BY id)
FROM tablename
)
SELECT
id, /* AS outerid, if you like */
Cnt = Rnk - PartRnk
FROM ranked
WHERE answertime IS NULL