A SQL puzzle (find the first occurrence of a column value) - sql

CIs there an easy way where I can find the first occurrence of a row that has a particular value in a column? For example suppose I have these two tables
Alphabet
A
B
C
D
Alphabet Usage
A Apple
B Bat
D Dog
A Amateur
A Arsenal
C Cat
B Ball
D Drum
What would be an easy way to select everything in the first table and the first usage of it in the second table?
Expected Output:
Alphabet Usage
A Apple
B Bat
C Cat
D Dog

You should be able to apply row_number(). However when using row_number there is an order that needs to be provided.
This first example uses a order by usage but the problem is that that will not be in the order of first in the table, it will be in alphabetical order:
select alphabet, usage
from
(
select t1.alphabet,
t2.usage,
row_number() over(partition by t1.alphabet order by t2.usage) rn
from table1 t1
inner join table2 t2
on t1.alphabet = t2.alphabet
) src
where rn =1
See SQL Fiddle with Demo.
If you do not have a numeric id field to guarantee the order of the first one entered. You might be able to use:
select alphabet, usage
from
(
select t1.alphabet,
t2.usage,
row_number() over(partition by t1.alphabet order by (select 1)) rn
from table1 t1
inner join table2 t2
on t1.alphabet = t2.alphabet
) src
where rn =1
See SQL Fiddle with Demo.
As #Aaron pointed out in the comments, that order is not guaranteed with using this method and the behavior can change.
Ideally, you should have sort type of column that will allow you distinguish the first occurrence of your data, i.e. datetime, id, etc. Since there is no order on data in a table, you apply the order using order by

You cannot, unless you have an ordering on the second table. SQL Tables are inherently unordered, so you would need a column that specifies an insert time or an auto-incrementing id.
If you happen to be running SQL Server with no parallelism and the data is stored in a single file or if the data in the second table fits on one page, then the following will probably work (but no guarantees):
select au.*
from (select au.Alphabet, min(seqnum) as minseqnum
from (select au.*, row_number() over (order by (select NULL)) as seqnum
from AlphabetUsage au
) au
group by au.Alphabet
) ausum join
(select au.*, row_number() over (order by (select NULL)) as seqnum
from AlphabetUsage au
) au
on ausum.seqnum = au.seqnum
In my experience on SQL Server, row_number() over (order by (select NULL)) assigns a row number without ordering the data. However, this is not documented and not guaranteed.
I highly, highly recommend that you add additional columns to the table, including an identity column to identify each row.

Related

Select two columns just side by side as join is resulting in multiple entries

The output should be just the two columns along side. In case one column has more values, output should just show null in the other column.
Facing this difficulty as this somehow is on the fringe of SQL table definition itself.
You can use row_number() and full join:
select me.me, pn.pnname
from (select me.*,
row_number() over (order by (select null)) as seqnum
from me
) me full join
(select pn.*,
row_number() over (order by (select null)) as seqnum
frompn
) pn
on me.seqnum = pn.seqnum;
Note that SQL tables represent unordered sets, so the results are in an arbitrary order. If you want a particular order in each column, then put that information in the order by.

select rows in sql with latest date from 3 tables in each group

I'm creating PREDICATE system for my application.
Please see image that I already
I have a question how can I select rows in SQL with latest date "Taken On" column tables for each "QuizESId" columns, before that I am understand how to select it but it only using one table, I learn from this
select rows in sql with latest date for each ID repeated multiple times
Here is what I have already tried
SELECT tt.*
FROM myTable tt
INNER JOIN
(SELECT ID, MAX(Date) AS MaxDateTime
FROM myTable
GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID
AND tt.Date = groupedtt.MaxDateTime
What I am confused about here is how can I select from 3 tables, I hope you can guide me, of course I need a solution with good query and efficient performance.
Thanks
This is for SQL Server (you didn't specify exactly what RDBMS you're using):
if you want to get the "latest row for each QuizId" - this sounds like you need a CTE (Common Table Expression) with a ROW_NUMBER() value - something like this (updated: you obviously want to "partition" not just by QuizId, but also by UserName):
WITH BaseData AS
(
SELECT
mAttempt.Id AS Id,
mAttempt.QuizModelId AS QuizId,
mAttempt.StartedAt AS StartsOn,
mUser.UserName,
mDetail.Score AS Score,
RowNum = ROW_NUMBER() OVER (PARTITION BY mAttempt.QuizModelId, mUser.UserName
ORDER BY mAttempt.TakenOn DESC)
FROM
UserQuizAttemptModels mAttempt
INNER JOIN
AspNetUsers mUser ON mAttempt.UserId = muser.Id
INNER JOIN
QuizAttemptDetailModels mDetail ON mDetail.UserQuizAttemptModelId = mAttempt.Id
)
SELECT *
FROM BaseData
WHERE QuizId = 10053
AND RowNum = 1
The BaseData CTE basically selects the data (as you did) - but it also adds a ROW_NUMBER() column. This will "partition" your data into groups of data - based on the QuizModelId - and it will number all the rows inside each data group, starting at 1, and ordered by the second condition - the ORDER BY clause. You said you want to order by "Taken On" date - but there's no such date visible in your query - so I just guessed it might be on the UserQuizAttemptModels table - change and adapt as needed.
Now you can select from that CTE with your original WHERE condition - and you specify, that you want only the first row for each data group (for each "QuizId") - the one with the most recent "Taken On" date value.

Get a new column with updated values, where each row change in time depending on the actual column?

I have some data that includes as columns an ID, Date and Place denoted by a number. I need to simulate a real time update where I create a new column that says how many different places are at the moment, so each time a new place appear in the column, the new column change it's value and shows it.
This is just a little piece of the original table with hundreds of millions of rows.
Here is an example, the left table is the original one and the right table is what I need.
I tried to do it with this piece of code but I cannot use the function DISTINCT with the OVER clause.
SELECT ID, Dates, Place,
count (distinct(Place)) OVER (PARTITION BY Place ORDER BY Dates) AS
DiffPlaces
FROM #informacion_prendaria_muestra
order by ID;
I think it will be possible by using DENSE_RANK() in SQL server
you can try this
SELECT ID, Dates, Place,
DENSE_RANK() OVER(ORDER BY Place) AS
DiffPlaces
FROM #informacion_prendaria_muestra
I think you can use a self join query like this - without using windows functions -:
select
t.ID, t.[Date], t.Place,
count(distinct tt.Place) diffPlace
from
yourTable t left join
yourTable tt on t.ID = tt.ID and t.[Date] >= tt.[Date]
group by
t.ID, t.[Date], t.Place
order by
Id, [Date];
SQL Fiddle Demo

How can I get certain rows and the "previous" rows from a table that includes a datetime column?

I have two tables, Alpha & Bravo. Bravo has a column id (integer, primary key) and some other columns that are not relevant for this question. Alpha has columns id (integer, primary key), bravo_id (foreign key to table Bravo), special (a single char, null for most rows but has a value for certain important rows), created (a DATETIME), and some others not relevant to this question.
I would like to get all the special rows from Alpha, plus for each special row I would like to get the "previous" non-special row from Alpha associated with the same row of Beta (that is, I would like to get the Alpha row with the same bravo_id and the most recent created that is older than the created of the special row), and I need to keep the special row & its previous row linked.
Currently I'm doing this with n+1 queries:
SELECT id, bravo_id, created FROM Alpha WHERE special IS NOT NULL
followed by a query like this for each result in the initial query:
SELECT id, created FROM Alpha
WHERE special IS NULL AND bravo_id = BrvN AND created < CrtN ORDER BY created DESC
Obviously that's wildly inefficient. Is there a way I can retrieve this information in a single query that will put each special row & its previous non-special row in a single row of the result?
Our product supports both SQL Server (2008 R2 if relevant) and Oracle (11g if relevant) so a query that works for both of those would be ideal, but a query for only one of the two would be fine.
EDIT: "Created" is perhaps a misnomer. The datetime in that column is when the referenced object was created, and not when it was entered into the database (which could be anywhere from seconds to years later). An ordering of the rows of Alpha based on the created column would have little or no correlation to an ordering based on the id column (which is a traditional incrementing identity/sequence).
SELECT a.Id, a.Bravo_Id, a.Created, d.Id, d.Created FROM #Alpha a
OUTER APPLY
(
SELECT TOP 1 da.id, da.Created
FROM #Alpha da
WHERE da.Special IS NULL
AND da.Bravo_Id = a.Bravo_Id
AND da.Created < a.Created
ORDER BY da.Created DESC
) d
WHERE a.Special IS NOT NULL
You can bind both queries with apply (ms sql server query)
This works in both SQL Server & Oracle:
select A.id, A.bravo_id, A.created, B.id, B.created
from Alpha A
left join Alpha B on A.bravo_id = B.bravo_id
and B.created < A.created
and B.special is null
where A.special is not null
and (B.created is null or
B.created = (select max(S.created)
from Alpha S
where S.special is null
and S.bravo_id = A.bravo_id
and S.created < A.created))
It left joins in all rows with the same foreign key and a lower/older created, then uses the where clause to filter them out (being careful not to exclude A rows that have no older row).
Unfortunately, SQL Server 2008 doesn't support cumulative sum. Here is an approach to solving the problem.
For each row in Alpha, count the number of special rows after alpha. This will assign a grouping. Within the group, then use row_number() to enumerate the values, and choose the first two.
select a.*
from (select a.*,
row_number() over (partition by bravo, grp order by id desc) as seqnum
from (select a.*,
(select count(*)
from alpha a2
where a2.bravo = a.bravo and a2.special = 1 and
a2.id >= a.id
) as grp
from alpha a
) a
) a
where seqnum <= 2;
In Oracle (or SQL Server 2012), you would write this as:
select a.*
from (select a.*,
row_number() over (partition by bravo, grp order by id desc) as seqnum
from (select a.*,
sum(case when special = 1 then 1 else 0 end) over (partition by bravo order by id desc
) as grp
from alpha a
) a
) a
where seqnum <= 2;

Select all but last row in Oracle SQL

I want to pull all rows except the last one in Oracle SQL
My database is like this
Prikey - Auto_increment
common - varchar
miles - int
So I want to sum all rows except the last row ordered by primary key grouped by common. That means for each distinct common, the miles will be summed (except for the last one)
Note: the question was changed after this answer was posted. The first two queries work for the original question. The last query (in the addendum) works for the updated question.
This should do the trick, though it will be a bit slow for larger tables:
SELECT prikey, authnum FROM myTable
WHERE prikey <> (SELECT MAX(prikey) FROM myTable)
ORDER BY prikey
This query is longer but for a large table it should faster. I'll leave it to you to decide:
SELECT * FROM (
SELECT
prikey,
authnum,
ROW_NUMBER() OVER (ORDER BY prikey DESC) AS RowRank
FROM myTable)
WHERE RowRank <> 1
ORDER BY prikey
Addendum There was an update to the question; here's the updated answer.
SELECT
common,
SUM(miles)
FROM (
SELECT
common,
miles,
ROW_NUMBER() OVER (PARTITION BY common ORDER BY prikey DESC) AS RowRank
FROM myTable
)
WHERE RowRank <> 1
GROUP BY common
Looks like I am a little too late but here is my contribution, similar to Ed Gibbs' first solution but instead of calculating the max id for each value in the table and then comparing I get it once using an inline view.
SELECT d1.prikey,
d1.authnum
FROM myTable d1,
(SELECT MAX(prikey) prikey myTable FROM myTable) d2
WHERE d1.prikey != d2.prikey
At least I think this is more efficient if you want to go without the use of Analytics.
query to retrieve all the records in the table except first row and last row
select * from table_name
where primary_id_column not in
(
select top 1 * from table_name order by primary_id_column asc
)
and
primary_id_column not in
(
select top 1 * from table_name order by primary_id_column desc
)