Using SQL to get the last item before n - sql

I am not quite sure how to ask this so I will start off with an example. Let's say I have a table in my database that looks like this:
id | time | event | pnumber
---------------------------
1 | 1200 | foo | 23
2 | 1130 | bar | 52
3 | 1045 | bat | 13
...
n | 0 | baz | 7
Now say I wanted to get the last known pnumber after a certain time. For example at time = 1135, it would have to go back and find the last known time in the table (1130) and then return that pnumber. So for t = 1130, it would return pnumber = 52. But as soon as the t = 1045 it would return pnumber = 13. (Time counts down in this context from 1200 to 0).
Here's what I have so far.
SELECT pnumber FROM table WHERE time = (SELECT time FROM table WHERE time <= '1135' ORDER BY time LIMIT 1)
Is there an easier way to do this? Without using multiple statements. I am using sqlite3

Sure. You can condense that query by doing:
SELECT pnumber FROM table WHERE time >= 1135 ORDER BY time DESC LIMIT 1;
No need to nest the select to get a specific time first, this should work.
EDIT: Got the inequality sign mixed around -- if you're looking for the first record AFTER a specific time, you'll want time >= 1135 and order by time descending with a limit of one.

Why do you need the second query? Could you do something like this:
SELECT TOP 1 pnumber FROM table WHERE time >= '1135' ORDER BY TIME DESC

I'm a bit confused. You are asking that 1135 would return the value for 1130, yet you are using greater than or equal to instead of less than. If your example is what you are looking for, try this.
SELECT PNUMBER FROM TABLE WHERE TIME<=1135 ORDER BY TIME DESC LIMIT 1

Related

SQL Having/Where clause to compare MAX from current/another table

I have a table that has date information and is being copied to another table and trying to perform an incremental load.
date = date format
hour = int
person
date
hour
bob
2023-01-01
1
bill
2023-01-02
2
select * into test.person_copy from
(select * from original.person)
My thought process of performing the incremental load is to check on the max(date) & max(hour) from the original table against the copied table to identify what is the gap between the max values from both tables. However, I'm not entirely sure how to implement the logic as it doesn't seem straight forward with the where clause. Having clause might make more sense, but also doesn't seem correct?
select * into test.person_copy from
(select * from original.person org
Having max(org.date, org.hour) > (select max(copy.date,copy.hour) from test.person_copy copy)
)
The other variation I had in mind was to use HAVING NOT IN
Having max(org.date, org.hour) NOT IN (select max(copy.date,copy.hour) from test.person_copy copy)
Wasn't sure if logic is correct. Hour field will be of importance's, but can live with just the date fields.
Expected output would be that the logic would check for existing max(date) and only insert if it doesn't exist. Example below, 2023-01-03
| person | date | hour |
|--------|------------|------|
| bob | 2023-01-01 | 1 |
| bill | 2023-01-02 | 2 |
| test | 2023-01-03 | 2 |
Don't have access to a RedShift environment but the following query should work:
select *
into test.person_copy
from original.person org
where dateadd(hrs, org.hour, org.date) >
(select max(dateadd(hrs, cpy.hour, cpy.date))
from test.person_copy cpy
)
This assumes that when the previous hour's copy was made entire set of source rows for that date&hour was copied (the new incremental load would have all rows for the dates&hours not already copied). This means that you need additional criteria in the select to make sure that you include only completed date-hours (i.e. make sure that you don't include the rows with hour=10 while the time is still 10:30).

How do I get all entries 5 minutes after a certain condition in SQL?

I am trying to solve the following problem using SQL:
I have a table (example shown below) with action items per user, the timestamp when the action happened and a unique identifier for each entry.
I want to find out what actions each user takes in the 5 minutes after a specific action occurs. For example, I want to see for all users with the action item "sit" what happens in the 5 minutes after that, so to see all entries starting with the "sit" action item.
I hope someone can help!!
Thank you!
table example
I started using ROW_NUMBER and then partition by users and order by time, but after that I dont know how to continue.
Your question is not entirely clear, however, in my understanding, it is easier to use a JOIN
create table log(UserName varchar(20),ActionTime datetime,ActionItem varchar(10),ActionId varchar(26));
insert into log values
('Anna' ,cast('2022-07-30 13:17:22' as datetime),'walk' ,'uid_1')
,('Peter' ,cast('2022-07-30 15:39:46' as datetime),'drive' ,'uid_2')
,('Sarah' ,cast('2022-07-30 09:07:53' as datetime),'stand' ,'uid_3')
,('Kurt' ,cast('2022-07-30 00:56:14' as datetime),'sit' ,'uid_4')
,('Deborah' ,cast('2022-07-30 15:26:02' as datetime),'lie' ,'uid_5')
,('Michelle',cast('2022-07-30 15:26:03' as datetime),'scratch','uid_6')
,('Sven' ,cast('2022-07-30 15:26:04' as datetime),'run' ,'uid_7')
,('Sarah' ,cast('2022-07-30 15:28:06' as datetime),'swim' ,'uid_8')
,('Peter' ,cast('2022-07-30 13:17:22' as datetime),'look' ,'uid_9')
;
select a.ActionId,a.UserName,a.ActionItem,a.ActionTime
,b.ActionTime,b.UserName,b.ActionItem,b.ActionId
from log a left join log b
on b.ActionId<>a.ActionId
and b.ActionTime>=a.ActionTime
and datediff(mi,a.ActionTime,b.ActionTime)<5
I guess this problem can not be solved with a single query. But you can use a series of queries.
In answer to your question I will use MySQL dialect of SQL. I believe it doesn't matter.
On first step let's assume that we are only interested in the last action "sit". In this case we can do such query:
SELECT * FROM user_actions WHERE ACTION_ITEM = "sit" ORDER BY TIMESTAMP DESC LIMIT 1;
So the result is
+------+---------------------+-------------+-------------------+
| USER | TIMESTAMP | ACTION_ITEM | UNIQUE_IDENTIFIER |
+------+---------------------+-------------+-------------------+
| Kurt | 2022-07-30 00:56:14 | sit | 4 |
+------+---------------------+-------------+-------------------+
Then save timestamp value in variable:
SELECT TIMESTAMP INTO #reason_ts FROM user_actions WHERE ACTION_ITEM = "sit" ORDER BY TIMESTAMP DESC LIMIT 1;
And now we need to get further actions in next 5 minutes (actually I took 12 hours because 5 minutes is not enough for your example). Let's do this:
SELECT csq.* FROM user_actions AS csq WHERE TIMESTAMP BETWEEN #reason_ts AND ADDTIME(#reason_ts, '12:00:00');
The result is:
+-------+---------------------+-------------+-------------------+
| USER | TIMESTAMP | ACTION_ITEM | UNIQUE_IDENTIFIER |
+-------+---------------------+-------------+-------------------+
| Sarah | 2022-07-30 09:07:53 | stand | 3 |
| Kurt | 2022-07-30 00:56:14 | sit | 4 |
+-------+---------------------+-------------+-------------------+
If you need all further action modify query:
SELECT csq.* FROM user_actions AS csq WHERE TIMESTAMP >= #reason_ts;
If you need not only last action "sit" it will be more difficult. I think you need to write some kind of script or sql function. But still it is doable.

Why does adding a SUM(column) throw a group by error [SQL]

I found some similar questions, but none of the solutions would work, nor did they explain what was causing the issue.
I have a working query
SELECT pages.pageString pageName, timeSpent
FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent
FROM `pageViews`
WHERE `time_spent` > 0
GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id
ORDER BY timeSpent DESC
LIMIT 5
This returns results that look like
+------------------------------+-----------+
| pageName | timeSpent |
+------------------------------+-----------+
| page 1 | 394292 |
| page 2 | 66990 |
| page 3 | 53896 |
| page 4 | 37796 |
| page 5 | 14982 |
+------------------------------+-----------+
I'd like to add a column containing the percentage of timeSpent relative to the other pages, to start I added a SUM(timeSpent) to my query but that throws an error
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'pages.pageString'
Im not sure why this column is effected by adding this new column to the select statement.
Sadly any solution involving changing sql settings won't work due to company policy.
I appreciate any advice
UPDATE
The failing sql statement is
SELECT pages.pageString pageName, timeSpent FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent FROM
`pageViews` WHERE `time_spent` > 0 GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id ORDER BY timeSpent DESC LIMIT 5
As per the first answer I added a groupBy which solves the error
SELECT pages.pageString pageName, timeSpent, SUM(timeSpent) FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent FROM `pageViews` WHERE `time_spent` > 0 GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id GROUP BY pageName ORDER BY timeSpent DESC LIMIT 5
This however does not give the proper output
+------------------------------+-----------+----------------+
| pageName | timeSpent | SUM(timeSpent) |
+------------------------------+-----------+----------------+
| page 1. | 390210 | 390210 |
| page 2 | 66972 | 66972 |
| page3 | 52332 | 52332 |
| page4 | 25454 | 25454 |
| page5 | 13552 | 13552 |
+------------------------------+-----------+----------------+
Ideally this SUM(timeSpent) would be 390210+ 66972 + 52332 + 25454 + 13552 so that I may do timeSpent / SUM(timeSpent)
You did not say where you tried to put the sum(timeSpent) but I believe one can try to reconstruct with the error message:
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'pages.pageString'
It says what the problem is. You added sum(timeSpent) to the projection, but the SQL statement does not have a GROUP BY, in particular it mentions the first item which should be aggregated pages.pageString.
It would mention the other ones too, once you fix this one.
On the other hand, please make sure you post exactly the failing SQL statement instead of trying to describe how to get the error you have. It's better for us who try to help.
Update:
You have two tables/views pages and pageViews. The first one is used to get the page name. I would just focus on the time calculation to make things easier. Figuring out the name afterwards is simple, because it is directly connected to the page_id.
The first information you want is the sum of all times spent so that you can calculate the ratio to this sum.
This is simply an aggregation where you sum the times over all pages.
The second information you want is the sum of the times per page_id. You already know how to do that. You group by the page_id while aggregating the sums of each.
Try to put those two together now. You have the first statement of which the result shall be applied to each row of the second statement so that you get the table form page_id, time_spent_page, time_spent_all.
When you have step 3 then it is easy to add the page_name now, since you have the page_id which is required for a simple join.
I tried no to give away the solution. Maybe you like to try again following the steps above. If you have difficulties, simply leave a comment (maybe showing how far you got).
It might look complex in the beginning, but once you have done that successfully I hope you'll see that it can be simple.
Adding a column containing the percentage of timeSpent relative to the sum of all pages
SELECT pages.pageString pageName, timeSpent,
, timeSpent / sum(timeSpent) over() * 100 p
FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent
FROM `pageViews`
WHERE `time_spent` > 0
GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id
ORDER BY timeSpent DESC
LIMIT 5

Case statement logic and substring

Say I have the following data:
Passes
ID | Pass_code
-----------------
100 | 2xBronze
101 | 1xGold
102 | 1xSilver
103 | 2xSteel
Passengers
ID | Passengers
-----------------
100 | 2
101 | 5
102 | 1
103 | 3
I want to count then create a ticket in the output of:
ID 100 | 2 pass (bronze)
ID 101 | 5 pass (because it is gold, we count all passengers)
ID 102 | 1 pass (silver)
ID 103 | 2 pass (steel)
I was thinking something like the code below however, I am unsure how to finish my case statement. I want to substring pass_code so that we get show pass numbers e.g '2xBronze' should give me 2. Then for ID 103, we have 2 passes and 3 customers so we should output 2.
Also, is there a way to firstly find '2xbronze' if the pass_code contained lots of other things such as '101001, 1xbronze, FirstClass' - this may change so i don't want to substring, could we search for '2xbronze' and then pull out the 2??
SELECT
CASE
WHEN Passes.pass_code like '%gold%' THEN Passengers.passengers
WHEN Passes.pass_code like '%steel%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%bronze%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%silver%' THEN SUBSTRING(passes.pass_code, 1,1)
else 0 end as no,
Passes.ID,
Passes.Pass_code,
Passengers.Passengers
FROM Passes
JOIN Passengers ON Passes.ID = Passengers.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=db698e8562546ae7658270e0ec26ca54
So assuming you are indeed using Oracle (as your DB fiddle implies).
You can do some string magic with finding position of a splitter character (in your case the x), then substringing based on that. Obviously this has it's problems, and x is a bad character seperator as well.. but based on your current set.
WITH PASSCODESPLIT AS
(
SELECT PASSES.ID,
TO_Number(SUBSTR(PASSES.PASS_CODE, 0, (INSTR(PASSES.PASS_CODE, 'x')) - 1)) AS NrOfPasses,
SUBSTR(PASSES.PASS_CODE, (INSTR(PASSES.PASS_CODE, 'x')) + 1) AS PassType
FROM Passes
)
SELECT
PASSCODESPLIT.ID,
CASE
WHEN PASSCODESPLIT.PassType = 'gold' THEN Passengers.Passengers
ELSE PASSCODESPLIT.NrOfPasses
END AS NrOfPasses,
PASSCODESPLIT.PassType,
Passengers.Passengers
FROM PASSCODESPLIT
INNER JOIN Passengers ON PASSCODESPLIT.ID = Passengers.ID
ORDER BY PASSCODESPLIT.ID ASC
Gives the result of:
ID NROFPASSES PASSTYPE PASSENGERS
100 2 bronze 2
101 5 gold 5
102 1 silver 1
103 2 steel 3
As can also be seen in this fiddle
But I would strongly advise you to fix your table design. Having multiple attributes in the same column leads to troubles like these. And the more variables/variations you start storing, the more 'magic' you need to keep doing.
In this particular example i see no reason why you don't simply have the 3 columns in Passes, also giving you the opportunity to add new columns going forward. I.e. to keep track of First class.
You can extract the numbers using regexp_substr(). So I think this does what you want:
SELECT (CASE WHEN p.pass_code LIKE '%gold%'
THEN TO_NUMBER(REGEXP_SUBSTR(p.pass_code, '^[0-9]+'))
ELSE pp.passengers
END) as num,
p.ID, p.Pass_code, pp.Passengers
FROM Passes p JOIN
Passengers pp
ON p.ID = pp.ID;
Here is a db<>fiddle.
This converts the leading digits in the code to a number. Also note the use of table aliases to simplify the query.

SQL: SUM of MAX values WHERE date1 <= date2 returns "wrong" results

Hi stackoverflow users
I'm having a bit of a problem trying to combine SUM, MAX and WHERE in one query and after an intense Google search (my search engine skills usually don't fail me) you are my last hope to understand and fix the following issue.
My goal is to count people in a certain period of time and because a person can visit more than once in said period, I'm using MAX. Due to the fact that I'm defining people as male (m) or female (f) using a string (for statistic purposes), CHAR_LENGTH returns the numbers I'm in need of.
SELECT SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id")
So far, so good. But now, as stated before, I'd like to only count the guests which visited in a certain time interval (for statistic purposes as well).
SELECT "statistic"."id", SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id"),
"statistic", "guests"
WHERE ( "guests"."arrival" <= "statistic"."from" AND "guests"."departure" >= "statistic"."to")
GROUP BY "statistic"."id"
This query returns the following, x = desired result:
x * (x+1)
So if the result should be 3, it's 12. If it should be 5, it's 30 etc.
I probably could solve this algebraic but I'd rather understand what I'm doing wrong and learn from it.
Thanks in advance and I'm certainly going to answer all further questions.
PS: I'm using LibreOffice Base.
EDIT: An example
guests table:
ID | arrival | departure | gender |
10 | 1.1.14 | 10.1.14 | mf |
10 | 15.1.14 | 17.1.14 | m |
11 | 5.1.14 | 6.1.14 | m |
12 | 10.2.14 | 24.2.14 | f |
13 | 27.2.14 | 28.2.14 | mmmmmf |
statistic table:
ID | from | to | name |
1 | 1.1.14 | 31.1.14 |January | expected result: 3
2 | 1.2.14 | 28.2.14 |February| expected result: 7
MAX(...) is the wrong function: You want COUNT(DISTINCT ...).
Add proper join syntax, simplify (and remove unnecessary quotes) and this should work:
SELECT s.id, COUNT(DISTINCT g.id) AS People
FROM statistic s
LEFT JOIN guests g ON g.arrival <= s."from" AND g.departure >= s."too"
GROUP BY s.id
Note: Using LEFT join means you'll get a result of zero for statistics ids that have no guests. If you would rather no row at all, remove the LEFT keyword.
You have a very strange data structure. In any case, I think you want:
SELECT s.id, sum(numpersons) AS People
FROM (select g.id, max(char_length(g.gender)) as numpersons
from guests g join
statistic s
on g.arrival <= s."from" AND g.departure >= s."too"
group by g.id
) g join
GROUP BY s.id;
Thanks for all your inputs. I wasn't familiar with JOIN but it was necessary to solve my problem.
Since my databank is designed in german, I made quite the big mistake while translating it and I'm sorry if this caused confusion.
Selecting guests.id and later on grouping by guests.id wouldn't make any sense since the id is unique. What I actually wanted to do is select and group the guests.adr_id which links a visiting guest to an adress databank.
The correct solution to my problem is the following code:
SELECT statname, SUM (numpers) FROM (
SELECT statistic.name AS statname, guests.adr_id, MAX( CHAR_LENGTH( guests.gender ) ) AS numpers
FROM guests
JOIN statistics ON (guests.arrival <= statistics.too AND guests.departure >= statistics.from )
GROUP BY guests.adr_id, statistic.name )
GROUP BY statname
I also noted that my database structure is a mess but I created it learning by doing and haven't found any time to rewrite it yet. Next time posting, I'll try better.