create sequential number column index into table with data - sql

I wanted to do something like this post, so I tried:
SELECT
ROW_NUMBER() OVER(ORDER BY t.[Data Saida] ) AS id,
t.codigo, t.[Data Saida], t.Entidade, t.DataEnt,
t.[GTEntT Nº], t.Estado, t.[GTSaida Nº], t.[Observações1],
t.Eequisitante, t.Certificado, T.Resultado, T.Seleccionar, t.[Tipo de Intervenção]
FROM
[Movimento ferramentas] t;
However I ended up getting something like
Syntax error (missing operator) in query expression ROW_NUMBER() OVER(ORDER BY t.[Data Saida] )
Is it because ROW_NUMBER() OVER() is SQL Server only or am I doing something wrong?
I'm working with MS Access 2010.
Here's a row from that table:

To add an AutoNumber field to an existing table, simply open the table in Design View, type in the Field Name and select "AutoNumber" from the drop-down list for the Data Type:
Access will populate the new field with AutoNumber values for any existing records in the table.
Edit re: influencing the order in which AutoNumber values are applied to existing records
As with many other database operations, there is essentially no guarantee that Access will use any particular order when assigning the AutoNumber values to existing records. However, if we look at a couple of examples we can see how Access will likely do it.
Consider the following sample table named [Events]. The rows were entered in random order and there is no primary key:
EventDate Event
---------- --------------
2012-06-01 it's June
2012-10-01 it's October
2012-09-01 it's September
2012-12-01 it's December
2012-11-01 it's November
2012-07-01 it's July
2012-04-01 it's April
2012-08-01 it's August
2012-02-01 it's February
2012-01-01 it's January
2012-03-01 it's March
2012-05-01 it's May
Now we'll simply add an AutoNumber field named [ID] using the procedure above. After that has been done
SELECT * FROM Events ORDER BY ID
returns
EventDate Event ID
---------- -------------- --
2012-06-01 it's June 1
2012-10-01 it's October 2
2012-09-01 it's September 3
2012-12-01 it's December 4
2012-11-01 it's November 5
2012-07-01 it's July 6
2012-04-01 it's April 7
2012-08-01 it's August 8
2012-02-01 it's February 9
2012-01-01 it's January 10
2012-03-01 it's March 11
2012-05-01 it's May 12
Now let's revert back to the old copy of the table and see if the existence of a primary key makes a difference. We'll make [Event Date] the primary key, save the changes to the table, and then add the [ID] AutoNumber field. After that is done, the select statement above gives us
EventDate Event ID
---------- -------------- --
2012-06-01 it's June 1
2012-10-01 it's October 2
2012-09-01 it's September 3
2012-12-01 it's December 4
2012-11-01 it's November 5
2012-07-01 it's July 6
2012-04-01 it's April 7
2012-08-01 it's August 8
2012-02-01 it's February 9
2012-01-01 it's January 10
2012-03-01 it's March 11
2012-05-01 it's May 12
Hmmm, same thing. So it looks like the AutoNumber values get assigned to the table in natural order (the order in which the records were added to the table) even if there is a primary key.
Okay, if that's the case then let's use a make-table query to create a new copy of the table in a different order
SELECT Events.EventDate, Events.Event
INTO Events2
FROM Events
ORDER BY Events.EventDate;
Now let's add the [ID] AutoNumber field to that new table and see what we get:
SELECT * FROM Events2 ORDER BY ID
returns
EventDate Event ID
---------- -------------- --
2012-01-01 it's January 1
2012-02-01 it's February 2
2012-03-01 it's March 3
2012-04-01 it's April 4
2012-05-01 it's May 5
2012-06-01 it's June 6
2012-07-01 it's July 7
2012-08-01 it's August 8
2012-09-01 it's September 9
2012-10-01 it's October 10
2012-11-01 it's November 11
2012-12-01 it's December 12
If that is the order we want then we can just delete the [Events] table and rename [Events2] to [Events].

Related

How do you get the last entry for each month in SQL?

I am looking to filter very large tables to the latest entry per user per month. I'm not sure if I found the best way to do this. I know I "should" trust the SQL engine (snowflake) but there is a part of me that does not like the join on three columns.
Note that this is a very common operation on many big tables, and I want to use it in DBT views which means it will get run all the time.
To illustrate, my data is of this form:
mytable
userId
loginDate
year
month
value
1
2021-01-04
2021
1
41.1
1
2021-01-06
2021
1
411.1
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-06
2021
2
32
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
And I'm trying to use SQL to get the last value (by loginDate) for each month.
I'm currently doing a groupby & a join as follows:
WITH latest_entry_by_month AS (
SELECT "userId", "year", "month", max("loginDate") AS "loginDate"
FROM mytable
)
SELECT * FROM mytable NATURAL JOIN latest_entry_by_month
The above results in my desired output:
userId
loginDate
year
month
value
1
2021-01-25
2021
1
251.1
2
2021-01-05
2021
1
4369
2
2021-02-14
2021
2
731
3
2021-01-20
2021
1
258
3
2021-02-19
2021
2
4251
3
2021-03-15
2021
3
171
But I'm not sure if it's optimal.
Any guidance on how to do this faster? Note that I am not materializing the underlying data, so it is effectively un-clustered (I'm getting it from a vendor via the Snowflake marketplace).
Using QUALIFY and windowed function(ROW_NUMBER):
SELECT *
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY userId, year, month
ORDER BY loginDate DESC) = 1

Aggregate multiple invoice numbers and invoice amount rows into one row

I have the following:
budget_id
invoice_number
April
June
August
004
11
NULL
690
NULL
004
12
1820
NULL
NULL
004
13
NULL
NULL
890
What I want to do is do the following:
budget_id
invoice_number
April
June
August
004
11, 12, 13
1820
690
890
However, when I try to do the following:
SELECT budget_id,
STRING_AGG(invoice_number, ',') AS invoice number,
April,
June,
August
FROM invoice_table
GROUP BY budget_id,
April,
June,
August
Nothing happens. The table stays exactly the same. The code above works if I'm able to comment out the months as it aggregates the invoices numbers without the months. But once I include the months, I still get 3 separate rows. I need the invoice amounts to be included with the months. Is it possible to get the invoice numbers aggregated as well as the invoice amounts in one row? I'm using Big Query if that helps.
Use below query,
SELECT budget_id,
STRING_AGG(invoice_number, ',') invoice_number,
SUM(April) April,
SUM(June) June,
SUM(August) August
FROM invoice_table
GROUP BY 1;

T-SQL checking if a date in 1 table is between 2 dates in another table

I have a main table CleanData and a reference table TIME_TABLE_VERSION ttv. In CleanData there is no primary key but each row has a date column called CALENDAR_DATE1.
I want to return all rows from CleanData where CleanData.CALENDAR_DATE1 is between ttv.ACTIVATION_DATE and ttv.DEACTIVATION_DATE.
What's tricky is the timing of when ttv table gets updated. If you look below you'll see that the last record in ttv has a deactivation date of July 16, 2022. However, this table will get updated in the future by truncating the previous row deactivation date and a new row gets inserted. So for example the next time ttv gets updated will be around June 30th and there will be a new row for TIME_TABLE_VERSION_ID = 191 with an activation date of July 3, 2022; the deactivation date for TIME_TABLE_VERSION_ID = 190 will get truncated from July 16, 2022 to July 2, 2022 upon update. Note that this ttv table update will happen in advance, when CleanData.CALENDAR_DATE1 is still less than the ttv.ACTIVATION_DATE in TIME_TABLE_VERSION_ID = 191. If I simply select MAX TIME_TABLE_VERSION_ID then there will be a few days of missing data returned from CleanData between June 30th and July 3rd.
I'm trying to write a query that will factor in when CleanData.CALENDAR_DATE1 is less than the most recent ttv.ACTIVATION_DATE, and give all the rows in CleanData with a Calendar_DATE1 between TIME_TABLE_VERSION_ID -1, until CleanData.CALENDAR_DATE1 is >= the ttv.ACTIVATION_DATE in TIME_TABLE_VERSION_ID + 0 (most recent).
Any ideas how to fix my query?
SELECT
CleanData.*
FROM
TIME_TABLE_VERSION AS ttv
INNER JOIN
CleanData ON CleanData.CALENDAR_DATE1 BETWEEN ttv.ACTIVATION_DATE AND ttv.DEACTIVATION_DATE
AND (CASE
WHEN (CleanData.CALENDAR_DATE1 < (SELECT ttv1.ACTIVATION_DATE FROM TIME_TABLE_VERSION ttv1 WHERE ttv.TIME_TABLE_VERSION_ID = ttv1.TIME_TABLE_VERSION_ID))
THEN (ttv.TIME_TABLE_VERSION_ID = (SELECT MAX (ttv1.TIME_TABLE_VERSION_ID)-1 FROM TIME_TABLE_VERSION ttv1))
ELSE (ttv.TIME_TABLE_VERSION_ID = (SELECT MAX(ttv1.TIME_TABLE_VERSION_ID) FROM TIME_TABLE_VERSION ttv1)) END)
ORDER BY
CleanData.CALENDAR_DATE1
Here's what table ttv looks like:
TIME_TABLE_VERSION_ID
TIME_TABLE_VERSION_NAME
ACTIVATION_DATE
DEACTIVATION_DATE
184
Feb22_01
2022-02-06 00:00:00.000
2022-02-26 23:59:59.000
185
Feb22_02
2022-02-27 00:00:00.000
2022-03-19 23:59:59.000
186
Feb22_03
2022-03-20 00:00:00.000
2022-04-09 23:59:59.000
187
Feb22_04
2022-04-10 00:00:00.000
2022-04-23 23:59:59.000
188
Apr22_01
2022-04-24 00:00:00.000
2022-05-14 23:59:59.000
189
Apr22_02
2022-05-15 00:00:00.000
2022-05-28 23:59:59.000
190
Apr22_03
2022-05-29 00:00:00.000
2022-07-16 23:59:59.000
Note there is no TIME_TABLE_VERSION_ID or TIME_TABLE_VERSION_NAME in CleanData to join to. I only have the CALENDAR_DATE1 in CleanData and the ACTIVATION_DATE and DEACTIVATION_DATE in ttv.
Note also that I have no ability to change the structure of either table, I have to work with what's there in both.
Thanks so much for any help you can offer!

How to manage complex sorting in SQL?

I need to have date sorting with the partial dates. I have a table with the following columns.
Day Month Year
-- ---- -----
NULL 03 1990
26 10 1856
03 07 Null
31 NULL 2018
NULL NULL NULL
I have a grid in which One of the column is Date where I am combining the above three columns and displays the dates.
Now I want sorting on this date column in the grid. The sort order of the dates should be like following :
[blank date]
22 [day]
March
April 12
May
July 29
August
September
September 14
October
1948
October 1948
October 1 1948
July 1976
1977
July 1977
July 23 1977
December 1981
December 29 1981
I have tried various ways to achieve this. But I am not able to get the desired result. Following are some of the ways I have applied.
I have tried sorting by creating the stored procedure in which I am creating the whole date by combining 3 columns and converting them in standard date formats and comparing the values. I have also tried by creating the computed property in the model and sorting them accordingly.
How can I do this in SQL?
I think you could do:
order by coalesce(year, '0000'), coalesce(month, '00'), coalesce(day, '')
You can be more explicit, but this puts the NULL values before the other values in the column.
Note: This uses the SQL standard operator for string concatenation. Not all databases support this, so you might need to tweak the code for your database.

Using SQL Server 2012 LAG

I am trying to write a query using SQL Server 2012 LAG function to retrieve data from my [Order] table where the datetime difference between a row and the previous row is less than equal to 2 minutes.
The result I'm expecting is
1234 April, 28 2012 09:00:00
1234 April, 28 2012 09:01:00
1234 April, 28 2012 09:03:00
5678 April, 28 2012 09:40:00
5678 April, 28 2012 09:42:00
5678 April, 28 2012 09:44:00
but I'm seeing
1234 April, 28 2012 09:00:00
1234 April, 28 2012 09:01:00
1234 April, 28 2012 09:03:00
5678 April, 28 2012 09:40:00
5678 April, 28 2012 09:42:00
5678 April, 28 2012 09:44:00
91011 April, 28 2012 10:00:00
The last row should not be returned. Here is what I have tried: SQL Fiddle
Any one with ideas?
Okay first of all I added a row to show you where someone else's answer doesn't work but they deleted it now.
Now for the logic in my query. You said you want each row that is within two minutes of another row. That means you have to look not only backwards, but also forwards with LEAD(). In your query, you returned when previous time was NULL so it simply returned the first value of each OrderNumber regardless if it was right or wrong. By chance, the first values of each of your OrderNumbers needed to be returned until you get to the last OrderNumber where it broke. My query corrects that and should work for all your data.
CREATE TABLE [Order]
(
OrderNumber VARCHAR(20) NOT NULL
, OrderDateTime DATETIME NOT NULL
);
INSERT [Order] (OrderNumber, OrderDateTime)
VALUES
('1234', '2012-04-28 09:00:00'),
('1234', '2012-04-28 09:01:00'),
('1234', '2012-04-28 09:03:00'),
('5678', '2012-04-28 09:40:00'),
('5678', '2012-04-28 09:42:00'),
('5678', '2012-04-28 09:44:00'),
('91011', '2012-04-28 10:00:00'),
('91011', '2012-04-28 10:25:00'),
('91011', '2012-04-28 10:27:00');
with Ordered as (
select
OrderNumber,
OrderDateTime,
LAG(OrderDateTime,1) over (
partition by OrderNumber
order by OrderDateTime
) as prev_time,
LEAD(OrderDateTime,1) over (
partition by OrderNumber
order by OrderDateTime
) as next_time
from [Order]
)
SELECT OrderNumber,
OrderDateTime
FROM Ordered
WHERE DATEDIFF(MINUTE,OrderDateTime,next_time) <= 2 --this says if the next value is less than or equal to two minutes away return it
OR DATEDIFF(MINUTE,prev_time,OrderDateTime) <= 2 --this says if the prev value is less than or equal to 2 minutes away return it
Results(Remember I added a row):
OrderNumber OrderDateTime
-------------------- -----------------------
1234 2012-04-28 09:00:00.000
1234 2012-04-28 09:01:00.000
1234 2012-04-28 09:03:00.000
5678 2012-04-28 09:40:00.000
5678 2012-04-28 09:42:00.000
5678 2012-04-28 09:44:00.000
91011 2012-04-28 10:25:00.000
91011 2012-04-28 10:27:00.000