Keep update a table created from another table - sql

In CrateDB, after creating a table from data of another table, is it possible to keep the new table updated with the insertion of new lines from the original table?
Query to create the new_table from enter code here:
CREATE TABLE "schema"."new_table" AS
SELECT
state,
time,
time - LAG(time, -1, time) OVER (ORDER BY time DESC) AS duration
FROM "schema"."original_table"
ORDER BY timeDESC;
Query I run periodically to keep it the new_table updated, and which I would like to avoid using:
INSERT INTO "schema"."new_table"
SELECT
process,
time,
time- LAG(time, -1, time) OVER (ORDER BY time DESC) AS duration FROM "mtopcua_car"."original_table" newDataTable
WHERE NOT EXISTS (SELECT time FROM "schema"."new_table" WHERE time = newDataTable.time);
Thanks.

Depending on how expensive the query is, a view might just do the job:
CREATE VIEW "schema"."new_view" AS
SELECT
state,
time,
time - LAG(time, -1, time) OVER (ORDER BY time DESC) AS duration
FROM "schema"."original_table"
ORDER BY time DESC;
CrateDB documentation: https://crate.io/docs/crate/reference/en/5.1/general/ddl/views.html

Related

SQL specific column update time in system-versioned temporal table

Is there any easy way to get information about the last update date of a selected column using system-versioned temporal table?
I have a table with columns A, B, C, each of them is updated randomly and separately, but I am interested in whether it is able to easily extract the date of the last update in column B.
I added a photo for the sake of simplicity, I need to extract information when there was the last change in the value in column A (in the photo I marked the last change in this column)
Use the lag() window function to look for changes in B, summarize that set to find max(StartTime), and use that in a Where filter to select your latest record.
Select * From history.table
Where StartTime=(Select max(StartTime) from
( Select *,
B<>lag(B) Over (Order By StartTime) as B_Changed
From history.table
)
Where B_Changed
)
I was able to find a simple solution that solves each case, below is the solution
SELECT TOP (1) * FROM (
SELECT
ID,
A,
LAG(A) OVER(PARTITION BY ID ORDER BY StartTime) AS PreviousA,
UserID,
StartTime
FROM dbo.Table FOR SYSTEM_TIME ALL
)t
WHERE t.A <> t.PreviousA
ORDER BY t.StartTime desc
The query returns the last modification in the column, if there was no modification in the table or only another column was modified, it correctly returns an empty row informing that there were no changes. Maybe someone will need it in the future. Thank you for your help.

Creating a partitioned table from query in Big Query does not yield same as without partitioning

When creating a table let's say "orders" with partitioning in the following way my result gets truncated in comparison to if I create it without partitioning. (Commenting and uncommenting rows five and 6).
I suspect that it might have something to do with the BQ limits (found here) but I can't figure out what. The ts is a timestamp field and order_id is a UUID string.
i.e. The count distinct on the last row will yield very different results. When partitioned it will return far less order_ids than without partitioning.
DROP TABLE IF EXISTS
`project.dataset.orders`;
CREATE OR REPLACE TABLE
`project.dataset.orders`
-- PARTITION BY
-- DATE(ts)
AS
SELECT
ts,
order_id,
SUM(order_value) AS order_value
FROM
`project.dataset.raw_orders`
GROUP BY
1, 2;
SELECT COUNT(DISTINCT order_id) FROM `project.dataset.orders`;
(This is not a valid 'answer', I just need a better place to write SQL than the comment box, I don't mind if moderator convert this answer into a comment AFTER it serves its purpose)
What is the number you'd get if you do query below, and which one does it align with (partitioned or non-partitioned)?
SELECT COUNT(DISTINCT order_id) FROM (
SELECT
ts,
order_id,
SUM(order_value) AS order_value
FROM
`project.dataset.raw_orders`
GROUP BY
1, 2
) t;
It turns out that there's a 60 day partition expiration!
https://cloud.google.com/bigquery/docs/managing-partitioned-tables#partition-expiration
So by updating the partition expiration I could get the full range.

Hive query results to new table

I have a very simple query below, which counts the number of transactions that happen each hour on our platform.
The numbers are in the billions so the query takes some time.
As such, I'd like to be able to run the query hourly, appending the results to another table - so we can have less latency & less load on the cluster.
I have access to Hue to do this - I am using Hive. is the below the correct way to do this?
INSERT INTO table udsuser.healthcheck
SELECT dt, hour, count(*)as transactions, 'dpi_datasum' as feed, 'FULL' as environment
FROM dpi_datasum
WHERE hour=hour(from_unixtime(unix_timestamp()))-2
Group by dt, hour
INSERT INTO table udsuser.healthcheck
SELECT dt, hour, count(*)as transactions,'dpi_datasum' as feed,'FULL' as
environment
FROM dpi_datasum
WHERE hour=hour(from_unixtime(unix_timestamp()))-2
Group by dt, hour
or
INSERT overwrite table udsuser.healthcheck
SELECT dt, hour, count(*)as transactions,'dpi_datasum' as feed,'FULL' as
environment
FROM dpi_datasum
WHERE hour=hour(from_unixtime(unix_timestamp()))-2
Group by dt, hour

SQL Eliminate Duplicates with NO ID

I have a table with the following Columns...
Node, Date_Time, Market, Price
I would like to delete all but 1 record for each Node, Date time.
SELECT Node, Date_Time, MAX(Price)
FROM Hourly_Data
Group BY Node, Date_Time
That gets the results I would like to see but cant figure out how to remove the other records.
Note - There is no ID for this table
Here are steps that are rather workaround than a simple one-command which will work in any relational database:
Create new table that looks just like the one you already have
Insert the data computed by your group-by query to newly created table
Drop the old table
Rename new table to the name the old one used to have
Just remember that locking takes place and you need to have some maintenance time to perform this action.
There are simpler ways to achieve this, but they are DBMS specific.
here is an easy sql-server method that creates a Row Number within a cte and deletes from it. I believe this method also works for most RDBMS that support window functions and Common Table Expressions.
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY Node, Date_Time ORDER BY Price DESC)
FROM
Hourly_Data
)
DELETE
FROM
cte
WHERE
RowNum > 1

Query to get the duration and details from a table

I have a scenario and not quite sure how to query it. As a sample, I have following table structure and want to get the history of the action for bus:
ID-----TIME---------BUSID----OPID----MOVING----STOPPED----PARKED----COUNT
1------10:10:10-----101------1101-----1---------0----------0---------15
2------10:10:11-----102------1102-----0---------1----------0---------5
3------10:11:10-----101------1101-----1---------0----------0---------15
4------10:12:10-----101------1101-----0---------1----------0---------15
5------10:13:10-----101------1101-----1---------0----------0---------19
6------10:14:10-----101------1101-----1---------0----------0---------19
7------10:15:10-----101------1101-----0---------1----------0---------19
8------10:16:10-----101------1101-----0---------0----------1---------0
9------10:17:10-----101------1101-----0---------0----------1---------0
I want to write a query to get the status of a bus like:
BUSID----OPID----STATUS-----TIME---------DURATION---COUNT
101------1101----MOVING-----10:10:10-----2-----------15
101------1101----STOPPED----10:12:10-----1-----------15
101------1101----MOVING-----10:13:10-----2-----------19
101------1101----STOPPED----10:15:10-----1-----------19
101------1101----PARKED-----10:16:10-----2-----------0
I am using SQL Server 2008.
Thanks for your help.
You can use Common Table Expressions to calculate the duration between the different rows.
WITH cte_log AS
(
SELECT
Row_Number()
OVER
(
ORDER BY time DESC
)
AS
id, time, busid, opid, moving, stopped, parked, count
FROM
log_table
WHERE
busid = 101
)
SELECT
current_rows.busid,
current_rows.opid,
current_rows.time,
DATEDIFF(second, current_rows.time, previous_rows.time) AS duration
current_rows.count
FROM
cte_log_position AS current_rows
LEFT OUTER JOIN
log_table AS previous_rows ON ((current_rows.row_id + 1) = previous_rows.row_id)
WHERE
current_rows.busid = 101
ORDER BY
current_rows.time DESC;
The WITH statement creates a temporary result set that is defined within the execution scope of this query. We are using it to fetch the previous records of each row and to calculate the time difference between the the current and the previous record.
This example was not tested, and it may not work perfectly, but I hope it gets you going in the correct direction. Feel free to leave feedback.
You may also want to check the following external links on how to use Common Table Expressions:
SQL Select Next Row and SQL Select Previous Row with Current Row using T-SQL CTE
Calculate Difference between current and previous rows... CTE and Row_Number() rocks!
4 Guys From Rolla: Common Table Expressions (CTE) in SQL Server 2005
MSDN: Using Common Table Expressions
personally i would denormalize the data so you have start_time and end_time in the one row. this will make the query much more efficient.
I don't have access to SQL Server at the moment, so there may be syntax errors in the following:
SELECT
BUSID,
OPID,
IF (MOVING = 1) 'MOVING' ELSE IF (STOPPED = 1) 'STOPPED' ELSE 'PARKED' AS STATUS
TIME,
COUNT
FROM BUS_DATA_TABLE
GROUP BY BUSID
ORDER BY TIME
You'll note that this does not include duration. Until you order your data, you don't know which is the previous entry. Once the data is ordered you can calculate the duration as the difference between the times in consecutive records. You could do this by SELECTing into a new table and then running a second query.
Grouping by BUSID, should give you your report for all buses.
Making certain assumptions about column type, etc:
SELECT
BUSID,
OPID,
STATUS,
TIME,
DURATION,
COUNT
FROM
TABLENAME
WHERE
BUSID = 1O1
ORDER BY
TIME
;