Select into with max() - sql

I have a basic query I use to determine the max value of a column in a table:
select A.revenue_code_id, max(A.revenue_code_version) from rev_code_lookup A
group by A.revenue_code_id
This results in ~580 rows (the entire table has over 2400 rows).
This works just fine for my query results but what I don't know is how to insert the 580 rows into a new able based on the max value. I realize this isn't the right code but what I am thinking of would look something like this:
select * into new_table from rev_code_lookup where max(revenue_code_version)

You can use the row_number() function to get the data you want. Combine with the other answer to insert the results into a table (I've made up a couple of extra columns as an example):
Select
x.revenue_code_id,
x.revenue_code_version,
x.update_timestamp,
x.updated_by
From (
Select
revenue_code_id,
revenue_code_version,
update_timestamp,
updated_by,
row_number() over (partition by revenue_code_id Order By revenue_code_version Desc) as rn
From
revenue_code_lookup
) x
Where
x.rn = 1
Example Fiddle

The insert in another table is always the same way, no matter the complexity of your select:
insert into table
[unbeliavablycomplicatedselecthere]
So in your case:
insert into new_table
select A.revenue_code_id, max(A.revenue_code_version) from rev_code_lookup A
group by A.revenue_code_id
Similarly, if you need to create a brand new table, do this first:
CREATE TABLE new_table
AS
select A.revenue_code_id, max(A.revenue_code_version) from rev_code_lookup A
group by A.revenue_code_id
This will create the corresponding table schema and then you can execute the previous query to insert the data.

Related

Handling duplicates in BigQuery (Nested Table)

I think this is a very simple question but I would like some guidance: I didn't want to have to drop a table to send a new table with the deduplicated records, like using DELETE FROM based on the query below using BigQuery, is it possible? PS: This is a nested table!
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id, date_register) row_number
FROM
dataset.table)
WHERE
row_number = 1
order by id, date_register
To de-duplicate in place, without re-creating the table - use MERGE:
MERGE `temp.many_random` t
USING (
SELECT DISTINCT *
FROM `temp.many_random`
)
ON FALSE
WHEN NOT MATCHED BY SOURCE THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
It's simpler than the current accepted answer, as it won't ask you to match the current partitioning or clustering - it will just respect it.
Update: please also check Felipe Hoffa's answer which is simpler, and learn more on this post: BigQuery Deduplication.
You need to exclude row_number from output and overwrite your table using CREATE OR REPLACE TABLE:
CREATE OR REPLACE TABLE your_table AS
PARTITION BY DATE(date_register)
SELECT
* EXCEPT(row_number)
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id, date_register) row_number
FROM your_table)
WHERE
row_number = 1
If you donĀ“t have a partition field defined at the source, I recommend that you create a new table with the partition field to make this query work so that you can automate the process.

SQL call Max row number from a temp table

In the temp table there are only two columns available. I would like to get the most recent ID for each load, as shown in the picture below.
I have tried this but it doesn't give me the answer I need.
select max(rn_plus_1),a.load, a.id from( select a.load,
a.id,
ROW_NUMBER() over(order by a.id desc) rn from max_num a group by load
, id) a
TEMP_TABLE lacks a sequential primary key or any other indicator for order of insertion. So it is not possible to get the latest ID for a LOAD using the columns of the table itself.
However, there is one option: ORA_ROWSCN(). This is a pseudo-column which identifies the System Change Number for the transaction which changed the table. So we can reconstruct the order of insertion by sorting the table on ORA_ROWSCN.
There are some caveats:
By default the SCN applies to the block level. Consequently all the rows in a block have the same SCN. It's a good enough approximation for wide tables but hopeless for a two-column toy like TEMP_TABLE. We can track SCN at the row level but only if the table is created with ROWDEPENDENCIES. The default is NOROWDEPENDENCIES. Unfortunately, we cannot use ALTER TABLE here. You will need to drop and recreate the table (*) to enable ROWDEPENDENCIES.
The SCN applies to the transaction. This means the solution will only work if each row in TEMP_TABLE is inserted in a separate transaction.
Obviously this is only possible if TEMP_TABLE is an actual table and not a view or some other construct.
Given all these criteria are satisfied here is a query which will give you the result set you want:
select load, id
from ( select load
, id
, row_number() over (partition by load order by ora_rowscn desc) as rn
from temp_table
)
where rn = 1
There is a demo on db<>fiddle. Also, the same demo except TEMP_TABLE defined with NOROWDEPENDENCIES, which produces the wrong result.
(*) If you need to keep the data in TEMP_TABLE the steps are:
rename TEMP_TABLE to whatever;
create table TEMP_TABLE as select * from whatever rowdependencies;
drop table whatever;
However, the SCN will be the same for the existing rows. If that matters you'll have to insert each row one at a time, in the order you wish to preserve, and commit after each insert.

Fetch No oF Rows that can be returned by select query

I'm trying to fetch data and showing in a table with pagination. so I use limit and offset for that but I also need to show no of rows that can be fetched from that query. Is there any way to get that.
I tried
resultset.last() and getRow()
select count(*) from(query) myNewTable;
These two cases i'm getting correct answer but is it correct way to do this. Performance is a concern
We can get the limited records using below code,
First, we need to set how many records we want like below,
var limit = 10;
After that sent this limit to the below statement
WITH
Temp AS(
SELECT
ROW_NUMBER() OVER( primayKey DESC ) AS RowNumber,
*
FROM
myNewTable
),
Temp2 AS(
SELECT COUNT(*) AS TotalCount FROM Temp
)
SELECT TOP limit * FROM Temp, Temp2 WHERE RowNumber > :offset order by RowNumber
This is run in both MSSQL and MySQL
There is no easy way of doing this.
1. As you found out, it usually boils down to executing 2 queries:
Executing SELECT with limit and offset in order to fetch the data that you need.
Executing a COUNT(*) in order to count the total number of pages.
This approach might work for tables that don't have a lot of rows, or when you filter the data (int the COUNT and SELECT queries) on a column that is indexed.
2. If your table is large, but the data that you need to show represents smaller percentage of the data from the table and the data shares a common trait (for example, the data in all of your pages is created on a single day) you can use partitioning. Executing COUNT and SELECT on a single partition will be way more faster than executing them on the whole table.
3. You can create another table which will store the value of the COUNT query.
For example, lets say that your big_table table looks like this:
id | user_id | timestamp_column | text_column | another_text_column
Now, your SELECT query looks like this:
SELECT * FROM big_table WHERE user_id = 4 ORDER BY timestamp_column LIMIT 20 OFFSET 20;
And your count query:
SELECT COUNT(*) FROM table WHERE user_id = 4;
You could create a count_table that will have the following format:
user_id | count
Once you fill this table with the current data in the system, you will create a trigger which will update this table on every insert or update of the big_table.
This way, the count query will be really fast, because it will be executed on the count_table, for example:
SELECT count FROM count_table WHERE user_id = 4
The drawback of this approach is that the insert in the big_table will be slower, since the trigger will fire and update the count_table on every insert.
This are the approaches that you can try but in the end it all depends on the size and type of your data.

Insert duplicate rows from temporary table

I am confused how to insert values from the declared table to the selected table. I used except to prevent insert of the first duplicate, but I want to make the second row of the duplicate be inserted.
How can I put the second value to the table above?
My point to achieve here is to insert the second value in the primary table to manipulate is time_mode value.
This is my query
INSERT INTO temp_time(SwipeID,tdate,ttime,time_mode,raw_data,[Shift],eid,machineip)
SELECT a.SwipeID,a.tdate,a.ttime,a.time_mode,a.raw_data,1, eid FROM #temp_time
EXCEPT
SELECT SwipeID,tdate,ttime,time_mode,raw_data,Shift,eid,machineip from temp_time
From the query above, only one value is inserted. My clients changed their minds that they want the duplicated values to be reflected. Changing the values on time_mode can be changed by the system i made. If I use the query again using insert without the except value, there would be 3 rows for the primary table which causes a problem because what I wanted to reflect is only the 2 rows.
I think I finally understood your problem correctly. Assuming you already have ran your first query that inserted the data without duplicates in your second table and now you want to insert the rest of the original duplicates.
In that case here's how you may do it, by eliminating the previous rows that you already have inserted:
WITH dupes AS (
SELECT *, ROW_NUMBER() OVER(
PARTITION BY SwipeID, tdate, ttime, time_mode, raw_data,[Shift], eid, machineip
ORDER BY (SELECT(0))
) AS row_num
FROM SourceTable
)
INSERT INTO DestinationTable (/*columns*/)
SELECT /*values you need*/
FROM dupes
WHERE row_num > 1;
Assuming you are using MYSQL;
INSERT INTO your_first_table(SwipeID,tdate,ttime,time_mode,raw_data,shift,eid,machineip)
SELECT a.SwipeID,a.tdate,a.ttime,a.time_mode,a.raw_data,1,a.eid,a.machineip FROM your_second_table a ORDER BY a.SwipeID DESC LIMIT 1, 1

Merge 2 Tables from different Databases

Hypothetically I want to merge 2 tables from different databases into one table, which includes all the data from the 2 tables:
The result would look like something like this:
Aren't the entries in the result table redundant, because there are 2 entries with Porsche and VW? Or can I just add the values in the column 'stock' because the column 'Mark' is explicit?
you need to create database link to another database here is the example on how to create database link http://psoug.org/definition/create_database_link.htm
after creating your select statement from another database should look: select * from tableA#"database_link_name"
Then you need to use MERGE statement to push data from another database so the merge statement should look something like this.
you can read about merge statement here: https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm#SQLRF01606
merge into result_table res
using (select mark, stock, some_unique_id
from result_table res2
union all
select mark, stock, some_unique_id
from tableA#"database_link_name") diff
on (res.some_unique_id = diff.some_unique_id )
when matched then
update set res.mark = diff.mark,
res.stock = diff.stock
when not matched then
insert
(res.mark,
res.stock,
res.some_unique_id)
values
(diff.mark,
diff.stock,
diff.some_unique_id);
I hope this will help you
SELECT ROW_NUMBER() OVER (ORDER BY Mark) AS new_ID, Mark, SUM(Stock) AS Stock
FROM
(
SELECT Mark,Stock FROM Database1.dbo.table1
UNION ALL
SELECT Mark,Stock FROM Database2.dbo.table2
) RESULT
GROUP BY Mark
Try this:
Select Mark, Stock, row_number() over(order by Mark desc) from table1
union all
Select Mark, Stock, row_number() over(order by Mark desc) from table2
regardless of the data redundancy, you could use union all clause to achieve this. Like:
Select * From tableA
UNION ALL
Select * From tanleB
Make sure the total number of columns and datatype should be matched between each
Don't forget to use fully qualified table names as the tables are in different databases
SELECT
Mark
,Stock
FROM Database1.dbo.table1
UNION ALL
SELECT
Mark
,Stock
FROM Database2.dbo.table2
If these are 2 live databases and you would need to constantly include rows from the 2 databases into your new database consider writing the table in your 3rd database as a view rather.
This way you can also add a column specifying which system the datarow is coming from. Summing the values is an option, however if you ever have a query regarding a incorrect summed value how would you know which system is the culprit?