I need to copy a huge table in SAP ERP software. Usually I split the table up with modulo 100 and copy it step by step. This works only if I have a number field as primary field I can use.
But in the table COSS there is no such field I can use. Is there a way to generate a number out of a string in SQL or is there another way to split a table in smaller pieces?
Thanks
Example:
method get_coep.
select * from coep using client #gv_client
where mod( cast( cast( belnr as numc( 10 ) ) as dec( 10, 0 ) ) , 100 ) = #iv_mod
into table #et_return
connection (gv_conn).
endmethod.
Related
MERGE PFM_EventPerformance_MetaData AS TARGET
USING
(
SELECT
[InheritanceMeterID] = #InheritanceMeterPointID
,[SubHourlyScenarioResourceID] = #SubHourlyScenarioResourceID
,[MeterID] = #MeterID--internal ID
,[BaselineID] = #BaselineID--internal ID
,[UpdateUtc] = GETUTCDATE()
)
AS SOURCE ON
TARGET.[SubHourlyScenarioResourceID] = SOURCE.[SubHourlyScenarioResourceID]
AND TARGET.[MeterID] = SOURCE.[MeterID]--internal ID
AND TARGET.[BaselineID] = SOURCE.[BaselineID]--internal ID
WHEN MATCHED THEN UPDATE SET
#MetaDataID = TARGET.ID--get preexisting ID when exists (must populate one row at a time)
,InheritanceMeterID = SOURCE.InheritanceMeterID
,[UpdateUtc] = SOURCE.[UpdateUtc]
WHEN NOT MATCHED
THEN INSERT
(
[InheritanceMeterID]
,[SubHourlyScenarioResourceID]
,[MeterID]--internal ID
,[BaselineID]--internal ID
)
VALUES
(
SOURCE.[InheritanceMeterID]
,SOURCE.[SubHourlyScenarioResourceID]
,SOURCE.[MeterID]--internal ID
,SOURCE.[BaselineID]--internal ID
);
In the above query I do not want to update the values in the Target table if there is no change in old values. I am not sure how to achieve this as I have used Merge statement rarely. Please help me with the solution. Thanks in advance
This is done best in two stages.
Stage 1: Merge Update on condition
SO Answer from before (Thanks to #Laurence!)
Stage 2: hash key condition to compare
Limits: max 4000 characters, including column separator characters
A rather simple way to compare multiple columns in one condition is the use of a computed column on both sides that HASHBYTES( , <column(s)> ) generates.
This moves writing lots of code from the merge statement to the table generation.
Quick example:
CREATE TABLE dbo.Test
(
id_column int NOT NULL,
dsc_name1 varchar(100),
dsc_name2 varchar(100),
num_age tinyint,
flg_hash AS HashBytes( 'SHA1',
Cast( dsc_name1 AS nvarchar(4000) )
+ N'•' + dsc_name2 + N'•' + Cast( num_age AS nvarchar(3) )
) PERSISTED
)
;
Comparing columns flg_hash between source and destination will make comparison quick as it is just a comparison between two 20 bit varbinary columns.
Couple of Caveat Emptor for working with HashBytes:
Function only works for a total of 4000 nvarchar characters
Trade off for short comparison code lies in generation of correct order in views and tables
There is a duplicate collision chance of around an 2^50+ for SHA1 - as security mechanism this is now considered insecure and a few years ago MS tried to drop SHA1 as algorithm
Added columns to tables and views can be overlooked from comparison if hash bytes code is outside of consideration for amendments
Overall I found that when comparing multiple columns this can overload my server engines but never had an issue with hash key comparisons
I am using the function below to generate a random number between 0 and 99999999999.
CREATE VIEW [dbo].[rndView]
AS
SELECT RAND() rndResult
GO
ALTER function [dbo].[RandomPass]()
RETURNS NUMERIC(18,0)
as
begin
DECLARE #RETURN NUMERIC(18,0)
DECLARE #Upper NUMERIC(18,0);
DECLARE #Lower NUMERIC(18,0);
DECLARE #Random float;
SELECT #Random = rndResult
FROM rndView
SET #Lower = 0
SET #Upper = 99999999999
set #RETURN= (ROUND(((#Upper - #Lower -1) * #Random + #Lower), 0))
return #RETURN
end;
However, I need to make sure that the returned number has never been used before in the same app. In .net I would create a while loop and keep looping until the returned value is not found in a table that stores previously used values. Is there a way to achieve the same result directly in SQL, ideally without using loops? If there is no way to do that without loops, I think it would still be more efficient to do it in an SQL function rather than having a loop in .net performing a number of query requests.
You will need to store the used values in a table, and a recursive query to generate the next value.
The answer depends on the RDBMS you are using.
Below are two examples, in PostgreSQL and MS SQL Server, that would solve your problem.
PostgreSQL
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
The PRIMARY KEY is not strictly necessary, but it will
generate an index which will speed up the next query;
ensure that two equal ids are never generated.
Then, use the following query to obtain a new id :
WITH RECURSIVE T AS (
SELECT 1 AS n, FLOOR(RANDOM() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RANDOM() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids
SELECT v
FROM T
ORDER BY n DESC
LIMIT 1
RETURNING id;
The logic is that as long as the (last) id generated is already consumed, we generate a new id. The column n of the CTE is there only to retrieve the last generated id at the end, but you may also use it to limit the number of generated random numbers (for example, give up if n > 10).
(tested using PostgreSQL 12.4)
MS SQL Server
First, create a table that will hold your consumed ids :
CREATE TABLE consumed_ids (
id BIGINT PRIMARY KEY NOT NULL
);
Then, use the following query to obtain a new id :
WITH T AS (
SELECT 1 AS n, FLOOR(RAND() * 100000000000) AS v
UNION ALL
SELECT n + 1, FLOOR(RAND() * 100000000000)
FROM T
WHERE EXISTS(SELECT * FROM consumed_ids WHERE id = v)
)
INSERT INTO consumed_ids (id)
OUTPUT Inserted.id
SELECT TOP 1 v
FROM T
ORDER BY n DESC;
(tested using MS SQL Server 2019).
Note however that MS SQL Server will give up after 100 tries by default.
There is no such thing as a random number without replacement.
You need to store the numbers that have already been used in a table, which I would define as:
create table used_random_numbers (
number decimal(11, 0) primary key
);
Then, when you create a new number, insert it into the table.
In the part of the code that generates a number, use a while loop. Within the while loop, check that the number doesn't exist.
Now, there are some things you can do to make this more efficient as the numbers grow larger -- and ways that don't require remembering all the previous values.
First, perhaps UUID/GUID is sufficient. This is the industry standard for a "random" id -- although it is a HEX string rather than a number in most databases. The exact syntax depends on the database.
Another approach is to have an 11 digit number. The first or last 10 digits could be the Unix epoch time (seconds since 1970-01-01) -- either explicitly or under some transformation so the value "looks" random. The additional digit would then be a random digit. Of course, you could extend this to minutes or days so you have more random digits.
I have an .csv file that includes 2 colums which are ID and their related segments.
But I have nearly 600 thousand rows. In that rows segmentID column has just one value. But in other row there is more than one value and it changes.
I have tried to copy the values to my database with this code:
**COPY FROM** 'C:/User/Local/intersectionsegments.csv' **DELIMITER** ',' **CSV HEADER**
But my method takes just one value for segmentID and one value for IntersectingSegments
How can I upload my csv file to the database with a query.
This is why CSV are poor sources of data (unavoidable sources, but still poor). The easiest process here is to create an intermediate staging table. This table contains as many columns as the CSV. Use COPY to populate this table and sql to copy from stage table to your production table. Something along the lines of
Create table intersecting_seqments_stage
( segment_id integer
, segment_1 integer
, segment_2 integer
, ...
, segment_n integer
) ;
Now with that in place, run the following as a script:
truncate intersecting_seqments_stage;
copy intersecting_seqments_stage from 'csv_table_name' ...;
insert into intersecting_seqments(segment_id, intersecting_seqment)
select segment_id, intersecting_segment
from (select segment_id, segment_1 as intersecting_segment
from intersecting_seqments_stage
union all
select segment_id, segment_2
from intersecting_seqments_stage
union all
...
select segment_id, segment_n
from intersecting_seqments_stage
)
where intersecting_segment is not null;
Using CTAS we can leverage the parallelism that Polybase provides to load data into a new table in a highly scalable and performant way.
Is there a way to use a similar approach to load data into an existing table? The table might even be empty.
Creating an external table and using INSERT INTO ... SELECT * FROM ... - I would assume that this goes through the head node and is therefore not in parallel?
I know that I could also drop the table and use CTAS to recreate it but then I have to deal with all the metadata again (column names, data types, distributions, ...).
You could use partition switching to do this, although remember not to use too many partitions with Azure SQL Data Warehouse. See 'Partition Sizing Guidance' here.
Bear in mind check constraints are not supported so the source table has to use the same partition scheme as the target table.
Full example with partitioning and switch syntax:
-- Assume we have a file with the values 1 to 100 in it.
-- Create an external table over it; will have all records in
IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'ext' )
EXEC ( 'CREATE SCHEMA ext' )
GO
-- DROP EXTERNAL TABLE ext.numbers
IF NOT EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('ext.numbers') )
CREATE EXTERNAL TABLE ext.numbers (
number INT NOT NULL
)
WITH (
LOCATION = 'numbers.csv',
DATA_SOURCE = eds_yourDataSource,
FILE_FORMAT = ff_csv
);
GO
-- Create a partitioned, internal table with the records 1 to 50
IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers
CREATE TABLE dbo.numbers
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number Between 1 And 50;
GO
-- DBCC PDW_SHOWPARTITIONSTATS ('dbo.numbers')
-- CTAS the second half of the external table, records 51-100 into an internal one.
-- As check contraints are not available in SQL Data Warehouse, ensure the switch table
-- uses the same scheme as the original table.
IF OBJECT_ID('dbo.numbers_part2') IS NOT NULL DROP TABLE dbo.numbers_part2
CREATE TABLE dbo.numbers_part2
WITH (
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED INDEX ( number ),
PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
)
AS
SELECT *
FROM ext.numbers
WHERE number > 50
GO
-- Partition switch it into the original table
ALTER TABLE dbo.numbers_part2 SWITCH PARTITION 2 TO dbo.numbers PARTITION 2;
SELECT *
FROM dbo.numbers
ORDER BY 1;
I am writing some SQL code to be run in MapBasic (MapInfo's Programming language). The best way to describe the question is with an example:
I want to select all records where ShipType="Barge" into a query named Barges and I want all the remaining records to be put in a query OtherShips.
I could simply use the following SQL commands:
select * from ShipsTable where ShipType = "Barge" into Barges
select * from ShipsTable where ShipType <> "Barge" into OtherShips
That's fine and all but I can't help but feel that this is inefficient. Won't SQL be searching through the database twice? Won't it find the rows of data that fit the 2nd Query during the processing of the 1st?
Instead, it would be faster if there was a command like:
select * from ShipsTable where ShipType = "Barge" into Barges ELSE into OtherShips
My question is, can you do this? Is there a command that fits this spec?
Thanks,
You could do this quite easily in SSIS with a conditional split and two different destinations.
But not really in TSQL.
However for "fun" some possibilities are looked at below.
You could create a partitioned view but the requirements that you need to meet for this are quite arduous and the execution plan just loads it all into a spool and then reads the spool twice with two different filters anyway.
CREATE TABLE Barges
(
Id INT,
ShipType VARCHAR(50) NOT NULL CHECK (ShipType = 'Barge'),
PRIMARY KEY (Id, ShipType)
)
CREATE TABLE OtherShips
(
Id INT,
ShipType VARCHAR(50) NOT NULL CHECK (ShipType <> 'Barge'),
PRIMARY KEY (Id, ShipType)
)
CREATE TABLE ShipsTable
(
ShipType VARCHAR(50) NOT NULL
)
go
CREATE VIEW ShipsView
AS
SELECT *
FROM Barges
UNION ALL
SELECT *
FROM OtherShips
GO
INSERT INTO ShipsView(Id, ShipType)
SELECT ROW_NUMBER() OVER(ORDER BY ##SPID), ShipType
FROM ShipsTable
Or you could use the OUTPUT clause and composable DML but that would require inserting both sets of rows into the first table and then cleaning out the unwanted rows afterwards (the second table would only get the correct rows and not need any clean up).
CREATE TABLE Barges2
(
ShipType VARCHAR(50) NOT NULL
)
CREATE TABLE OtherShips2
(
ShipType VARCHAR(50) NOT NULL
)
CREATE TABLE ShipsTable2
(
ShipType VARCHAR(50) NOT NULL
)
INSERT INTO Barges2
SELECT *
FROM
(
INSERT INTO OtherShips2
OUTPUT INSERTED.*
SELECT *
FROM ShipsTable2
) D
WHERE D.ShipType = 'Barge';
DELETE FROM OtherShips2 WHERE ShipType = 'Barge';
MapBasic does provide you access to MapInfo's 'Invert Selection' which would give you anything that wasn't selected from your first query (assuming your first query does return results). You can call it by using it's menu ID (found in Menu.def) which is 311 or if you include menu.def at the top of the file you can reference it through the constant M_QUERY_INVERTSELECT.
eg.
Select * from ShipsTable where ShipType = "Barge" into Barges
Run Menu Command 311
or
Run Menu Command M_QUERY_INVERTSELECT if you have included the menu definitions file.
I believe this would give you better performance than doing a second selection as per your example but you wouldn't be able to then name the results table with an alias without doing another selection. Depends on your use case whether this is worth using or not, for a large query that takes quite a while it could well save on some processing time.