I have the following query:
Select Name,
case when charindex('I',a.S_Data) > 0 then 1 else 0 end as Illustrated,
case when charindex('FP',a.S_Data) > 0 then 1 else 0 end as FrontPage,
case when charindex('BP',a.S_Data) > 0 then 1 else 0 end as BackPage,
case when charindex('ELP',a.S_Data) > 0 then 1 else 0 end as EDLP,
case when charindex('PR',a.S_Data) > 0 then 1 else 0 end as SpecialPromo
From Table1
What I would like to do is to store those filter values in some sort of lookup table or a settings table.
I am struggling with how to draw the values from a lookup table to use with this query.
I can think of at least two options...
CREATE TABLE constants (
id AS INT,
Illustrated AS VARCHAR(3),
FrontPage AS VARCHAR(3),
BackPage AS VARCHAR(3),
EDLP AS VARCHAR(3),
SpecialPromo AS VARCHAR(3)
)
INSERT INTO constants SELECT 1, 'I', 'FP', 'BP', 'ELP', 'PR'
SELECT
Name,
CASE WHEN CHARINDEX(constants.Illustrated, data.S_Data) > 0 THEN 1 ELSE 0 END AS Illustrated,
etc, etc
FROM
data
INNER JOIN
constants
ON constants.id = 1
Or...
CREATE TABLE constants (
constant_set_id AS INT,
constant_name AS VARCHAR(16),
value AS AS VARCHAR(3)
)
INSERT INTO constants SELECT 1, 'Illustrated', 'I'
INSERT INTO constants SELECT 1, 'FrontPage', 'FP'
INSERT INTO constants SELECT 1, 'BackPage', 'BP'
INSERT INTO constants SELECT 1, 'EDLP', 'ELP'
INSERT INTO constants SELECT 1, 'SpecialPromo', 'PR'
SELECT
Name,
MAX(CASE WHEN constants.constant_name = 'Illustrated' AND CHARINDEX(constants.value, data.S_Data) > 0 THEN 1 ELSE 0 END) AS Illustrated,
etc, etc
FROM
data
INNER JOIN
constants
ON constants.constant_set_id = 1
GROUP BY
data.name
Both let you have multiple different sets of constants. One is expandable without changing the schema, though the query still would need to change.
The main advantage of either approach is that you can re-use the constants else where, but store them once in a centralised location. Which is only relevant if/when the values in the constants needs updating. Re-use through indirection.
At the moment, your table apparently violates First Normal Form, since a single field can hold many values for a single record.
There are at least two ways this could be resolved:
(1) if the only values that can be stored in this field are the five specified in the query, it might make sense to replace the character field with five integer fields, each a flag for the specified condition - ie:
...
Illustrated int,
FrontPage int,
BackPage int,
EDLP int,
SpecialPromo int,
...
(2) If a variety of different conditions are to be stored, then I would suggest adding a lookup table for conditions, and a link table between the conditions and the original table - like so:
Conditions
----------
Condition_id
Description
Link_Table
----------
Table1_id
Condition_id
First, it would appear that Table1 is not first normal form (NFNF) because it violates the requirement that each tuple of has exactly one value for each attribute being of the type that is the declared type of that attribute i.e. the S_Data has multiple scalar types. You will suffer update anomalies e.g. deleting a setting presumably involves an UPDATE with text concatenation. Consider that SQL doesn't has operators that handle this kind of data (i.e. non-relational) very well.
Second, your output table is suboptimal becasue it returns the same type as multiple columns i.e. it looks more like a report.
Consider that the unit of work in SQL is the row:
CREATE TABLE Settings
(
Setting VARCHAR(15) NOT NULL UNIQUE
);
INSERT INTO Settings VALUES ('Illustrated'), ('FrontPage'), ('BackPage'),
('EDLP'), ('SpecialPromo');
CREATE TABLE Table1
(
Name VARCHAR(20) NOT NULL,
Setting VARCHAR(15) NOT NULL
REFERENCES Settings (Setting)
ON DELETE CASCADE
ON UPDATE CASCADE,
UNIQUE (Name, Setting)
);
Related
I am trying to write a migration script using SQL.
I got the DDL part pretty easily covered, but now i have to migrate the existing data as shown below:
Before Migration:
Table BOX
uuid
active_service_number
other BOX columns...
2869c64f-8ecb-4296-8c3b-1c72b308d59f
2
...
After Migration:
Table BOX
uuid
other BOX columns...
2869c64f-8ecb-4296-8c3b-1c72b308d59f
...
Table BOX_SERVICE
uuid
number
state
box_uuid
6a33d57f-e02b-4d0a-b258-3cef0bb3dff7
0
INACTIVE
2869c64f-8ecb-4296-8c3b-1c72b308d59f
...
1
INACTIVE
2869c64f-8ecb-4296-8c3b-1c72b308d59f
...
2
ACTIVE
2869c64f-8ecb-4296-8c3b-1c72b308d59f
...
...
...
...
...
N
INACTIVE
2869c64f-8ecb-4296-8c3b-1c72b308d59f
To sum it up
For each BOX row, I want to create exactly N BOX_SERVICE rows, numbered from 0 to N. (N is a fixed number predetermined)
Only one BOX_SERVICE should have it's state set to ACTIVE, the one corresponding to active_service_number (2 in the example above) of the corresponding BOX.
I got the pseudo-code figured out, but can't seem to translate it into SQL. I tried to do it with a single request, with multiple requests, cursors.
Here is what my DDL is looking like:
CREATE TABLE BOX_SERVICE (uuid varchar(255) NOT NULL,
number int,
state varchar(255) DEFAULT 'INACTIVE',
box_uuid varchar(255),
PRIMARY KEY(uuid),
CONSTRAINT FK_BOX_BOX_SERVICE FOREIGN KEY (box_uuid)
REFERENCES BOX(uuid)
);
-- Migrate existing data
ALTER TABLE BOX DROP active_service_number
You can generate the rows using generate_series() and cross join:
select <uuid function>,
gs.n as number,
(case when gs.n = b.active_service_number then 'ACTIVE' else 'INACTIVE' end),
b.uuid as box_uuid
from box b cross join
generate_series(0, n, 1) as gs(n);
Your version of Postgres only offers UUIDs via an extension. Use whatever method you use for the first column of generating UUIDs.
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
select uuid_generate_v1(),sequence number,
(case when active_service_number=SEQUENCE THEN 'ACTIVE' ELSE 'INACTIVE' END)STATE,
UUID BOX_UUID from box CROSS JOIN (select generate_series(0,5) as sequence)t
Output:
UUID |Number|Status |Box_UUID
4b247576-8e42-11eb-80ae-1831bf6eced8|0 |INACTIVE|2869c64f-8ecb-4296-8c3b-1c72b308d59f
4b247577-8e42-11eb-80af-1831bf6eced8|1 |INACTIVE|2869c64f-8ecb-4296-8c3b-1c72b308d59f
4b247578-8e42-11eb-80b0-1831bf6eced8|2 |ACTIVE |2869c64f-8ecb-4296-8c3b-1c72b308d59f
4b247579-8e42-11eb-80b1-1831bf6eced8|3 |INACTIVE|2869c64f-8ecb-4296-8c3b-1c72b308d59f
4b24757a-8e42-11eb-80b2-1831bf6eced8|4 |INACTIVE|2869c64f-8ecb-4296-8c3b-1c72b308d59f
4b24757b-8e42-11eb-80b3-1831bf6eced8|5 |INACTIVE|2869c64f-8ecb-4296-8c3b-1c72b308d59f
I have a table that I am trying to insert multiple records into using a select statement.
The ID field is an INT and not autoincremented but I do need to increment in in the INSERT.
The table belongs to a third party product we use for our ERP so I cannot change the property of the ID.
The insert is supposed to create a record in the EXT01100 table for each line item on a particular sales order.
Here is the code I am using:
INSERT INTO EXT01100 (Extender_Record_ID, Extender_Window_ID, Extender_Key_Values_1 , Extender_Key_Values_2, Extender_Key_Values_3)
SELECT (SELECT MAX(EXTENDER_RECORD_ID) + 1 FROM EXT01100), 'ECO_FEE_DIGIT', SOL.LNITMSEQ, SOL.SOPNUMBE, SOL.SOPTYPE
FROM SOP10200 SOL WITH(NOLOCK)
WHERE SOL.SOPTYPE = #InTYPE AND SOL.SOPNUMBE = #INNUMBE AND SOL.LNITMSEQ <> 0
This works on a single line order, but multiple line orders will produce a Primary Key duplicate error so I don't think I can use (SELECT MAX(EXTENDER_RECORD_ID) + 1 FROM EXT01100) in this case.
This is in SQL server.
Any help is greatly appreciated!
You can use row_number to ensure each row has a unique ID, and you need to take an exclusive lock on your main sequence table, and you need to remove your nolock.
INSERT INTO EXT01100 (Extender_Record_ID, Extender_Window_ID, Extender_Key_Values_1 , Extender_Key_Values_2, Extender_Key_Values_3)
SELECT (SELECT MAX(EXTENDER_RECORD_ID) FROM EXT01100 WITH (TABLOCKX)) + ROW_NUMBER() OVER (ORDER BY SOL.LNITMSEQ)
, 'ECO_FEE_DIGIT', SOL.LNITMSEQ, SOL.SOPNUMBE, SOL.SOPTYPE
FROM SOP10200 SOL
WHERE SOL.SOPTYPE = #InTYPE AND SOL.SOPNUMBE = #INNUMBE AND SOL.LNITMSEQ <> 0;
Seconding a recommendation from the comments above, we use Sequences in our production system with no problem. Here's how it looks:
create sequence SQ_Extender_Record_ID
minvalue 1
start with 1
cache 100;
INSERT INTO EXT01100 (Extender_Record_ID, Extender_Window_ID, Extender_Key_Values_1 , Extender_Key_Values_2, Extender_Key_Values_3)
SELECT (next value for SQ_Extender_Record_ID), 'ECO_FEE_DIGIT', SOL.LNITMSEQ, SOL.SOPNUMBE, SOL.SOPTYPE
FROM SOP10200 SOL
WHERE SOL.SOPTYPE = #InTYPE AND SOL.SOPNUMBE = #INNUMBE AND SOL.LNITMSEQ <> 0
Obviously, adjust the min/start values as appropriate for your situation.
If you want, you could add a default constraint to the table/column with this:
alter table EXT01100 add constraint DF_EXT01100__Extender_Record_ID
default (next value for SQ_Extender_Record_ID)
for Extender_Record_ID
You mention that this is in a database whose schema you don't control, so that may not be an option; I mention it for the sake of completeness.
I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.
I have three certain columns in a table I am trying to query, say ID(char), Amount(bigint) and Reference(char). Here is a sample of a few entries from this table. The first two rows have no entry in the third column.
ID | Amount | Reference
16266| 24000|
16267| -12500|
16268| 25000| abc:185729000003412
16269| 25000| abc:185730000003412
What I am trying to get is a query or a function that will return the ids of the duplicate rows that have the same amount and the same modulus (%100000000) of the number in the string in the reference column.
The only cells in the reference column I am interested in will all have 'abc:' before the whole number, and nothing after the number. I need some way to convert that final field (string) into a int so I can search for the modulus of that number
Here is the script I will run once I get the reference field converted into a number without the 'abc:'
CREATE TEMP TABLE tableA (
id int,
amount int,
referenceNo bigint)
INSERT INTO tableA (id, amount, referenceNo) SELECT id, net_amount, longnumber%100000000 AS referenceNo FROM deposit_item
SELECT DISTINCT * FROM tableA WHERE referenceNo > 1 AND amount > 1
Basically, how do I convert the reference field (abc:185729000003412) to an integer in PSQL (185729000003412 or 3412)?
Assuming that reference id is always delimited by :
split_part(Reference, ':', 2)::integer
should work.
Edit:
If you want to match abc: specifically - try this:
CASE
WHEN position('abc:' in Reference) > 0
THEN split_part(Reference, 'abc:', 2)::integer
ELSE 0
END
But you should indeed consider storing the xxx: prefix separately.
I am working on a legacy system, and many of the database structures are horrendous (key / value pairs).
I have the following select statement:
(
SELECT D.F_VALUE
FROM T_WEB_QUOTES_DATA D
WHERE
D.F_QUOTE_ID = TO_CHAR(VR_RENTAL.QUOTEID)
AND D.F_KEY = 'Secondary_Driver_Forename'
) AS "SECONDARY_DRIVER_FORENAME"
So as you can see it is looking for a record where the F_Key column has a value of Secondary_Driver_Forename. The problem is there is another F_Key that holds the same exact information and I need to check for both keys.
So what I want to do is:
If there are no records where F_Key = Secondary_Driver_Forename or of such a record exists, but the value is an empty string or null, then I would like to go and look for a record where the F_Key is 2ndary_Driver_FirstName and if that does not exist (or is null), I would like to return a string saying No Key
How can I achieve this in Oracle SQL?
I am thinking of something like this:
(
SELECT (case when max(case when D.F_KEY in 'Secondary_Driver_Forename' then 1 else 0 end) = 1
then max(case when D.F_KEY in 'Secondary_Driver_Forename' then D.F_VALUE end)
else max(D.F_Value)
end)
FROM T_WEB_QUOTES_DATA D
WHERE
D.F_QUOTE_ID = TO_CHAR(VR_RENTAL.QUOTEID)
AND D.F_KEY in ('Secondary_Driver_Forename', '2ndary_Driver_FirstName')
) AS "SECONDARY_DRIVER_FORENAME";
That is, do a conditional aggregation of the values. If the primary value is present, then use it. Otherwise, just choose the value that is there (either NULL or the value from the second key).