5 characters string ID like Northwind CustomerID - sql

I am looking to get a 5 character long string compiled from a string name, just like it is in MS Northwind sample database (Customers.CustomerID):
CustomerID CompanyName
----- ----------------------------------
ALFKI Alfreds Futterkiste
ANATR Ana Trujillo Emparedados y helados
ANTON Antonio Moreno Taquería
AROUT Around the Horn
BERGS Berglunds snabbköp
BLAUS Blauer See Delikatessen
BLONP Blondesddsl pere et fils
BOLID Bólido Comidas preparadas
BONAP Bon app'
BOTTM Bottom-Dollar Markets
BSBEV B's Beverages
... ...
So the ID (code) is not totally random, it somehow resembles the content of the string name. Obviously, duplicity is forbidden, the code has to be unique. Let's assume:
DECLARE #StrCode NCHAR(5)
I'd start with thinging along theese lines - I would create a scalar function to check duplicities which would return 0 for no duplicities and 1 for found duplicities, the core check would be simply counting lines of the provided string code:
SELECT COUNT(ID) FROM MyTable WHERE StrCode = #StrCode
And then in an another function (input = StringName, output = #StrCode) I'd try few approaches, from the most readable to the least readable, and check each attempt for a duplicity along the way:
SELECT #StrCode =
CASE
WHEN dbo.CheckForDuplicity(SUBSTRING(#StringName,1,6)) = 0
THEN SUBSTRING(#StringName,1,6) -- get start of the string name
WHEN CHARINDEX(' ',#StringName) > 0 AND dbo.CheckForDuplicity(CONCAT(SUBSTRING(#StringName,1,3),SUBSTRING(#StringName,CHARINDEX(' ',#StringName)+1,2))) = 0
THEN CONCAT(SUBSTRING(#StringName,1,3),SUBSTRING(#StringName,CHARINDEX(' ',#StringName)+1,2)) -- get 3 chars from the 1st word and 2 chars from the 2nd
WHEN dbo.CheckForDuplicity(CONCAT(SUBSTRING(#StringName,1,3),SUBSTRING(#StringName,LEN(#StringName)-3,2))) = 0
THEN CONCAT(SUBSTRING(#StringName,1,3),SUBSTRING(#StringName,LEN(#StringName)-3,2)) -- get first 3 chars and last 2 chars
-- possibly other approaches...
END
So the first one would get accepted. But there may be a lot better approaches I'm not aware of...
This is only a concept. Is this a good idea?
And what about using this StringCode as a primary key, as they do in Northwind? Wouldn't this string key slow down the things, in comparison to INT key?

Related

I get this error Message "Invalid length parameter passed to the LEFT or SUBSTRING function."

enter image description hereThis code should return the street address without the street number. These EU address have their street number at the end of the address. I am not sure why the error is happening.
UPDATE STAGING_1_1_FACT_CUSTOMERS_B2B_LGP
SET [StreetAddress] = SUBSTRING([Address], 1, PATINDEX('%[1-9]%', [Address])-1)
FROM [dbo].[STAGING_1_1_FACT_CUSTOMERS_B2B_LGP]
WHERE [Country Code] IN ('NL','DE','LT','AT','BE','ES','DK','IT', 'SE', 'CZ', 'SI', 'SUI', 'EE','PL','HU','LIE','FI','LV')
Identify rows without a number in the address:
SELECT * FROM dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP -- wow
WHERE PATINDEX('%[1-9]%', [Address]) = 0;
To get the entire address when a number doesn't occur, you can use:
SUBSTRING(Address, 1, COALESCE(NULLIF(
PATINDEX('%[1-9]%', [Address]), 0),LEN(Address)+1)-1)
Which - finding no number - will add 1 to the length so you can still subtract 1 to get the whole string. That's assuming you want the whole string in that case.
In order to perform the update you're still going to have to prepare for garbage data that you obviously have (or you wouldn't be here) but that you didn't include in your screenshot (also don't post data as screenshots). Given this sample data:
CREATE TABLE dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP
(
Address nvarchar(255),
StreetNumber nvarchar(255),
StreetAddress nvarchar(255)
);
INSERT dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP(Address)
VALUES(N'Gewerbegebiet 5'),
(N'Spännigweg 1'),
(N'Hauptstr 113A'),
(N'Viale Francesco Redi 39'),
(N'Garbage your code does not handle.'),
(N'More garbage 20th promenade 225 W');
You can run the following update:
; /* <--- ensure previous statement terminated */
WITH src AS
(
SELECT *, FirstNumber =
COALESCE(NULLIF(PATINDEX('%[1-9]%', [Address]), 0),LEN(Address)+1)
FROM dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP
-- WHERE CountryCode IN ('some', 'list');
)
UPDATE src SET
StreetNumber = SUBSTRING(Address, FirstNumber, 255),
StreetAddress = LEFT(Address, FirstNumber-1);
Output (which shows what happens to garbage):
Address
StreetNumber
StreetAddress
Gewerbegebiet 5
5
Gewerbegebiet
Spännigweg 1
1
Spännigweg
Hauptstr 113A
113A
Hauptstr
Viale Francesco Redi 39
39
Viale Francesco Redi
Garbage your code does not handle.
Garbage your code does not handle.
More garbage 20th promenade 225 W
20th promenade 225 W
More garbage
Example db<>fiddle
Also you don't need the FROM line in the update. You're updating the same table.
Finally, the requirement makes little sense to me.
Why do you want StreetAddress to be everything up to but not including the number?
What happens if there is a number in a street name?
If you're trying to clean up address data, there is very expensive software that does this and still isn't perfect, so trying to re-invent the wheel is going to lead to lots of little frustrating issues like this one.

xmlquery returns all values as one long line instead of separate entities

I am trying to query the telephone numbers from the following xml file
<xmlPhoneEntity>
<TelephoneEntity>
<number>123</number>
</TelephoneEntity>
<TelephoneEntity>
<number>456</number>
</TelephoneEntity>
<TelephoneEntity>
<number>789</number>
</TelephoneEntity>
</xmlPhoneEntity>
This XML is located in my DB - the table looks like this
id customer_id telephone blabla
1 111 xmlfile
2 222 xmlfile
etc
My sql query looks like this -
select xmlserialize(xmlquery('/xmlPhoneEntity/TelephoneEntity/number/text()
passing telephone"
the response: 123456789
I tried using /nodes() instead of /text() and the result is the same.
How do I separate the values ?

Very long time execute queries with NOT NULL expression in Derby DB

I have table with name ATTACHMENT with follow columns:
COLUMN_NAME TYPE_NAME COLUMN_SIZE
---------------------------------------------------
DTYPE VARCHAR 31
ID VARCHAR 36
VERSION BIGINT 19
TYPE INTEGER 10
FILENAME VARCHAR 100
DATA BLOB 9437211
SIZE INTEGER 10
CHECKSUM BIGINT 19
AUTHOR VARCHAR 36
FILEDATE DATE 10
FILETIME TIME 8
CREATIONDATE DATE 10
CREATIONTIME TIME 8
FILETYPE INTEGER 10
SYSTEM SMALLINT 5
ORIGINALPICTUREID VARCHAR 36
COMPRESSEDPICTUREID VARCHAR 36
FIRSTUSE VARCHAR 120
And when I have run simple test SQL query:
SELECT ID FROM ATTACHMENT WHERE ORIGINALPICTUREID IS NOT NULL;
This query execute very long time (30 sec.)
But when I have run next test SQL query without IS NOT NULL expression:
SELECT ID FROM ATTACHMENT WHERE ORIGINALPICTUREID IS NULL;
This query execute only 2 sec.
In real system I have script:
select ATTACHMENT.ID,
ATTACHMENT.SIZE,
ATTACHMENT.AUTHOR,
ATTACHMENT.FILENAME,
ATTACHMENT.FILETIME,
ATTACHMENT.FILEDATE,
ATTACHMENT.CREATIONDATE,
ATTACHMENT.CREATIONTIME,
ATTACHMENT.FILETYPE,
ATTACHMENT.COMPRESSEDPICTUREID,
ATTACHMENT.ORIGINALPICTUREID,
ATTACHMENT.FIRSTUSE
from ATTACHMENT,
MESSAGECONTENT_ATTACHMENT,
MESSAGECONTENT
where ATTACHMENT.ID not in (select distinct ATTACHMENT.ORIGINALPICTUREID
from ATTACHMENT
where ATTACHMENT.ORIGINALPICTUREID is not null)
and ATTACHMENT.ID not in (select distinct COMPRESSEDPICTUREID
from ATTACHMENT
where ORIGINALPICTUREID is not null)
and MESSAGECONTENT_ATTACHMENT.MESSAGECONTENT_ID = MESSAGECONTENT.ID
and MESSAGECONTENT_ATTACHMENT.ATTACHMENTS_ID = ATTACHMENT.ID
and ATTACHMENT.DTYPE = 'P'
and MESSAGECONTENT.PERSONIDPATIENT = '0584393a-0955-4c9b-98f7-d31c991d22a3'
and (ATTACHMENT.FILENAME like '%jpeg'
or ATTACHMENT.FILENAME like '%jpg'
or ATTACHMENT.FILENAME like '%tiff'
or ATTACHMENT.FILENAME like '%tif'
or ATTACHMENT.FILENAME like '%bmp'
or ATTACHMENT.FILENAME like '%gif'
or ATTACHMENT.FILENAME like '%png'
or ATTACHMENT.FILENAME like '%ser')
and this script execute very, very long time.
Could you please help me how I can solve problem with IS NOT NULL expression in my SQL query in my Derby DB?
Thank you very much!
You are killing yourself on this query primarily due to your distinct of not nulls... You are blowing through ALL ATTACHMENTS TWICE for original and compressed respectively, yet you are only interested in a single patient. I've restructured the query to START with the WHO you want... The patientPersonID. From that, join to the message attachments. You only care about anything that is attached to this ONE PERSON. This should result in a very small set of records. Of THOSE records, only THOSE do you care to look at the attachment table itself and see if any qualify for your DPTYPE, like condition and IS NULL.
I would ensure you have an index on your messagecontent table on (PersonIDPatient) at a minimum, and if any other columns AFTER the first position, no problem. The joins to the other tables appear to be on their respective primary ID column and would assume that you have indexes on those.
SELECT
atch.ID,
atch.SIZE,
atch.AUTHOR,
atch.FILENAME,
atch.FILETIME,
atch.FILEDATE,
atch.CREATIONDATE,
atch.CREATIONTIME,
atch.FILETYPE,
atch.COMPRESSEDPICTUREID,
atch.ORIGINALPICTUREID,
atch.FIRSTUSE
FROM
MESSAGECONTENT msgCont
JOIN MESSAGECONTENT_ATTACHMENT msgAtt
ON msgCont.ID = msgAtt.MESSAGECONTENT_ID
JOIN ATTACHMENT atch
ON msgAtt.ATTACHMENTS_ID = atch.ID
AND atch.DTYPE = 'P'
AND atch.ORIGINALPICTUREID IS NOT NULL
AND atch.CompressedPictureID IS NOT NULL
AND ( atch.FILENAME LIKE '%jpeg'
OR atch.FILENAME LIKE '%jpg'
OR atch.FILENAME LIKE '%tiff'
OR atch.FILENAME LIKE '%tif'
OR atch.FILENAME LIKE '%bmp'
OR atch.FILENAME LIKE '%gif'
OR atch.FILENAME LIKE '%png'
OR atch.FILENAME LIKE '%ser')
WHERE
msgCont.PersonIDPatient = '0584393a-0955-4c9b-98f7-d31c991d22a3'
NOT IN operator in queries does not make use of any indexes. -
Avoid using NOT IN operator in your queries.
In order to to find results which does NOT meet a certain criteria it has to check ALL the records against the condition, which makes presences of indexes irrelevant.
Also instead of using wildcard % try making use of Full-Text indexes and query the database something like
Select Col1, Col2 , .......
from Table
Where Col1 CONTAINS(Col1,'Search') AND Col1 CONTAINS(Col1,'Search2').........

SAS table string length (limit)

I am creating a SAS table in which one of the fields has to holds a huge string.
Following is my table (TABLE name = MKPLOGS4):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear
2 123 concatenated kinda like concat_group() from MYSQL but for some reason
3 123 SAS does not have such functionality. Now I am having trouble with
4 123 String concatenation.
5 124 Hi there old friend of mine, hope you are enjoying the weather
6 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
and I have to get a Table similar to this (table name = MKPLOGSA):
OBS RNID DESCTEXT
--------- -----------
1 123 This is some text which is part of the record. I want this to appear concatenated kinda like concat_group() from MYSQL but for some reason SAS does not have such functionality. Now I am having trouble with String concatenation.
2 124 Hi, there old friend of mine, hope you are enjoying the weather Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
So, after trying unsuccessfully with SQL, I came up with the following SAS code (please note I am very new at SAS):
DATA MKPLOGSA (DROP = DTEMP DTEXT);
SET MKPLOGS4;
BY RNID;
RETAIN DTEXT;
IF FIRST.RNID THEN
DO;
DTEXT = DESCTEXT;
DELETE;
END;
ELSE IF LAST.RNID THEN
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DESCTEXT = DTEMP;
END;
ELSE
DO;
DTEMP = CATX(' ',DTEXT,DESCTEXT);
DTEXT = DTEMP;
DELETE;
END;
The SAS log is producing this warning message:
WARNING: IN A CALL TO THE CATX FUNCTION, THE BUFFER ALLOCATED
FOR THE RESULT WAS NOT LONG ENOUGH TO CONTAIN THE CONCATENATION
OF ALL THE ARGUMENTS. THE CORRECT RESULT WOULD CONTAIN 229 CHARACTERS,
BUT THE ACTUAL RESULT MAY EITHER BE TRUNCATED TO 200 CHARACTER(S) OR
BE COMPLETELY BLANK, DEPENDING ON THE CALLING ENVIRONMENT. THE
FOLLOWING NOTE INDICATES THE LEFT-MOST ARGUMENT THAT CAUSED TRUNCATION.
Followed by the message (for the SAS data step I posted here):
NOTE: ARGUMENT 3 TO FUNCTION CATX AT LINE 100 COLUMN 15 IS INVALID.
Please note that in my sample data table (MKPLOGS4), each line of string for the field DESCTEXT can be upto 116 characters and there is no limit as to how many lines of description text/recordID.
The output I am getting has only the last line of description:
OBS RNID DESCTEXT
---- --------
1 123 String concatenation.
2 124 Are you sure this is not your jacket, okay then. Will give charity.
. . .
. . .
. . .
I have the following questions:
. is there something wrong with my code?
. is there a limit to SAS string concatenation? Can I override this? If yes, please provide code.
If you have a suggestion, I would really appreciate if you can post your version of code. This is not school work/homework.
Since SAS stores character data as blank-padded fixed length strings, it is usually not a good idea to store a large amount of text in the dataset. However, if you must, then you can create a character type variable with a length of up to 32767 characters. If you don't mind doing some extra I/O, here is an easy way.
/* test data -- same id repeated over multiple observations i.e., in a "long-format" */
data one;
input rnid desctext & :$200.;
cards;
123 This is some text which is part of the record. I want this to appear
123 concatenated kinda like concat_group() from MYSQL but for some reason
123 SAS does not have such functionality. Now I am having trouble with
123 String concatenation.
124 Hi there old friend of mine, hope you are enjoying the weather
124 Are you sure this is not your jacket, okay then. Will give charity.
;
run;
/* re-shape from the long to the wide-format. assumes that data are sorted by rnid. */
proc transpose data=one out=two;
by rnid;
var desctext;
run;
/* concatenate col1, col2, ... vars into single desctext */
data two;
length rnid 8 desctext $1000;
keep rnid desctext;
set two;
desctext = catx(' ', of col:);
run;
The documentation for the catx function specifies that it will (by default) only return 200 characters unless you have already specified a length for the string you are storing the result to.
All you need to do is add either a length or an attrib statement somewhere in your datastep.
Here is how I would have coded it (untested):
data mkplogsa (rename=dtext=desctext);
length dtext $32767 ;
set mkplogs4;
by rnid;
retain dtext;
if first.rnid then do;
dtext = "";
end;
dtext = catx(' ',dtext,desctext);
if last.rnid then do;
output;
end;
keep dtext;
run;
Note that 32767 is the largest string size for a character value in a SAS dataset. If your string is larger than that you're out of luck.
Cheers
Rob
Thanks guys, I was able to solve this problem by using PROC TRANSPOSE and then using concatenation. Here is the code:
/*
THIS TRANSPOSE STEP TAKES THE MKPLOGS4 TABLE AND
CREATES A NEW TEMPORARY TABLE CALLED MKPLOGSA. SINCE
THE DESCRIPTION TEXT IS STORED IN MULTIPLE LINES (OBSERVATIONS)
IN THE ITEXT FILE, IN ORDER TO COMBINE THEM TO A SINGLE ROW,
WE USE TRANSPOSE. HOWEVER, AFTER THIS STEP, THE DESCRIPTION TEXT
SPREAD OVER MULTIPLE LINES ALTHOUGH ON SAME ROW (OBSERVATION)
ARE STILL SEPARATED INTO MULTIPLE COLUMNS (ON THE SAME ROW)
ALL PREFIXED IN THIS CASE BY 'DESCTEXT'. WE DROP THE AUTO-CREATED
COLUMN _NAME_
*/
PROC TRANSPOSE DATA = MKPLOGS4 OUT = MKPLOGSA (DROP = _NAME_)
PREFIX = DESCTEXT;
VAR DESCTEXT;
BY PLOG;
RUN;
/*
THIS DATA STEP CREATES A NEW TABLE CALLED MKPLOGSB WHICH
TAKES ALL THE SEPARATED DESCRIPTION TEXT COLUMNS AND
CONCATENATES THEM INTO A SINGLE COLUMN - LONG_DESCRIPTION.
*/
DATA MKPLOGSB (DROP = DESCTEXT:);
SET MKPLOGSA;
/* CONCATENATED DESC. TEXT SET TO MAX. 27000 CHARS. */
LENGTH LONG_DESCRIPTION $27000;
LONG_DESCRIPTION = CATX(' ',OF DESCTEXT:);
RUN;

VFP insert, index updating

So the main program is in C#. Inserting new records into a VFP database table. It was taking too long to generate the next ID for the record via
select max(id)+1 from table
, so I put that code into a compile dll in VFP and am calling that COM object through C#.
The COM object returns the new ID in about 250ms. I then just do an update through OLEDB. The problem I am having is that after the COM object returns the newly inserted ID, I cannot immediately find it from C# via the OLEDB
select id form table where id = *newlyReturnedID*
returns 0 rows back. If I wait an unknown time period the query will return 1 row. I can only assume it returns 0 rows immediately because it has yet to add the newly minted ID into the index and therefore the select cannot find it.
Has anyone else ever run into something similar? If so, how did you handle it?
DD
Warning: your code is flawed in a multi-user environment. Two people could run the query at the same time and get the same ID. One of them will fail on the INSERT if the column has a primary or candidate key, which is a best practice for key fields.
My recommendation is to either have the ID be a auto-incrementing integer field (I'm not a fan of them), or even better, create a table of keys. Each record in the table is for a table that gets keys assigned. I use the a structure similar to this:
Structure for: countergenerator.dbf
Database Name: conferencereg.dbc
Long table name: countergenerator
Number of records: 0
Last updated: 11/08/2008
Memo file block size: 64
Code Page: 1252
Table Type: Visual FoxPro Table
Field Name Type Size Nulls Next Step Default
----------------------------------------------------------------------------------------------------------------
1 ccountergenerator_pk Character 36 N guid(36)
2 ckey Character (Binary) 50 Y
3 ivalue Integer 4 Y
4 mnote Memo 4 Y "Automatically created"
5 cuserid Character 30 Y
6 tupdated DateTime 8 Y DATETIME()
Index Tags:
1. Tag Name: PRIMARY
- Type: primary
- Key Expression: ccountergenerator_pk
- Filter: (nothing)
- Order: ascending
- Collate Sequence: machine
2. Tag Name: CKEY
- Type: regular
- Key Expression: lower(ckey)
- Filter: (nothing)
- Order: ascending
- Collate Sequence: machine
Now the code for the stored procedure in the DBC (or in another program) is this:
FUNCTION NextCounter(tcAlias)
LOCAL lcAlias, ;
lnNextValue, ;
lnOldReprocess, ;
lnOldArea
lnOldArea = SELECT()
IF PARAMETERS() < 1
lcAlias = ALIAS()
IF CURSORGETPROP("SOURCETYPE") = DB_SRCLOCALVIEW
*-- Attempt to get base table
lcAlias = LOWER(CURSORGETPROP("TABLES"))
lcAlias = SUBSTR(lcAlias, AT("!", lcAlias) + 1)
ENDIF
ELSE
lcAlias = LOWER(tcAlias)
ENDIF
lnOrderNumber = 0
lnOldReprocess = SET('REPROCESS')
*-- Lock until user presses Esc
SET REPROCESS TO AUTOMATIC
IF !USED("countergenerator")
USE EventManagement!countergenerator IN 0 SHARED ALIAS countergenerator
ENDIF
SELECT countergenerator
IF SEEK(LOWER(lcAlias), "countergenerator", "ckey")
IF RLOCK()
lnNextValue = countergenerator.iValue
REPLACE countergenerator.iValue WITH countergenerator.iValue + 1
UNLOCK
ENDIF
ELSE
* Create the new record with the starting value.
APPEND BLANK IN countergenerator
SCATTER MEMVAR MEMO
m.cKey = LOWER(lcAlias)
m.iValue = 1
m.mNote = "Automatically created by stored procedure."
m.tUpdated = DATETIME()
GATHER MEMVAR MEMO
IF RLOCK()
lnNextValue = countergenerator.iValue
REPLACE countergenerator.iValue WITH countergenerator.iValue + 1
UNLOCK
ENDIF
ENDIF
SELECT (lnOldArea)
SET REPROCESS TO lnOldReprocess
RETURN lnNextValue
ENDFUNC
The RLOCK() ensures there is no contention for the records and is fast enough to not have bottleneck the process. This is way safer than the approach you are currently taking.
Rick Schummer
VFP MVP
VFP needs to FLUSH its workareas.