How to get specific word from the string column using SQL Server - sql

I have a table.
create table tblProduct
(
ProductID int primary key identity(1000,1),
ProductName varchar(100),
ProductDescription nvarchar(max)
)
In this table, there are 1000 records like this...
ProductID=1001
ProductName='Apple i6'
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of USA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'
ProductID=1002
ProductName='Micromax Canvas'
ProductDescription='Scrambled it to make a type specimen bookLorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'
ProductID=1003
ProductName='Oppo Z3'
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'
and so on....
Now I want to find only country name (no duplicate and no other word) from the productDescription column of the tblProduct...
Output must be something like this:
Country Name - Total(Group By)
USA - 1
INDIA - 2
Note: "product of XXX*" will be available almost all rows of productdescription column.
*xxx is the country name.

You can use string manipulation functions LEFT/RIGHT/PATINDEX:
WITH cte AS
(
SELECT
r = RIGHT(ProductDescription, LEN(ProductDescription) -
PATINDEX('%Product of%' ,ProductDescription) - 10)
FROM #tblProduct
WHERE PATINDEX('%Product of%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX(' ', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY LEFT(r, CHARINDEX(' ', r)-1);
LiveDemo
But you have to think about corner cases:
country name contains multiple words like Russian Federation
what with multiple names (U.S./USA/United States of America...) you will get multiple groups, you need data cleansing
Note: "product of XXX*" will be available almost all rows of
productdescription column.
If you know all countries in advance it will be much easier. Just create table countries:
name master
'U.S.' 'USA'
'USA' 'USA'
'India' 'India'
SELECT c.master, COUNT(*) AS total
FROM #tblProduct p
JOIN countries c
ON p.Description LIKE '%Product of ' + c.name + '%'
GROUP BY c.master;
EDIT:
WITH cte AS
(
SELECT r = RIGHT(ProductDescription, LEN(ProductDescription) - PATINDEX('%Product of <strong>%' ,ProductDescription) - 18)
FROM #tblProduct
WHERE PATINDEX('%Product of <strong>%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX('<', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY r,LEFT(r, CHARINDEX('<', r)-1);
LiveDemo2

Related

How to extract 4 words before each word of a given list in sql

I have got a table with a column containing text (the column name is 'Text'). There are some acronyms in brackets, so I would like to extract them along with the five words appearing before them.
I have already extracted the rows that contain all the acronyms of my list using the like operator:
select Text from table
where Text like '(NASA)'
or Text like '(NBA)'
In stead of getting an output of the whole text in each row:
Text
He works for the National Aeronautics and Space Administration (NASA).
He played basketball for the National Basketball Association (NBA) from 2000 to 2002.
I would like to get the output of two columns one for the acronym and another for the meaning of the acronym (showing the five words prior to the acronym):
Acronym Meaning
(NASA) National Aeronautics and Space Administration
(NBA) for the National Basketball Association
Without actually seeing your data, I will assume that all the acronyms follow the same pattern but you should be able to adapt the code with the correct logic if your strings are structured differently. In this case '(Acronym) meaning' is the structure which I'm going to work with.
select '(NASA) National Aeronautics and Space Administration' as text
into #temp1
union all
select '(FBI) Federal Bureau of Investigation' as text
select SUBSTRING(text,CHARINDEX('(',text)+1 ,CHARINDEX(')',text)-CHARINDEX('(',text)-1) as Acronym,
SUBSTRING(text,CHARINDEX(')',text)+2 ,len(text)-CHARINDEX(')',text)+1) as meaning
from #temp1
This code subsets the original string by using character positions in the string between the brackets for the acronym and then character positions starting after closed brackets for the meanings.

Comparing 2 table data and printing all columns of 2 nd table data using java

I have 2 tables in sql server. And my objective is to compare 1st table column (TCODES) with 2 nd table column name (ST_Description ).If match found the pick rest all columns data in 2 nd table.I am able to get all data in console.But further i am not able to proceed.Your earliest response will help me.
1st table data in DB:
#TCODES(column name)#
[SAT,
ZN4963_PROM_01,
/LCLCDP/BVOUTPUT,
/LCLCDP/CSV_IDOC_CRE,
/lclcdp/export_settl,
/lclcdp/itemization,
/lclcdp/TUD,
/LCLCDP/UPLOAD_CSV,
/N/LCLCDP/CSV_IDOC_CRE,
/n/posdw/mon0 ,
/posdw/mon0,
AL11,
ARCU_COIT1,
AS01,
AS02,
2nd table data in DB:Column names are release name
#Release_Name# ##Cycle_Name## ###ST_DESCRIPTION ###
-------------------------------------------------------------------------------
Oct Release |SAP Regression
February Release |Non SAP - SIT - Feb2016 |Navigate to SAT Inquiry page
February Release |Non SAP - SIT - Feb2016 |Navigate ARCU_COIT1 inuiry page
February Release |Non SAP - SIT - Feb2016 |Type ASN pertaining to the PO
February Release |Non SAP - SIT - Feb2016 |Select AS01 option in UI
February Release |Non SAP - SIT - Feb2016 |Enter the dock and click next
February Release |Non SAP - SIT - Feb2016 |Type AL11 number ad qty in the
February Release |Non SAP - SIT - Feb2016 | /lclcdp/itemization
February Release |Non SAP - SIT - Feb2016 |Navigate to ASN Inquiry page
February Release |Non SAP - SIT - Feb2016 |Validate the /posdw/mon0
First join the two tables by the common unique column, then use WHERE clause to map your condition.
SELECT * FROM TABLE2
JOIN ON TABEL1.ID = TABLE2.ID
WHERE TABLE1.TCODES = TABLE2.ST_DESCRIPTION

Can I sort a query by one column or another?

I have a table wit two string column :
- title can be NULL
- content is required
If I use ORDER BY title, content. Data will be sorted first by title, and after by content.
I want to know if it's possible to sort by the two columns "at the same time".
It's not really clear (that's why I can not find an answer), so let's have an example :
Title | Content
-------|---------
NULL | lorem
-------|---------
NULL | ipsum
-------|---------
dolor | test
-------|---------
sit | test
Will result in :
Title | Content
-------|---------
dolor | test
-------|---------
NULL | ipsum
-------|---------
NULL | lorem
-------|---------
sit | test
In fact my problem is that in my view, title and content are displayed in the same column (if title is not null, it's shown, else I use content). There is a sort feature on this column and can not find a way to manage the sort correctly.
Use COALESCE - it returns the first non-null value of its parameters
order by coalesce(title, content)

How to order String based on the first letter of each word in string using SQL query

I want to sort string based on first letter of word T in a string.
At first all strings with starting letter T will display then those that start with T in words other than the first would appear in alphabetical order
Input:(Below are strings which contains word starting with T)
Catering Truck
Ice Cream Truck
Tank Hauler
Trade Contractor
Pizza Time
Expected O/P after sorting
Tanks Hauler
Trade Contractor
Catering Truck
Ice Cream Truck
Pizza Time
select *
from tableName
order by
case when columnName like 'T%' then 1 else 0 end,
columnName

Sort Postcode for menu/list

I need to sort a list of UK postcodes in to order.
Is there a simple way to do it?
UK postcodes are made up of letters and numbers:
see for full info of the format:
http://en.wikipedia.org/wiki/UK_postcodes
But my problem is this a simple alpha sort doesn't work because each code starts with 1 or two letters letters and then is immediately followed by a number , up to two digits, then a space another number then a letter. e.g. LS1 1AA or ls28 1AA, there is also another case where once the numbers in the first section exceed 99 then it continues 9A etc.
Alpha sort cause the 10s to immediately follow the 1:
...
LS1 9ZZ
LS10 1AA
...
LS2
I'm looking at creating a SQL function to convert the printable Postcode into a sortable postcode e.g. 'LS1 9ZZ' would become 'LS01 9ZZ', then use this function in the order by clause.
Has anybody done this or anything similar already?
You need to think of this as a tokenization issue so SW1A 1AA should tokenize to:
SW
1
A
1AA
(although you could break the inward part down into 1 and AA if you wanted to)
and G12 8QT should tokenize to:
G
12
(empty string)
8QT
Once you have broken the postcode down into those component parts then sorting should be easy enough. There is an exception with the GIR 0AA postcode but you can just hardcode a test for that one
edit: some more thoughts on tokenization
For the sample postcode SW1A 1AA, SW is the postcode area, 1A is the postcode district (which we'll break into two parts for sorting purposes), 1 is the postcode sector and AA is the unit postcode.
These are the valid postcode formats (source: Royal Mail PAF user guide page 8 - link at bottom of this page):
AN NAA
AAN NAA
ANN NAA
ANA NAA
AAA NAA (only for GIR 0AA code)
AANN NAA
AANA NAA
So a rough algorithm would be (assuming we want to separate the sector and unit postcode):
code = GIR 0AA? Tokenize to GI/R/ /0/AA (treating R as the district simplifies things)
code 5 letters long e.g G1 3AF? Tokenize to G/1/ /3/AF
code 6 letters long with 3rd character being a letter e.g. W1P 1HQ? Tokenize to W/1/P/1/HQ
code 6 letters long with 2nd character being a letter e.g. CR2 6XH? Tokenize to CR/2/ /6/XH
code 7 letters long with 4th character being a letter e.g. EC1A 1BB? Tokenize to EC/1/A/1/BB
otherwise e.g. TW14 2ZZ, tokenize to TW/14/ /2/ZZ
If the purpose is to display a list of postcodes for the user to choose from then I would adopt Neil Butterworth's suggestion of storing a 'sortable' version of the postcode in the database. The easiest way to create a sortable version is to pad all entries to nine characters:
two characters for the area (right-pad if shorter)
two for the district number (left-pad if shorter)
one for the district letter (pad if missing)
space
one for the sector
two for the unit
and GIR 0AA is a slight exception again. If you pad with spaces then the sort order should be correct. Examples using # to represent a space:
W1#1AA => W##1##1AA
WC1#1AA => WC#1##1AA
W10#1AA => W#10##1AA
W1W#1AA => W##1W#1AA
GIR#0AA => GI#R##0AA
WC10#1AA => WC10##1AA
WC1W#1AA => WC#1W#1AA
You need to right-pad the area if it's too short: left-padding produces the wrong sort order. All of the single letter areas - B, E, G, L, M, N, S, W - would sort before all of the two-letter areas - AB, AL, ..., ZE - if you left-padded
The district number needs to be left padded to ensure that the natural W1, W2, ..., W9, W10 order remains intact
I know this is a couple of years late but i too have just experienced this problem.
I have managed to over come it with the following code, so thought i would share as i searched the internet and could not find anything!
mysql_query("SELECT SUBSTRING_INDEX(postcode,' ',1) as p1, SUBSTRING_INDEX(postcode,' ',-1) as p2 from `table` ORDER BY LENGTH(p1), p1, p2 ASC");
This code will take a Full UK postcode and split it into 2.
It will then order by the first part of the postcode followed by the second.
I'd be tempted to store the normalised postcode in the database along with the real postcode - that way you only do the string manipulation once, and you can use an index to help you with the sort.