Sort varchar with alpha numeric and special character values - sql

I have a invoice_number field as varchar(20)
I have the select query as
SELECT Row_Number() OVER(ORDER BY case isnumeric(invoice_number)
when 1 then convert(bigint,invoice_num)
else 99999999999999999999
end) As id,
name,
submit_date,
invoice_number,
invoice_total,
currency_code
FROM vw_invoice_report
which works fine for a few scenarios but I couldn't make it work for all the the invoice_number values as below
f8ad2a28ddad4f6aa4df
0B849D69741145379079
20190313176617593442
ATOctober2000Promise
00100001010000000061
E285567EF0D0885E9160
SC1805000123000293
1999bernstyin2010
20600006307FFGMG
REVISED INVOICE F...
1111-2222(changzhou)
667339, 667340, 6...
18.12733562GAGA L...
IN-US01235055 ...
SSR-USD/426/2019 - 2
Nanny; Park Doug
184034
376840
376847-1
72692
72691
72690
72689
Am getting Error converting data type varchar to bigint. for some of the above data, can someone please help me make it work for the above test data?

Your problem is that some of your invoice numbers (for example 20190313176617593442) are too large for the BIGINT data type. You can work around this by keeping the values as strings, and left padding with 0 the numeric ones out to 20 digits for sorting. For example:
SELECT Row_Number() OVER(ORDER BY case isnumeric(invoice_number)
when 1 then REPLACE(STR(invoice_number, 20), ' ', '0')
else '99999999999999999999'
end) As id,
Demo (also showing converted invoice numbers) on SQLFiddle
Update
Based on OP comments and additional values to be sorted, this query should satisfy that requirement:
SELECT Row_Number() OVER(ORDER BY case
when isnumeric(invoice_number) = 1 then RIGHT('00000000000000000000' + REPLACE(invoice_number, '.', ''), 20)
when invoice_number like '%[0-9]-[0-9]%' and invoice_number not like '%[^0-9]' then REPLACE(STR(REPLACE(invoice_number, '-', '.'), 20), ' ', '0')
else '99999999999999999999'
end) As id,
invoice_number
FROM vw_invoice_report
Demo on SQLFiddle

Hmmm. I am thinking this might do what you want:
row_number() over (order by (case when isnumeric(invoicenumber) = 1
then len(invoicenumber)
else 99999
end
),
invoicenumber
)

Related

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.
I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2
Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example
#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...

Sql Server Splitting column value into email and name

I am new to sql. I need help with separating 2 values from a column value.
Example column value:
Sam Taylor <Sam.Taylor#gmail.com>
I need 2 columns from that column.
Name Email
Sam Taylor Sam.Taylor#gmail.com
TIA
https://www.db-fiddle.com/f/beu4tdDo4WFAwKXtt9KL8A/0
DECLARE #yourField nvarchar(100)='Sam Taylor <Sam.Taylor#gmail.com>'
SELECT SUBSTRING (#yourField ,0,CHARINDEX('<',#yourField)) as Name, SUBSTRING (#yourField ,CHARINDEX('<',#yourField)+1,CHARINDEX('>',#yourField)-CHARINDEX('<',#yourField)-1) as Email
One way to do this will be like below but assuming that you do not have < or > as data in name.
select datastring,
name=max(case when row=1 then value else null end),
email=max(case when row=2 then value else null end)
from
(
select
datastring,
value=REPLACE(value,'>',''),
row=row_number() over (partition by datastring order by datastring)
from yourtable
cross apply STRING_SPLIT(datastring,'<')
)t
group by datastring
You can use string_split(), but like this:
select t.*, v.*
from t cross apply
(select max(case when s.value not like '%#%>' then trim(s.value) end) as name,
max(case when s.value like '%#%>' then replace(s.value, '>', '') end) as email
from string_split(t.full_email, '<') s
) v;
In older versions, you can use:
select ltrim(rtrim(left(full_email, charindex('<', full_email) - 1))) as name,
replace(stuff(full_email, 1, charindex('<', full_email), ''), '>', '') as name
from t;
Here is a db<>fiddle.

Why null value in the table getting error while using lead function

I am getting this error - Error converting data type nvarchar to numeric.
I have data coming from a table and I need only two values from the table where I filter only the number (no alphanumeric so used the isnumeric(covrg_cd)=1). The input data looks like the first picture. The Row 1 will always be null and in other other rows, there may or may not be data. However, because row 1 is always null, the lead function is throwing this error: Error converting data type nvarchar to numeric, but the rate column is always in nvarchar. I am using LEAD function in SQL to get the paybandfrom & paybandto using the Rate from Input table and using row_number() to get the tier value.
Input table
out put must be like this..
I have my query like this
SELECT a.payband , a.[from] as pybdnfrom, (RIGHT('00000000000000000000' + CAST(A.[TO] AS VARCHAR),20)) AS pybndto , a.tier
FROM (SELECT DISTINCT A.RATE as payband, A.RATE as [from], CASE WHEN TIER <> 4 THEN A.[TO] ELSE 100000000.000 END AS [to], ROW_NUMBER() OVER(ORDER BY RATE) AS TIER
FROM(SELECT DISTINCT A.RATE, LEAD(SUM((CONVERT(NUMERIC(20,3), (A.RATE)))-0.010)) OVER(ORDER BY A.RATE) AS [TO], ROW_NUMBER() OVER(ORDER BY A.RATE) AS TIER
FROM (SELECT DISTINCT BN_RATE_KEY02 as RATE, COVRG_CD AS COVERAGE
from #tmppsRateCost
WHERE ISNUMERIC(COVRG_CD) = 1 AND COVRG_CD = '1')A GROUP BY A.RATE)A)A
ORDER BY 1
Any help would be appreciated.
The error is because the '' cannot be parsed as a number. It's not related to the LEAD.
If you want to keep that approach you can modify your query in this way (I just commented the parts I replaced):
SELECT a.payband
,a.[from] AS pybdnfrom
--,(RIGHT('00000000000000000000' + CAST(A.[TO] AS VARCHAR), 20)) AS pybndto
,CASE WHEN payband = '' THEN '' ELSE (RIGHT('00000000000000000000' + CAST(A.[TO] AS VARCHAR), 20)) END AS pybndto
,a.tier
FROM (
SELECT DISTINCT A.RATE AS payband
,A.RATE AS [from]
,CASE
WHEN TIER <> 5
THEN A.[TO]
ELSE 100000000.000
END AS [to]
,ROW_NUMBER() OVER (
ORDER BY RATE
) AS TIER
FROM (
SELECT DISTINCT A.RATE
--,LEAD(SUM((CONVERT(NUMERIC(20, 3), (A.RATE))) - 0.010)) OVER (
,LEAD(SUM((CONVERT(NUMERIC(20, 3), (CASE WHEN A.RATE = '' THEN '0.010' ELSE A.RATE END))) - 0.010)) OVER (
ORDER BY A.RATE
) AS [TO]
,ROW_NUMBER() OVER (
ORDER BY A.RATE
) AS TIER
FROM (
SELECT DISTINCT BN_RATE_KEY02 AS RATE
,COVRG_CD AS COVERAGE
FROM #tmppsRateCost
WHERE ISNUMERIC(COVRG_CD) = 1
AND COVRG_CD = '1'
) A
GROUP BY A.RATE
) A
) A
ORDER BY 1
Anyway I guess you might have a cleaner approach just by removing the empty line in the initial table.
Get rid of ISNUMERIC() and use TRY_CONVERT() insert of CONVERT(). In this condition:
WHERE ISNUMERIC(COVRG_CD) = 1 AND COVRG_CD = '1'
The ISNUMERIC() is just unneeded because you have an exact string comparison.
SELECT a.payband , a.[from] as pybdnfrom,
(RIGHT('00000000000000000000' + CAST(A.[TO] AS VARCHAR),20)) AS pybndto ,
a.tier
FROM (SELECT DISTINCT A.RATE as payband, A.RATE as [from],
CASE WHEN TIER <> 4 THEN A.[TO] ELSE 100000000.000 END AS [to],
ROW_NUMBER() OVER (ORDER BY RATE) AS TIER
FROM (SELECT DISTINCT A.RATE,
LEAD(SUM((TRY_CONVERT(NUMERIC(20,3), (A.RATE)))-0.010)) OVER (ORDER BY A.RATE) AS [TO],
ROW_NUMBER() OVER (ORDER BY A.RATE) AS TIER
FROM (SELECT DISTINCT BN_RATE_KEY02 as RATE, COVRG_CD AS COVERAGE
FROM #tmppsRateCost
WHERE COVRG_CD = '1'
)A
GROUP BY A.RATE
) A
) A
ORDER BY 1;

Removing delimiters from a single column of a csv file without opening the file

I have the following contents of the csv file
Here are the contents of the CSV file:
Date_Added|this_flag|Name|DOB|SSN|ID
May 1st, 2015|Y|Jingle|heimerscmidt|19901002|123456789|3
May 1st, 2015|N|Jingleheimerscmidt|19901002|123456789|3
May 5th, 2015|Y|Jon|19901001|012345678|1
May 1st, 2015|N|Jon|19901002|012345678|1
May 1st, 2015|Y|Jacob|19901001|234567890|2
May 5th, 2015|N|Jingleheimerscmidt|19901001|123456789|3
May 1st, 2015|Y|Jingleheimerscmidt|19901001|123456789|3
As you can see in the bold and italic content, there is a pipe operator in the content apart from the pipe operator separted columns.I want to remove that pipe operator from the text without opening the csv file. Is there way to solve this problem either by writing a code or any other approach
okay I know you tagged oracle so perhaps yourself or another Oracle guru can migrate this solution from sql-server. I know that oracle is capable of each of these operations.
Normally I would say you want a fast/fancy way of splitting a string but in this case you need to maintain ordinal position of the strings between delimiters. So I thought of a way you could do this.
1) First import CSV into a temp table as all 1 column. now this will be an issue if your CRLF is also found within the Name column.... but we will assume it isn't because you didn't specify it.
2) Build a row_number on that table to use as a fake primary key, and determine when there are more delimiters than there should be.
3) use a recursive cte to spilt the string into to rows and maintain an ordinal position of the substring in the original string by which to concatenate later.
4) Determine what rows to group by altering OrdinalPostion by MergePositions and generating a DENSE_RANK() based on it
5) Conditional Aggregation using the OrdinalGroup as the column number and then use a concatenation method to combine all OrdginalGroup 3 rows.
DECLARE #CSV as TABLE (LumpedColumns NVARCHAR(MAX))
INSERT INTO #CSV VALUES
('May 1st, 2015|Y|Jingle|he|imerscmidt|19901002|123456789|3')
,('May 1st, 2015|N|Jingleheimerscmidt|19901002|123456789|3')
,('May 5th, 2015|Y|Jon|19901001|012345678|1')
,('May 1st, 2015|N|Jon|19901002|012345678|1')
,('May 1st, 2015|Y|Jacob|19901001|234567890|2')
,('May 5th, 2015|N|Jingleheimerscmidt|19901001|123456789|3')
,('May 1st, 2015|Y|Jingleheimerscmidt|19901001|123456789|3')
;WITH cteFakePrimaryKey AS (
SELECT
LumpedColumns
,CASE WHEN LEN(LumpedColumns) - LEN(REPLACE(LumpedColumns,'|','')) > 5 THEN
LEN(LumpedColumns) - LEN(REPLACE(LumpedColumns,'|','')) - 5 ELSE 0 END as MergeXPositions
,ROW_NUMBER() OVER (ORDER BY (SELECT 0)) as PK
FROM
#CSV
)
, cteRecursive AS (
SELECT
PK
,LumpedColumns
,MergeXPositions
,LEFT(LumpedColumns,CHARINDEX('|',LumpedColumns)-1) as ColValue
,RIGHT(LumpedColumns,LEN(LumpedColumns) - CHARINDEX('|',LumpedColumns)) as Remaining
,1 As OrdinalPosition
FROM
cteFakePrimaryKey
UNION ALL
SELECT
PK
,LumpedColumns
,MergeXPositions
,LEFT(Remaining,CHARINDEX('|',Remaining)-1)
,RIGHT(Remaining,LEN(Remaining) - CHARINDEX('|',Remaining))
,OrdinalPosition + 1
FROM
cteRecursive
WHERE
Remaining IS NOT NULL AND CHARINDEX('|',Remaining) > 0
UNION ALL
SELECT
PK
,LumpedColumns
,MergeXPositions
,Remaining
,NULL
,OrdinalPosition + 1
FROM
cteRecursive
WHERE Remaining IS NOT NULL AND CHARINDEX('|',Remaining) = 0
)
, cteOrdinalGroup AS (
SELECT
PK
,LumpedColumns
,ColValue
,OrdinalPosition
,DENSE_RANK() OVER (PARTITION BY PK ORDER BY
CASE
WHEN OrdinalPosition < 3 THEN OrdinalPosition
WHEN OrdinalPosition > (3 + MergeXPositions) THEN OrdinalPosition
ELSE 3 END ) as OrdinalGRoup
FROM
cteRecursive
)
SELECT
PK
,LumpedColumns
,MAX(CASE WHEN OrdinalGRoup = 1 THEN ColValue END) as Date_Added
,MAX(CASE WHEN OrdinalGRoup = 2 THEN ColValue END) as this_flag
,STUFF(
(SELECT '|' + ColValue
FROM
cteOrdinalGroup g2
WHERE
g1.PK = g2.PK
AND g2.OrdinalGroup = 3
ORDER BY
g2.OrdinalPosition
FOR XML PATH(''))
,1,1,'') as name
,MAX(CASE WHEN OrdinalGRoup = 4 THEN ColValue END) as DOB
,MAX(CASE WHEN OrdinalGRoup = 5 THEN ColValue END) as SSN
,MAX(CASE WHEN OrdinalGRoup = 6 THEN ColValue END) as ID
FROM
cteOrdinalGroup g1
GROUP BY
PK
,LumpedColumns
ORDER BY
PK
If that's just one case you need to handle and you don't want to modify the file and you need to do that in Informatica, you can change the Input Type session property for this Source Qualifier from File to Command and use sed to do the replacement, like:
cat $$FileName | sed -e 's/Jingle|heimerscmidt/Jingleheimerscmidt/g'
It's not a very nice solution hence it's not a generic one. But perhaps this will do or at least give you some ideas.

SQL statement to count each individual NPANXX series

I am writting one query where one field is DIAL_NUMBER.
some values are 11 digit and some are 10 digits in that field. where it is 11 digit i need 2nd to 7th charcter and where it is 10 digit i need 1st to 6th character.
Then i need count of each individual series. i tried with this below approach, which is giving error.
Please help me in identifying the solution.
select dialled number, case
when length(Dialled_Number) = '11' then Substr(Dialled_Number, 2, 7)
else Substr(Dialled_Number, 1, 6)
end
count(*)
from Error_Event
Without knowing your expected results, I imagine you need to use group by with your query.
Perhaps something like this:
select case
when length(Dialled_Number) = 11 then Substr(Dialled_Number, 2, 7)
else Substr(Dialled_Number, 1, 6)
end,
count(*)
from Error_Event
group by case
when length(Dialled_Number) = 11 then Substr(Dialled_Number, 2, 7)
else Substr(Dialled_Number, 1, 6)
end
SQL Fiddle Demo
If it is MS SQL, you can use ROW_NUMBER() and COUNT function to get desired output.
DECLARE #TABLE TABLE(DIAL_NUMBER VARCHAR(20))
INSERT INTO #TABLE
SELECT '81243193812' UNION
SELECT '1829321874' UNION
SELECT '182932'
SELECT NPANXX, [Count] FROM
(
SELECT NPANXX,
COUNT(NPANXX) OVER (PARTITION BY NPANXX) AS [Count], DIAL_NUMBER,
ROW_NUMBER() OVER (PARTITION BY NPANXX ORDER BY DIAL_NUMBER) RN
FROM
(
SELECT DIAL_NUMBER,
CASE
WHEN LEN(DIAL_NUMBER) = 11 THEN SUBSTRING(DIAL_NUMBER,2, 7)
ELSE SUBSTRING(DIAL_NUMBER,1, 6)
END AS NPANXX
FROM #table
) Tmp
)FTMp
WHERE RN = 1
Sql Fiddle