Extracting Multiple Numerical Values from Text - sql

048(70F-Y),045(DDI-Y),454(CMDE-Y)
I have the above data in a column field, I need to extract each number before the, so in the above example I would want to see 048, 045, 454.
Note the data in the field will change in each record in the above you have 3 sets of numbers. Sometimes you may have just one set or 6 sets. I just need to capture all sets of numbers that are to the left of the (.
Ideally I would want the results to show in a new column like below. I have tried a few things and gotten no where any help would be greatly appreciate.
I would expect the result to look like the below:
+----------+-----------------------------------+---------------+
| EventId | PAEditTypes | Edits |
+----------+-----------------------------------+---------------+
| 6929107 | 082(SPA-Y),177(QL-Y) | 082, 177 |
| 26534980 | 048(70F-Y),045(DDI-Y),454(CMDE-Y) | 045, 048, 454 |
+----------+-----------------------------------+---------------+

You can get desired output with the following step:
use string_split with cross apply to isolate each item
use left to get only the first part of each item together with CHARINDEX to know where you have to stop
use STRING_AGG to build the final result, adding WITHIN GROUP clause to enforce ordering (if ordering is not important just remove WITHIN GROUP clause)
This is a TSQL sample that should work:
declare #tmp table ( EventId varchar(50), PAEditTypes varchar(200) )
insert into #tmp values
('6929107' ,'082(SPA-Y),177(QL-Y)' )
,('26534980','048(70F-Y),045(DDI-Y),454(CMDE-Y)')
select
EventId
, PAEditTypes
, STRING_AGG(left(value,CHARINDEX('(',value)-1),', ') WITHIN GROUP (ORDER BY value ASC) as Edits
from
#tmp
cross apply
string_split(PAEditTypes, ',')
group by
EventId
, PAEditTypes
order by
EventId desc
Output:

Related

Get Distinct value from a list in SQL Server

I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!
I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.
If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);
Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895

How to retrieve records using comma separated values with IN clause?

I would like to retrieve certain records from a full list of table. Here I am using comma separated values with IN clause. The table rows looks like this:
Here is my SQL query, but the query completed with empty result set`
DECLARE #input VARCHAR(1000) = '2,3,17,10,16'
SELECT * FROM locations
WHERE
east_zone in (SELECT VALUE FROM string_split(#input,','))
OR
west_zone in (SELECT VALUE FROM string_split(#input,','))
Appreciate your help!
While this can be accomplished, i would request you to rethink your data model. Its a bad idea to store a comma separated list of ids/references in your databases. I strongly am with the comments of Tim Biegeleisen
Alternative would be store the list of zones-titles in a separate table.
Here is a way to accomplish this
with data
as (select 'model_check_holding' as col1,'1,2,3,4,5' as str union all
select 'model_cash_holding' as col1,'5,8,9' as str
)
,split_data
as (select *
from data
cross apply string_split(str,',')
)
,user_input
as(select '2,8,1' as input_val)
select *
from split_data
where value in (select x.value
from user_input
cross apply string_split(input_val,',') x
)
+---------------------+-----------+-------+
| col1 | str | value |
+---------------------+-----------+-------+
| model_check_holding | 1,2,3,4,5 | 1 |
| model_check_holding | 1,2,3,4,5 | 2 |
| model_cash_holding | 5,8,9 | 8 |
+---------------------+-----------+-------+
dbfiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1cc9b224e443369744df19c1d7a7d789
Tim is 110% correct. Your data model is totally messed up -- not only storing multiple values in a delimited string. But also string numbers as strings. Wrong, wrong, wrong.
But if you are stuck with some else's really, really, really bad design choices, you do have an option:
DECLARE #input VARCHAR(1000) = '2,3,17,10,16';
SELECT l.*
FROM locations l
WHERE EXISTS (SELECT 1
FROM string_split(#input, ',') s1 JOIN
string_split(concat(l.east_zone, ',', l.west_zone), ',') l
ON s1.value = l.value
);
I do not recommend this approach. I merely suggest it as a stop-gap until you can fix the data model.

Why doesn't this SQL sub query statement work?

The statement produces the following error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
I presume I somehow need to concatenate the field names in the subquery?
SELECT (
SELECT COALESCE(Table_Field, Field) AS Fields
FROM API_Objects_Fields
WHERE Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
)
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
The sub query specifies the fields to be returned from the table.
DECLARE #Name VARCHAR(1000)
Select #Name =
COALESCE(#Name,'') +Table_Field + ';'
FROM API_Objects_Fields
WHERE Field IN
( 'fullname' ,'confirmed' ,'primary_email' ,'location_short' )
Select #Name As FieldName
#akfkmupiwu need to do like this for above comment
WITH CTE AS
(SELECT (
SELECT DISTINCT TOP 1 COALESCE(Table_Field, Field)
FROM API_Objects_Fields F
WHERE F.UserID = PM.UserID AND F.Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
)AS Fields,
ROW_NUMBER()OVER (PARTITION BY Table_Field ORDER BY FIELD)AS RN
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
)
Select * from CTE WHERE RN = 1
It is an assumption query basing on your question
What the error is telling you
The problem with your query is exactly what the error says, it brings back more than one result. Since your subquery is in the select portion of the outer query (as opposed to the from or the where), sql is looking for the one value to populate the specific column. Think of it more in terms of filling in an excel spreadsheet. You cannot add two separate values to one cell. Instead, you need the data to go into two separate rows.
On another note, coalesce checks if the first value is null, if it is then it returns the second value. If the first value is not null, that value is returned. It sounds to me that this is not the behavior that you are looking for.
How to fix this
You need to either change your query to pull back different rows for each of the possible values that Fields can be or you need to find a way to specify only one value to return for Fields. Since I am unsure what you are looking for, I am going to demonstrate the first way of solving this.
Data
Your question does not provide any data for API_Objects_Fields, so I am going to make some up. Let's assume the columns in this table are Field_ID, Table_Field, and Field and let's say that your table looks like this:
Field_ID | Table_Field | Field
1 | Alan Turing | fullname
2 | Catherine Zeta Jones | fullname
3 | True | confirmed
4 | MN | location_short
5 | 123-456-7890 | phone_number
As I mentioned before, right now your query would try to pull back the rows where the field is fullname, confirmed, or location_short all. Instead of trying to stuff one column of one row. full of 4 results, let's change your query to bring back 4 rows
The Query
SELECT f.Table_Field, Field
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
INNER JOIN (
SELECT Table_Field, Field
FROM API_Objects_Fields
WHERE Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
) f
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
What will happen
This query will now pull back data that looks more like this:
Table_Field | Fields
Alan Turing | fullname
Catherine Zeta Jones | fullname
True | confirmed
MN | location_short
However, I think you will be surprised with the results you actually end up with. Since the query does not connect the data in API_Objects_Fields with any other tables, you would get the values from the results table above over and over again. In fact, you would get the values above for every single row returned by
Select *
From user_basics u
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
If this query returns 12 results, you would end up with 12 Alan Turings, 12 Catherine Zeta Jones, 12 Trues, and 12 MNs. If this is not the result you are looking for, you will need to add an ON portion to the inner join so the results from f are connected with the other tables.

SQL Query to remove cyclic redundancy

I have a table that looks like this:
Column A | Column B | Counter
---------------------------------------------
A | B | 53
B | C | 23
A | D | 11
C | B | 22
I need to remove the last row because it's cyclic to the second row. Can't seem to figure out how to do it.
EDIT
There is an indexed date field. This is for Sankey diagram. The data in the sample table is actually the result of a query. The underlying table has:
date | source node | target node | path count
The query to build the table is:
SELECT source_node, target_node, COUNT(1)
FROM sankey_table
WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19'
GROUP BY source_node, target_node
In the sample, the last row C to B is going backwards and I need to ignore it or the Sankey won't display. I need to only show forward path.
Removing all edges from your graph where the tuple (source_node, target_node) is not ordered alphabetically and the symmetric row exists should give you what you want:
DELETE
FROM sankey_table t1
WHERE source_node > target_node
AND EXISTS (
SELECT NULL from sankey_table t2
WHERE t2.source_node = t1.target_node
AND t2.target_node = t1.source_node)
If you don't want to DELETE them, just use this WHERE clause in your query for generating the input for the diagram.
If you can adjust how your table is populated, you can change the query you're using to only retrieve the values for the first direction (for that date) in the first place, with a little bit an analytic manipulation:
SELECT source_node, target_node, counter FROM (
SELECT source_node,
target_node,
COUNT(*) OVER (PARTITION BY source_node, target_node) AS counter,
RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
LEAST(source_node, target_node), TRUNC(data_date)
ORDER BY data_date) AS rnk
FROM sankey_table
WHERE TO_CHAR(data_date, 'yyyy-mm-dd')='2013-08-19'
)
WHERE rnk = 1;
The inner query gets the same data you collect now but adds a ranking column, which will be 1 for the first row for any source/target pair in any order for a given day. The outer query then just ignores everything else.
This might be a candidate for a materialised view if you're truncating and repopulating it daily.
If you can't change your intermediate table but can still see the underlying table you could join back to it using the same kind of idea; assuming the table you're querying from is called sankey_agg_table:
SELECT sat.source_node, sat.target_node, sat.counter
FROM sankey_agg_table sat
JOIN (SELECT source_node, target_node,
RANK () OVER (PARTITION BY GREATEST(source_node, target_node),
LEAST(source_node, target_node), TRUNC(data_date)
ORDER BY data_date) AS rnk
FROM sankey_table) st
ON st.source_node = sat.source_node
AND st.target_node = sat.target_node
AND st.rnk = 1;
SQL Fiddle demos.
DELETE FROM yourTable
where [Column A]='C'
given that these are all your rows
EDIT
I would recommend that you clean up your source data if you can, i.e. delete the rows that you call backwards, if those rows are incorrect as you state in your comments.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.