T-SQL How to translate multiple sub-strings to new values - sql

First of all, sorry because I don't know how to title my problem.
My situation is, I have 1 lookup table with this format:
+----+-----------+------------+
| ID | Fruit | Color |
+----+-----------+------------+
| 1 | Banana | Yellow |
| 2 | Apple | Red |
| 3 | Blueberry | NotYetBlue |
+----+-----------+------------+
And my main table is like this:
+-------+------------------------+------------+
| MixID | Contains | MixedColor |
+-------+------------------------+------------+
| 1 | Banana | |
| 2 | Apple:Blueberry | |
| 3 | Banana:Apple:Blueberry | |
+-------+------------------------+------------+
I want to make a look-up on the first table and fill in the MixedColor column as below:
+-------+------------------------+-----------------------+
| MixID | Contains | MixedColor |
+-------+------------------------+-----------------------+
| 1 | Banana | Yellow |
| 2 | Apple:Blueberry | Red:NotYetBlue |
| 3 | Banana:Apple:Blueberry | Yellow:Red:NotYetBlue |
+-------+------------------------+-----------------------+
Any help will be very appreciated.
Thank you

I agree that ideally your table structure should be altered. But, you can get what you want with:
SELECT MIXID, [CONTAINS],
STUFF((
SELECT ':' + Color
FROM Table1 a
WHERE ':'+b.[Contains]+':' LIKE '%:'+a.Fruit+':%'
FOR XML PATH('')
), 1, 1, '') AS Color
FROM Table2 b
GROUP BY MIXID, [CONTAINS]
Demo: SQL Fiddle

As "Charles Bretana" suggested it would be best to modify you schema to something like this:
+--------+-------+----------+
| RowID | MixID | FruitID |
+--------+-------+----------+
| 0 | 1 | 1 |
| 1 | 2 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 1 |
| 4 | 3 | 2 |
| 5 | 3 | 3 |
|--------+-------+----------+
now using a simple inenr join you can select the correct color and match the fruit.
if it is not possible for you to achieve that construct you could use a recursive query mentioned here : Turning a Comma Separated string into individual rows.
to manipulate your data to look like that.
Here is a SQL Fiddle: http://sqlfiddle.com/#!3/8d68f/12
table data :
create table Mixses(MixID int, ContainsData varchar(max))
insert Mixses select 1, '10:11:12'
insert Mixses select 2, '10:11'
insert Mixses select 3, '10'
insert Mixses select 4, '11:12'
create table Fruits(FruitID int, Name varchar(200), Color varchar(200))
insert Fruits select 10, 'Bannana' , 'Yellow'
insert Fruits select 11, 'Apple' , 'Red'
insert Fruits select 12, 'BlueBerry' , 'Blue'
insert Fruits select 13, 'Pineapple' , 'Brown'
Query:
;with tmp(MixID, DataItem, Data) as
(
select
MixID,
LEFT(ContainsData, CHARINDEX(':',ContainsData+':')-1),
STUFF(ContainsData, 1, CHARINDEX(':',ContainsData+':'), '')
from Mixses
union all
select MixID,
LEFT(Data, CHARINDEX(':',Data+':')-1),
STUFF(Data, 1, CHARINDEX(':',Data+':'), '')
from tmp
where Data > ''
)
select t.MixID, t.DataItem, f.Color
from tmp t
inner join Fruits f on f.FruitID=t.DataItem
order by MixID

Related

Replace value in column based on another column

I have the following table:
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
+----+--------+------------+----------------------+
How can I create the column Replaced ? I thought of creating 10 maximum columns (I know there are no more than 10 nested levels) and fetch the ID from every substring split by '-', and then concatenating them if not null into Replaced, but I think there is a simpler solution.
While what you ask for is technically feasible (probably using a recursive query or a tally), I will take a different stance and suggest that you fix your data model instead.
You should not be storing multiple values as a delimited list in a single database column. This defeats the purpose of a relational database, and makes simple things both unnecessarily complicated and inefficient.
Instead, you should have a separate table to store that data, which each replacement id on a separate row, and possibly a column that indicates the sequence of each element in the list.
For your sample data, this would look like:
id replace_id seq
1 1 1
2 1 1
2 2 2
3 1 1
3 3 2
4 1 1
4 3 2
4 4 3
5 1 1
5 2 2
5 5 3
6 1 1
6 2 2
6 6 3
Now you can efficiently generate the expected result with either a join, a subquery, or a lateral join. Assuming that your table is called mytable and that the mapping table is mymapping, the lateral join solution would be:
select t.*, r.*
from mytable t
outer apply (
select string_agg(t1.name) within group(order by m.seq) replaced
from mymapping m
inner join mytable t1 on t1.id = m.replace_id
where m.id = t.id
) x
You can try something like this:
DECLARE #Data TABLE ( ID INT, [Name] VARCHAR(10), To_Replace VARCHAR(10) );
INSERT INTO #Data ( ID, [Name], To_Replace ) VALUES
( 1, 'Fruits', '1' ),
( 2, 'Apple', '1-2' ),
( 3, 'Citrus', '1-3' ),
( 4, 'Orange', '1-3-4' ),
( 5, 'Empire', '1-2-5' ),
( 6, 'Fuji', '1-2-6' );
SELECT
*
FROM #Data AS d
OUTER APPLY (
SELECT STRING_AGG ( [Name], '-' ) AS Replaced FROM #Data WHERE ID IN (
SELECT CAST ( [value] AS INT ) FROM STRING_SPLIT ( d.To_Replace, '-' )
)
) List
ORDER BY ID;
Returns
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
+----+--------+------------+----------------------+
UPDATE
Ensure the id list order is maintained when aggregating names.
DECLARE #Data TABLE ( ID INT, [Name] VARCHAR(10), To_Replace VARCHAR(10) );
INSERT INTO #Data ( ID, [Name], To_Replace ) VALUES
( 1, 'Fruits', '1' ),
( 2, 'Apple', '1-2' ),
( 3, 'Citrus', '1-3' ),
( 4, 'Orange', '1-3-4' ),
( 5, 'Empire', '1-2-5' ),
( 6, 'Fuji', '1-2-6' ),
( 7, 'Test', '6-2-7' );
SELECT
*
FROM #Data AS d
OUTER APPLY (
SELECT STRING_AGG ( [Name], '-' ) AS Replaced FROM (
SELECT TOP 100 PERCENT
Names.[Name]
FROM ( SELECT CAST ( '<ids><id>' + REPLACE ( d.To_Replace, '-', '</id><id>' ) + '</id></ids>' AS XML ) AS id_list ) AS xIds
CROSS APPLY (
SELECT
x.f.value('.', 'INT' ) AS name_id,
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) AS row_id
FROM xIds.id_list.nodes('//ids/id') x(f)
) AS ids
INNER JOIN #Data AS Names ON Names.ID = ids.name_id
ORDER BY row_id
) AS x
) List
ORDER BY ID;
Returns
+----+--------+------------+----------------------+
| ID | Name | To_Replace | Replaced |
+----+--------+------------+----------------------+
| 1 | Fruits | 1 | Fruits |
| 2 | Apple | 1-2 | Fruits-Apple |
| 3 | Citrus | 1-3 | Fruits-Citrus |
| 4 | Orange | 1-3-4 | Fruits-Citrus-Orange |
| 5 | Empire | 1-2-5 | Fruits-Apple-Empire |
| 6 | Fuji | 1-2-6 | Fruits-Apple-Fuji |
| 7 | Test | 6-2-7 | Fuji-Apple-Test |
+----+--------+------------+----------------------+
I'm sure there's optimization that can be done here, but this solution seems to guarantee the list order is kept.

I want to write a sqlcmd script that pulls data from a database and then manipulates that data

I'm trying to find examples of sqlcmd script files that will run a select statement, and return those values internal to the script and place them in a variable. I then want to iterate over those returned values, run some if statements on those returned values, and then run some sql insert statements. I'm using Sql Server Managment Studio, so I thought I could run some scripts in the sqlcmd mode of the Query Editor. Maybe there's a better way to do it, but that seemed like a good solution.
I've looked on the Microsoft website for sqlcmd and T-SQL examples that might help. I've also done general searches of the web, but all the examples that come up are too simplistic, and weren't helpful. Any help would be appreciated.
Here is how I understand your starting position:
create table #data
(
id int,
column1 varchar(100),
column2 varchar(100),
newcolumn int
)
create table #lookup
(
id int,
column1 varchar(100),
column2 varchar(100)
)
insert into #data
values
(1, 'black', 'duck', NULL),
(2, 'white', 'panda', NULL),
(3, 'yellow', 'dog', NULL),
(4, 'orange', 'cat', NULL),
(5, 'blue', 'lemur', NULL)
insert into #lookup
values
(1, 'white', 'panda'),
(2, 'orange', 'cat'),
(3, 'black', 'duck'),
(4, 'blue', 'lemur'),
(5, 'yellow', 'dog')
select * from #data
select * from #lookup
Output:
select * from #data
/------------------------------------\
| id | column1 | column2 | newcolumn |
|----|---------|---------|-----------|
| 1 | black | duck | NULL |
| 2 | white | panda | NULL |
| 3 | yellow | dog | NULL |
| 4 | orange | cat | NULL |
| 5 | blue | lemur | NULL |
\------------------------------------/
select * from #lookup
/------------------------\
| id | column1 | column2 |
|----|---------|---------|
| 1 | white | panda |
| 2 | orange | cat |
| 3 | black | duck |
| 4 | blue | lemur |
| 5 | yellow | dog |
\------------------------/
From this starting point, you can achieve what you are asking for as follows:
update d set d.newcolumn = l.id
from #data d
left join #lookup l on d.column1 = l.column1 and d.column2 = l.column2
alter table #data
drop column column1, column2
This will leave the tables in the desired state, with the varchar values moved out into the lookup table:
select * from #data
/----------------\
| id | newcolumn |
|----|-----------|
| 1 | 3 |
| 2 | 1 |
| 3 | 5 |
| 4 | 2 |
| 5 | 4 |
\----------------/
select * from #lookup
/------------------------\
| id | column1 | column2 |
|----|---------|---------|
| 1 | white | panda |
| 2 | orange | cat |
| 3 | black | duck |
| 4 | blue | lemur |
| 5 | yellow | dog |
\------------------------/

BigQuery Standard SQL Group by aggregate multiple columns

Sample dataset:
|ownerId|category|aggCategory1|aggCategory2|
--------------------------------------------
| 1 | dog | animal | dogs |
| 1 | puppy | animal | dogs |
| 2 | daisy | flower | ignore |
| 3 | rose | flower | ignore |
| 4 | cat | animal | cats |
...
Looking to do a group by that contains number of owners from category, aggCategory1, aggCategory2 for example outputting:
|# of owners|summaryCategory|
-----------------------------
| 1 | dog |
| 1 | puppy |
| 1 | daisy |
| 1 | rose |
| 1 | cat |
| 2 | animal |
| 2 | flower |
| 1 | dogs |
| 2 | ignore |
| 1 | cats |
Doesn't have to be that format but looking to get the above data points.
Thanks!
One method is to use union all to unpivot the data and then aggregation in an outer query:
SELECT category, COUNT(*)
FROM (SELECT ownerID, category
FROM t
UNION ALL
SELECT ownerID, aggCategory1
FROM t
UNION ALL
SELECT ownerID, aggCategory2
FROM t
) t
GROUP BY category
The more BigQuery'ish way to write this uses arrays:
SELECT cat, COUNT(*)
FROM t CROSS JOIN
UNNEST(ARRAY[category, aggcategory1, aggcategory2]) cat
GROUP BY cat;
SELECT COUNT(T.ownerID), T.category
FROM (
SELECT ownerID, category
FROM table
UNION
SELECT ownerID, aggCategory1
FROM table
UNION
SELECT ownerID, aggCategory2
FROM table
) AS T
GROUP BY T.category
With a GROUP BY and the union with all of yours categories columns, it can be good.
use union all
with cte as
(
SELECT ownerID, category as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory1 as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory2 as summaryCategory
FROM table
) select count(ownerID),summaryCategory from cte group by summaryCategory

Conditional Deleting in SQL?

See the SQL table below:
+------------+---------+
| Category | RevCode |
+------------+---------+
| 100.10.10 | 2 |
| 100.10.10 | 3 |
| 100.50.10 | 2 |
| 100.50.15 | 2 |
| 100.50.15 | 3 |
| 1000.80.10 | 3 |
| 200.10.10 | 3 |
| 200.50.10 | 3 |
| 200.80.10 | 3 |
| 2000.20.10 | 2 |
| 2000.20.10 | 3 |
| 2000.20.20 | 2 |
| 2000.20.20 | 3 |
| 2000.20.30 | 2 |
+------------+---------+
How can I delete all line items with the Rev Code of 3 where the following condition is met:
A Category has a Rev Code of both '2' and '3'.
For example:
+-----------+---------+
| Category | RevCode |
+-----------+---------+
| 100.10.10 | 2 |
| 100.10.10 | 3 |
+-----------+---------+
The above table will become:
+-----------+---------+
| Category | RevCode |
+-----------+---------+
| 100.10.10 | 2 |
+-----------+---------+
You can use sub_query with having clause like this:
delete from del_table
where RevCode = '3'
and Category in
(select Category from del_table
where RevCode in ('2','3')
group by Category
having count(distinct RevCode) =2 )
this statement may not be efficient, you can use exists clause instead of in clause.
Thanks for Charlesliam's comment. I use sql fiddle tested two cases below.
case1 :
create table del_table(Category varchar(20),RevCode Int);
INSERT INTO del_table VALUES
('100.10.10',2 ),
('100.10.10',3 ),
('100.50.10',2 ),
('100.50.15',3 )
result after deletion:
CATEGORY REVCODE
100.10.10 2
100.50.10 2
100.50.15 3
case2(a Category have more than two rows with duplicate RevCode) :
create table del_table(Category varchar(20),RevCode Int);
INSERT INTO del_table VALUES
('100.10.10',2 ),
('100.10.10',2 ),
('100.10.10',3 ),
('100.10.10',3 ),
('100.50.10',2 ),
('100.50.15',3 )
result after deletion:
CATEGORY REVCODE
100.10.10 2
100.10.10 2
100.50.10 2
100.50.15 3
See whether this helps you.
DECLARE #A TABLE (ID INT IDENTITY(1,1) PRIMARY KEY, CATEGORY VARCHAR(20),REVCODE INT)
INSERT INTO #A VALUES
('100.10.10',2 ),
('100.10.10',3 ),
('100.50.10',2 ),
('100.50.15',2 ),
('100.50.15',3 ),
('1000.80.10',3),
('200.10.10',3 ),
('200.50.10',3 ),
('200.80.10',3 ),
('2000.20.10',2),
('2000.20.10',3),
('2000.20.20',2),
('2000.20.20',3),
('2000.20.30',2)
SELECT * FROM #A
Table:
Query:
DELETE LU
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CATEGORY ORDER BY REVCODE) ROW
FROM #A A
WHERE A.REVCODE IN (2,3)
) LU
WHERE LU.ROW = 2
SELECT * FROM #A
Result:

SQL query to search by multiple tags with relevance sorting

I've got a set of cities that have a many-to-many relationship with a set of tags. The user gives me a collection of tags (which may contain duplicates!), and I need to return a list of matching entries, sorted by relevance.
The Data
Here's some sample data to illustrate the problem:
Cities:
--------------------
| id | city |
--------------------
| 1 | Atlanta |
| 2 | Baltimore |
| 3 | Cleveland |
| 4 | Denver |
| 5 | Eugene |
--------------------
Tags:
------
| id |
------
| 1 |
| 2 |
| 3 |
| 4 |
------
The cities are tagged like this:
Atlanta: 1, 2
Baltimore: 3
Cleveland: 1, 3, 4
Denver: 2, 3
Eugene: 1, 4
...so the CityTags table looks like:
------------------------
| city_id | tag_id |
------------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 3 |
| 3 | 4 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
| 5 | 4 |
------------------------
Example 1
If the user gives me the tag ids: [1, 3, 3, 4], I want to count how many matches I have for each of the tags, and return a relevance-sorted result like:
------------------------
| city | matches |
------------------------
| Cleveland | 4 |
| Baltimore | 2 |
| Eugene | 2 |
| Atlanta | 1 |
| Denver | 1 |
------------------------
Since Cleveland matched all four tags, it's first, followed by Baltimore and Eugene, which each had two tags match, etc.
Example 2
One more example to make for good measure. For the search [2, 2, 2, 3, 4], we'd get:
------------------------
| city | matches |
------------------------
| Denver | 4 |
| Atlanta | 3 |
| Cleveland | 2 |
| Baltimore | 1 |
| Eugene | 1 |
------------------------
SQL
If I ignore the repeated tags, then it's trivial:
SELECT name,COUNT(name) AS relevance FROM
(SELECT name FROM cities,citytags
WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches
GROUP BY name ORDER BY relevance DESC;
But that's not what I need. I need to respect the duplicates. Can someone suggest how I might accomplish this?
Solution in Postgresql
Aha! A temporary table is was I needed. Postgresql lets me do this with its WITH syntax. Here's the solution:
WITH search(tag) AS (VALUES (1), (3), (3), (4))
SELECT name, COUNT(name) AS relevance FROM cities
INNER JOIN citytags ON cities.id=citytags.city_id
INNER JOIN search ON citytags.tag_id=search.tag
GROUP BY name ORDER BY relevance DESC;
Thank you very much to those that answered.
If the user list comes in as a comma-separated list, you could try turning it into a temp table and join on that instead. I don't know the relveant syntax for PosteGRE, so here is the idea in MySql:
create temporary table usertags (tag_id int);
insert usertags values (1),(3),(3),(4);
SELECT name, COUNT(name) AS relevance
FROM cities
JOIN citytags on cities.id = citytags.city_id
JOIN usertags on citytags.tag_id = usertags.tag_id
GROUP BY name ORDER BY relevance DESC;
Converting the comma-separated list to the above code would be as simple as doing a replace all of , to ),( using your server-side language, and then embedding that into a VALUES statement to populate the temp table.
Demo (MySql): http://www.sqlize.com/1qNThhD9tC
Stick all the tags into a table and then JOIN instead of including them in an IN list.
CREATE TABLE #input (
tag_id INT NOT NULL
)
;
INSERT INTO #input
SELECT 1
UNION ALL SELECT 3
UNION ALL SELECT 3
UNION ALL SELECT 4
;
SELECT
city.name,
search.relevance
FROM
city
INNER JOIN
(
SELECT
city_id,
COUNT(*) AS relevance
FROM
citytags
INNER JOIN
#input
ON #input.tag_id = citytags.tag_id
GROUP BY
city_id
)
AS search
ON search.city_id = city.id
ORDER BY
search.relevance DESC
;