Count number of occurrences of keyword in comma separated column? - sql

I have a column which stores data like this:
Product:
product1,product2,product5
product5,product7
product1
What I would like to do is count the number of occurrences there are of product1, product2, etc. but where the record contains multiple products I want it to double count them.
So for the above example the totals would be:
product1: 2
product2: 1
product5: 2
product7: 1
How can I achieve this?
I was trying something like this:
select count(case when prodcolumn like '%product1%' then 'product1' end) from myTable
This gets me the count for product1 appears but how do I extend this to go through each product?
I also tried something like this:
select new_productvalue, count(new_productvalue) from OpportunityExtensionBase
group by new_ProductValue
But that lists all different combinations of the products which were found and how many times they were found...
These products don't change so hard coding it is ok...
EDIT: here is what worked for me.
WITH Product_CTE (prod) AS
(SELECT
n.q.value('.', 'varchar(50)')
FROM (SELECT cast('<r>'+replace(new_productvalue, ';', '</r><r>')+'</r>' AS xml) FROM table) AS s(XMLCol)
CROSS APPLY s.XMLCol.nodes('r') AS n(q)
WHERE n.q.value('.', 'varchar(50)') <> '')
SELECT prod, count(*) AS [Num of Opps.] FROM Product_CTE GROUP BY prod

You have a lousy, lousy data structure, but sometimes one must make do with that. You should have a separate table storing each pair product/whatever pair -- that is the relational way.
with prodref as (
select 'product1' as prod union all
select 'product2' as prod union all
select 'product5' as prod union all
select 'product7' as prod
)
select p.prod, count(*)
from prodref pr left outer join
product p
on ','+p.col+',' like '%,'+pr.prod+',%'
group by p.prod;
This will be quite slow on a large table. And, the query cannot make use of standard indexes. But, it should work. If you can restructure the data, then you should.

Nevermind all you need if one split function
SQL query to split column data into rows
hope after this you can manage .

Related

counting values in two columns at the same time? SQL/SQLITE

The problem I am attemping to do is find which two people go on the most number of trips together. I created a table where you have the name of a person, the name of somebody that went on a hike with them, the name of the peak, and the date. I want to be able to count all of the values of name1 and name2. Ex in my dataset below I want to count how many times 'Mary' and 'Patricia' appear side by side. I tried to use a COUNT(name1,name2) as numPairs and using group by (name1,name2) but SQLLITE says count() can only take in one parameter.
If my query looks off it is mainly due to the fact I am more comfortable using relational algebra selections/projections to get my data. am open to any other solutions that may help me
mary,patricia
brad,steven
cherry,rick
brad,steven
mary,patricia
mary,patricia
| 2
brad,steven | 2
cherry,rick | 1
Do you simply want aggregation?
select name1, name2, count(*)
from t
group by name1, name2;
If you want distinct pairs, so the ordering doesn't matter, you can use least() and greatest():
select least(name1, name2), greatest(name1, name2), count(*)
from t
group by least(name1, name2), greatest(name1, name2);
You need to use nested query as below
select count(*) as Num_of_times,both_names, peak1 from (select concat(name1,",",name2)
as both_names, peak1 from your_table) as tbl group by both_names;

Grouping a percentage calculation in postgres/redshift

I keep running in to the same problem over and over again, hoping someone can help...
I have a large table with a category column that has 28 entries for donkey breed, then I'm counting two specific values grouped by each of those categories in subqueries like this:
WITH totaldonkeys AS (
SELECT donkeybreed,
COUNT(*) AS total
FROM donkeytable1
GROUP BY donkeybreed
)
,
sickdonkeys AS (
SELECT donkeybreed,
COUNT(*) AS totalsick
FROM donkeytable1
JOIN donkeyhealth on donkeytable1.donkeyid = donkeyhealth.donkeyid
WHERE donkeyhealth.sick IS TRUE
GROUP BY donkeybreed
)
,
It's my goal to end up with a table that has primarily the percentage of sick donkeys for each breed but I always end up struggling like hell with the problem of not being able to group by without using an aggregate function which I cannot do here:
SELECT (CAST(sickdonkeys.totalsick AS float) / totaldonkeys.total) * 100 AS percentsick,
totaldonkeys.donkeybreed
FROM totaldonkeys, sickdonkeys
GROUP BY totaldonkeys.donkeybreed
When I run this I end up with 28 results for each breed of donkey, one correct I believe but obviously hundreds of useless datapoints.
I know I'm probably being really dumb here but I keep hitting in to this same problem again and again with new donkeydata, I should obviously be structuring the whole thing a new way because you just can't do this final query without an aggregate function, I think I must be missing something significant.
You can easily count the proportion that are sick in the donkeyhealth table
SELECT d.donkeybreed,
AVG( (dh.sick)::int ) AS proportion_sick
FROM donkeytable1 d JOIN
donkeyhealth dh
ON d.donkeyid = dh.donkeyid
GROUP BY d.donkeybreed

How to force Total Column to appear at last in a TRANSFORM SQL statement

I have the following SQL statement to create a summary view
TRANSFORM Sum(Amount) AS CurrAmount
SELECT Currency, Sum(Amount) AS TotalAmount
From MyTable
GROUP BY Currency
ORDER BY Currency
PIVOT Source
It creates the following view
Currency TotalAmount Retail Corporate Others
EUR 7,071 585 2,345 4,141
GBP 10,444 2,322 4,889 3,233
JPY 7,050 1,295 4,500 1,255
USD 1,625 250 450 925
I am looking for help wherein I need the 'TotalAmount' field to appear as the last column. Much appreciated
Niz
I think your requirement for column ordering/sequence can be handled by the IN clause that is supported by the TRANSFORM operator. Have a look at this:
TRANSFORM <aggregate-function-expression>
<select-statement>
PIVOT <expression>
[IN (<column-value-list>)]
where <aggregate-function-expression> is an expression created with one of the aggregate functions, <select-statement> contains a GROUP BY clause, and <column-value-list> is a list of required values expected to be returned by the PIVOT expression, enclosed in quotes and separated by commas. (You can use the IN clause to force the output sequence of the columns.)
In other words, just use IN and put your quote/comma delimited list of columns in the desired order (e.g. IN ("Currency", "Retail", "Corporate", "Others", "TotalAmount"))
Seems a little complicated to me, but it appears to be supported by Access.
Note: this info was grabbed from the following article:
TRANSFORM Statement
The solution using IN clause somewhat does the job but behaves quite strange at times. Moreover, the departments (retail, corporate) are not fixed for me. New departments keep coming and therefore I cannot be hard coding it every time. I had it resolved in a simpler way.
SELECT Currency,
(SELECT SUM(t1.Amount) FROM MyTable t1 WHERE t1.Source = 'Retail' AND t1.Curr = pos.Curr) Retail,
(SELECT SUM(t1.Amount) FROM MyTable t1 WHERE t1.Source = 'Corporate' AND t1.Curr = pos.Curr) Corporate,
(SELECT SUM(t1.Amount) FROM MyTable t1 WHERE t1.Source NOT IN ('Retail','Corporate') AND t1.Curr = pos.Curr) Others,
(SELECT SUM(t1.Amount) FROM MyTable t1 WHERE t1.Curr = pos.Curr) Total
FROM MyTable pos
GROUP BY Curr
ORDER BY Curr
Even in the above, if there are new departments (appearing in Others) I will end up changing the SQL if I may need to specify, but still is in my control unlike the TRANSFORM statement where is goes out of control.
Thanks guys.

Case Statement with Text Search

I have a data-set (using SQL Server Management Studio) that is used for Sales Analysis. For this example, when an agent fufills a Sales Call or Account Review, they list (via a drop-down) what topics they discussed in the call/review. Then there is a corresponding column of the products that client purchased after-the fact (in this example, I'm using automobiles). I'm thinking maybe a case statement is the way to do but in esscence I need to figure out if any of the makers the person suggested exists in the products column:
So in this example, in line 1, they had suggested Mazda and a Toyota (seperate by ";") and Mazda appears in the products line so that would then be marked as effective. Line 3, they suggested Honda but the person ended up getting a Jeep, so that not effective. So on and so forth.
I'd like for it to be dynamic (maybe an EXISTS??) that way I don't have to write/maintain something like 'Effective'=CASE WHEN Topic like '%Mazada%' and Products like '%Mazada%', "Yes", "No" WHEN.....
Thoughts?
If you have a Product table, then you might be able to get away with something like this:
select RowId, Topic, Products,
(case when exists (select 1
from Products p
where t.Topic like '%'+p.brand+'%' and
t.Products like '%'+p.brand+'%'
)
then 'Yes' else 'No'
end) as Effective
from t;
This is based on the fact that the "brand" seems to be mentioned in both the topic and products fields. If you don't have such a table, you could do something like:
with products as (
select 'Mercedes' as brand union all
select 'Mazda' union all
select 'Toyota' . . .
)
select RowId, Topic, Products,
(case when exists (select 1
from Products p
where t.Topic like '%'+p.brand+'%' and
t.Products like '%'+p.brand+'%'
)
then 'Yes' else 'No'
end) as Effective
from t;
However, this may not work, because in the real world, text is more complicated. It has misspellings, abbreviations, and synonyms. There is no guarantee that there is even a matching word on both lists, and so on. But, if your text is clean enough, this approach might be helpful.

Doing Math with 2 Subquerys

I have two subquerys both calculating sums. I would like to do an Artithmetic Minus(-) with the result of both Querys . eg Query1: 400 Query2: 300 Result should be 100.
Obvious a basic - in the query does not work. The minus works as MINUS on sets. How can I solve this? Do you have any ideas?
SELECT CustumersNo FROM Custumers WHERE
(
SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo
-
SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo
)
> 500
Using Informix - sorry missed that point
To get the original syntax to work, you would need to surround the sub-selects in parentheses:
SELECT CustumersNo
FROM Custumers
WHERE ((SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo)
-
(SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo)
) > 500
Note that aggregates are defined to ignore nulls in the values they aggregate in standard SQL. However, the SUM of an empty set of rows is NULL, not zero.
You can get inventive and devise ways to always have a value for each customer listed in the roe table, such as:
SELECT CustomersNo
FROM (SELECT CustomersNo, SUM(value) AS net_credit
FROM (SELECT credit AS CustomersNo, +value
UNION
SELECT debit AS CustomersNo, -value
) AS x
GROUP BY CustomersNo
) AS y
WHERE net_credit > 500;
You can also do that with an appropriate HAVING clause if you wish. Note that this avoids issues with customers who have credit entries but no debit entries or vice versa; all the entries that are present are treated appropriately.
Your misspelling (or unorthodox spelling) of 'customers' is nearly as good as 'costumers'.
Something like what you tried should work. It may be a syntax problem, and it may depend on what type of SQL you are using. However, an approach like this would be more efficient:
Update: I see you were having a problem with nulls, so I updated it to handle nulls properly.
select CustumersNo from (
select CustumersNo,
sum(coalesce(roecredit.value,0)) - sum(coalesce(roedebit.value,0))
as balance
FROM Custumers
join roe roecredit on roe.credit = Custumers.CustumersNo
join roe roedebit on roe.debit = Custumers.CustumersNo
group by CustumersNo
)
where balance > 500
Caveat: I don't have experience with Informix specifically.