Group columns that don't exactly match in sql server

Group columns that don't exactly match in sql server - sql

I am working on a query to pull a list of Merchants and get a count of transactions for that merchant. Here's an example (Note: My table has more columns for description, location, status, amount, date, etc, but these are the important ones).
TransactionID MerchantName
1 MERCHANTA #123
2 MERCHANTA #541
3 MERCHANTA #456
4 MERCHANTB #123
5 MERCHANTB
6 SOME MERCHANTC #123
Now, I want to group these merchants together but since each merchant could have more than one store, their merchant name doesn't always match the same as other transactions.
The only way I know to group them together is the following standard query, but it's never going to work for the different store numbers.
SELECT MerchantName, COUNT(*)
FROM Transactions
GROUP BY MerchantName
My goal is to use Regex to replace the store number with a wildcard or blank string so I can group them together by merchant, regardless of store numbers. Here is my pattern: [#*]\s?[a-zA-Z\d]?
Expected output:
MerchantName TransactionCount
MERCHANTA 3
MERCHANTB 2
SOME MERCHANTC 1
Is this even possible? If so, what is a good way of doing this? Thanks in advance.

Just another option.
No need for the IIF() or a CASE. We just add a "fail-safe" in the charindex()
Example
Declare #YourTable Table ([TransactionID] int,[MerchantName] varchar(50))
Insert Into #YourTable Values
(1,'MERCHANTA #123')
,(2,'MERCHANTA #541')
,(3,'MERCHANTA#456') -- << made ugly
,(4,' MERCHANTB #123') -- << made ugly
,(5,'MERCHANTB')
,(6,'SOME MERCHANTC #123')
Select [MerchantName]
,TransCount = count(*)
From (
Select [MerchantName] = ltriM(rtrim(left([MerchantName],charindex('#',[MerchantName]+'#')-1)))
From #YourTable
) A
Group By [MerchantName]
Returns
MerchantName TransCount
MERCHANTA 3
MERCHANTB 2
SOME MERCHANTC 1
> EDIT for the *
...
Select [MerchantName] = ltriM(rtrim(left([MerchantName],charindex('#',replace([MerchantName],'*','#')+'#')-1)))
From #YourTable
...

Consider:
with cte as (
select
TransactionID,
iif(
charindex(' #', MerchantName) > 0,
left(MerchantName, charindex(' #', MerchantName) - 1),
MerchantName
) MerchantName
from mytable
)
select MerchantName, count(*) TransactionCount
from cte
group by MerchantName
In the common table expression, we modify the merchant name by removing everything that is after ' #' (a space, then the hash sign). Then all that is left to do is aggregate.
Demo on DB Fiddle:
MerchantName | TransactionCount
:------------- | ---------------:
MERCHANTA | 3
MERCHANTB | 2
SOME MERCHANTC | 1
Note: this assumes that ' #' always represents the splitting pattern.

Using stuff and Patindex
DECLARE #MYTAB AS TABLE(transactionId int IDENTITY(1,1),MerchantName nvarchar(50))
insert into #MYTAB(MerchantName) values('MERCHANTA #123')
insert into #MYTAB(MerchantName) values('MERCHANTA #541')
insert into #MYTAB(MerchantName) values('MERCHANTA #456')
insert into #MYTAB(MerchantName) values('MERCHANTB #123')
insert into #MYTAB(MerchantName) values('MERCHANTB')
insert into #MYTAB(MerchantName) values('SOME MERCHANTC #123')
;with cte as(
select
case
when stuff(MerchantName,patindex('%#%',MerchantName),4,'') is not null then
stuff(MerchantName,patindex('%#%',MerchantName),4,'') else MerchantName end [customer] from #MYTAB )
select [customer],count(1) transactionCount from cte group by [customer]

Related

Combine multiple value into 1 for Impala SQL

I want to combine multiple product entries into 1 and also sum their price. Currently, the database looks like this :
Name Product Price
Zack Vanilla Twist 1
Jane Lolipop 0.5
Zack Lolipop 0.5
Zack Candymint 0.5
Jane ChocoLoco LM 1.5
I want to change the look of this into something like this:
Name Product sum(Price)
Zack Vanilla Twist, Lolipop, Candymint 2
Jane Lolipop, ChocoLoco LM 2
How to do this using Impala SQL?

This query works for MySQL, this might help you.
select Name, group_concat(`product` separator ', ') Product, sum(Price)
from tempt
group by Name
order by Name desc
dbfiddle here

declare #temp table (Name varchar(50), product varchar(50), Price decimal(3,1))
insert into #temp values ('Zack','Vanilla Twist',1)
insert into #temp values ('Jane','Lolipop',0.5)
insert into #temp values ('Zack','Lolipop',0.5)
insert into #temp values ('Zack','Candymint',0.5)
insert into #temp values ('Jane','ChocoLoco LM',1.5)
-- No cursor, Whil loop, or User defined function:
SELECT
Name,
STUFF((
SELECT ', ' + product
FROM #temp
WHERE (name = Results.name)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS Product
,sum(Price) as [Sum(Price)]
FROM #temp Results
GROUP BY name
Output:
Name Product Sum(Price)
Jane Lolipop, ChocoLoco LM 2
Zack Vanilla Twist, Lolipop, Candymint 2

Sum records and add note what was summed up in sql

I have a simple table looks like this one:
company_Id user_Id price sub_price
123456 11111 200 NULL
123456 11111 500 NULL
456789 22222 300 NULL
And I want to consolidate records which has count(*) >= 2 into one row by summing up the price but with note what was summed up in column sub_price. Desired output should look like this one:
company_Id user_Id price sub_price
123456 11111 700 200,500
456789 22222 300 300
Is there any simple approach how to achieve desired output? Many thanks for your help in advance.

You can use listagg to turn the elements of a group into a string:
SELECT ...
, LISTAGG(price, ',') WITHIN GROUP (ORDER BY price) sub_price
FROM ...
Although listagg is SQL standard, it is not yet supported by all databases. However, most database offer similar functionality by a different name—e.g. string_agg in PostgreSQL and SQL Sever (since 2017) or group_concat in MySQL.
More info: http://modern-sql.com/feature/listagg (also showing alternatives if listagg is not supported)

This is one possible solution;
More info about concatenating multiple rows into single row you can find here
DECALRE #tbl AS table (
company_Id int
,user_Id int
,price int
,sub_price varchar(25)
)
INSERT INTO #tbl values (123456, 11111, 200, NULL)
INSERT INTO #tbl values (123456, 11111, 500, NULL)
INSERT INTO #tbl values (456789, 22222, 300, NULL)
SELECT
company_Id
,user_Id
,SUM(price) AS price
,STUFF(
(SELECT ',' + cast(price as varchar)
FROM #tbl
WHERE company_Id = a.company_id
AND user_Id = a.user_Id
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'') AS sub_price
FROM #tbl a
GROUP BY company_Id, user_Id

Sum the number of occurrence by id

Is it possible to COUNT the number of times a value occurs in a table, however, use the count of 1 if the value appears more than once for each id.
Take the below table as an example. We want to see if either {5,6} occurred for p_id. If more than 1 occurrence of {5,6} is found, treat it as 1. For eg. p_id 1, the total count is 1.
p_id status
1 5
1 6
1 2
2 5
2 5
3 4
3 2
4 6
4 2
4 5
..transforms to..
p_id count
1 1
2 1
3 0
4 1
COUNT(CASE status IN (5,6) THEN 1 END) does an overall count.

Use the CASE...WHEN... as follows:
SELECT a.id, ISNULL(b.cnt, 0)
FROM
(
SELECT DISTINCT id FROM tab
) a
LEFT JOIN
(
SELECT id, CASE COUNT(*) WHEN 1 THEN 0 ELSE 1 END 'cnt'
FROM tab WHERE val in (5, 6) GROUP BY id
) b
ON a.id = b.id
SQLFiddle

This solution provides a quick setup and a simple two-step explanation of how I do this, using your example. The second query provides the desired result:
CREATE TABLE #temp (p_id INT, [status] INT);
INSERT #temp VALUES (1,5);
INSERT #temp VALUES (1,6);
INSERT #temp VALUES (1,2);
INSERT #temp VALUES (2,5);
INSERT #temp VALUES (2,5);
INSERT #temp VALUES (3,4);
INSERT #temp VALUES (3,2);
INSERT #temp VALUES (4,6);
INSERT #temp VALUES (4,2);
INSERT #temp VALUES (4,5);
-- Simple two-step tutorial
-- First, group by p_id so that all p_id's will be shown
-- run this to see...
SELECT A.p_id
FROM #temp A
GROUP BY A.p_id;
-- Now expand your query
-- Next, for each p_id row found, perform sub-query to see if 1 or more exist with status=5 or 6
SELECT A.p_id
,CASE WHEN EXISTS(SELECT 1 FROM #temp B WHERE B.p_id=A.p_id AND [status] IN (5,6)) THEN 1 ELSE 0 END AS [Count]
FROM #temp A
GROUP BY A.p_id;

Use the SIGN() function. It is exactly what you are looking for.
SELECT
[p_id],
SIGN(COUNT(CASE WHEN [status] IN (5,6) THEN 1 END)) AS [count]
FROM #temp
GROUP BY p_id

You can translate 5,6 = 1 and rest to 0 then do max()
with cte as (
select p_id, case when status in (5,6) then 1 else 0 end status
from FROM #tem)
select p_id, max(status) status
from cte
group by p_id

How to select info from row above?

I want to add a column to my table that is like the following:
This is just an example of how the table is structured, the real table is more than 10.000 rows.
No_ Name Account_Type Subgroup (New_Column)
100 Sales 3
200 Underwear 0 250 *100
300 Bikes 0 250 *100
400 Profit 3
500 Cash 0 450 *400
So for every time there is a value in 'Subgroup' I want the (New_Column) to get the value [No_] from the row above
No_ Name Account_Type Subgroup (New_Column)
100 Sales 3
150 TotalSales 3
200 Underwear 0 250 *150
300 Bikes 0 250 *150
400 Profit 3
500 Cash 0 450 *400
There are cases where the table is like the above, where two "Headers" are above. And in that case I also want the first above row (150) in this case.
Is this a case for a cursor or what do you recommend?
The data is ordered by No_
--EDIT--
Starting from the first line and then running through the whole table:
Is there a way I can store the value for [No_] where [Subgroup] is ''?
And following that insert this [No_] value in the (New_Column) in each row below having value in the [Subgroup] row.
And when the [Subgroup] row is empty the process will keep going, inserting the next [No_] value in (New_Column), that is if the next line has a value in [Subgroup]
Here is a better image for what I´m trying to do:

SQL Server 2012 suggests using Window Offset Functions.
In this case : LAG
Something like this:
SELECT [No_]
,[Name]
,[Account_Type]
,[Subgroup]
,LAG([No_]) OVER(PARTITION BY [Subgroup]
ORDER BY [No_]) as [PrevValue]
FROM table
Here is an example from MS:
http://technet.microsoft.com/en-us/library/hh231256.aspx

The ROW_NUMBER function will allow you to find out what number the row is, but because it is a windowed function, you will have to use a common table expression (CTE) to join the table with itself.
WITH cte AS
(
SELECT [No_], Name, Account_Type, Subgroup, [Row] = ROW_NUMBER() OVER (ORDER BY [No_])
FROM table
)
SELECT t1.*, t2.[No_]
FROM cte t1
LEFT JOIN cte t2 ON t1.Row = t2.Row - 1
Hope this helps.

Next query will return Name of the parent row instead of the row itself, i.e. Sales for both Sales, Underwear, Bikes; and Profit for Profit, Cash:
select ISNULL(t2.Name, t1.Name)
from table t1
left join table t2 on t1.NewColumn = t2.No

So in SQL Server 2008 i created test table with 3 values in it:
create table #ttable
(
id int primary key identity,
number int,
number_prev int
)
Go
Insert Into #ttable (number)
Output inserted.id
Values (10), (20), (30);
Insert in table, that does what you need (at least if understood correctly) looks like this:
declare #new_value int;
set #new_value = 13; -- NEW value
Insert Into #ttable (number, number_prev)
Values (#new_value,
(Select Max(number) From #ttable t Where t.number < #new_value))
[This part added] And to work with subgroup- just modify the inner select to filter out it:
Select Max(number) From #ttable t
Where t.number < #new_value And Subgroup != #Subgroup

SELECT
No_
, Name
, Account_Type
, Subgroup
, ( SELECT MAX(above.No_)
FROM TableX AS above
WHERE above.No_ < a.No_
AND above.Account_Type = 3
AND a.Account_Type <> 3
) AS NewColumn
FROM
TableX AS a

Trying to find the total number of, distinct, clients logged on weekdays, weekends and both weekday+weekend

My table structure is as follows:
node_id | client_id | timestamp
--------+-----------+-----------
1 | 102 | 2012-02-01 (weekday)
--------+-----------+-----------
2 | 104 | 2012-02-01 (weekday)
--------+-----------+-----------
2 | 106 | 2012-02-02 (weekday)
--------+-----------+-----------
1 | 106 | 2012-02-02 (weekend)
--------+-----------+-----------
(added fake weekday/weekend to simplify things)
I need to find the total number of, distinct, client_id's logged on:
A weekday
The weekend
Both a weekday and the weekend
Is it possible to do this in MSSQL? Or will I have to resort to simply dumping all the data and parsing it in my program?
EDIT:
From the above table, the desired output would tell me that:
3 people were logged on Mon-Fri by nodes 1 & 2
1 person was logged on Sat-Sun by nodes 1
1 person was logged on Mon-Sun by nodes 1 & 2
Basically, I need to know how many clients were logged on Mon-Fri, Sat-Sun, Mon-Sun and by which nodes.

You can use datepart(weekday,?) to find the day-of-the-week for a date (see http://msdn.microsoft.com/en-us/library/ms174420.aspx); then its just a matter of specifying which of those you're interested in grouping together.

If I understand you correctly you can try this
SET DATEFIRST 1
declare #tbl table (node_id int identity(1,1), client_id int, dtm datetime)
insert into #tbl (client_id,dtm) values (1,'20111001'), (1,'20111001'),(1,'20111002'),(1,'20111003'),(1,'20111004')
,(2,'20111001'), (2,'20111003'),(2,'20111003'),(2,'20111003'),(2,'20111004')
--weekday
select client_id, COUNT(*)
FROM #tbl
WHERE DATEPART(DW,dtm)<6
GROUP BY client_id
--weekend
select client_id, COUNT(*)
FROM #tbl
WHERE DATEPART(DW,dtm)>5
GROUP BY client_id
This solution uses DATEPART function.
To set the first day of the week use SET DATEFIRST.
EDITED
In SQL-Server 2005+ you can do it this way:
SET DATEFIRST 1
DECLARE #tbl table (node_id int, client_id int, dtm datetime)
INSERT INTO #tbl (node_id,client_id,dtm) VALUES (1,102,'20120201'),(2,104,'20120201'),(2,106,'20120202'),(1,106,'20120204')
--weekday
SELECT CAST(COUNT(*) as varchar)+' people were logged on Mon-Fri by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl WHERE DATEPART(DW,dtm)<6 GROUP BY node_id for xml path(''))
FROM #tbl
WHERE DATEPART(DW,dtm)<6
--weekend
SELECT CAST(COUNT(*) as varchar)+' people were logged on Sat-Sun by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl WHERE DATEPART(DW,dtm)>5 GROUP BY node_id for xml path(''))
FROM #tbl
WHERE DATEPART(DW,dtm)>5
--all week
SELECT CAST(COUNT(*) as varchar)+' people were logged on Mon-Sun by nodes '+
(select cast(node_id as varchar)+',' as 'data()' from #tbl GROUP BY node_id for xml path(''))
FROM #tbl

WITH myCTE AS (
SELECT node_id, clientid, datepart(weekday, timestamp) wkday
FROM YourTable
)
SELECT T1.node_id, T1.count(distinct clientid) total, 'weekdays' category
FROM myCTE T1
WHERE wkday in (2,3,4,5,6) --weekdays
UNION ALL
SELECT T1.node_id, T1.count(distinct clientid) total, 'weekends' category
FROM myCTE T1
WHERE wkday in (1,7) --weekends
SELECT T1.node_id, T1.count(distinct clientid) total, 'mon-sat' category
FROM myCTE T1
WHERE wkday in (2,3,4,5,6,7) --mon-sat
Hope this helps... using the script above, you can follow that if you need more groupings of different combination of days, just keep adding a union all and change the where clause criteria.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group columns that don't exactly match in sql server - sql

Related

Combine multiple value into 1 for Impala SQL

Sum records and add note what was summed up in sql

Sum the number of occurrence by id

How to select info from row above?

Trying to find the total number of, distinct, clients logged on weekdays, weekends and both weekday+weekend

Categories

Resources