SQL group by number and replace characters - sql

I have data stored in my database for mobile numbers.
I want to group by the column number in the database.
For example, some numbers may show 44123456789 and 0123456789 which is the same number. How can I group these together?

SELECT DIGITS(column_name) FROM table_name
You should use this format in DB then you assign it any variable, next you can matching their digits with the others.

Not sure it really suits you, but you could build this kind of subquery:
SELECT ta.`phone_nbr`,
COALESCE(list.`normalized_nbr`,ta.`phone_nbr`) AS nbr
FROM (
SELECT
t.`phone_nbr`,
SUBSTRING(t.`phone_nbr`,2) AS normalized_nbr
FROM `your_table` t
WHERE LEFT(t.`phone_nbr`,1) = '0'
UNION
SELECT
t.`phone_nbr`,
sub.`filter_nbr` AS normalized_nbr
FROM `your_table` t,
( SELECT
SUBSTRING(t2.`phone_nbr`,2) AS filter_nbr
FROM `your_table` t2
WHERE LEFT(t2.`phone_nbr`,1) = '0') sub
WHERE LEFT(t.`phone_nbr`,1) != '0'
AND t.`phone_nbr` LIKE CONCAT('%',sub.`filter_nbr`)
) list
LEFT OUTER JOIN `your_table` ta
ON ta.`phone_nbr` = list.`phone_nbr`
It will return you a list of phone numbers with their "normalized" number, i.e. with the 0 or international prefix removed if there is a duplicate match, and the raw number otherwise.
You can then use a GROUP BY clause on the nbr field, join on the phone_nbr for the rest of your query.
It has some limits, as it can unfortunately group similar stripped numbers. +49123456789, +44123456789 and 0123456789 will unfortunately have the same normalized number.

Related

Is there a way to check if any items in a string array are in a string in Snowflake/Redshift?

I am looking for a way to check if a string contains any words in another field which is a single string that holds a list of items. Something like this...
id items (STRING)
1 burger;hotdog
I have a second dataset that might look like...
transaction_id description amount
10 cheeseburger 10
Now I need to grab the amount if the description matches any items in the first table, in this case it does match with the string burger, however, i can't seem to get the SQL right since if I were to use LIKE ANY in Snowflake, i'd need to pass in **('%burger%",'%hotdog%') which are two separate strings - in this case I can't make explicit calls as each id/item permutation may be different in the first table. While in Redshift when I try to use
CASE WHEN lower(t.description) SIMILAR TO '%(' || replace(items,';','|') || ')%' then amount END
I get the following error: Specified types or functions (one per INFO message) not supported on Redshift tables.
Thanks in advance!
If your wanting a snowflake answer:
WITH keys AS (
SELECT * FROM VALUES (1,'burger;hotdog') a(id,items)
), data AS (
SELECT * FROM VALUES (10,'cheeseburger',10) b(transaction_id, description, amount)
), seq_keys AS (
SELECT s.seq_id, f.value as key
FROM (
SELECT seq8() as seq_id, k.*
FROM keys AS k
) AS s
,lateral flatten(input=>split(s.items,';')) F
)
SELECT d.*, sk.*
FORM data d
JOIN seq_keys sk ON d.description ILIKE '%'||sk.key||'%'
gives:
TRANSACTION_ID DESCRIPTION AMOUNT SEQ_ID KEY
10 cheeseburger 10 0 "burger"
which is you distinct on the SEQ_ID then you can de-dupe if there are multi keys that match.. I would be inclined to also add an ID to the "data table".

Teradata column not sorting correctly

I'm trying to join two tables by a column, and sort a table by the same column.
Here is some example data from the two tables:
table.x
state
00039
01156
table.y
state
39
1156
How do I join and sort the tables in SQL assistant?
Simplest solution would be to cast both sides to integer as #Andrew mentioned, so you could use simple casting, or trycast(...) which will try to cast the value and if that fails won't return an error, but NULL value instead:
select *
from x
inner join y on
trycast(y.state as integer) = trycast(y.state as integer)
order by y.state
Old answer (leaving this here for sake of future readers and what you can / can't do):
If you have a recent version of Teradata (you didn't specify it) you would also have LPAD function. Assuming that y.state is not text, but a number we'd also need to cast it, as lpad takes string as argument. If it is, omit cast(...):
select *
from x
inner join y on
x.state = lpad(cast(y.state as varchar(5)), 5, '0')
order by y.state
If you don't have an LPAD function, then some dirty code with substring might come in handy:
select *
from x
inner join y on
x.state = substring('00000' from char_length(cast(y.state as varchar(5))+1) || cast(y.state as varchar(5)
order by y.state
Above assumes that you store numbers within maximum 5 digits. If it's beyond that number (your sample data says 5) then you need to adjust the code.

Emulate subquery with no main table in access

I can do this in SQL Server:
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
but in Access the same query show me the next error:
Query input must contain at least one table or query
So which could be the best form to emulate this? Make a query with any other table looks dirty for me.
EDIT 1:, HR.employees It may no have data, but i want show constants ('HERRAMIENTA ELÉCTRICA',''0') and 0 in the third column, maybe using isnull and this is not the problem here.
Why not to select directly:
select 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
IIF(ISNULL(sum(empid)), 0, sum(empid)) AS STOCK
from HR.employees
This simply doesn't work in Access. You need a FROM clause.
So you need to have a dummy table with one record, even if you don't use a single field from that table.
SELECT 'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
(select sum(empid) from HR.employees) STOCK
FROM Dummy_Table
Using this example as empty table:
with employ as
(select 2 as col from dual
minus
select 2 as col from dual)
The query is this one:
select 'HERRAM' as tipo,
0 as deprec,
coalesce(sum(col), 0) as STOCK
from employ;
coalesce(x, value) sets the column to value when X is null
In Access, you can use a system table, and Val and Nz for the zero value:
SELECT TOP 1
'HERRAMIENTA ELÉCTRICA' AS TIPO_PRODUCTO,
0 AS DEPRECIACION,
Val(Nz((select sum(empid) from HR.employees), 0)) AS STOCK
FROM
MSysObjects

Doing Math with 2 Subquerys

I have two subquerys both calculating sums. I would like to do an Artithmetic Minus(-) with the result of both Querys . eg Query1: 400 Query2: 300 Result should be 100.
Obvious a basic - in the query does not work. The minus works as MINUS on sets. How can I solve this? Do you have any ideas?
SELECT CustumersNo FROM Custumers WHERE
(
SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo
-
SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo
)
> 500
Using Informix - sorry missed that point
To get the original syntax to work, you would need to surround the sub-selects in parentheses:
SELECT CustumersNo
FROM Custumers
WHERE ((SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo)
-
(SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo)
) > 500
Note that aggregates are defined to ignore nulls in the values they aggregate in standard SQL. However, the SUM of an empty set of rows is NULL, not zero.
You can get inventive and devise ways to always have a value for each customer listed in the roe table, such as:
SELECT CustomersNo
FROM (SELECT CustomersNo, SUM(value) AS net_credit
FROM (SELECT credit AS CustomersNo, +value
UNION
SELECT debit AS CustomersNo, -value
) AS x
GROUP BY CustomersNo
) AS y
WHERE net_credit > 500;
You can also do that with an appropriate HAVING clause if you wish. Note that this avoids issues with customers who have credit entries but no debit entries or vice versa; all the entries that are present are treated appropriately.
Your misspelling (or unorthodox spelling) of 'customers' is nearly as good as 'costumers'.
Something like what you tried should work. It may be a syntax problem, and it may depend on what type of SQL you are using. However, an approach like this would be more efficient:
Update: I see you were having a problem with nulls, so I updated it to handle nulls properly.
select CustumersNo from (
select CustumersNo,
sum(coalesce(roecredit.value,0)) - sum(coalesce(roedebit.value,0))
as balance
FROM Custumers
join roe roecredit on roe.credit = Custumers.CustumersNo
join roe roedebit on roe.debit = Custumers.CustumersNo
group by CustumersNo
)
where balance > 500
Caveat: I don't have experience with Informix specifically.

Aggregate SQL Function to grab only the first from each group

I have 2 tables - an Account table and a Users table. Each account can have multiple users. I have a scenario where I want to execute a single query/join against these two tables, but I want all the Account data (Account.*) and only the first set of user data (specifically their name).
Instead of doing a "min" or "max" on my aggregated group, I wanted to do a "first". But, apparently, there is no "First" aggregate function in TSQL.
Any suggestions on how to go about getting this query? Obviously, it is easy to get the cartesian product of Account x Users:
SELECT User.Name, Account.* FROM Account, User
WHERE Account.ID = User.Account_ID
But how might I got about only getting the first user from the product based on the order of their User.ID ?
Rather than grouping, go about it like this...
select
*
from account a
join (
select
account_id,
row_number() over (order by account_id, id) -
rank() over (order by account_id) as row_num from user
) first on first.account_id = a.id and first.row_num = 0
I know my answer is a bit late, but that might help others. There is a way to achieve a First() and Last() in SQL Server, and here it is :
Stuff(Min(Convert(Varchar, DATE_FIELD, 126) + Convert(Varchar, DESIRED_FIELD)), 1, 23, '')
Use Min() for First() and Max() for Last(). The DATE_FIELD should be the date that determines if it is the first or last record. The DESIRED_FIELD is the field you want the first or the last value. What it does is :
Add the date in ISO format at the start of the string (23 characters long)
Append the DESIRED_FIELD to that string
Get the MIN/MAX value for that field (since it start with the date, you will get the first or last record)
Stuff that concatened string to remove the first 23 characters (the date part)
Here you go!
EDIT: I got problems with the first formula : when the DATE_FIELD has .000 as milliseconds, SQL Server returns the date as string with NO milliseconds at all, thus removing the first 4 characters from the DESIRED_FIELD. I simply changed the format to "20" (without milliseconds) and it works all great. The only downside is if you have two fields that were created at the same seconds, the sort can possibly be messy... in which cas you can revert to "126" for the format.
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + Convert(Varchar, DESIRED_FIELD)), 1, 19, '')
EDIT 2 : My original intent was to return the last (or first) NON NULL row. I got asked how to return the last or first row, wether it be null or not. Simply add a ISNULL to the DESIRED_FIELD. When you concatenate two strings with a + operator, when one of them is NULL, the result is NULL. So use the following :
Stuff(Max(Convert(Varchar, DATE_FIELD, 20) + IsNull(Convert(Varchar, DESIRED_FIELD), '')), 1, 19, '')
Select *
From Accounts a
Left Join (
Select u.*,
row_number() over (Partition By u.AccountKey Order By u.UserKey) as Ranking
From Users u
) as UsersRanked
on UsersRanked.AccountKey = a.AccountKey and UsersRanked.Ranking = 1
This can be simplified by using the Partition By clause. In the above, if an account has three users, then the subquery numbers them 1,2, and 3, and for a different AccountKey, it will reset the numnbering. This means for each unique AccountKey, there will always be a 1, and potentially 2,3,4, etc.
Thus you filter on Ranking=1 to grab the first from each group.
This will give you one row per account, and if there is at least one user for that account, then it will give you the user with the lowest key(because I use a left join, you will always get an account listing even if no user exists). Replace Order By u.UserKey with another field if you prefer that the first user be chosen alphabetically or some other criteria.
I've benchmarked all the methods, the simpelest and fastest method to achieve this is by using outer/cross apply
SELECT u.Name, Account.* FROM Account
OUTER APPLY (SELECT TOP 1 * FROM User WHERE Account.ID = Account_ID ) as u
CROSS APPLY works just like INNER JOIN and fetches the rows where both tables are related, while OUTER APPLY works like LEFT OUTER JOIN and fetches all rows from the left table (Account here)
You can use OUTER APPLY, see documentation.
SELECT User1.Name, Account.* FROM Account
OUTER APPLY
(SELECT TOP 1 Name
FROM [User]
WHERE Account.ID = [User].Account_ID
ORDER BY Name ASC) User1
SELECT (SELECT TOP 1 Name
FROM User
WHERE Account_ID = a.AccountID
ORDER BY UserID) [Name],
a.*
FROM Account a
The STUFF response from Dominic Goulet is slick. But, if your DATE_FIELD is SMALLDATETIME (instead of DATETIME), then the ISO 8601 length will be 19 instead of 23 (because SMALLDATETIME has no milliseconds) - so adjust the STUFF parameter accordingly or the return value from the STUFF function will be incorrect (missing the first four characters).
First and Last do not exist in Sql Server 2005 or 2008, but in Sql Server 2012 there is a First_Value, Last_Value function. I tried to implement the aggregate First and Last for Sql Server 2005 and came to the obstacle that sql server does guarantee the calculation of the aggregate in a defined order. (See attribute SqlUserDefinedAggregateAttribute.IsInvariantToOrder Property, which is not implemented.) This might be because the query analyser tries to execute the calculation of the aggregate on multiple threads and combine the results, which speeds up the execution, but does not guarantee an order in which elements are aggregated.
Define "First". What you think of as first is a coincidence that normally has to do with clustered index order but should not be relied on (you can contrive examples that break it).
You are right not to use MAX() or MIN(). While tempting, consider the scenario where you the first name and last name are in separate fields. You might get names from different records.
Since it sounds like all your really care is that you get exactly one arbitrary record for each group, what you can do is just MIN or MAX an ID field for that record, and then join the table into the query on that ID.
There are a number of ways of doing this, here a a quick and dirty one.
Select (SELECT TOP 1 U.Name FROM Users U WHERE U.Account_ID = A.ID) AS "Name,
A.*
FROM Account A
(Slightly Off-Topic, but) I often run aggregate queries to list exception summaries, and then I want to know WHY a customer is in the results, so use MIN and MAX to give 2 semi-random samples that I can look at in details e.g.
SELECT Customer.Id, COUNT(*) AS ProblemCount
, MIN(Invoice.Id) AS MinInv, MAX(Invoice.Id) AS MaxInv
FROM Customer
INNER JOIN Invoice on Invoice.CustomerId = Customer.Id
WHERE Invoice.SomethingHasGoneWrong=1
GROUP BY Customer.Id
Create and join with a subselect 'FirstUser' that returns the first user for each account
SELECT User.Name, Account.*
FROM Account, User,
(select min(user.id) id,account_id from User group by user.account_id) as firstUser
WHERE Account.ID = User.Account_ID
and User.id = firstUser.id and Account.ID = firstUser.account_id