SQL: Concatenate all fields matching a given key?

SQL: Concatenate all fields matching a given key? - sql

Suppose I have a SQL query like this:
SELECT
tickets.TicketNumber, history.remarks
FROM
AT_DeviceReplacement_Tickets tickets
INNER JOIN
AT_DeviceReplacement_Tickets_History history
ON tickets.TicketNumber = history.TicketNumber;
I get a table like this in repsonse:
ticketNumber | remarks
-------------+------------
1 | "Hello, there is a problem."
1 | "Did you check the power cable?
1 | "We plugged it in and now it works. Thank you!"
2 | "Hello, this is a new ticket."
Suppose that I want to write a query that will concatenate the remarks for each ticket and return a table like this:
ticketNumber | remarks
-------------+------------
1 | "Hello, there is a problem.Did you check the power cable?We plugged it in and now it works. Thank you!"
2 | "Hello, this is a new ticket."
Yes, in the real code, I've actually got these sorted by date, among other things, but just for the sake of discussion, how would I edit the above query to get the result I described?

Have a look at the following questions:
Can I Comma Delimit Multiple Rows Into One Column?
Is it possible to concatenate column values into a string using CTE?

The cleanest solution to this problem is DB dependent. Lentine's links show very ugly solutions for Oracle and SQL Server and a clean one for MySQL. The answer in PostgreSQL is also very short and easy.
SELECT ticket_number, string_agg(remarks, ', ')
FROM
AT_DeviceReplacement_Tickets tickets
INNER JOIN
AT_DeviceReplacement_Tickets_History history
ON tickets.Ticket_Number = history.Ticket_Number
GROUP BY tickets.ticket_number;
(Note you have both ticket_number and TicketNumber in your sample code.)
My guess is that Oracle and SQL Server either (1) have a similar aggregate function or (2) have the capability of defining your own aggregate functions. [For MySQL the equivalent aggregate is called GROUP_CONCAT.] What DB are you using?

Related

SQL different null values in different rows

I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!

Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.

SQL Query - More options and suggestions apart from pivoting

New to SQL please dont mind if this is a silly question..
My table looks like this
I want to send only one email to manager telling him that these employees in your group failed to fill timesheet.
currently i have pivoted the above table that looks like this
and sending emails by concatinating firstemp+secondemp+thirdemp+------
can this be done in any more easiest way..?

You can use CONCAT() function to retrieve a single row data in one column
SELECT M_EMAIL,
CONCAT(FIRSTEMP, SECONDEMP, THIRDEMP, FOURTHEMP, FIFTHEMP...)
FROM 'your_table';
CONCAT() replaces NULL values with an empty string.

Please don't pivot, as the concat is really ugly to maintain (and will break if a more capable manager pops up with more subordinates than your concat columns).
The syntax depends on what SQL server you use. For example, in MSSQL you could do this:
select manager, m_email, STRING_AGG(employee, ', ') as subordinates
from Employee
group by manager, m_email
This result has only 1 row per manager and fixed number of columns regardless how many subordinates the manager has:
manager | m_email | subordinates
----------------------------------
A | A#A.COM | b, D
D | D#D.COM | e, h
I | I#I.COM | j
Play with the example here: http://sqlfiddle.com/#!18/73bb5/5
Another option is just query relevant data to application layer and do the grouping there.

Transposing a field into fields

I have a query that produces a 2 field result: Email and Interest.
The result is millions of records. But there are about 100 distinct Interests.
I would like to run the query to produce a result that is 101 fields wide like this:
Email | Books | Cats | Dogs | ETC
Where the metric is the count of each.
With my knowledge of SQL thus far I'd have to use CASE WHEN. But I'd have to write 100 lines of code.
Is there a better way?

You could use the PIVOT statement but sounds like terradata does not support that. Pivot would require typing in all column names as well. Don't think you can avoid that

SQL: Select distinct based on regular expression

Basically, I'm dealing with a horribly set up table that I'd love to rebuild, but am not sure I can at this point.
So, the table is of addresses, and it has a ton of similar entries for the same address. But there are sometimes slight variations in the address (i.e., a room # is tacked on IN THE SAME COLUMN, ugh).
Like this:
id | place_name | place_street
1 | Place Name One | 1001 Mercury Blvd
2 | Place Name Two | 2388 Jupiter Street
3 | Place Name One | 1001 Mercury Blvd, Suite A
4 | Place Name, One | 1001 Mercury Boulevard
5 | Place Nam Two | 2388 Jupiter Street, Rm 101
What I would like to do is in SQL (this is mssql), if possible, is do a query that is like:
SELECT DISTINCT place_name, place_street where [the first 4 letters of the place_name are the same] && [the first 4 characters of the place_street are the same].
to, I guess at this point, get:
Plac | 1001
Plac | 2388
Basically, then I can figure out what are the main addresses I have to break out into another table to normalize this, because the rest are just slight derivations.
I hope that makes sense.
I've done some research and I see people using regular expressions in SQL, but a lot of them seem to be using C scripts or something. Do I have to write regex functions and save them into the SQL Server before executing any regular expressions?
Any direction on whether I can just write them in SQL or if I have another step to go through would be great.
Or on how to approach this problem.
Thanks in advance!

Use the SQL function LEFT:
SELECT DISTINCT LEFT(place_name, 4)

I don't think you need regular expressions to get the results you describe. You just want to trim the columns and group by the results, which will effectively give you distinct values.
SELECT left(place_name, 4), left(place_street, 4), count(*)
FROM AddressTable
GROUP BY left(place_name, 4), left(place_street, 4)
The count(*) column isn't necessary, but it gives you some idea of which values might have the most (possibly) duplicate address rows in common.

I would recommend you look into Fuzzy Search Operations in SQL Server. You can match the results much better than what you are trying to do. Just google sql server fuzzy search.

Assuming at least SQL Server 2005 for the CTE:
;with cteCommonAddresses as (
select left(place_name, 4) as LeftName, left(place_street,4) as LeftStreet
from Address
group by left(place_name, 4), left(place_street,4)
having count(*) > 1
)
select a.id, a.place_name, a.place_street
from cteCommonAddresses c
inner join Address a
on c.LeftName = left(a.place_name,4)
and c.LeftStreet = left(a.place_street,4)
order by a.place_name, a.place_street, a.id

SQL Query with multiple values in one column

I've been beating my head on the desk trying to figure this one out. I have a table that stores job information, and reasons for a job not being completed. The reasons are numeric,01,02,03,etc. You can have two reasons for a pending job. If you select two reasons, they are stored in the same column, separated by a comma. This is an example from the JOBID table:
Job_Number User_Assigned PendingInfo
1 user1 01,02
There is another table named Pending, that stores what those values actually represent. 01=Not enough info, 02=Not enough time, 03=Waiting Review. Example:
Pending_Num PendingWord
01 Not Enough Info
02 Not Enough Time
What I'm trying to do is query the database to give me all the job numbers, users, pendinginfo, and pending reason. I can break out the first value, but can't figure out how to do the second. What my limited skills have so far:
select Job_number,user_assigned,SUBSTRING(pendinginfo,0,3),pendingword
from jobid,pending
where
SUBSTRING(pendinginfo,0,3)=pending.pending_num and
pendinginfo!='00,00' and
pendinginfo!='NULL'
What I would like to see for this example would be:
Job_Number User_Assigned PendingInfo PendingWord PendingInfo PendingWord
1 User1 01 Not Enough Info 02 Not Enough Time
Thanks in advance

You really shouldn't store multiple items in one column if your SQL is ever going to want to process them individually. The "SQL gymnastics" you have to perform in those cases are both ugly hacks and performance degraders.
The ideal solution is to split the individual items into separate columns and, for 3NF, move those columns to a separate table as rows if you really want to do it properly (but baby steps are probably okay if you're sure there will never be more than two reasons in the short-medium term).
Then your queries will be both simpler and faster.
However, if that's not an option, you can use the afore-mentioned SQL gymnastics to do something like:
where find ( ',' |fld| ',', ',02,' ) > 0
assuming your SQL dialect has a string search function (find in this case, but I think charindex for SQLServer).
This will ensure all sub-columns begin and start with a comma (comma plus field plus comma) and look for a specific desired value (with the commas on either side to ensure it's a full sub-column match).
If you can't control what the application puts in that column, I would opt for the DBA solution - DBA solutions are defined as those a DBA has to do to work around the inadequacies of their users :-).
Create two new columns in that table and make an insert/update trigger which will populate them with the two reasons that a user puts into the original column.
Then query those two new columns for specific values rather than trying to split apart the old column.
This means that the cost of splitting is only on row insert/update, not on _every single select`, amortising that cost efficiently.
Still, my answer is to re-do the schema. That will be the best way in the long term in terms of speed, readable queries and maintainability.

I hope you are just maintaining the code and it's not a brand new implementation.
Please consider to use a different approach using a support table like this:
JOBS TABLE
jobID | userID
--------------
1 | user13
2 | user32
3 | user44
--------------
PENDING TABLE
pendingID | pendingText
---------------------------
01 | Not Enough Info
02 | Not Enough Time
---------------------------
JOB_PENDING TABLE
jobID | pendingID
-----------------
1 | 01
1 | 02
2 | 01
3 | 03
3 | 01
-----------------
You can easily query this tables using JOIN or subqueries.
If you need retro-compatibility on your software you can add a view to reach this goal.

I have a tables like:
Events
---------
eventId int
eventTypeIds nvarchar(50)
...
EventTypes
--------------
eventTypeId
Description
...
Each Event can have multiple eventtypes specified.
All I do is write 2 procedures in my site code, not SQL code
One procedure converts the table field (eventTypeIds) value like "3,4,15,6" into a ViewState array, so I can use it any where in code.
This procedure does the opposite it collects any options your checked and converts it in

If changing the schema is an option (which it probably should be) shouldn't you implement a many-to-many relationship here so that you have a bridging table between the two items? That way, you would store the number and its wording in one table, jobs in another, and "failure reasons for jobs" in the bridging table...

Have a look at a similar question I answered here
;WITH Numbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS N
FROM JobId
),
Split AS
(
SELECT JOB_NUMBER, USER_ASSIGNED, SUBSTRING(PENDING_INFO, Numbers.N, CHARINDEX(',', PENDING_INFO + ',', Numbers.N) - Numbers.N) AS PENDING_NUM
FROM JobId
JOIN Numbers ON Numbers.N <= DATALENGTH(PENDING_INFO) + 1
AND SUBSTRING(',' + PENDING_INFO, Numbers.N, 1) = ','
)
SELECT *
FROM Split JOIN Pending ON Split.PENDING_NUM = Pending.PENDING_NUM
The basic idea is that you have to multiply each row as many times as there are PENDING_NUMs. Then, extract the appropriate part of the string

While I agree with DBA perspective not to store multiple values in a single field it is doable, as bellow, practical for application logic and some performance issues. Let say you have 10000 user groups, each having average 1000 members. You may want to have a table user_groups with columns such as groupID and membersID. Your membersID column could be populated like this:
(',10,2001,20003,333,4520,') each number being a memberID, all separated with a comma. Add also a comma at the start and end of the data. Then your select would use like '%,someID,%'.
If you can not change your data ('01,02,03') or similar, let say you want rows containing 01 you still can use " select ... LIKE '01,%' OR '%,01' OR '%,01,%' " which will insure it match if at start, end or inside, while avoiding similar number (ie:101).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas