Conditional update column B with modified value based on column A - sql

I am facing a large table with data that got imported from a csv. However the delimiters in the csv where not sanitized, so the input data looked something like this:
alex#mail.com:Alex
dummy#mail.com;Bob
foo#bar.com:Foo
spam#yahoo.com;Spam
whatever#mail.com:Whatever
During the import : was defined as the delimiter, so each row with the delimiter ; was not imported properly. This resulted in a table structured like this:
| ID | MAIL | USER |
|-- --|---------------------|----------|
| 1 | alex#mail.com | ALEX |
| 2 | dummy#mail.com;Bob | NULL |
| 3 | foo#bar.com | Foo |
| 4 | spam#yahoo.com;Spam | NULL |
| 5 | whatever#mail.com | Whatever |
As reimporting is no option I was thinking about manually sanitizing the data in the affected rows by using SQL queries. So I tried to combine SELECT and UPDATE statements by filtering rows WHERE USER IS NULL and update both columns with the correct value where applicable.

What you need are string functions. Reading a bit, I find that Google BigQuery has STRPOS() and SUBSTR().
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#substr
https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#strpos
An update query to fix the situation you are describing looks like this:
update table_name set mail =SUBSTR(mail,1,STRPOS(mail,';')-1), user =SUBSTR(mail,STRPOS(mail,';')+1) where user is null
The idea here is to split mail in its two parts, the part before the ; and the part after. Hope this helps.

Related

Access text count in query design

I am new to Access and am trying to develop a query that will allow me to count the number of occurrences of one word in each field from a table with 15 fields.
The table simply stores test results for employees. There is one table that stores the employee identification - id, name, etc.
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Is there an answer through Query Design, or is code required?
The solution, whether Query Design, or code, would be greatly appreciated!
Firstly, one of the reasons that you are struggling to obtain the desired result for what should be a relatively straightforward request is because your data does not follow database normalisation rules, and consequently, you are working against the natural operation of a RDBMS when querying your data.
From your description, I assume that the fields A1 through A15 are answers to questions on a test.
By representing these as separate fields within your database, aside from the inherent difficulty in querying the resulting data (as you have discovered), if ever you wanted to add or remove a question to/from the test, you would be forced to restructure your entire database!
Instead, I would suggest structuring your table in the following way:
Results
+------------+------------+-----------+
| EmployeeID | QuestionID | Result |
+------------+------------+-----------+
| 1 | 1 | correct |
| 1 | 2 | incorrect |
| ... | ... | ... |
| 1 | 15 | correct |
| 2 | 1 | correct |
| 2 | 2 | correct |
| ... | ... | ... |
+------------+------------+-----------+
This table would be a junction table (a.k.a. linking / cross-reference table) in your database, supporting a many-to-many relationship between the tables Employees & Questions, which might look like the following:
Employees
+--------+-----------+-----------+------------+------------+-----+
| Emp_ID | Emp_FName | Emp_LName | Emp_DOB | Emp_Gender | ... |
+--------+-----------+-----------+------------+------------+-----+
| 1 | Joe | Bloggs | 01/01/1969 | M | ... |
| ... | ... | ... | ... | ... | ... |
+--------+-----------+-----------+------------+------------+-----+
Questions
+-------+------------------------------------------------------------+--------+
| Qu_ID | Qu_Desc | Qu_Ans |
+-------+------------------------------------------------------------+--------+
| 1 | What is the meaning of life, the universe, and everything? | 42 |
| ... | ... | ... |
+-------+------------------------------------------------------------+--------+
With this structure, if ever you wish to add or remove a question from the test, you can simply add or remove a record from the table without needing to restructure your database or rewrite any of the queries, forms, or reports which depends upon the existing structure.
Furthermore, since the result of an answer is likely to be a binary correct or incorrect, then this would be better (and far more efficiently) represented using a Boolean True/False data type, e.g.:
Results
+------------+------------+--------+
| EmployeeID | QuestionID | Result |
+------------+------------+--------+
| 1 | 1 | True |
| 1 | 2 | False |
| ... | ... | ... |
| 1 | 15 | True |
| 2 | 1 | True |
| 2 | 2 | True |
| ... | ... | ... |
+------------+------------+--------+
Not only does this consume less memory in your database, but this may be indexed far more efficiently (yielding faster queries), and removes all ambiguity and potential for error surrounding typos & case sensitivity.
With this new structure, if you wanted to see the number of correct answers for each employee, the query can be something as simple as:
select results.employeeid, count(*)
from results
where results.result = true
group by results.employeeid
Alternatively, if you wanted to view the number of employees answering each question correctly (for example, to understand which questions most employees got wrong), you might use something like:
select results.questionid, count(*)
from results
where results.result = true
group by results.questionid
The above are obviously very basic example queries, and you would likely want to join the Results table to an Employees table and a Questions table to obtain richer information about the results.
Contrast the above with your current database structure -
Per your original question:
The second table has 15 fields - A1 through A15 with the words correct or incorrect in each field. I need the total number of incorrect occurrences for each field, not for the entire table.
Assuming that you want to view the number of incorrect answers by employee, you are forced to use an incredibly messy query such as the following:
select
employeeid,
iif(A1='incorrect',1,0)+
iif(A2='incorrect',1,0)+
iif(A3='incorrect',1,0)+
iif(A4='incorrect',1,0)+
iif(A5='incorrect',1,0)+
iif(A6='incorrect',1,0)+
iif(A7='incorrect',1,0)+
iif(A8='incorrect',1,0)+
iif(A9='incorrect',1,0)+
iif(A10='incorrect',1,0)+
iif(A11='incorrect',1,0)+
iif(A12='incorrect',1,0)+
iif(A13='incorrect',1,0)+
iif(A14='incorrect',1,0)+
iif(A15='incorrect',1,0) as IncorrectAnswers
from
YourTable
Here, notice that the answer numbers are also hard-coded into the query, meaning that if you decide to add a new question or remove an existing question, not only would you need to restructure your entire database, but queries such as the above would also need to be rewritten.

Generate rows from input array

Let's assume I have a table with many records called comments, and each record includes only a text body:
CREATE TABLE comments(id INT NOT NULL, body TEXT NOT NULL, PRIMARY KEY(id));
INSERT INTO comments VALUES (generate_series(1,100), md5(random()::text));
Now, I have an input array with N substrings, with arbitrary length. For example:
abc
xyzw
123456
not_found
For each input value, I want to return all rows that match a certain condition.
For example, given that the table includes the following records:
| id | body |
| -- | ----------- |
| 11 | abcd1234567 |
| 22 | unkown12 |
| 33 | abxyzw |
| 44 | 12345abc |
| 55 | found |
I need a query that returns the following result:
| substring | comments.id | comments.body |
| --------- | ----------- | ------------- |
| abc | 11 | abcd1234567 |
| abc | 44 | 12345abc |
| xyzw | 33 | abxyzw |
| 123456 | 11 | abcd1234567 |
So far, I have this SQL query:
SELECT substrings, comments.id, comments.body
FROM unnest(ARRAY[
'abc',
'xyzw',
'123456',
'not_found'
]) AS substrings
JOIN comments ON comments.id IN (
SELECT id
FROM comments as inner_comments
WHERE inner_comments.body LIKE ('%' || substrings || '%')
);
But the database client gets stuck for more than 10 minutes. And I missing something about joins?
Please note that this is a simplified example of my problem. My current check on the comment is not a LIKE statement, but a complex switch-case statement of different functions (fuzzy matching).
The detour with the IN is unnecessary and unless the optimizer can rewrite this and it likely cannot, adds overhead. Try if it gets better without.
SELECT un.substring,
comments.id,
comments.body
FROM unnest(ARRAY['abc',
'xyzw',
'123456',
'not_found']) un (substring)
INNER JOIN comments
ON comments.body LIKE ('%' || un.substring || '%');
But still indexes cannot be used here because of the wildcard at the beginning. You might want to look at Full Text Search and see what options you have with it to improve the situation.
Basically you are performing FULLTEXT search in a column that most likely doesn't have a FULLTEXT index.
A first step you could try would be to have your column "body" FULLTEXT indexed. See details here and then perform the search using CONTAINS but, quite honestly, since you want to perform fuzzy matching you cannot rely on SQL server to perform the search - it would just not work properly. You will need an indexing service such as ElasticSearch, CloudSearch, Azure Search, etc

Replacing data from one column in a table with another using an unrelated tables

I am working on updating an existing system which will display a table on a web from to show who has called out of work. The data is pulled from a database consisting of two tables. One contains the actual data recorded from the call, and the other is a 'Legend' of sorts so convert a single digit in a column to a plain text description. For example:
Options Table
|---------------------|------------------|------------------|
| LevelName | OptionNo | Description |
|---------------------|------------------|------------------|
| LorA | 1 | Late |
|---------------------|------------------|------------------|
| LorA | 2 | Absent |
|---------------------|------------------|------------------|
| TorT | 1 | Today |
|---------------------|------------------|------------------|
| TorT | 2 | Tomorrow |
|---------------------|------------------|------------------|
The second table has a layout like this (only showing relevant fields):
CallData Table
|---------------------|------------------|------------------|
| EmployeeName | Todaytomorrow | Late_Or_Absent |
|---------------------|------------------|------------------|
| John | 1 | 1 |
|---------------------|------------------|------------------|
| Jane | 2 | 2 |
|---------------------|------------------|------------------|
What I am having trouble with is how to somehow relate the two tables in order to replace the integer representation in the CallLog Table with the corresponding value in the Options Table.
I have tried creating a temporary table by selecting all from both tables, however when I attempt to replace the values only those which happen to align after the join are replaced.
Here is the SQL I currently have:
/*Create TempTable to be updated*/
SELECT * INTO #tempStore FROM CallData, [Options]
WHERE CallData.EmpName IS NOT NULL AND call_completed = '1'
/*Query Pertinant Info*/
SELECT [CallinID], [EmpName], [TodayTomorrow] AS [When], [Late_Or_Absent] AS [Type], [Late_Reason] AS [Reason],
[Absent_Reason], [Early_Reason], [Contact_Number], [Comment], [created_dt] FROM [#tempStore]
/*Update Type to varChar*/
ALTER TABLE [#tempStore]
ALTER COLUMN [TodayTomorrow] nvarchar(20)
/*Perform TorT Update*/
UPDATE [#tempStore] SET [#tempStore].TodayTomorrow = [#tempStore].[Description]
WHERE [#tempStore].LevelName = 'TorT' AND [#tempStore].TodayTomorrow = CONVERT(nvarchar(1), [#tempStore].[OptionNo])
Any and all help is much appreciated.
For this example, I think the following will work:
select cd.EmployeeName
,optTT.Description TodayTomorrow
,optLE.Description Late_or_Absent
from CallData as cd
join [Options] as optTT on optTT.OptionNo = cd.Late_or_Absent and optTT.LevelName = 'TorT'
join [Options] as optLE on optLE.OptionNo = cd.TodayTomorrow and optLE.LevelName = 'LorA'
Edited: added joining on different levels. If there are more than two levels, a more general solution is required.

How can I get a pivot table with concatenated values?

I have the following data:
| ID | TYPE | USER_ID |
|----------|----------|----------|
| 1 | A | 7 |
| 1 | A | 8 |
| 1 | B | 6 |
| 2 | A | 9 |
| 2 | B | 5 |
I'm trying to create a query to return
| ID | RESULT |
|----------|----------|
| 1 | 7, 8, 6 |
| 2 | 9, 5 |
The USER_ID values must be ordered by the TYPE attribute.
Since I'm using MS ACCESS, I'm trying to pivot. What I've tried:
TRANSFORM first(user_id)
SELECT id, type
FROM mytable
GROUP BY id, type
ORDER BY type
PIVOT user_id
Error:
Too many crosstab column headers (4547).
I'm missing something in the syntax. However, it seems to be wrong since the first() aggregate needs to be changed to something else to concatenate the results.
PS: I'm using MS-ACCESS 2007. If you know a solution for SQL-Server or Oracle using only SQL (without vendor functions or stored procedures), I'll probably accept your answer since it will help me to find a solution for this problem.
You don't want to use PIVOT. Pivot will create a column named after each of your user IDs (1 - 7). Your TYPE field doesn't seem to do anything either.
Unfortunately, doing this in SQL Server requires the use of a function (FOR XML Path) that's not available in Access.
Here's a link with a similar Access function to do something similar.

How do I subtract all prices in a mySQL table with SQL only?

I was playing with the following, but it's not there just yet.
ALTER TABLE `product_price` CHANGE `price` = `price` - 20;
What you're looking for is this:
UPDATE product_price SET price = price - 20;
So if your data looks like this:
| id | price |
|----|---------------|
| 1 | 25.20 |
| 2 | 26.50 |
| 3 | 27.00 |
| 4 | 24.25 |
It will turn it to this:
| id | price |
|----|---------------|
| 1 | 5.20 |
| 2 | 6.50 |
| 3 | 7.00 |
| 4 | 4.25 |
As tehvan pointed out in your comments, ALTER is used when you want to change the structure of the table. From the docs:
ALTER TABLE enables you to change the structure of an existing table. For example, you can add or delete columns, create or destroy indexes, change the type of existing columns, or rename columns or the table itself. You can also change the comment for the table and type of the table.
If you want to update information in any way you want to use the UPDATE statement.
As Paolo Bergantino mentioned, you tried to alter the structure of the table rather than the data contained in it. The SQL is made up of different parts, each responsible for something different. For defining your data structures (tables, views, etc.) you use the DDL (Data Definition Language). For manipulating data on the other hand, you use the DML (Data Manipulation Language).
This site shows the different parts of the SQL along with examples.