This question already has answers here:
Select query to remove non-numeric characters
(19 answers)
Closed 3 months ago.
I have a column:
| Duration |
| -------- |
| 32 minutes|
| 27minutes |
| 20 mins |
| 15 |
I want to remove the text so that only the numbers remain, but as the text is varied I'm at a loss how to do so. I've reviewed multiple solutions and none seem to accomplish the job in an elegant way.
I had another column that was distance, and every row contained 'km' at the end so I was able to use replace.
UPDATE runner_orders
SET distance = REPLACE(distance,'km','')
I tried doing the same but using a wildcard, this didn't work.
UPDATE runner_orders
SET duration = REPLACE(duration, 'min%','')
Any input is well appreciated.
You can achieve this with CAST
UPDATE runner_orders SET duration = CAST(duration as INT);
Related
This question already has answers here:
How to get a value of key with $ thru JSON_EXTRACT in BigQuery
(1 answer)
Parsing JSON files from a column with invalid token in BigQuery
(1 answer)
BigQuery parse json child column with special character
(1 answer)
Closed 5 months ago.
I am looking to extract $Revenue & $price value from a column which has values like below in a google big query, not sure how to use REGEX_EXTRACT or any other function to do so. Position of revenue & price varies in string so can have specific position.
Any thoughts how i do that?
Value 1 -
{"utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_initial_medium":"direct"}
Value 2 -
{"utm_initial_source":"none",utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_source":"none","Device":"Desktop"}
You can try below query.
JSON_VALUE() function uses double quotes to escape invalid JSONPath characters.
see more information about JSON functions here.
WITH sample_data AS (
SELECT '{"utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_initial_medium":"direct"}' json UNION ALL
SELECT '{"utm_initial_source":"none","utm_medium":"direct","utm_amplitude_user_id":"1580904318308","$quantity":1,"Locale":"English","$revenue":56.49,"Source":"App","utm_date":"2020-02-05","AppVersion":"Mac","$price":56.49,"utm_source":"none","Device":"Desktop"}'
)
SELECT JSON_VALUE(json, '$."$revenue"') AS revenue,
JSON_VALUE(json, '$."$price"') AS price,
FROM sample_data;
+---------+-------+
| revenue | price |
+---------+-------+
| 56.49 | 56.49 |
| 56.49 | 56.49 |
+---------+-------+
I am using MAXIFS (or similar) to identify the wanted line in a table. but i do not need the max value, i need data from an adjecent column. Example:
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Basically, in this example i am searching for lines, matching "Number", with the latest date. But in a next step i require to get the row number of the date to enable the use of INDEX and return the appropriate column (TableComments1[Comment]).
I tried different approaches - no success.
PS: performance is also important here.
UPDATE, example lookup table:"TableComments1"
T.Number | Comment | CommentDate
==============+==============+===========
SCTASK0073347 | correction | 22/07/2018
SCTASK0073347 | update 11 | 25/07/2018
SCTASK0073347 | update 2 | 21/07/2018
PS: sorting "CommentDate" is not an option here.
After days of dabbling and finally posting the above question i found a solution myself. Not sure it is the best but performance seems okay.
Be aware: a more simple solution is possible, by sorting the table "CommentDate". This could not be guaranteed and was not desired in this use-case based on the question input.
recap: We want in table TableView1 to add the most recent comment for column "Number" with lookup from TableComments1 containing the comment history:
I got the idea from another post to use a helper column for combination of 2 criteria. New table layout:
T.Number | Comment | CommentDate | Helper1
==============+==============+=============+===================
SCTASK0073347 | correction | 22/07/2018 | 43303SCTASK0073347
SCTASK0073347 | find this! | 25/07/2018 | 43306SCTASK0073347
SCTASK0073347 | update 2 | 21/07/2018 | 43302SCTASK0073347
TASK9999 | comment | 25/07/2018 | 43306TASK9999
Formula breakdown
The formula for the Helper column just does CONCATENATE 2 columns:
=[#CommentDate]&[#[T.Number]]
Lets say we want: SCTASK0073347
Note: in the helper column we have value "43306SCTASK0073347";
where "43306" is the numerical representation of date "25/07/2018".
This will search for a match of "Number" and return the most recent "CommentDate":
=MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])
Returning "25/07/2018". Lets abbreviate the above to <<MostRecentDate>> for readability in next step(s).
This step, will search for a combination of above formula <<MostRecentDate>> & "Number" in the Helper column:
=MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0)
..returning row number (2) matching helper table value "43306SCTASK0073347".
From this point forward we use MATCH (now returning the wanted row) and INDEX in a style VLOOKUP would do:
=INDEX(TableComments1[Comment];MATCH(<<MostRecentDate>>&TableView1[#Number];TableComments1[Helper1];0))
...returning the wanted column with desired comment "find this!".
Full/final formula, includes IFNA function to clear blank lookups with no comments:
=IFNA(INDEX(TableComments1[Comment];MATCH(MAXIFS(TableComments1[CommentDate];TableComments1[T.Number];TableView1[#Number])&TableView1[#Number];TableComments1[Helper1];0));"")
How can I implement an SQL query like this, in SparkSQL 2.0 using DataFrames and Scala language? I've read a lot of posts but none of them seems to achieve what I need, or if you can point me one, would do. Here's the problem:
UPDATE table SET value = 100 WHERE id = 2
UPDATE table SET value = 70 WHERE id = 4
.....
Suppose that you have a table table with two columns like this:
id | value
--- | ---
1 | 1
2 | null
3 | 3
4 | null
5 | 5
Is there a way to implement the above query, using map, match cases, UDFs or if-else statements? The values that I need to store in the value field are not sequential, so I have specific values to put there. I'm aware too that it is not possible to modify a immutable data when dealing with DataFrames. I have no code to share because I can't get it to work nor reproduce any errors.
Yes you can, it's very simple. You can use when and otherwise.
val pf = df.select($"id", when($"id" === 2, lit(100)).otherwise(when($"id" === 4, lit(70)).otherwise($"value")).as("value"))
First: I'm using Access 2010.
What I need to do is pull everything in a field out that is NOT a certain string. Say for example you have this:
00123457*A8V*
Those last 3 characters that are bolded are just an example; that portion can be any combination of numbers/letters and from 2-4 characters long. The 00123457 portion will always be the same. So what I would need to have returned by my query in the example above is the "A8V".
I have a vague idea of how to do this, which involved using the Right function, with (field length - the last position in that string). So what I had was
SELECT Right(Facility.ID, (Len([ID) - InstrRev([ID], "00123457")))
FROM Facility;
Logically in this mind it would work, however Access 2010 complains that I am using the Right function incorrectly. Can someone here help me figure this out?
Many thanks!
Why not use a replace function?
REPLACE(Facility.ID, "00123457", "")
You are missing a closing square bracket in here Len([ID)
You also need to reverse this "00123457" in InStrRev(), but you don't need InStrRev(), just InStr().
If I understand correctly, you want the last three characters of the string.
The simple syntax: Right([string],3) will yield the results you desire.
(http://msdn.microsoft.com/en-us/library/ms177532.aspx)
For example:
(TABLE1)
| ID | STRING |
------------------------
| 1 | 001234567A8V |
| 2 | 008765432A8V |
| 3 | 005671234A8V |
So then you'd run this query:
SELECT Right([Table1.STRING],3) AS Result from Table1;
And the Query returns:
(QUERY)
| RESULT |
---------------
| A8V |
| A8V |
| A8V |
EDIT:
After seeing the need for the end string to be 2-4 characters while the original, left portion of the string is 00123457 (8 characters), try this:
SELECT Right([Table1].[string],(Len([Table1].[string])-'8')) AS Result
FROM table1;
I'm currently using emacs sql-mode as my sql shell, a (simplified) query response is below:
my_db=# select * from visit limit 4;
num | visit_key | created | expiry
----+-----------------------------+----------------------------+------------
1 | 0f6fb8603f4dfe026d88998d81a | 2008-03-02 15:17:56.899817 | 2008-03-02
2 | 7c389163ff611155f97af692426 | 2008-02-14 12:46:11.02434 | 2008-02-14
3 | 3ecba0cfb4e4e0fdd6a8be87b35 | 2008-02-14 16:33:34.797517 | 2008-02-14
4 | 89285112ef2d753bd6f5e51056f | 2008-02-21 14:37:47.368657 | 2008-02-21
(4 rows)
If I want to then formulate another query based on that data, e.g.
my_db=# select visit_key, created from visit where expiry = '2008-03-02'
and num > 10;
You'll see that I have to add the comma between visit_key and created, and surround the expiry value with quotes.
Is there a SQL DB shell that shows it's content more homoiconically, so that I could minimise this sort of editing? e.g.
num, visit_key, created, expiry
(1, '0f6fb8603f4dfe026d88998d81a', '2008-03-02 15:17:56.899817', '2008-03-02')
or
(num=1, visit_key='0f6fb8603f4dfe026d88998d81a',
created='2008-03-02 15:17:56.899817', expiry='2008-03-02')
I'm using postgresql btw.
Here's one idea, which is similar to what I do sometimes, though I'm not sure that it's exactly what you're asking for:
Run a Lisp compiler (like SBCL) in SLIME. Then load CLSQL. It has a "Functional Data Manipulation Language" (SELECT documentation) which might help you do something like you want, perhaps in conjunction with SLIME's autocompletion capabilities. If not, it's easy to define Lisp functions and macros (assuming you know Lisp, but you're already an Emacser!).
Out-of-the-box, it doesn't give the nicely formatted tables that most SQL interfaces have, but even that isn't too hard to add. And Lisp is certainly powerful enough to let one easily come up with ways to make your common operations easier.
I've found the following changes in psql go some way to giving me homoiconicity:
=# select remote_ip, referer, http_method, time from hit limit 1;
remote_ip | referer | http_method | time
-----------------+---------+-------------+---------------------------
213.233.132.148 | | GET | 2013-08-27 08:01:42.38808
(1 row)
=# \a
Output format is unaligned.
=# \f ''', '''
Field separator is "', '".
=# \t
Showing only tuples.
=# select remote_ip, referer, http_method, time from hit limit 1;
213.233.132.148', '', 'GET', '2013-08-27 08:01:42.38808
caveats: everything is a string, and it's missing start and end quotes.