I have a row of data that needs some "scrubbing" before it's usable. Since the data enters
from an external source I can't control what enters the table. Instead I need to do some
extensive "scrubbing" a few times a week.
Example data:
Upptäck tron: [har livet mening?] : [vad påstår Jesus?] : [är tron till för alla?]
I want to remove the brackets and all text in between them. (The colons are removed later.)
I was trying this command
UPDATE Table SET Table=REPLACE(Field,[.?],'');
and
UPDATE Table SET Table=REPLACE(Field,'[.?]','');
but it doesn't seem to work.
Since I'm new to SQLite I feel a bit lost. The problem is similar to
Remove everything between specific brackets but I need a pure SQL-query for SQLite.
Has anybody got an idea of how to tackle this problem? (An example would be much
appreciated.)
Don't do it in SQL do it at the application layer.
Related
I used a simple ETL tool to import QuickBooks data into Google BigQuery. Great! The only challenge notable limitation on this step is that I can't do any translation ... more like it's an EL tool.
That said, now I want to query the imported table. It's no problem at all for correctly named fields in BigQuery (like txndate). However, some of the fields are of the format abc.xyz (e.g., deposittoaccountref.value) and can't be queried. The "." in the name is apparently confusing BigQuery.
If I dump the whole table, I can see the "." name fields and the associated values.
However, I can't create a custom query against those fields. They don't show up in the auto-generated schema that allows one to drag and drop field names into the query.
Also, I tried to manually type the field name in and received the following error message: Missing column alias. Any expression in a SELECT statement that is not a column from the original data source must be followed by an alias, for example: AS my_alias.
I've tried quoting the field name and bracketing the field name but they still throw the same error.
I traced back to QB API documentation and this is indeed how Intuit labels the fields.
Finally, as long as I can query these fields at all, I can rename them to eliminate the "." problem.
Please advise and thank you!
ok, I solved this myself.
The way to fix this within bigquery query editor is to manually type in the field name (i.e., not available in the auto-generated schema) and to parenthesis the field name.
e.g, deposittoaccountref.value becomes (deposittoaccountref.value)
Now, this will label the column in the result set as "value", so you may want to relabel the data field to something without the ".". For example, I took the original
deposittoaccountref.value and modified it to
(deposittoaccountref.value) as deposittoaccountref_value
Hopefully, this will help someone else in the future!
the above answer works when there is a single dot in the name as in the example.
however, if there are multiple e.g., "line.value.amount" then the parenthesis trick doesn't work
i've tried nesting the parenthesis in different ways to no avail
e.g., (line.value.amount) = error error, ((line.value).amount) = error, (line.(value.amount)) = error
Below is simplified example of my data. As you can see – there are just two rows here
So I run below and suddenly getting unexpected result
What I expected was something like:
Why am I getting wrong result?
Moreover, when I run below – I am getting only one row. Why second row with id=1 is not showing??
Is there BigQuery bug or what?
Disclaimer: I was asked exactly this type of question few times offline (outside of StackOverflow) and recently saw very same question on SO (I can't understand this BigQuery magic. find string with LIKE) but unfortunately it was deleted so I decided to Post this on my own
The reason for GROUP BY not grouping those two rows is that str field in those rows are actually different. Unfortunately, BigQuery Web UI collapses spaces in result panel when it is in Table mode. To see real/original values you can switch to JSON mode, as below
Same reason is for unexpected result for use of LIKE
As of how to deal with this? It depends!
For example you can kind of normalize your strings by suppressing spaces by yourself as it is shown below
P.S. In our internal tools – we just fixed the issue with suppressed spaces and just simply show all spaces:
I was bored and looking at old code that runs like molasses on a cold day. I found that a group of tables in our accounting system - each with 500,000 records of ~20 datapoints - that use a single column of concatenated, fixed-width values instead of separate columns. (Fixing the tables isn't an option.) An old .net ETL project is grabbing all records, doing a bunch of substrings on each record to set an object's corresponding attributes, then sending the object to merge with production data via a stored proc.
The way it is working is fine. It works. And, to be perfectly honest, I doubt I'll be given the go-ahead to fix it even if I come up with a better solution, but I was curious to see if anyone knew of a better way of doing this, because it's not entirely unlikely that I'll face a situation like this in the future.
I was thinking that if there was a way to use the TextFieldParser to parse a static string instead of a file/stream that might be a valid idea. Or, instead, I could write the entire table to a text file and then use the TextFieldParser to send data to the SProc. http://www.dotnetperls.com/textfieldparser does show that TextFieldParser is quite a bit faster than split, which I would assume is tantamount to the string manipulation our project is currently doing with substring. So there may be something to that idea.
Or perhaps the whole, old project should be dumped for a shiny new SSIS project. Would it also have to write the records to a flat file before importing into SQL? Or can it import directly from the table?
Thank you in advance!
First at all sorry for my English, this is not my native language. So.
I want to execute a SQL query in a script to get some data. I don't know if it's possible and if so, how to make it. To summarize :
The script add a button in M3 Smart Office (a ERP). I already done that.
When i select a row in a M3 function (like an article, or a client) i want to take and send his ID (and some other data) to a website.
They're is a lot of function in M3. In each function, they're are some field who contains a data. One of them contain the ID of the object (An article, a client,...). What i want to do, is to get this ID. The problem is that the field who contains the ID doesn't have the same name in all the function. So, i have two solutions :
Do a lot of if/elseif. Like "if it's such function, take such field". But if I (or somebody else) want to add a combination function/field later i (or somebody else ;) )need to do that in the script. It's not practical.
Create a sql table wich contain all the combination function/field. Then is the script, i do a sql query and i get all the data that the script need.
So here the situation. Maybe you have ideas to do that otherwise (without sql) and i take it !
Please see this in depth tutorial from the 4guysfromrolla site:
Server-Side JScript Objects
We have recently had to do some work with an OpenEdge database that a third party product uses, and today (after much hair-pulling), we finally identified why a view was returning no results.
This view in question combines about 100 separate tables, and is then queried against (we have limited rights to this database). One of the fields returned by this view is a hard-coded string literal, along the lines of
'John Smith' AS TheName
We were having difficulty running queries that included this string, which we were trying to RTrim (the view returned a lot of trailing spaces) and then concatenate with another field.
However, if we used RTrim on this field then, instead of returning an error message, or a null or something like that, the row simply wasn't returned. We weren't trying to use it in a WHERE clause or JOIN, this was simply part of the SELECT ... FROM VIEWNAME. After reviewing the view, it seemed that the view had erroneously detected the length of the string as 9 characters (no length was specified in the definition), and RTrim just didn't work.
Now, I could understand why this might lead to an error message, or a NULL value in the SELECT, but why would the row simply not be returned at all? This doesn't seem like good SQL behaviour and I've never seen it happen with any other RDBMS.
Other info : we're test querying via ODBC and WinSQL, with a view to this being included in an existing ASP.NET app. We don't have access to the backend except via this, although we do have rights to create views.
Update : As a freaky follow-up, we have now discovered that if we attempt to query this view without any WHERE clause, no records are returned. This may have the same cause.
This sounds like it could be related to the SQL-WIDTH within the progress database. One problem with Progress is that if the content of the field exceeds the SQL-WIDTH then you will get strange SQL behaviour (sometimes the driver might fail, other times you get no results).
To identify this you need to use the dbtool command to check for SQL-WIDTH's that may be exceeded.
Make sure you don't have blanks. Trimming doesn't remove blanks only spaces. Blanks are also not nulls. There is a difference in the character set while it is not visibly different in your editor.
I have run into this with a few databases, DBII, Oracle, PostGreSQL. Check the character set of your editor and try viewing the tables, you might see nothing or you might see big rectangles.
That sounds like very strange behavior. Just code around it, do the trim and/or string manipulation in the application and go on your way.